Sign in to follow this  
Followers 0
avery

Impossible to do with RegEx?

14 posts in this topic

Does anyone know if this is impossible to do with RegEx or not? (Warning: Head-ache material)

To populate an Array with:

Example #1
04/19/2005  09:16 AM            16,384 BUILTIN\Administrators filename.doc
1111111111  22222222            333333 44444444444444444444444444444444444

Example #2
04/19/2005  09:16 AM            16,384 BUILTIN\Administrators filename.doc
1111111111  22222222            333333 4444444444444444444444 555555555555

The <blank area> are not tabs, unforgivably, they are spaces.

I also understand a login name could include spaces as well so I figured "Example #1" is impossible or to high of a probability to result in bad results.

I figured "Example #2" might be do-able if I was to understand regex better.

I tried to use StringSplit but the delimiters are not consistent enough for me to get good results with.

Please, if there are any regex guru's out there, help me. These things hurt my head worse then anything else.

I understand I am asking for a lot of help. I'll donate 10$ to jon@autoitscript.com to help with his hosting bills if anyone is willing to try and help me out with my regex.

Thanks for reading my post.

Respectfully,

Avery Howell

Merry Christmas or Happy Holidays!

The autoitscript.com domain runs on its own physical and dedicated server and currently handles 30GB of traffic per day.

The hosting fees are paid for by user donations and my own money. Please make a donation if you feel AutoIt is worth supporting. No amount is too small - it all helps ;)

Thanks,

Jon


www.abox.orgAvery HowellVisit My AutoIt Websitehttp://www.abox.org

Share this post


Link to post
Share on other sites



Oh, c'mon it wasn't that har... uhmm... I mean...

That was tough! Here you go:

#include <Array.au3>

Global $aInput[3] = ["03/18/2007  08:16 AM               987 BUILTIN\Users SmallFile.doc", _
        "05/20/2008  12:01 PM            16,384 BUILTIN\Administrators filename.doc", _
        "04/19/2005  09:16 AM         2,316,384 BUILTIN\Guests BigFile.doc"]

Global $sRegExp = "(\d{2}/\d{2}/\d{4})(?:\s+)(\d{2}:\d{2}\s[[:alpha:]]{2})(?:\s+)([0-9,]+)(?:\s+)(.+)"

For $n = 0 To UBound($aInput) - 1
    $aRET = StringRegExp($aInput[$n], $sRegExp, 3)
    If IsArray($aRET) Then
        _ArrayDisplay($aRET, $n & ":  $aRET")
    Else
        ConsoleWrite($n & ":  Error" & @LF)
    EndIf
Next

Make that donation to AutoIt commensurate with the extreme effort this required!

;)


Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law

Share this post


Link to post
Share on other sites

I was just going to offer another pattern example to achieve the same thing... however, I had a thought that maybe this is one larger fileread or string.

So...

#include <Array.au3>

Global $s_string = "04/19/2005  09:16 AM            16,384 BUILTIN\Administrators filename.doc" & @CRLF
$s_string &= "1111111111  22222222            333333 44444444444444444444444444444444444" & @CRLF
$s_string &= "08/24/2006  11:23 PM            6 BUILTIN\Administrators filename.doc" & @CRLF
$s_string &= "1111111111  22222222            333333 4444444444444444444444 555555555555"

; If we have a large string, we can do this in two parts ( or one if you want to step 4)
; Get just the lines that are valid
Global $a_just_lines = _myString_GetValidLinesArray($s_string)
If IsArray($a_just_lines) = 0 Then Exit
_ArrayDisplay($a_just_lines)

; If we are not skipping the above ( not using Step 4 )
; Then we can send each individual line and get the 4 parts of the values returned
Global $a_sep_data
For $i = 0 To UBound($a_just_lines) - 1
    $a_sep_data = _myString_GetValidDataArray($a_just_lines[$i])
    _ArrayDisplay($a_sep_data)
Next

Func _myString_GetValidLinesArray($s_string)
    Local $s_pattern = "(\d{2}/\d{2}/\d{4}\s+\d+:\d+\s+(?:AM|PM)\s+[\d,]+\s+.+?)(?:\v|\z)"
    Return StringRegExp($s_string, $s_pattern, 3)
EndFunc

Func _myString_GetValidDataArray($s_string)
    Local $s_pattern = "(\d{2}/\d{2}/\d{4})\s+(\d+:\d+\s+(?:AM|PM))\s+([\d,]+)\s+(.+?)(?:\v|\z)"
    Return StringRegExp($s_string, $s_pattern, 3)
EndFunc

[center]Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.[/center]

Share this post


Link to post
Share on other sites

I was just going to offer another pattern example to achieve the same thing... however, I had a thought that maybe this is one larger fileread or string.

So...

Don't forget to emphasize what a huge level of effort this requires. We'd hate to see avery feel like a Scrooge at Christmas, now wouldn't we?

;)


Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law

Share this post


Link to post
Share on other sites

Not the worst case to work with; only 1 group out of 4 is "not known" to you (it may have white spaces or not).

You know for sure that first group and the 3rd one does not have any white spaces. You know also that the 2nd group has 1 white space.

It can be easily done without StringRegExp (easy for me because StringRegExp is still a matter of trail and error for me) this way:

- StringStripWS with flag 4 (strip double or more spaces between words)

- StringSplit for " " (white space)

- [1] is the first group (date)

- [2] & [3] is "time"

- [4] is "size"

- what's left is the last group

It could have been worse: other groups might have or not white spaces or they might be present or not ... and you were speaking about headaches ;)


SNMP_UDF ... for SNMPv1 and v2c so far, GetBulk and a new example script

wannabe "Unbeatable" Tic-Tac-Toe

Paper-Scissor-Rock ... try to beat it anyway :)

Share this post


Link to post
Share on other sites

Here is another attempt at using the string of numbers in each example as a template for the entries of an array.

#include <Array.au3>

Global $s_string = "04/19/2005 09:16 AM     16,384 BUILTIN\Administrators filename.doc" & @CRLF
$s_string &= "1111111111 22222222   333333 44444444444444444444444444444444444" & @CRLF
$s_string &= "08/24/2006 11:23 PM   16,384 BUILTIN\Administrators filename.doc" & @CRLF
$s_string &= "1111111111 22222222   333333 4444444444444444444444 555555555555"

Local $temp
$aInput = StringSplit(StringRegExpReplace($s_string, "([ ]+)", " "), @CRLF, 3)

For $Ex = 0 To UBound($aInput) - 1 Step 2
    Local $Pat = StringRegExp($aInput[$Ex + 1], "([^ ]+)", 3)

    Local $aArray[UBound($Pat)]
    ConsoleWrite($aInput[$Ex] & @CRLF)
    $Num = 1
    For $i = 0 To StringLen($aInput[$Ex + 1] & " ")
        If StringMid($aInput[$Ex + 1] & " ", $i, 1) = $Num Then
            $temp &= StringMid($aInput[$Ex], $i, 1)
        EndIf
        If StringMid($aInput[$Ex + 1] & " ", $i, 1) = " " Then
            $aArray[$Num - 1] = $temp
            $Num += 1
            ConsoleWrite($Num & " " & $temp & @CRLF)
            $temp = ""
        EndIf
    Next
    _ArrayDisplay($aArray)
Next

Share this post


Link to post
Share on other sites

Don't forget to emphasize what a huge level of effort this requires. We'd hate to see avery feel like a Scrooge at Christmas, now wouldn't we?

You rang?

Seriously though, my contribution to this matter is thus:

The given output looks exactly like the output of the "dir /q" command.

If this is correct, then the owner field is fixed at 23 characters (longer names are concatenated with no space between it and the filename, shorter names are padded with spaces)


[font="Tahoma"]"Tougher than the toughies and smarter than the smarties"[/font]

Share this post


Link to post
Share on other sites

#8 ·  Posted (edited)

Regex Pattern..

Single Line

[\d/:,]+(?:(?:\sA|P)M)?|[A-Z]+\\.*(?=\s)|[a-z.]+

Doesn't works if login name has spaces. Fixed

Multilines mode:

(*ANYCRLF)(?m)[\d/:,]+(?:(?:\sA|P)M)?|[A-Z]+\\.*(?=\s\S+$)|[a-z.]+
Edited by Mison

Hi ;)

Share this post


Link to post
Share on other sites

Another attempt.

#include <Array.au3>

Global $s_string = "04/19/2005 09:16 AM     16,384 BUILTIN\Administrators filename.doc" & @CRLF
$s_string &= "1111111111 22222222 333333 44444444444444444444444444444444444" & @CRLF
$s_string &= "08/24/2006 11:23 PM 16,384 BUILTIN\Administrators filename.doc" & @CRLF
$s_string &= "1111111111 22222222 333333 4444444444444444444444 555555555555"


$aInput = StringSplit(StringRegExpReplace($s_string, "([ ]+)", " "), @CRLF, 3)

For $Ex = 0 To 1
    Local $aResult = StringRegExp($aInput[$Ex], "(.{10}) *(.{8}) *(.{6}) *(.*)", 3)
    _ArrayDisplay($aResult)
Next
For $Ex = 2 To 3
    Local $aResult2 = StringRegExp($aInput[$Ex], "(.{10}) *(.{8}) *(.{6}) *(.*?) (.*)", 3)
    _ArrayDisplay($aResult2)
Next

Share this post


Link to post
Share on other sites

Just a point, but unless strings containing spaces are enclosed in quotes (which it looks like they aren't) then I don't think #2 can be done by any method. #1 should be feasible if times and dates are assumed to be in a regular format.

For example, if you have "domain\user name file.txt" there is no way of telling which section 'name' belongs to, so you cannot separate #4 from #5.

Share this post


Link to post
Share on other sites

Thanks guys. I still think it was a hard regex.

My example was listed with the numbers under the data as the array index number I was looking to create using the regex but I'm pretty sure these awesome regex would work with either anyways, correct? The 111,222,333 etc is not in the original data-source I'm looking to parse.

I will do the donation just as I promised and it was totally worth it even though some of you think it was easy. I've always struggled with regex for some reason. Maybe someone will buy me a regex book for Christmas, it was on my list to Santa.


www.abox.orgAvery HowellVisit My AutoIt Websitehttp://www.abox.org

Share this post


Link to post
Share on other sites

Told you can be done without using RegEx. I agree, RegEx results in a shorter and faster code and for those RegEx gurus nothing is easier, but there are always workarounds. There is always at least one other way to do it.


SNMP_UDF ... for SNMPv1 and v2c so far, GetBulk and a new example script

wannabe "Unbeatable" Tic-Tac-Toe

Paper-Scissor-Rock ... try to beat it anyway :)

Share this post


Link to post
Share on other sites

@avery:

Have you checked http://www.regular-expressions.info ? RegEx looked like voodoo to me as well until I took the time to read a good deal of it's material. They have pretty clear examples of all the features (including more advanced topics, like lookarounds, greediness, etc), including explanations of how each match is performed.

PS: Some motivation for you, if you need it:

Posted Image

"Wait, forgot to escape a space. Wheeeeee[taptaptap]eeeeee."

Share this post


Link to post
Share on other sites

@danielkza: Your linky seemed to be a mashup of Cameron Laird's personal notes on "Regular Expressions" and Regular-Expressions.info

;)


Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0