Jump to content
Sign in to follow this  
gcue

a job for REGEXP?

Recommended Posts

gcue

i have a LARGE text file (48212 lines - 2.5MB) with lots of information

i would like to capture into an array each instance of the following (spaces are shown as they are in the text file)

id : 7

name : Status

of course "7" and "Status" always change. i am trying to get them into an array with 2 columns (col 1: id, col 2: name)

i would provide an exerpt but the file changes alot throughout - but please let me know if it's needed.

much appreciation in advance!

Edited by gcue

Share this post


Link to post
Share on other sites
Ascend4nt

Something like this might be what you are looking for. If you are looking for ID's that aren't numerical, or Names that contain more than one word, you'll have to give a better example so the Regular Expression can be altered.

Note that $sText should be set by you (the text from the file)

$iOffset=1
$i=0
$iArraySize=10
Dim $aIDName[$iArraySize][2]
While 1
    $aMatches=StringRegExp($sText,"(?s)id \: (\d+).*?name \: (\w+)",1,$iOffset)
    $iOffset=@extended
    If @error Then ExitLoop
    $aIDName[$i][0]=$aMatches[0]
    $aIDName[$i][1]=$aMatches[1]
    $i+=1
    If $i>$iArraySize-1 Then
        $iArraySize+=10
        ReDim $aIDName[$iArraySize][2]
    EndIf   
WEnd
ReDim $aIDName[$i+1][2]

*edit: Offset should start at 1

Edited by Ascend4nt

Share this post


Link to post
Share on other sites
gcue

array was empty when i did _arraydisplay($aidname) =(

actually i the id's ARE numerical...here's an exerpt of the txt file:

export-timestamp: 1280228784

begin schema

name : HPD:Help Desk

timestamp : 1272320092

export-version : 9

schema-type : 1

admin-sub-perm : 0

get-list-flds : 21\1000000161\15\0\\1000000000\50\0\\7\15\0\\1000000164\8\0\\1000000169\8\0\\1000000018\15\0\\1000000019\10\0\\1000000560\30\0\\1000003009\25\0\\1000000

get-list-flds : 869\16\0\\1000000217\25\0\\1000000218\25\0\\301921200\10\0\\260000001\15\0\\200000006\25\0\\1000000035\10\0\\240001002\20\0\\1000000099\19\0\\6\20\0\\2\

get-list-flds : 7\0\\1000002613\20\0\

default-vui : Default User View

obj-props : 6\90002\4\26\Remedy Incident Management\90007\2\1\90008\4\1\1\90009\4\17\BMC:Incident Mgmt\90010\2\1\90011\2\1\

vui {

id : 399990088

name : 536895968

label : Default User View

}

vui {

id : 399990106

name : New5368959682

label : Dialog View

}

vui {

id : 399990109

name : New5368959683

label : Dialog View - Assignee

}

vui {

id : 399990127

name : New536895968

label : Dialog View - Create

}

vui {

id : 399990128

name : NewNew5368959682

label : Dialog View - Create (Simple)

}

vui {

id : 399990154

name : NewNew536895968

label : Dialog View - Modify

}

vui {

id : 399990155

name : NewNewNew536895968

label : Dialog View - Modify (Simple)

}

vui {

id : 399996597

name : New5368959684

label : RA

}

field {

id : 1

name : Entry ID

datatype : 4

create-mode : 2

option : 3

maxlength : 15

menu-style : 1

qbe-match-op : 3

fulltext-optns : 0

default : INC

}

many thanks!!!

Share this post


Link to post
Share on other sites
Ascend4nt

Worked fine for me when I copied the text to a file and read it into $sText. However the ReDim at the end was off by 1. Also, one of your name labels contains spaces, so you'll need to adjust the PCRE.

Here's one that grabs the name from the first letter up to the end of line:

#include <Array.au3>
$sFilename=@DesktopDir&"\test.txt"
$sText=FileRead($sFilename)
$iOffset=1
$i=0
$iArraySize=10
Dim $aIDName[$iArraySize][2]
While 1
    $aMatches=StringRegExp($sText,"(?s)id \: (\d+).*?name \: ([^\v]+)",1,$iOffset)
    $iOffset=@extended
    If @error Then ExitLoop
    $aIDName[$i][0]=$aMatches[0]
    $aIDName[$i][1]=$aMatches[1]
    $i+=1
    If $i>$iArraySize-1 Then
    $iArraySize+=10
    ReDim $aIDName[$iArraySize][2]
    EndIf
WEnd
If $i Then
    ReDim $aIDName[$i][2]
    _ArrayDisplay($aIDName)
EndIf

Share this post


Link to post
Share on other sites
gcue

hmm i still didn't get anything... just 9 empty rows/columns if i force the arraydisplay

what's PCRE?

thanks for your help =)

Share this post


Link to post
Share on other sites
Ascend4nt

Please tell me you're using the latest version of AutoIt.

Save the example text that *you* provided in a path like that I listed, and retry.

Oh, and PCRE's are PC Regular Expressions.

Share this post


Link to post
Share on other sites
SmOke_N

hmm i still didn't get anything... just 9 empty rows/columns if i force the arraydisplay

what's PCRE?

thanks for your help =)

Well, PCRE has been explained to you before, when we/I told you to look up the tuts if you were going to use Regex.

You know, the links that were provided for you in the many other non-regex attempt threads you've made?


Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Share this post


Link to post
Share on other sites
gcue

hey Ascend4nt

i am using the last version of autoit.

you're right, i tried the example i posted and it works great!

i guess the paste doesnt work so well.. ive a small portion of it as an attachment.

hey smoke,

i have tried going through the tutorials and have definitely tried when it comes up. definitely hard for me to understand. i go through the syntax of what others write and i sorta get the syntax but still dont see sometimes the order of how its written. i have even tried using regex coach, regex magic to see if i can figure out teh output and read the tutorials from those progs. dunno why i cant get it.. its tough! i try to learn from people's examples too. i hate asking.

Share this post


Link to post
Share on other sites
Ascend4nt

There's more than one space between 'id' and 'name' and ':' in your text file, which is why it failed (here it needs \s+). Just in case there's situations where there is more than one between the ':' and the #/name, I've added '\s+ ' there as well.

Here's hopefully the last version. One more thing - if 'name' always follows 'id' on the next line, it could be modified further to just consume whitespace up to name. You could also add 'label' as a 3rd column, but I'll leave that for you to figure out. Here's what it looks like now:

#include <Array.au3>
$sFilename=@DesktopDir&"\test.txt"
$sText=FileRead($sFilename)
$iOffset=1
$i=0
$iArraySize=10
Dim $aIDName[$iArraySize][2]
While 1
    $aMatches=StringRegExp($sText,"(?s)id\s+\:\s+(\d+).*?name\s+\:\s+([^\v]+)",1,$iOffset)
    $iOffset=@extended
    If @error Then ExitLoop
    $aIDName[$i][0]=$aMatches[0]
    $aIDName[$i][1]=$aMatches[1]
    $i+=1
    If $i>$iArraySize-1 Then
    $iArraySize+=10
    ReDim $aIDName[$iArraySize][2]
    EndIf
WEnd
If $i Then
    ReDim $aIDName[$i][2]
    _ArrayDisplay($aIDName)
EndIf

Share this post


Link to post
Share on other sites
seandisanti

i have tried going through the tutorials and have definitely tried when it comes up. definitely hard for me to understand. i go through the syntax of what others write and i sorta get the syntax but still dont see sometimes the order of how its written. i have even tried using regex coach, regex magic to see if i can figure out teh output and read the tutorials from those progs. dunno why i cant get it.. its tough! i try to learn from people's examples too. i hate asking.

Personally, i suck at regular expressions, as evidenced by an example i posted to solve someone else's issue earlier where even after grabbing the data i wanted i couldn't be bothered to ruin my pattern to remove a trailing slash that i was picking up. Every time i have to use them, i read up enough to muddle through the issue at hand using the regexp tester tool linked in the tutorial in the helpfile. and as soon as i have the pattern i need everything i've kind of picked up in time it took to get it working goes right out of my head. don't kill yourself trying to memorize the rules etc, but i swear if i can get a mostly working pattern using just the helpfile(and the expression tester linked therein) anybody can. personally the approach i take is to jumble everything together into one big nasty string(so i don't have to worry about newline characters, etc. Then use StringRegExp() mode 3 to return arrays because i'm comfortable with arrays and the other string functions to fix that data.

Share this post


Link to post
Share on other sites
gcue

Personally, i suck at regular expressions, as evidenced by an example i posted to solve someone else's issue earlier where even after grabbing the data i wanted i couldn't be bothered to ruin my pattern to remove a trailing slash that i was picking up. Every time i have to use them, i read up enough to muddle through the issue at hand using the regexp tester tool linked in the tutorial in the helpfile. and as soon as i have the pattern i need everything i've kind of picked up in time it took to get it working goes right out of my head. don't kill yourself trying to memorize the rules etc, but i swear if i can get a mostly working pattern using just the helpfile(and the expression tester linked therein) anybody can. personally the approach i take is to jumble everything together into one big nasty string(so i don't have to worry about newline characters, etc. Then use StringRegExp() mode 3 to return arrays because i'm comfortable with arrays and the other string functions to fix that data.

good tip.. i think thats the way i learn too...

thanks for the post =)

Share this post


Link to post
Share on other sites
seandisanti

thanks for the post =)

glad to help

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.