Jump to content

a job for REGEXP?


gcue
 Share

Recommended Posts

i have a LARGE text file (48212 lines - 2.5MB) with lots of information

i would like to capture into an array each instance of the following (spaces are shown as they are in the text file)

id : 7

name : Status

of course "7" and "Status" always change. i am trying to get them into an array with 2 columns (col 1: id, col 2: name)

i would provide an exerpt but the file changes alot throughout - but please let me know if it's needed.

much appreciation in advance!

Edited by gcue
Link to comment
Share on other sites

Something like this might be what you are looking for. If you are looking for ID's that aren't numerical, or Names that contain more than one word, you'll have to give a better example so the Regular Expression can be altered.

Note that $sText should be set by you (the text from the file)

$iOffset=1
$i=0
$iArraySize=10
Dim $aIDName[$iArraySize][2]
While 1
    $aMatches=StringRegExp($sText,"(?s)id \: (\d+).*?name \: (\w+)",1,$iOffset)
    $iOffset=@extended
    If @error Then ExitLoop
    $aIDName[$i][0]=$aMatches[0]
    $aIDName[$i][1]=$aMatches[1]
    $i+=1
    If $i>$iArraySize-1 Then
        $iArraySize+=10
        ReDim $aIDName[$iArraySize][2]
    EndIf   
WEnd
ReDim $aIDName[$i+1][2]

*edit: Offset should start at 1

Edited by Ascend4nt
Link to comment
Share on other sites

array was empty when i did _arraydisplay($aidname) =(

actually i the id's ARE numerical...here's an exerpt of the txt file:

export-timestamp: 1280228784

begin schema

name : HPD:Help Desk

timestamp : 1272320092

export-version : 9

schema-type : 1

admin-sub-perm : 0

get-list-flds : 21\1000000161\15\0\\1000000000\50\0\\7\15\0\\1000000164\8\0\\1000000169\8\0\\1000000018\15\0\\1000000019\10\0\\1000000560\30\0\\1000003009\25\0\\1000000

get-list-flds : 869\16\0\\1000000217\25\0\\1000000218\25\0\\301921200\10\0\\260000001\15\0\\200000006\25\0\\1000000035\10\0\\240001002\20\0\\1000000099\19\0\\6\20\0\\2\

get-list-flds : 7\0\\1000002613\20\0\

default-vui : Default User View

obj-props : 6\90002\4\26\Remedy Incident Management\90007\2\1\90008\4\1\1\90009\4\17\BMC:Incident Mgmt\90010\2\1\90011\2\1\

vui {

id : 399990088

name : 536895968

label : Default User View

}

vui {

id : 399990106

name : New5368959682

label : Dialog View

}

vui {

id : 399990109

name : New5368959683

label : Dialog View - Assignee

}

vui {

id : 399990127

name : New536895968

label : Dialog View - Create

}

vui {

id : 399990128

name : NewNew5368959682

label : Dialog View - Create (Simple)

}

vui {

id : 399990154

name : NewNew536895968

label : Dialog View - Modify

}

vui {

id : 399990155

name : NewNewNew536895968

label : Dialog View - Modify (Simple)

}

vui {

id : 399996597

name : New5368959684

label : RA

}

field {

id : 1

name : Entry ID

datatype : 4

create-mode : 2

option : 3

maxlength : 15

menu-style : 1

qbe-match-op : 3

fulltext-optns : 0

default : INC

}

many thanks!!!

Link to comment
Share on other sites

Worked fine for me when I copied the text to a file and read it into $sText. However the ReDim at the end was off by 1. Also, one of your name labels contains spaces, so you'll need to adjust the PCRE.

Here's one that grabs the name from the first letter up to the end of line:

#include <Array.au3>
$sFilename=@DesktopDir&"\test.txt"
$sText=FileRead($sFilename)
$iOffset=1
$i=0
$iArraySize=10
Dim $aIDName[$iArraySize][2]
While 1
    $aMatches=StringRegExp($sText,"(?s)id \: (\d+).*?name \: ([^\v]+)",1,$iOffset)
    $iOffset=@extended
    If @error Then ExitLoop
    $aIDName[$i][0]=$aMatches[0]
    $aIDName[$i][1]=$aMatches[1]
    $i+=1
    If $i>$iArraySize-1 Then
    $iArraySize+=10
    ReDim $aIDName[$iArraySize][2]
    EndIf
WEnd
If $i Then
    ReDim $aIDName[$i][2]
    _ArrayDisplay($aIDName)
EndIf
Link to comment
Share on other sites

Please tell me you're using the latest version of AutoIt.

Save the example text that *you* provided in a path like that I listed, and retry.

Oh, and PCRE's are PC Regular Expressions.

Link to comment
Share on other sites

  • Moderators

hmm i still didn't get anything... just 9 empty rows/columns if i force the arraydisplay

what's PCRE?

thanks for your help =)

Well, PCRE has been explained to you before, when we/I told you to look up the tuts if you were going to use Regex.

You know, the links that were provided for you in the many other non-regex attempt threads you've made?

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

hey Ascend4nt

i am using the last version of autoit.

you're right, i tried the example i posted and it works great!

i guess the paste doesnt work so well.. ive a small portion of it as an attachment.

hey smoke,

i have tried going through the tutorials and have definitely tried when it comes up. definitely hard for me to understand. i go through the syntax of what others write and i sorta get the syntax but still dont see sometimes the order of how its written. i have even tried using regex coach, regex magic to see if i can figure out teh output and read the tutorials from those progs. dunno why i cant get it.. its tough! i try to learn from people's examples too. i hate asking.

Link to comment
Share on other sites

There's more than one space between 'id' and 'name' and ':' in your text file, which is why it failed (here it needs \s+). Just in case there's situations where there is more than one between the ':' and the #/name, I've added '\s+ ' there as well.

Here's hopefully the last version. One more thing - if 'name' always follows 'id' on the next line, it could be modified further to just consume whitespace up to name. You could also add 'label' as a 3rd column, but I'll leave that for you to figure out. Here's what it looks like now:

#include <Array.au3>
$sFilename=@DesktopDir&"\test.txt"
$sText=FileRead($sFilename)
$iOffset=1
$i=0
$iArraySize=10
Dim $aIDName[$iArraySize][2]
While 1
    $aMatches=StringRegExp($sText,"(?s)id\s+\:\s+(\d+).*?name\s+\:\s+([^\v]+)",1,$iOffset)
    $iOffset=@extended
    If @error Then ExitLoop
    $aIDName[$i][0]=$aMatches[0]
    $aIDName[$i][1]=$aMatches[1]
    $i+=1
    If $i>$iArraySize-1 Then
    $iArraySize+=10
    ReDim $aIDName[$iArraySize][2]
    EndIf
WEnd
If $i Then
    ReDim $aIDName[$i][2]
    _ArrayDisplay($aIDName)
EndIf
Link to comment
Share on other sites

i have tried going through the tutorials and have definitely tried when it comes up. definitely hard for me to understand. i go through the syntax of what others write and i sorta get the syntax but still dont see sometimes the order of how its written. i have even tried using regex coach, regex magic to see if i can figure out teh output and read the tutorials from those progs. dunno why i cant get it.. its tough! i try to learn from people's examples too. i hate asking.

Personally, i suck at regular expressions, as evidenced by an example i posted to solve someone else's issue earlier where even after grabbing the data i wanted i couldn't be bothered to ruin my pattern to remove a trailing slash that i was picking up. Every time i have to use them, i read up enough to muddle through the issue at hand using the regexp tester tool linked in the tutorial in the helpfile. and as soon as i have the pattern i need everything i've kind of picked up in time it took to get it working goes right out of my head. don't kill yourself trying to memorize the rules etc, but i swear if i can get a mostly working pattern using just the helpfile(and the expression tester linked therein) anybody can. personally the approach i take is to jumble everything together into one big nasty string(so i don't have to worry about newline characters, etc. Then use StringRegExp() mode 3 to return arrays because i'm comfortable with arrays and the other string functions to fix that data.
Link to comment
Share on other sites

Personally, i suck at regular expressions, as evidenced by an example i posted to solve someone else's issue earlier where even after grabbing the data i wanted i couldn't be bothered to ruin my pattern to remove a trailing slash that i was picking up. Every time i have to use them, i read up enough to muddle through the issue at hand using the regexp tester tool linked in the tutorial in the helpfile. and as soon as i have the pattern i need everything i've kind of picked up in time it took to get it working goes right out of my head. don't kill yourself trying to memorize the rules etc, but i swear if i can get a mostly working pattern using just the helpfile(and the expression tester linked therein) anybody can. personally the approach i take is to jumble everything together into one big nasty string(so i don't have to worry about newline characters, etc. Then use StringRegExp() mode 3 to return arrays because i'm comfortable with arrays and the other string functions to fix that data.

good tip.. i think thats the way i learn too...

thanks for the post =)

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...