elektron Posted May 28, 2012 Share Posted May 28, 2012 I'm writing a program to correct some formatting issues in print images of a HCFA-1500 form generated by an old SCO/UNIX system. Here's the deal. The input: a text file. The output: a text file. I can't show the input due to HIPAA regulations, and I can't really show the program either, but I can let you guys know what I need and how the input is parsed into the program. Essentially: $hFile = FileOpen("PRINTIMAGE.txt", $READ_ONLY); $content = FileRead($hFile, -1); read the file to end into a variable. So, now the entire contents of this file is dumped into this one little variable. Now I can go through that string and apply regex matches to the string. Here's the problem though. In many cases, it's not as simple as a regex replace. I need to find the character position of the string that matches my regular expression and apply some formatting to it. Can someone help me? -- elektron Link to comment Share on other sites More sharing options...
jchd Posted May 28, 2012 Share Posted May 28, 2012 Without further specification, there's nothing anyone can do for you. Remember a regexp is a program which will apply rules that you create to match, extract or replace relevant data. If you can't make the rules explicit, no regex is going to automagically appear! This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
elektron Posted May 29, 2012 Author Share Posted May 29, 2012 Without further specification, there's nothing anyone can do for you. Remember a regexp is a program which will apply rules that you create to match, extract or replace relevant data. If you can't make the rules explicit, no regex is going to automagically appear! The problem is not writing the regex to create the matches, the problem is finding a way, in autoit, to find the location of the string that matches the given regular expression. Is that possible? Here's what I want: ; Continuation of code posted earlier ; ... ; $iPos = StringRegExMatch($content, $regexp) ; Returns the character position of the first regex match. Link to comment Share on other sites More sharing options...
Moderators Melba23 Posted May 29, 2012 Moderators Share Posted May 29, 2012 elektron,If you set the flag parameter to 1 or 2 the position of the end of the match is returned in @extended - here is a short example:#include <Array.au3> Global $aArray[3] $sString = "Bla bla bla" $iOffset = 1 For $i = 0 To 2 $aTemp = StringRegExp($sString, "a", 1, $iOffset) $iOffset = @extended $aArray[$i] = $iOffset - 1 Next _ArrayDisplay($aArray)That should allow you to locate your match. M23  Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind Open spoiler to see my UDFs: Spoiler ArrayMultiColSort ---- Sort arrays on multiple columnsChooseFileFolder ---- Single and multiple selections from specified path treeview listingDate_Time_Convert -- Easily convert date/time formats, including the language usedExtMsgBox --------- A highly customisable replacement for MsgBoxGUIExtender -------- Extend and retract multiple sections within a GUIGUIFrame ---------- Subdivide GUIs into many adjustable framesGUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView itemsGUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeViewMarquee ----------- Scrolling tickertape GUIsNoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxesNotify ------------- Small notifications on the edge of the displayScrollbars ----------Automatically sized scrollbars with a single commandStringSize ---------- Automatically size controls to fit textToast -------------- Small GUIs which pop out of the notification area  Link to comment Share on other sites More sharing options...
elektron Posted May 29, 2012 Author Share Posted May 29, 2012 elektron, If you set the flag parameter to 1 or 2 the position of the end of the match is returned in @extended - here is a short example: #include <Array.au3> Global $aArray[3] $sString = "Bla bla bla" $iOffset = 1 For $i = 0 To 2 $aTemp = StringRegExp($sString, "a", 1, $iOffset) $iOffset = @extended $aArray[$i] = $iOffset - 1 Next _ArrayDisplay($aArray) That should allow you to locate your match. M23 That's great. It works! But I ended up changing my algorithm completely. expandcollapse popupFunc parseFile($fileData) ;MsgBox(0, "", $fileData) ; If you're not me and you're reading this crap, then I just want to tell you, I'm sorry for you. Here DEFINITELY be dragons. $sigRegExp = '([s]{6}[SIGNATURE]+[s][ON]{2}[s][FILE]{4}[s]+)([0-9]{2}[s][0-9]{2}[s][0-9]{2}[s]+)([s]{6}[SIGNATURE]+[s][ON]{2}[s][FILE]{4})' $svcRegExp = '((((([0-9]{2})[s][s]?){3})+)[d]{2}sds{3}d{5}s{15}ds{5}(([dd]{2}s){2})sds{8}d{10})' $fileByLine = StringSplit($fileData, @LF) $returnResult = "" For $i = 1 To $fileByLine[0] Step 1 $sigResult = StringRegExp($fileByLine[$i], $sigRegExp, 0) $svcResult = StringRegExp($fileByLine[$i], $svcRegExp, 0) If( $sigResult = 1 ) Then ; We've found a match $buf = StringSplit($fileByLine[$i], "") $buf[46] = $buf[44] $buf[47] = $buf[45] $buf[44] = "2" $buf[45] = "0" $correctedString = "" For $j = 1 To $buf[0] Step 1 $correctedString &= $buf[$j] Next $fileByLine[$i] = $correctedString EndIf If( $svcResult = 1 ) Then ; We've found a match $buf = StringSplit($fileByLine[$i], "") $buf[3] = $buf[4] $buf[4] = $buf[5] $buf[5] = "2" $buf[6] = "0" $buf[11] = " " $buf[12] = " " $buf[14] = " " $buf[15] = " " $buf[17] = " " $buf[18] = " " $buf[19] = "" $correctedString = "" For $k = 1 To $buf[0] Step 1 If( $k = 45 ) Then $correctedString &= " " EndIf $correctedString &= $buf[$k] Next $fileByLine[$i] = $correctedString ; ConsoleWrite($i & ". " & $correctedString) EndIf $returnResult &= $fileByLine[$i] & @LF Next Return $returnResult EndFunc It's nasty, but not bad for a half-hours worth of coding. That regex matches the entire line(s) I need to extract from the data. Thanks for the help though! Link to comment Share on other sites More sharing options...
Robjong Posted May 30, 2012 Share Posted May 30, 2012 (edited) Hi, Your pattern should work but if you care about it here is some advice to correct your pattern. There is no need to put escape sequences in a character set. [d] -> d [0-9] -> d [dd] -> d If you want to match words do not put them in a character set, a character set will match any of characters in the set. For example "[ON]{2}" will match ON as well as NO. [ON] -> (ON) ; using a capturing group [ON] -> (?:ON) ; using a non-capturing group [ON] -> ON ; just the word It will be something like this: $sigRegExp = 's{6}SIGNATUREsONsFILEs+d{2}sd{2}sd{2}s{6,}SIGNATUREsONsFILE' $svcRegExp = '(((((d{2})ss?){3})+)d{2}sds{3}d{5}s{15}ds{5}((dds){2})sds{8}d{10})' Edited May 30, 2012 by Robjong Link to comment Share on other sites More sharing options...
elektron Posted June 10, 2012 Author Share Posted June 10, 2012 Hi, Your pattern should work but if you care about it here is some advice to correct your pattern. There is no need to put escape sequences in a character set. [d] -> d [0-9] -> d [dd] -> d If you want to match words do not put them in a character set, a character set will match any of characters in the set. For example "[ON]{2}" will match ON as well as NO. [ON] -> (ON) ; using a capturing group [ON] -> (?:ON) ; using a non-capturing group [ON] -> ON ; just the word It will be something like this: $sigRegExp = 's{6}SIGNATUREsONsFILEs+d{2}sd{2}sd{2}s{6,}SIGNATUREsONsFILE' $svcRegExp = '(((((d{2})ss?){3})+)d{2}sds{3}d{5}s{15}ds{5}((dds){2})sds{8}d{10})' Awesome, thanks for this. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now