Jump to content
Sign in to follow this  
elektron

RegExp StringPos

Recommended Posts

elektron

I'm writing a program to correct some formatting issues in print images of a HCFA-1500 form generated by an old SCO/UNIX system. Here's the deal.

The input: a text file.

The output: a text file.

I can't show the input due to HIPAA regulations, and I can't really show the program either, but I can let you guys know what I need and how the input is parsed into the program.

Essentially:

$hFile = FileOpen("PRINTIMAGE.txt", $READ_ONLY);

$content = FileRead($hFile, -1); read the file to end into a variable.

So, now the entire contents of this file is dumped into this one little variable. Now I can go through that string and apply regex matches to the string. Here's the problem though. In many cases, it's not as simple as a regex replace. I need to find the character position of the string that matches my regular expression and apply some formatting to it.

Can someone help me?

-- elektron

Share this post


Link to post
Share on other sites
jchd

Without further specification, there's nothing anyone can do for you. Remember a regexp is a program which will apply rules that you create to match, extract or replace relevant data. If you can't make the rules explicit, no regex is going to automagically appear!


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
elektron

Without further specification, there's nothing anyone can do for you. Remember a regexp is a program which will apply rules that you create to match, extract or replace relevant data. If you can't make the rules explicit, no regex is going to automagically appear!

The problem is not writing the regex to create the matches, the problem is finding a way, in autoit, to find the location of the string that matches the given regular expression. Is that possible? Here's what I want:

; Continuation of code posted earlier

; ...

; $iPos = StringRegExMatch($content, $regexp) ; Returns the character position of the first regex match.

Share this post


Link to post
Share on other sites
Melba23

elektron,

If you set the flag parameter to 1 or 2 the position of the end of the match is returned in @extended - here is a short example:

#include <Array.au3>

Global $aArray[3]

$sString = "Bla bla bla"

$iOffset = 1

For $i = 0 To 2
    $aTemp = StringRegExp($sString, "a", 1, $iOffset)
    $iOffset = @extended
    $aArray[$i] = $iOffset - 1
Next

_ArrayDisplay($aArray)

That should allow you to locate your match. ;)

M23


Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites
elektron

elektron,

If you set the flag parameter to 1 or 2 the position of the end of the match is returned in @extended - here is a short example:

#include <Array.au3>

Global $aArray[3]

$sString = "Bla bla bla"

$iOffset = 1

For $i = 0 To 2
    $aTemp = StringRegExp($sString, "a", 1, $iOffset)
    $iOffset = @extended
    $aArray[$i] = $iOffset - 1
Next

_ArrayDisplay($aArray)

That should allow you to locate your match. ;)

M23

That's great. It works! But I ended up changing my algorithm completely.

Func parseFile($fileData)
;MsgBox(0, "", $fileData)

; If you're not me and you're reading this crap, then I just want to tell you, I'm sorry for you. Here DEFINITELY be dragons.
$sigRegExp = '([s]{6}[SIGNATURE]+[s][ON]{2}[s][FILE]{4}[s]+)([0-9]{2}[s][0-9]{2}[s][0-9]{2}[s]+)([s]{6}[SIGNATURE]+[s][ON]{2}[s][FILE]{4})'
$svcRegExp = '((((([0-9]{2})[s][s]?){3})+)[d]{2}sds{3}d{5}s{15}ds{5}(([dd]{2}s){2})sds{8}d{10})'
$fileByLine = StringSplit($fileData, @LF)
$returnResult = ""
For $i = 1 To $fileByLine[0] Step 1
  $sigResult = StringRegExp($fileByLine[$i], $sigRegExp, 0)
  $svcResult = StringRegExp($fileByLine[$i], $svcRegExp, 0)
  If( $sigResult = 1 ) Then ; We've found a match
   $buf = StringSplit($fileByLine[$i], "")
   $buf[46] = $buf[44]
   $buf[47] = $buf[45]
   $buf[44] = "2"
   $buf[45] = "0"
   $correctedString = ""
   For $j = 1 To $buf[0] Step 1
    $correctedString &= $buf[$j]
   Next
   $fileByLine[$i] = $correctedString
  EndIf
  If( $svcResult = 1 ) Then ; We've found a match
   $buf = StringSplit($fileByLine[$i], "")
   $buf[3] = $buf[4]
   $buf[4] = $buf[5]
   $buf[5] = "2"
   $buf[6] = "0"
   $buf[11] = " "
   $buf[12] = " "
   $buf[14] = " "
   $buf[15] = " "
   $buf[17] = " "
   $buf[18] = " "
   $buf[19] = ""
   $correctedString = ""
   For $k = 1 To $buf[0] Step 1
    If( $k = 45 ) Then
     $correctedString &= " "
    EndIf
    $correctedString &= $buf[$k]
   Next
   $fileByLine[$i] = $correctedString
   ; ConsoleWrite($i & ". " & $correctedString)
  EndIf
  $returnResult &= $fileByLine[$i] & @LF
Next
Return $returnResult
EndFunc

It's nasty, but not bad for a half-hours worth of coding. That regex matches the entire line(s) I need to extract from the data. Thanks for the help though!

Share this post


Link to post
Share on other sites
Robjong

Hi,

Your pattern should work but if you care about it here is some advice to correct your pattern.

There is no need to put escape sequences in a character set.

[d]   -> d
[0-9]  -> d
[dd] -> d

If you want to match words do not put them in a character set, a character set will match any of characters in the set. For example "[ON]{2}" will match ON as well as NO.

[ON] -> (ON)   ; using a capturing group
[ON] -> (?:ON) ; using a non-capturing group
[ON] -> ON     ; just the word

It will be something like this:

$sigRegExp = 's{6}SIGNATUREsONsFILEs+d{2}sd{2}sd{2}s{6,}SIGNATUREsONsFILE'
$svcRegExp = '(((((d{2})ss?){3})+)d{2}sds{3}d{5}s{15}ds{5}((dds){2})sds{8}d{10})'
Edited by Robjong

Share this post


Link to post
Share on other sites
elektron

Hi,

Your pattern should work but if you care about it here is some advice to correct your pattern.

There is no need to put escape sequences in a character set.

[d]   -> d
[0-9]  -> d
[dd] -> d

If you want to match words do not put them in a character set, a character set will match any of characters in the set. For example "[ON]{2}" will match ON as well as NO.

[ON] -> (ON)   ; using a capturing group
[ON] -> (?:ON) ; using a non-capturing group
[ON] -> ON     ; just the word

It will be something like this:

$sigRegExp = 's{6}SIGNATUREsONsFILEs+d{2}sd{2}sd{2}s{6,}SIGNATUREsONsFILE'
$svcRegExp = '(((((d{2})ss?){3})+)d{2}sds{3}d{5}s{15}ds{5}((dds){2})sds{8}d{10})'

Awesome, thanks for this.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×