florisch Posted June 5, 2007 Share Posted June 5, 2007 I have tried a lot, but I dont get it. I have a text file looking like this: Date 04.06.2007 13:20 Start XXX blabla lotsoftext in lots of lines Date 04.06.2007 13:22 Start YYY blabla lotsoftext in lots of lines Date 04.06.2007 16:22 Start ZZZ blabla lotsoftext in lots of lines Date 04.06.2007 17:23 Start XXX blabla lotsoftext in lots of lines I am searching for the text block(s) containing "XXX" or "YYY", grabbing everything from "Date" to next "Date" or eof. To solve this, I would search with StringInStr() for "XXX", cut the text in two pieces, search for "Date" in first part, matching from end; in second part matching from beginning of the part, repeating all that until "XXX" is not found anymore. Local $sLogText = FileRead("RegExpTest.txt") Local $sSearchText = "XXX" local $i = 1 local $pos, $startpos, $temp, $endpos while 1 $pos = StringInStr($sLogText, $sSearchText, 0, $i) if $pos = 0 then ExitLoop $startpos = StringInStr(StringLeft($sLogText, $pos), "Date", 0, -1) $temp = StringInStr(StringTrimLeft($sLogText, $pos), "Date") if $temp = 0 Then $endpos = StringLen($sLogText) else $endpos = $pos + $temp EndIf $i += 1 MsgBox(0,"Entry found", StringMid($sLogText, $startpos, $endpos - $startpos)) WEnd How to do this in a better/shorter/ simpler way with StringRegExp? I fooled around with regexp, but could not get it. Btw. its always complaining when I try to use [:ascii:] Anybody here for this little challenge? I promise to learn from stringRegExp solution :-) Link to comment Share on other sites More sharing options...
Sokko Posted June 5, 2007 Share Posted June 5, 2007 Better/shorter/simpler is what regular expressions are all about. I tested this code on the data you posted so it should work fine, assuming the entire file looks like that. Any questions?$aRegex = StringRegExp($sLogText, "Date (.*?)\r\nStart " & $sSearchText & "\r\n((?:.|\r\n)*?)\r\n\r\n", 2)If a matching block is found, $aRegex is an array:$aRegex[0] contains the entire matching block (from "Date" up to and including the blank line)$aRegex[1] contains the date of the block ("04.06.2007 13:20")$aRegex[2] contains the text of the block ("lots of text in lots of lines")In addition, @extended contains the offset of the character in $sLogText that immediately follows the matching block, so you can trim the string and try again to get another match.If a matching block is not found, @error is set to 1 and $aRegex is NOT an array (so make sure you check that first). Link to comment Share on other sites More sharing options...
Sokko Posted June 6, 2007 Share Posted June 6, 2007 Did my code work for the file you have? This has quickly dropped down to page 5 so I figured you might have missed the reply. Link to comment Share on other sites More sharing options...
florisch Posted June 11, 2007 Author Share Posted June 11, 2007 Did my code work for the file you have? This has quickly dropped down to page 5 so I figured you might have missed the reply.Thanks a lot for your help. I left early last week and was on vacation until now. Sorry for the delay. Your regex looks fine and I will try to use it, but (there always is a "but" :-) The regex finds the first occurence, but the last one does not have an empty line at the end. And: "lots of lines" may contain empty lines. Therefore I changed the regex to $aRegex = StringRegExp($sLogText, "Date (.*?)\r\nStart " & $sSearchText & "\r\n((?:.|\r\n)*?)(?:\r\nDate|\z)", 2) Link to comment Share on other sites More sharing options...
Sokko Posted June 12, 2007 Share Posted June 12, 2007 I ran out of ideas. How do I supress the last group? Reverse your thinking. Instead of trying to remove the last group, capture everything except the last group with another group. $aRegex = StringRegExp($sLogText, "(Date (.*?)\r\nStart " & $sSearchText & "\r\n((?:.|\r\n)*?))(?:\r\nDate|\z)", 2) This shifts the return values around a bit: $aRegex now has indexes from 0 to 3 (the new captured group is index 1). It's not actually possible to exclude the trailing Date from the full match. Your problem is that the sequence you're trying to capture has no definite terminator that resides within the block. The only way to see whether you've reached the end of the sequence is to step outside it. Blame the designer of the file format. Link to comment Share on other sites More sharing options...
florisch Posted June 12, 2007 Author Share Posted June 12, 2007 (edited) Reverse your thinking. Instead of trying to remove the last group, capture everything except the last group with another group. [...] Blame the designer of the file format. Thanks for showing new ways. And well, for the file format I have to blame myself :-) Edited June 12, 2007 by florisch Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now