Sign in to follow this  
Followers 0
JBtje

StringRegExp

9 posts in this topic

Hello,

I've been trying to get my regular expressions correct, thought for some reason it seems to do something different than it should do...

$data = "hello" & @CRLF & "world" & @CRLF & "how" & @CRLF & "are" & @CRLF & "you?"

$zz = StringRegExp( $data, '[^.]+how', 1)
msgbox(0, "1", $zz[0]) ; Returns everything!? why ?


$zz = StringRegExp( $data, '[.]+', 1)
;~ msgbox(0, "2", $zz[0]) ; Returns nothing!? why ?

$zz = StringRegExp( $data, '[\r\n.]+', 1)
msgbox(0, "3", $zz[0]) ; Returns line break (or space)!? why ?

Situation 1 as seen above, should return all characters that ate not withing the [] (^ should do that..) so... it should return just one linebreak!

What it does, it returns everything :| how come?

Situation 2 should return the first word "hello", but it returns nothing...

Situation 3 should return everything, but returns only one linebreak!?

Anyone who can explain what i'm doing wrong? or is this some sort of weird bug?

Greetz,

JB

Share this post


Link to post
Share on other sites



you might want to read through this: PCRE Specs

also you might want you use capture brackets '()' to capture exactly what you want.

also in the directory "program files\autoit3\examples\helpfile" there is a file "StringRegExpGUI.au3" you can use that to test the regexp engine, the radio button options are (top to bottom) 0,1,3


[size="2"] "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian Kernighan[/size]

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

Hello JBtje,

SRE can be a fickle thing, and I have found it to be one of the most difficult functions in AutoIt to learn and understand.

First, Your third example is a correct way to obtaining the result you want for the First example. Reason your third example is incorrect is your asking it to capture what is inside [], /r=Carriage Return or @CR, /n = linefeed or @LF, so your asking it to capture @CRLF. Reason for the first is your asking it to Capture everything but '.' repetitively '+', and any combinations of 'how' so your end result from $data would be 'how'. This example which is your third, is what you want.

$zz1 = StringRegExp( $data, '[\r\n.]+', 1)
MsgBox(0,"1",$zz1[0])

In your Second example, your asking it to capture '.' since there are no periods in your $data string, you will result with nothing. In order to capture just the first word, these to examples will do the trick:

$zz2a = StringRegExp( $data, '(.*?)' & @CRLF, 1)
$zz2b = StringRegExp( $data, '[^/r/n]+' & @CRLF, 1)
MsgBox(0,"2","First Example= " & $zz2a[0] & @CRLF _
            &"Second Example= " & $zz2b[0])

As for your third example, I explained what it does in my First example. I'm not sure exatly what your goal is here, but to take a guess, you want the sentence to be one line, and if I am correct, you could use:

$zz = StringRegExpReplace($data,@CRLF,' ')
msgbox(0, "3", $zz)

; if you need a linefeed at the end, then return as
$zz = StringRegExpReplace($data,@CRLF,' ') & @CRLF
MsgBox(0,"3",$zz)

I hope this helps.

Realm

Edited by Realm

My Contributions: Unix Timestamp: Calculate Unix time, or seconds since Epoch, accounting for your local timezone and daylight savings time. RegEdit Jumper: A Small & Simple interface based on Yashied's Reg Jumper Function, for searching Hives in your registry.  

Share this post


Link to post
Share on other sites

The first example actually returns

hello

world

how

For me

In the second the dot will be read as a litteral when used in square brackets by the way.

StringRegExp($data, "^\b\w+\b\v", 1) gets the first word.

I don't know why you would ever have to use an SRE to return everything ($data already holds it) but if that's what you want then

StringRegExp($data, "(?s)^.+$", 1) will do it.

Look in my signature for the PCRE Toolkit to test SREs and SRERs


George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

#5 ·  Posted (edited)

Thank you all for the reply's.

Regular expressions are not that "hard" to use --> in php I have no problems with them.

Thought... when there are these weird situations, as when a "." is read as a literal character (the dot) and not as "anything beside the new line character" than it will be hard :| to understand…

Why would u want a "." behave differently within square braces, than outside of it? That makes no sense to me.... how come its done this weird way?

Thank you for the explanation! Even thought this makes no sense, I get the error ;)

Thank!

JB

@Edit:

My actual goal out of this is to remove a "big" part of info from a file... this file, 59KB of size contains words, or not in the center, but always contains those words in the "footer" (last part of the file).

Now with

$sText = StringRegExpReplace($sText, "Footerword(?s).+", "")

this code I can strip this last part of the file so SRE wont search inthere.

For some reason this function seems to be really slow! also 13% cpu of a quad 2.8GHz seems to be a little bit heigh for such a "simple" task. How come?

is it, since its just a matter of removind data from a certain word, maybe smarter to use another function? --> _StringBetween seems to do the trick...

Edited by JBtje

Share this post


Link to post
Share on other sites

Why would u want a "." behave differently within square braces, than outside of it? That makes no sense to me.... how come its done this weird way?

From the helpfile:

Matching Characters

[ ... ] Match any character in the set. e.g. [aeiou] matches any lower-case vowel. A contiguous set can be defined using a dash between the starting and ending characters. e.g. [a-z] matches any lower case character. To include a dash (-) in a set, use it as the first or last character of the set. To include a closing bracket in a set, use it as the first character of the set. e.g. [][] will match either [ or ]. Note that special characters do not retain their special meanings inside a set, with the exception of \\, \^, \-,\[ and \] match the escaped character inside a set.

You will find the same in the official docs for PCRE.

Visit the SciTE4AutoIt3 Download page for the latest versions        Beta files                                                          Forum Rules
 
Live for the present,
Dream of the future,
Learn from the past.
  :)

Share this post


Link to post
Share on other sites

I rest my case...

I was not even aware the same occurred in "all" PCRE :| seems I work to few with it?

Thought, running one more test seems to have f*cked up my windows 7 layout.

A file, 3090 lines, 59KB.

FooterText at line: 2532

A simple script:

Global $file
$file = FileOpen( "blaat.txt", 0)
If $file = -1 Then
    MsgBox(0, "Error", "Unable to open file.")
    Exit
EndIf
$data = FileRead($file)
FileClose($file)
$data = StringRegExpReplace($data, "FooterText(?s).+", "")
msgbox(0, "3", $data)

After running the script from SciTE, the screen went black (5 sec). Screen came back with the msgbox-borders(!) shown (rest of the box transparent). Nothing happened after that. Also the windows 7 toolbar on the bottom of the screen turned in-transparent (is that a word?)

Running the script for a second, third and fourth time, gives the expected result within a sec (but might be because the memory remembers some of its earlier searches?)

Anyhow: I would not recommend this function (to myself) to use with “big” size of data...

Share this post


Link to post
Share on other sites

You didn't post any of your actual code for us to analyze what seems to be happening here and without that we can't help you.


George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

#9 ·  Posted (edited)

You didn't post any of your actual code for us to analyze what seems to be happening here and without that we can't help you.

I actually thought it would have been something that could not been reproduced. Thought... after running multiple test, it keeps occuring in different formt.

Once my whole background was gone after running the script and having a program open (gone = black background) thought when I press desktop icon it did show the desktop again.

Also it has occured that after the screen went black for a sec, all programs where "invicible". minimalizing and maximalizing them made them come back, beside the MSGbox!

In the Test.rar attachment you will find:

Script.png -> Taskbar and header of the program are NOT transparent: caused by running the script

test.au3 -> test file with the script provided as above: run it and it will (at least here) cause problems.

test.txt -> 72KB file: This is the HTML code of a site I retreive certain information from. (yes, some sort of "bot" but only to read/display info...)

Untitled-1.jpg -> I closed explorer.exe (so u dont see all the icons on m desktop ;)) this is what happened directly after running the problem. Note: My mouse doet NOT hover the "show desktop" button!

Untitled-2.jpg -> SciTE window is brought back, other 2 (one of them, with tile "3" = msgbox) are still transparent.

System info:

Windows 7 Ultimate N x64

Intel core i7-860 @ 2.80 GHz

Graphic card: ATI radeon HD5700 series

Greetingz,

JB

Test.rar

Edited by JBtje

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0