Jump to content

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Find out more here. X
X


Photo

RegExp - has anyone seen this library before?


  • Please log in to reply
136 replies to this topic

#121 vim

vim

    Polymath

  • Active Members
  • PipPipPipPip
  • 218 posts

Posted 06 October 2006 - 05:05 PM

For those wanting to learn more about regular expressions, I have found an online copy of Mastering Regular Expressions, 2nd Edition here. I am reading it myself in an attempt to better understand regular expressions.


Thanks 'this-is-me' !

Great find. I'm surprised its available online. O'Reilly normally doesn't do that.
I have their CD "The Perl CD Bookshelf", which has very good info.

ViM







#122 thomasl

thomasl

    Wayfarer

  • Active Members
  • Pip
  • 63 posts

Posted 06 October 2006 - 08:14 PM

No named capturing groups then? Too bad; maybe some day... I'll just be thrilled to finally see RegExps well-supported. :lmao:

Try this:

$s="Why not test this for yourself?" $p="((?P<named>.\s.).+(?P=named))" $b=StringRegExp($s,$p,3) for $i=0 to Ubound($B)-1   ConsoleWrite("!"&$b[$i]&"!"&@CRLF); next

However, named groups don't seem, as yet, to work in StringRegExpReplace(). If you send Jon a bottle of champagne maybe he'll consider implementing that.

#123 sohfeyr

sohfeyr

    Prodigy

  • Active Members
  • PipPipPip
  • 194 posts

Posted 08 October 2006 - 07:09 PM

I think I must be missing something in this implementation. (Thought about posting in the support forum, but since there were already similar posts in this thread... If it belongs there, fine, if it belongs here, fine.)

$ln = 'CallCommand»_EditLineReplace|33|"»_1"|"»_2"|' $x = StringRegExp($ln,'\|"([^"]+)"\|',1) $y = StringRegExp($ln,'\|([^|]+)\|',1) For $n in $x ConsoleWrite($n & @crlf) Next ConsoleWrite("----" & @crlf) For $n in $y ConsoleWrite($n & @crlf) Next


The output:
»_1 ---- 33


I was expecting:
»_1 »_2 ---- 33 "»_1" "»_2"


Any idea what's wrong? $x[1] and $y[1] both result in errors.
Modes 1 and 3 work as above, but 2 and 4 give me this:
Variable must be of type "Object".: For $n in $x For $n in $x^ ERROR

Edited by sohfeyr, 08 October 2006 - 07:25 PM.


#124 sohfeyr

sohfeyr

    Prodigy

  • Active Members
  • PipPipPip
  • 194 posts

Posted 08 October 2006 - 07:41 PM

Okay... I got
\|"([^"]+)"|\|([^|]+)

to work in RegExBuddy, but AutoIt is still only returning one capture, the 33

#125 /dev/null

/dev/null

    Universalist

  • MVPs
  • 2,946 posts

Posted 08 October 2006 - 07:52 PM

$ln = 'CallCommand»_EditLineReplace|33|"»_1"|"»_2"|'

with the form of your data, StringSplit() with "|" as delimiter, would be much easier than RegExp...

Cheers
Kurt
__________________________________________________________(l)user: Hey admin slave, how can I recover my deleted files?admin: No problem, there is a nice tool. It's called rm, like recovery method. Make sure to call it with the "recover fast" option like this: rm -rf *

#126 sohfeyr

sohfeyr

    Prodigy

  • Active Members
  • PipPipPip
  • 194 posts

Posted 08 October 2006 - 08:09 PM

with the form of your data, StringSplit() with "|" as delimiter, would be much easier than RegExp...


Oh, I'm sorry - I forgot to say what I was trying to accomplish.
I want to be able to do things like:
CMD»FNC|Nparam|"Qparam|Rparam"|SParam

so that the returned groups are:
Nparam
Qparam|Rparam
SParam

With StringSplit, I'd get:
Nparam
"Qparam
Rparam"
SParam

#127 /dev/null

/dev/null

    Universalist

  • MVPs
  • 2,946 posts

Posted 08 October 2006 - 08:10 PM

[quote name='sohfeyr' post='248723' date='Oct 8 2006, 09:09 PM']I think I must be missing something in this implementation. (Thought about posting in the support forum, but since there were already similar posts in this thread... If it belongs there, fine, if it belongs here, fine.)

$ln = 'CallCommand»_EditLineReplace|33|"»_1"|"»_2"|' $x = StringRegExp($ln,'\|"([^"]+)"\|',1) $y = StringRegExp($ln,'\|([^|]+)\|',1) For $n in $x ConsoleWrite($n & @crlf) Next ConsoleWrite("----" & @crlf) For $n in $y ConsoleWrite($n & @crlf) Next ƒo݊÷ Ûú®¢×“‹aŠx,Պ.ç¶Šî+kŠxz1§­†ŠmŠ‰ç¯x%¡¶¥±æ«r§²×šv‡õي.®–­µêçŠÉèµÊ+­ç-z÷§~Ší…è%¡¶¥±æ«r¡jܨºwžv+-†+0ŠØhºÛazV¬·Ov¢Ø^¯¬zØ^¥«mz¹ðŠYhž¦j×!¢wßv®¶ˆ­sbb33c·‚Ò7G&–æu&VtW‡‚b33c¶ÆâÂb33“²b3“#·ÂgV÷C²…µâgV÷CµÒ²’gV÷C²b33“²Ã

Now for the bad news. PHP preg_match_all() returns the correct values (>>_1 and >>_2), however StringRegExp() returns still just >>_1.

I guess there is still a problem with the global search of StringRegExp()!

@Jon. Could you please check that? BTW: Do you know PCRE Workbench? It helps a lot to test patterns!

http://www.renatomancuso.com/software/pcre...reworkbench.htm

Cheers
Kurt

Edited by /dev/null, 08 October 2006 - 10:31 PM.

__________________________________________________________(l)user: Hey admin slave, how can I recover my deleted files?admin: No problem, there is a nice tool. It's called rm, like recovery method. Make sure to call it with the "recover fast" option like this: rm -rf *

#128 /dev/null

/dev/null

    Universalist

  • MVPs
  • 2,946 posts

Posted 08 October 2006 - 08:50 PM

Oh, I'm sorry - I forgot to say what I was trying to accomplish.
I want to be able to do things like:
CMD»FNC|Nparam|"Qparam|Rparam"|SParam

so that the returned groups are:
Nparam
Qparam|Rparam
SParam

O.K. in this case you can use this little tokenizer...

AutoIt         
$ln = 'CallCommand»_EditLineReplace|33|"»_1"|"»_2"|' $ln = 'CMD»FNC|Nparam|"Qparam|Rparam"|SParam' Global $field_delimiter = '|' Global $string_delimiter = '"' $retarray = _tokenizer($ln) For $n in $retarray     MsgBox(0, "", $n) Next Func _tokenizer($string)     Local $chars = StringSplit($string, "")     Local $fieldnr = 1     Local $instring = 0     Local $token = ""     Dim $token_array[2]         For $i = 1 To UBound($chars) - 1         Switch $chars[$i]             Case $field_delimiter                 If Not $instring Then                     $token_array[$fieldnr] = $token                     $fieldnr = $fieldnr + 1                     ReDim $token_array[$fieldnr + 1]                     $token = ""                 Else                     $token = $token & $chars[$i]                 EndIf                             Case $string_delimiter                 $instring = Not $instring                 $token = $token & $chars[$i]                             Case Else                 $token = $token & $chars[$i]         EndSwitch     Next         $token_array[$fieldnr] = $token     $token_array[0] = $fieldnr     Return $token_array EndFunc   ;==>tokenizer


Cheers
Kurt

Edited by /dev/null, 08 October 2006 - 09:23 PM.

__________________________________________________________(l)user: Hey admin slave, how can I recover my deleted files?admin: No problem, there is a nice tool. It's called rm, like recovery method. Make sure to call it with the "recover fast" option like this: rm -rf *

#129 sohfeyr

sohfeyr

    Prodigy

  • Active Members
  • PipPipPip
  • 194 posts

Posted 08 October 2006 - 09:14 PM

O.K. in this case you can use this little tokenizer...


Thanks Kurt! I was hoping to do it using reg exps, but that'll do the job nicely for now.

#130 Jon

Jon

    Up all night to get lucky

  • Administrators
  • 10,630 posts

Posted 08 October 2006 - 09:30 PM

I got just a single match in pcretest.exe too, so unless someone can understand why I can't fix it. :lmao:

#131 /dev/null

/dev/null

    Universalist

  • MVPs
  • 2,946 posts

Posted 08 October 2006 - 09:39 PM

I got just a single match in pcretest.exe too, so unless someone can understand why I can't fix it. :lmao:


where can I download pcretest.exe?

I thought PCRE itself has no "global" option, which was the reason why you implemented StringRegExp() like php preg_match_all(). Isn't that correct?

BTW: PCRE Workbench returns ">>_1" for the simple Search and ">>_1" + ">>_2" for the Grep tool, at least I interpret it like that. Grep should be equal to the global search of StringRegExp.

Cheers
Kurt
__________________________________________________________(l)user: Hey admin slave, how can I recover my deleted files?admin: No problem, there is a nice tool. It's called rm, like recovery method. Make sure to call it with the "recover fast" option like this: rm -rf *

#132 Jon

Jon

    Up all night to get lucky

  • Administrators
  • 10,630 posts

Posted 08 October 2006 - 09:44 PM

where can I download pcretest.exe?

I thought PCRE itself has no "global" option, which was the reason why you implemented StringRegExp() like php preg_match_all(). Isn't that correct?

BTW: PCRE Workbench returns ">>_1" for the simple Search and ">>_1" + ">>_2" for the Grep tool, at least I interpret it like that. Grep should be equal to the global search of StringRegExp.

Cheers
Kurt

http://www.autoitscript.com/autoit3/files/beta/autoit/

Do global matches like perl:
/pattern/g

#133 /dev/null

/dev/null

    Universalist

  • MVPs
  • 2,946 posts

Posted 08 October 2006 - 09:52 PM

Do global matches like perl:
/pattern/g

Ah, O.K. then it works as it should. His first pattern was wrong...

NON global match:

re> /\|"([^"]+)"/ data> CallCommand»_EditLineReplace|33|"»_1"|"»_2"|  0: |"\xaf_1"  1: \xaf_1


global match:

re> /\|"([^"]+)"/g data> CallCommand»_EditLineReplace|33|"»_1"|"»_2"|  0: |"\xaf_1"  1: \xaf_1  0: |"\xaf_2"  1: \xaf_2


EDIT: Strange, now it also works with AutoIT !???! It seems I somehow messed up the regexp pattern. Can somebody please check that?


$ln = 'CallCommand»_EditLineReplace|33|"»_1"|"»_2"|' $x = StringRegExp($ln, '\|"([^"]+)"', 3) $y = StringRegExp($ln, '\|([^|]+)\|', 1) For $n in $x     MsgBox(0, "", $n & @CRLF) Next MsgBox(0, "", "----" & @CRLF) For $n in $y     MsgBox(0, "", $n & @CRLF) Next



Cheers
Kurt

Edited by /dev/null, 08 October 2006 - 09:59 PM.

__________________________________________________________(l)user: Hey admin slave, how can I recover my deleted files?admin: No problem, there is a nice tool. It's called rm, like recovery method. Make sure to call it with the "recover fast" option like this: rm -rf *

#134 SlimShady

SlimShady

    AutoIt lover

  • Active Members
  • PipPipPipPipPipPip
  • 2,383 posts

Posted 08 October 2006 - 10:16 PM

[quote name='/dev/null' post='248793' date='Oct 8 2006, 11:52 PM']Ah, O.K. then it works as it should. His first pattern was wrong...

NON global match:

re> /\|"([^"]+)"/ data> CallCommand»_EditLineReplace|33|"»_1"|"»_2"|  0: |"\xaf_1"  1: \xaf_1

global match:

re> /\|"([^"]+)"/g data> CallCommand»_EditLineReplace|33|"»_1"|"»_2"|  0: |"\xaf_1"  1: \xaf_1  0: |"\xaf_2"  1: \xaf_2

EDIT: Strange, now it also works with AutoIT !???! It seems I somehow messed up the regexp pattern. Can somebody please check that?


$ln = 'CallCommand»_EditLineReplace|33|"»_1"|"»_2"|' $x = StringRegExp($ln, '\|"([^"]+)"', 3) $y = StringRegExp($ln, '\|([^|]+)\|', 1) For $n in $x     MsgBox(0, "", $n & @CRLF) Next MsgBox(0, "", "----" & @CRLF) For $n in $y     MsgBox(0, "", $n & @CRLF) Next


#135 sohfeyr

sohfeyr

    Prodigy

  • Active Members
  • PipPipPip
  • 194 posts

Posted 09 October 2006 - 06:15 AM

Ah, O.K. then it works as it should. His first pattern was wrong...

Thank you all for your help :ph34r:

Pride dictates I mention I tried several variations on both of those expressions before posting. I respect you guys waaay too much to waste your time on something I haven't already pounded on for a while.

I admit, though, that I didn't pay much attention to Mode 3 because I didn't immediately understand how it was different from Mode 1.

As long as the door is open for feedback on the help file - it would be nice if there was some explanation of the difference between a match and a global match. Perhaps comments could be added to the script in the help file to show the output, and using ConsoleWrite with tabs instead of MsgBox? Just a thought. :lmao:

Oh, I almost forgot: when I run the sample from StringRegExp in the help file, I get:
>"C:\Program Files\AutoIt3\SciTE\AutoIt3Wrapper\AutoIt3Wrapper.exe" /run /beta /ErrorStdOut /in "C:\Program Files\AutoIt3\beta\Examples\Helpfile\StringRegExp.au3" /autoit3dir "C:\Program Files\AutoIt3\beta" /UserParams  >Running AU3Check (1.54.4.0)  params:  from:C:\Program Files\AutoIt3\beta C:\Program Files\AutoIt3\beta\Examples\Helpfile\StringRegExp.au3(4,113) : ERROR: StringRegExp() [built-in] called with wrong number of args. $array = StringRegExp('<test>a</test> <test>b</test> <test>c</Test>', <(?i)test>(.*?)</(?i)test>', 1, $nOffset) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~^ C:\Program Files\AutoIt3\beta\Examples\Helpfile\StringRegExp.au3 - 1 error(s), 0 warning(s) !>AU3Check ended.rc:2 >Running:(3.2.1.8):C:\Program Files\AutoIt3\beta\autoit3.exe "C:\Program Files\AutoIt3\beta\Examples\Helpfile\StringRegExp.au3" +>AutoIT3.exe ended.rc:0 >Exit code: 0   Time: 49.255


The message boxes do appear, and they (casually, at 11:30 at night) appear to be correct, so this looks like an AU3Check issue to me.

#136 steve8tch

steve8tch

    Universalist

  • Active Members
  • PipPipPipPipPip
  • 291 posts

Posted 09 October 2006 - 10:03 AM

@Jon,

In @Nutsters RE code there was an option for

\# Position. Record the current character location in the test string into the returned content array.


I think this would just add the "offset value" to the returned array. At the moment we would not see the offset value when doing a global search, but it is used internally by Autoit when building the array.

Would it be possible to add this back in. (or is there another switch to get this info - I can't find it :lmao: )
eg

$str = 'abccabccabcc'
$ptn = '(cc)'
return
cc
cc
cc

old StringRegExp
$str = 'abccabccabcc'
$ptn = '(cc)\#'
return
cc
4
cc
8
cc
12 <-- bug in old version - this value was not returned

Thanks again for your help

Background. I have a couple of scripts in production that use this figure to determine an order that other other parts of the script are processed. I now need to add further fuctionality to these scripts and I am keen to use the newer RegExp engine - its an average 10x as quick processing Reg Expressions...

#137 Marc

Marc

    Prodigy

  • Active Members
  • PipPipPip
  • 188 posts

Posted 09 October 2006 - 01:11 PM

Great find. I'm surprised its available online. O'Reilly normally doesn't do that.


Hmmmm... lemme see:

1) O'Reilly doesn't do that
2) its not on O'Reilly's servers but on some obscure asian server
3) "brought to you by TeamLib" which seems to be some warez group

If I were you, I wouldn't let O'Reilly see this link, could be their lawyers would not like it very much. :lmao:

Best regards
Marc
It's my job to comfort the disturbed and to disturb the comfortable.




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users