martin Posted March 14, 2009 Share Posted March 14, 2009 From the help for StringRegExp (?: ... ) Non-capturing group. Behaves just like a normal group, but does not record the matching characters in the array nor can the matched text be used for back-referencing.'... does not record the matching characters in the array' made me expect that those characters would not be in the array returned but $test = 'abcd123mmm' $ans = StringRegExp($test,'(?:abcd123)\S*',2);$test might have a number of lines If @error then ConsoleWrite(@error & @CRLF) Else ConsoleWrite($ans[0] & @CRLF) endif gives a result which includes the characters I don't want to be captured. How should I do it? It looks like the description in the help for StringRegExp actually only applies to StringRegExpReplace because this $test2 = 'fffabcd123kkkk' $r = StringRegExpReplace($test2,'(?:abcd123).*','qqqq') ConsoleWrite($r & @CRLF) gives fffqqqq as expected (hoped?) But then the help says that by default '.' matches any character except new line but this $test2 = 'fffabcd123kkkk' & @LF & '**********' $r = StringRegExpReplace($test2,'(?:abcd123).*','qqqq') ConsoleWrite($r & @CRLF) returns a string which includes the @LF and the following characters. If I change @LF for @CR it makes no difference. If instead of '.*' I use '\S*' to match any non-whitespace character then I still get the @LF or @CR included but I think these are whitespace characters. I do not understanding some very basic things here. Serial port communications UDF Includes functions for binary transmission and reception.printing UDF Useful for graphs, forms, labels, reports etc.Add User Call Tips to SciTE for functions in UDFs not included with AutoIt and for your own scripts.Functions with parameters in OnEvent mode and for Hot Keys One function replaces GuiSetOnEvent, GuiCtrlSetOnEvent and HotKeySet.UDF IsConnected2 for notification of status of connected state of many urls or IPs, without slowing the script. Link to comment Share on other sites More sharing options...
Robjong Posted March 14, 2009 Share Posted March 14, 2009 (edited) Hey, This might explain some of it. The non capturing group will not be excluded from the match but will not create an extra match. #include <Array.au3> $test = 'abcd123mmm' ; this will return 1 match (the full string) $ans = StringRegExp($test, '(\w+(?:123).*)', 3) If @error Then ConsoleWrite("Error: " & @error & @CRLF) Else _ArrayDisplay($ans) ConsoleWrite($ans[0] & @CRLF) EndIf ; but if we were to use a capturing group it would return 2 matches (full string and 123) $ans = StringRegExp($test, '(\w+(123).*)', 3) If @error Then ConsoleWrite("Error: " & @error & @CRLF) Else _ArrayDisplay($ans) ConsoleWrite($ans[0] & @CRLF) EndIfoÝ÷ ظ¤xûÜ©¢)í²{azw±¶,¶Øb³¥ +«zØ^Þqªmº¸§ èº"VÞ{%¹×~º&¶¦j×!mçºÇ°ØZmÇuÛ|m«"r¬k&qÝvÞÈhºW[zØ^æî´',ׯz¼)àmèbØjëh×6$ans = StringRegExp($test, '(?:abcd123)(.*)', 3) If @error Then ConsoleWrite("Error: " & @error & @CRLF) Else ConsoleWrite($ans[0] & @CRLF) EndIfoÝ÷ Ù8^mèZ¾*.®â±«"¶ÈhºW[y·jëÊ¡j÷âÅ{azËkx-éµÈpY[椶«z·©§"§ÊØb©¶ax±pØjªª¬¢Ø^ën®w²Úâ-YajËax±r¢ç붬b~'«¶µÈK©l¡«¢+ØÀÌØíÑÍÐÈôÌäíÄÈͬÌäìµÀì1µÀìÌä쨨¨¨¨¨¨¨¨¨Ìäì(ÀÌØíÈôMÑÉ¥¹IáÁIÁ± ÀÌØíÑÍÐÈ°Ìäì ýÌ¥ÄÈ̸¨Ìäì°ÌäíÅÅÅÄÌäì¤) ½¹Í½±]É¥Ñ ÀÌØíȵÀì I1¤ : I got it now I think. Edited March 14, 2009 by Robjong Link to comment Share on other sites More sharing options...
PsaltyDS Posted March 14, 2009 Share Posted March 14, 2009 I think you want return type = 3, and you need parens around the group you want "(\S*)". Like this: #include <Array.au3> $test = 'abcd123mmm' & @CRLF & 'xyzmmm' & 'abcd123yyy' For $n = 1 To 3 $ans = StringRegExp($test, '(?:abcd123)(\S*)', $n); $test might have a number of lines If @error Then ConsoleWrite(@error & @CRLF) Else _ArrayDisplay($ans, "Type = " & $n) EndIf Next Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law Link to comment Share on other sites More sharing options...
martin Posted March 14, 2009 Author Share Posted March 14, 2009 I think you want return type = 3, and you need parens around the group you want "(\S*)". Like this: #include <Array.au3> $test = 'abcd123mmm' & @CRLF & 'xyzmmm' & 'abcd123yyy' For $n = 1 To 3 $ans = StringRegExp($test, '(?:abcd123)(\S*)', $n); $test might have a number of lines If @error Then ConsoleWrite(@error & @CRLF) Else _ArrayDisplay($ans, "Type = " & $n) EndIf Next Thank you PsaltyDS But I'm not at all happy (comfortable) with all this. It seems that the important thing is that if you have a non capturing group with StringRegExp then you must have at least one other group or does not behave as non-capturing. I didn't think of that obviously and it is certainly not something I would have guessed from reading the help. Plus, as one of my examples showed, it isn't the case with StringRegExpReplace so in my opinion the behaviour is wrong. (But then in my opinion most of the world is wrong ) Also if I only want the first match then the flag of 1 is ok, although the help says it returns an array of matches but it doesn't. If I need all the matches then I must use 3 for the flag which correctly (IMO) causes the non-capturing group to be omitted. Again it is not at all obvious to me from the help and I suspect it is not obvious to many more people. The flag of 2 should give an array of matches including the full match, but like the flag of 1 it doesn't and only returns the first match. I don't know whether this is because the help is misleading or that StringRegExp is faulty or that I still don't get it. Anyway, thanks again PsaltyDS for finding a solution for me. Serial port communications UDF Includes functions for binary transmission and reception.printing UDF Useful for graphs, forms, labels, reports etc.Add User Call Tips to SciTE for functions in UDFs not included with AutoIt and for your own scripts.Functions with parameters in OnEvent mode and for Hot Keys One function replaces GuiSetOnEvent, GuiCtrlSetOnEvent and HotKeySet.UDF IsConnected2 for notification of status of connected state of many urls or IPs, without slowing the script. Link to comment Share on other sites More sharing options...
Authenticity Posted March 14, 2009 Share Posted March 14, 2009 It does return array of matches as expected but it's not globally go through the rest of the string. If you had 10 capturing parentheses and there was an overall successful match then you can expect 10 element sized array, but with option 3 you don't know what is the array size in first place. Non-capturing parentheses are returned as well if there was no other thing to capture, that is why they're called non-capturing because in this case: /(?:t|to)op/ you may just want to group alternatives but not capture them, I believe it's quite clear. Link to comment Share on other sites More sharing options...
Ascend4nt Posted March 14, 2009 Share Posted March 14, 2009 I always use 'capturing' parentheses when I need to capture a specific portion of a match, or get an array of information, otherwise it will return the full match. I think that should pretty much be your practice, martin. The only case I would say you don't need to use parentheses are in using the match/no-match option (0) of StringRegExp - although sometimes you'll need to do 'OR's with parentheses. Take part of one of my more fun explorations into PCRE's, grabbing Registry Key data from a RegEdit file: #include <array.au3> $sRegFileStr="Windows Registry Editor Version 5.00"&@CRLF& _ @CRLF& _ "[HKEY_CURRENT_USER\Software]"&@CRLF& _ '@=""'&@CRLF& _ @CRLF& _ "[HKEY_CURRENT_USER\Software\akey]"&@CRLF& _ @CRLF& _ "[HKEY_CURRENT_USER\Software\akey\subkey]"&@CRLF& _ @CRLF& _ "[HKEY_CURRENT_USER\Software\7-zip]"&@CRLF& _ '"Lang"="-"'&@CRLF& _ '"Path"="C:\\Program Files\\7-Zip"'&@CRLF& _ @CRLF& _ "[HKEY_CURRENT_USER\Software\7-zip\Compression]" $aRegKeys=StringRegExp($sRegFileStr,"\[(HKEY_(?:CURRENT_USER|LOCAL_MACHINE|USERS)\\.*)\]",3) _ArrayDisplay($aRegKeys) Note how I make sure it *starts* with [, but I don't want to capture it - so I can either leave it outside of 'capturing' parentheses, or put it as a non-capturing group (?:\[). Then inside you'll notice I capture the whole key *inside* brackets, but I have to explicitly tell the PCRE engine that I don't want to capture 'CURRENT_USER|LOCAL_MACHINE|USERS' separately, though I *do* want to capture them as part of the 'grander' capture. (Otherwise, you'll wind up with a 2nd capture for the inner group). Hrmm.. hope you can understand that..? My contributions: Performance Counters in Windows - Measure CPU, Disk, Network etc Performance | Network Interface Info, Statistics, and Traffic | CPU Multi-Processor Usage w/o Performance Counters | Disk and Device Read/Write Statistics | Atom Table Functions | Process, Thread, & DLL Functions UDFs | Process CPU Usage Trackers | PE File Overlay Extraction | A3X Script Extract | File + Process Imports/Exports Information | Windows Desktop Dimmer Shade | Spotlight + Focus GUI - Highlight and Dim for Eyestrain Relief | CrossHairs (FullScreen) | Rubber-Band Boxes using GUI's (_GUIBox) | GUI Fun! | IE Embedded Control Versioning (use IE9+ and HTML5 in a GUI) | Magnifier (Vista+) Functions UDF | _DLLStructDisplay (Debug!) | _EnumChildWindows (controls etc) | _FileFindEx | _ClipGetHTML | _ClipPutHTML + ClipPutHyperlink | _FileGetShortcutEx | _FilePropertiesDialog | I/O Port Functions | File(s) Drag & Drop | _RunWithReducedPrivileges | _ShellExecuteWithReducedPrivileges | _WinAPI_GetSystemInfo | dotNETGetVersions | Drive(s) Power Status | _WinGetDesktopHandle | _StringParseParameters | Screensaver, Sleep, Desktop Lock Disable | Full-Screen Crash Recovery Wrappers/Modifications of others' contributions: _DOSWildcardsToPCRegEx (original code: RobSaunder's) | WinGetAltTabWinList (original: Authenticity) UDF's added support/programming to: _ExplorerWinGetSelectedItems | MIDIEx UDF (original code: eynstyne) (All personal code/wrappers centrally located at Ascend4nt's AutoIT Code) Link to comment Share on other sites More sharing options...
martin Posted March 14, 2009 Author Share Posted March 14, 2009 It does return array of matches as expected but it's not globally go through the rest of the string. If you had 10 capturing parentheses and there was an overall successful match then you can expect 10 element sized array, but with option 3 you don't know what is the array size in first place. Non-capturing parentheses are returned as well if there was no other thing to capture, that is why they're called non-capturing because in this case:/(?:t|to)op/you may just want to group alternatives but not capture them, I believe it's quite clear.No, I can confidently say it's not at all clear. That just sounds like nonsense to me. Serial port communications UDF Includes functions for binary transmission and reception.printing UDF Useful for graphs, forms, labels, reports etc.Add User Call Tips to SciTE for functions in UDFs not included with AutoIt and for your own scripts.Functions with parameters in OnEvent mode and for Hot Keys One function replaces GuiSetOnEvent, GuiCtrlSetOnEvent and HotKeySet.UDF IsConnected2 for notification of status of connected state of many urls or IPs, without slowing the script. Link to comment Share on other sites More sharing options...
martin Posted March 14, 2009 Author Share Posted March 14, 2009 I always use 'capturing' parentheses when I need to capture a specific portion of a match, or get an array of information, otherwise it will return the full match. I think that should pretty much be your practice, martin. The only case I would say you don't need to use parentheses are in using the match/no-match option (0) of StringRegExp - although sometimes you'll need to do 'OR's with parentheses. Take part of one of my more fun explorations into PCRE's, grabbing Registry Key data from a RegEdit file: #include <array.au3> $sRegFileStr="Windows Registry Editor Version 5.00"&@CRLF& _ @CRLF& _ "[HKEY_CURRENT_USER\Software]"&@CRLF& _ '@=""'&@CRLF& _ @CRLF& _ "[HKEY_CURRENT_USER\Software\akey]"&@CRLF& _ @CRLF& _ "[HKEY_CURRENT_USER\Software\akey\subkey]"&@CRLF& _ @CRLF& _ "[HKEY_CURRENT_USER\Software\7-zip]"&@CRLF& _ '"Lang"="-"'&@CRLF& _ '"Path"="C:\\Program Files\\7-Zip"'&@CRLF& _ @CRLF& _ "[HKEY_CURRENT_USER\Software\7-zip\Compression]" $aRegKeys=StringRegExp($sRegFileStr,"\[(HKEY_(?:CURRENT_USER|LOCAL_MACHINE|USERS)\\.*)\]",3) _ArrayDisplay($aRegKeys) Note how I make sure it *starts* with [, but I don't want to capture it - so I can either leave it outside of 'capturing' parentheses, or put it as a non-capturing group (?:\[). Then inside you'll notice I capture the whole key *inside* brackets, but I have to explicitly tell the PCRE engine that I don't want to capture 'CURRENT_USER|LOCAL_MACHINE|USERS' separately, though I *do* want to capture them as part of the 'grander' capture. (Otherwise, you'll wind up with a 2nd capture for the inner group). Hrmm.. hope you can understand that..? That's a lot more helpful, your example makes perfect sense to me and so does your explanation. I know that in the help it says that (..) will return the text matched in the group but to me I expect that if a capturing group returns the text then a non capturing group will not return the text even if there is no other capturing group, but I accept that I just have to learn that it is the way it is. I will take your advise and use the brackets for capturing groups wherever possible. Thanks ascendant. Serial port communications UDF Includes functions for binary transmission and reception.printing UDF Useful for graphs, forms, labels, reports etc.Add User Call Tips to SciTE for functions in UDFs not included with AutoIt and for your own scripts.Functions with parameters in OnEvent mode and for Hot Keys One function replaces GuiSetOnEvent, GuiCtrlSetOnEvent and HotKeySet.UDF IsConnected2 for notification of status of connected state of many urls or IPs, without slowing the script. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now