KJohn Posted August 29, 2007 Share Posted August 29, 2007 (edited) How would I craft a StringRegExp to obtain the following behavior? (Case insensitive throughout)Assume the string to be evaluated is: GalapagosThe substring should return a non-zero only if it matches from the first character. For example: Matches would include G, Ga, Gal, Gala, gAlaP, gaLApa, etc. But the following would not be matches: ala, Pago, lapagos, etc.I had done this once a long time back but this single comparison in StringRegExp proved to be much slower than all the multiple comparisons by StringInStr (SiS performs with multiple starting characters till a match is found or end of string is reached). Maybe it had something to do with the way I had written it. Maybe not. Please help.Speed is my primary concern. I'm latching on to the single comparison benefit of SRE. If you can think of an even better way to accomplish the above, I would be really grateful. Edited August 29, 2007 by Koshy John Link to comment Share on other sites More sharing options...
randallc Posted August 29, 2007 Share Posted August 29, 2007 (edited) Hi, Here's one way; but if you are looking in a fileread, you will see in my RegExp func you need things like (?m) to get every line too.. #include<array.au3> $s_FileRead="Galapagos" $s_Searches="gAlAp" ;~ $s_Searches=StringReplace(StringReplace($s_Searches, "|", " | "), ".", "\.") $patternReg='^(?i)'&$s_Searches&'.*$' Local $asList = StringRegExp($s_FileRead, $patternReg, 3) if IsArray($asList) then _ArrayDisplay($asList," Matches for "&$s_Searches&" in " &$s_FileRead) if not IsArray($asList) then MsgBox(0,"","No Match for "&$s_Searches&" in " &$s_FileRead) ;============================================= $s_Searches="AlAp" ;~ $s_Searches=StringReplace(StringReplace($s_Searches, "|", " | "), ".", "\.") $patternReg='^(?i)'&$s_Searches&'.*$' Local $asList = StringRegExp($s_FileRead, $patternReg, 3) if IsArray($asList) then _ArrayDisplay($asList," Matches for "&$s_Searches&" in " &$s_FileRead) if not IsArray($asList) then MsgBox(0,"","No Match for "&$s_Searches&" in " &$s_FileRead)Best, Randall PS if you are looking at speed in large RegExp, you need to compile with ANSI, not unicode (unless that has been fixed too?.. I haven't tested it lately, but there's has been a huge speed difference) Edited August 29, 2007 by randallc ExcelCOM... AccessCom.. Word2... FileListToArrayNew...SearchMiner... Regexps...SQL...Explorer...Array2D.. _GUIListView...array problem...APITailRW Link to comment Share on other sites More sharing options...
KJohn Posted August 29, 2007 Author Share Posted August 29, 2007 (edited) Hi, Here's one way; but if you are looking in a fileread, you will see in my RegExp func you need things like (?m) to get every line too.. #include<array.au3> $s_FileRead="Galapagos" $s_Searches="gAlAp" ;~ $s_Searches=StringReplace(StringReplace($s_Searches, "|", " | "), ".", "\.") $patternReg='^(?i)'&$s_Searches&'.*$' Local $asList = StringRegExp($s_FileRead, $patternReg, 3) if IsArray($asList) then _ArrayDisplay($asList," Matches for "&$s_Searches&" in " &$s_FileRead) if not IsArray($asList) then MsgBox(0,"","No Match for "&$s_Searches&" in " &$s_FileRead) ;============================================= $s_Searches="AlAp" ;~ $s_Searches=StringReplace(StringReplace($s_Searches, "|", " | "), ".", "\.") $patternReg='^(?i)'&$s_Searches&'.*$' Local $asList = StringRegExp($s_FileRead, $patternReg, 3) if IsArray($asList) then _ArrayDisplay($asList," Matches for "&$s_Searches&" in " &$s_FileRead) if not IsArray($asList) then MsgBox(0,"","No Match for "&$s_Searches&" in " &$s_FileRead)Best, Randall PS if you are looking at speed in large RegExp, you need to compile with ANSI, not unicode (unless that has been fixed too?.. I haven't tested it lately, but there's has been a huge speed difference) The one line I was looking for was: '^(?i)'&$s_Searches&'.*$' Could you explain that to me? This is what I understand: (?i) - case insensitive ^ - but why this this there? (to match any character not in the set) '.*$' - and what does this stand for? There are a few parts of AutoIt that don't yet have full Unicode support. These are: Send and ControlSend - Instead, Use ControlSetText or the Clipboard functions. Regular expressions - To reduce the size of AutoIt, the regular expression engine is currently compiled in ANSI mode. Console operations are converted to ANSI. These limits will be addressed in future versions if possible. Technically that means the RegExp engine is compiled in ANSI in both versions.. So does it really make a difference... From a performance point of view, would it make sense to compile scipts in the ANSI mode (assuming that the script will be running on only English lang systems) ? Edited August 29, 2007 by Koshy John Link to comment Share on other sites More sharing options...
KJohn Posted August 29, 2007 Author Share Posted August 29, 2007 The one line I was looking for was: '^(?i)'&$s_Searches&'.*$'Could you explain that to me? This is what I understand:(?i) - case insensitive^ - but why this this there? (to match any character not in the set)'.*$' - and what does this stand for?Technically that means the RegExp engine is compiled in ANSI in both versions.. So does it really make a difference... From a performance point of view, would it make sense to compile scipts in the ANSI mode (assuming that the script will be running on only English lang systems) ?Ah.. forget it... Regular expressions are slower whether the stub is ANSI or Unicode... ANSI compilation is a little faster but the difference is negligible... Link to comment Share on other sites More sharing options...
randallc Posted August 29, 2007 Share Posted August 29, 2007 Could you explain that to me? This is what I understand:(?i) - case insensitive^ - but why this this there? (to match any character not in the set)'.*$' - and what does this stand for?"^" is marker for beginning of line.$s_Search is search string."$" is marker for end of line.So '.*' is "." any character, any number of times, even zero, after the search string.then '$' the end of lineBest, randall(When I last checked , the speed difference on a large result (say 30% match) of a huge file; eg 80-Mb - was about 100x as fast in ANSI; but negligible for small results; but Iwant to do a huge file with one RegExp call, not loop it for recurrent calls to slow it down, , so this becomes significant..And I am still puzzling how I was lucky to get such a fast result given all the potential pitfalls with RegExp callbacks; just lucky for a change with my first attempt!Best, Randall) ExcelCOM... AccessCom.. Word2... FileListToArrayNew...SearchMiner... Regexps...SQL...Explorer...Array2D.. _GUIListView...array problem...APITailRW Link to comment Share on other sites More sharing options...
KJohn Posted August 29, 2007 Author Share Posted August 29, 2007 "^" is marker for beginning of line.$s_Search is search string."$" is marker for end of line.So '.*' is "." any character, any number of times, even zero, after the search string.then '$' the end of lineBest, randall(When I last checked , the speed difference on a large result (say 30% match) of a huge file; eg 80-Mb - was about 100x as fast in ANSI; but negligible for small results; but Iwant to do a huge file with one RegExp call, not loop it for recurrent calls to slow it down, , so this becomes significant..And I am still puzzling how I was lucky to get such a fast result given all the potential pitfalls with RegExp callbacks; just lucky for a change with my first attempt!Best, Randall)You do realize that doing a RegExp on an entire 80MB file will load the entire 80MB into RAM, rite? 80MB of RAM may not be much for many of us but there are a lot of people in this world on 256MB still... especially in the developing countries... Link to comment Share on other sites More sharing options...
Moderators SmOke_N Posted August 29, 2007 Moderators Share Posted August 29, 2007 You do realize that doing a RegExp on an entire 80MB file will load the entire 80MB into RAM, rite? 80MB of RAM may not be much for many of us but there are a lot of people in this world on 256MB still... especially in the developing countries...Do you know some other function available to us that doesn't? Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer. Link to comment Share on other sites More sharing options...
weaponx Posted August 29, 2007 Share Posted August 29, 2007 (edited) Am I missing something or ?? Isn't this a lot easier: $String = "Galapagos" $Test = "gala" If StringLeft(StringUpper($String), StringLen ($Test)) == StringUpper($Test) Then MsgBox(0,"","Match found") EndIfoÝ÷ ØKÞW¬±Êy«¢+ØÀÌØíMÑÉ¥¹ôÅÕ½Ðí±Á½ÌÅÕ½Ðì(ÀÌØíQÍÐôÅÕ½Ðí±ÅÕ½Ðì()%MÑÉ¥¹1Ð ÀÌØíMÑÉ¥¹°MÑÉ¥¹1¸ ÀÌØíQÍФ¤ôÀÌØíQÍÐQ¡¸5Í ½à À°ÅÕ½ÐìÅÕ½Ðì°ÅÕ½Ðí5Ñ ½Õ¹ÅÕ½Ðì¤ Edited August 29, 2007 by weaponx Link to comment Share on other sites More sharing options...
Moderators SmOke_N Posted August 29, 2007 Moderators Share Posted August 29, 2007 $s_FileRead ="Galapagos" $s_Searches ="gAlAp" _StringMatch($s_FileRead, $s_Searches) If @error Then MsgBox(16, "Error", "Match not found") Else MsgBox(64, "Success", "Match found") EndIf Func _StringMatch($sInStr, $sVerify) If StringRegExp($sInStr, "(?s)(?i)(?m:^|[\s|,|\.|\?\:])" & $sVerify) Then Return 1 Return SetError(1, 0, 0) EndFuncAre you trying to do something like this? Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer. Link to comment Share on other sites More sharing options...
MrCreatoR Posted August 29, 2007 Share Posted August 29, 2007 (edited) "^" is marker for beginning of line.Where i can read about it? As far as i know, to match the begining of line (string) you use \A, to match end of string (Not line) you use \z - But ^ is for matching not the following after that characters.About the mathces... Koshy John: why you can not just use StringInStr() ?Sorry, it seems that i misunderstood the request. Edited August 29, 2007 by MsCreatoR Spoiler Using OS: Win 7 Professional, Using AutoIt Ver(s): 3.3.6.1 / 3.3.8.1 AutoIt Russian Community My Work... Spoiler Projects: ATT - Application Translate Tool {new}| BlockIt - Block files & folders {new}| SIP - Selected Image Preview {new}| SISCABMAN - SciTE Abbreviations Manager {new}| AutoIt Path Switcher | AutoIt Menu for Opera! | YouTube Download Center! | Desktop Icons Restorator | Math Tasks | KeyBoard & Mouse Cleaner | CaptureIt - Capture Images Utility | CheckFileSize ProgramUDFs: OnAutoItErrorRegister - Handle AutoIt critical errors {new}| AutoIt Syntax Highlight {new}| Opera Library! | Winamp Library | GetFolderToMenu | Custom_InputBox()! | _FileRun UDF | _CheckInput() UDF | _GUIInputSetOnlyNumbers() UDF | _FileGetValidName() UDF | _GUICtrlCreateRadioCBox UDF | _GuiCreateGrid() | _PathSplitByRegExp() | _GUICtrlListView_MoveItems - UDF | GUICtrlSetOnHover_UDF! | _ControlTab UDF! | _MouseSetOnEvent() UDF! | _ProcessListEx - UDF | GUICtrl_SetResizing - UDF! | Mod. for _IniString UDFs | _StringStripChars UDF | _ColorIsDarkShade UDF | _ColorConvertValue UDF | _GUICtrlTab_CoverBackground | CUI_App_UDF | _IncludeScripts UDF | _AutoIt3ExecuteCode | _DragList UDF | Mod. for _ListView_Progress | _ListView_SysLink | _GenerateRandomNumbers | _BlockInputEx | _IsPressedEx | OnAutoItExit Handler | _GUICtrlCreateTFLabel UDF | WinControlSetEvent UDF | Mod. for _DirGetSizeEx UDF Examples: ScreenSaver Demo - Matrix included | Gui Drag Without pause the script | _WinAttach()! | Turn Off/On Monitor | ComboBox Handler Example | Mod. for "Thinking Box" | Cool "About" Box | TasksBar Imitation Demo Like the Projects/UDFs/Examples? Please rate the topic (up-right corner of the post header: Rating ) * === My topics === * ================================================== ================================================== AutoIt is simple, subtle, elegant. © AutoIt Team Link to comment Share on other sites More sharing options...
randallc Posted August 29, 2007 Share Posted August 29, 2007 Where i can read about it? As far as i know, to match the begining of line (string) you use \A, to match end of string (Not line) you use \z - But ^ is for matching not the following after that characters. About the mathces... Koshy John: why you can not just use StringInStr() ? Sorry, it seems that i misunderstood the request.Hi, Start Wikipedia [^ ] Matches a single character that is not contained within the brackets. For example, [^abc] matches any character other than "a", "b", or "c". [^a-z] matches any single character that is not a lowercase letter from "a" to "z". As above, literal characters and ranges can be mixed. ^ Matches the starting position within the string. In multiline mode, it matches the starting position of any line.Wikipedia regexp , or tute on forum (link in my sig) Best, randall ExcelCOM... AccessCom.. Word2... FileListToArrayNew...SearchMiner... Regexps...SQL...Explorer...Array2D.. _GUIListView...array problem...APITailRW Link to comment Share on other sites More sharing options...
randallc Posted August 29, 2007 Share Posted August 29, 2007 80MB of RAM may not be much for many of us but there are a lot of people in this world on 256MB still... especially in the developing countries...The good news for an index on a smaller machine is that it would be likely to be less than 40gig HD, so <10Mb index file!Best,Randall ExcelCOM... AccessCom.. Word2... FileListToArrayNew...SearchMiner... Regexps...SQL...Explorer...Array2D.. _GUIListView...array problem...APITailRW Link to comment Share on other sites More sharing options...
KJohn Posted September 1, 2007 Author Share Posted September 1, 2007 Do you know some other function available to us that doesn't?What I meant was not to load the whole file completely... reading line by line would be a better option... Link to comment Share on other sites More sharing options...
KJohn Posted September 1, 2007 Author Share Posted September 1, 2007 Am I missing something or ?? Isn't this a lot easier: $String = "Galapagos" $Test = "gala" If StringLeft(StringUpper($String), StringLen ($Test)) == StringUpper($Test) Then MsgBox(0,"","Match found") EndIfoÝ÷ ØKÞW¬±Êy«¢+ØÀÌØíMÑÉ¥¹ôÅÕ½Ðí±Á½ÌÅÕ½Ðì(ÀÌØíQÍÐôÅÕ½Ðí±ÅÕ½Ðì()%MÑÉ¥¹1Ð ÀÌØíMÑÉ¥¹°MÑÉ¥¹1¸ ÀÌØíQÍФ¤ôÀÌØíQÍÐQ¡¸5Í ½à À°ÅÕ½ÐìÅÕ½Ðì°ÅÕ½Ðí5Ñ ½Õ¹ÅÕ½Ðì¤ What you said is indeed easier (I've tried that before manually).. but its much slower than the StringInStr by itself (with all its excess comparisons; the whole point of attempting to do the comparison only once was to make it faster....)... Link to comment Share on other sites More sharing options...
randallc Posted September 1, 2007 Share Posted September 1, 2007 What I meant was not to load the whole file completely... reading line by line would be a better option...Hi, I haven't tested lately, but I always thought reading line by line would have to mean a slower looping function than reading all at once surely?Best, Randall ExcelCOM... AccessCom.. Word2... FileListToArrayNew...SearchMiner... Regexps...SQL...Explorer...Array2D.. _GUIListView...array problem...APITailRW Link to comment Share on other sites More sharing options...
KJohn Posted September 1, 2007 Author Share Posted September 1, 2007 Hi, I haven't tested lately, but I always thought reading line by line would have to mean a slower looping function than reading all at once surely?Best, RandallReading line by line is slightly slower.. But it all depends on how you will be processing the file...But reading the whole file can be slower on low mem systems if there is hard disk thrashing (page file swapping)...P.S. I'm readying a test script to bring out the differences in speed... Link to comment Share on other sites More sharing options...
KJohn Posted September 1, 2007 Author Share Posted September 1, 2007 http://www.autoitscript.com/forum/index.php?showtopic=52253This is being discussed in the AutoIt Feature Requests forum... Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now