littlebigman Posted April 21, 2010 Share Posted April 21, 2010 (edited) Hello I started learning AutoIt today, and I like it very much. However, I couldn't find why StringRegExp() doesn't seem to be able to extract a bit of information from a web page that I copy/paste from the Google Chrome browser into the clipboard: WinWaitActive("List of companies - Chrome") Sleep(500) ;Select and copy current web page, and look for pattern using regex _ClipBoard_Empty() Send("^a^c") $clipboard = _ClipBoard_GetData() $nbrfound = StringRegExp($clipboard, '^(\d+) companies ', 1) ;Check that array is valid, to avoid "Subscript used with non-Array variable" If Not IsArray($nbrfound) Then MsgBox(48,"Error","Not an array") Else MsgBox(0,"My title",$nbrfound[0]) EndIf Thank you for any hint. Edited April 27, 2010 by littlebigman Link to comment Share on other sites More sharing options...
enaiman Posted April 21, 2010 Share Posted April 21, 2010 Be sure first that you really get the content of the webpage in the Clipboard. Using a simple MsgBox before RegEx will show you what are you deal with. SNMP_UDF ... for SNMPv1 and v2c so far, GetBulk and a new example script wannabe "Unbeatable" Tic-Tac-Toe Paper-Scissor-Rock ... try to beat it anyway :) Link to comment Share on other sites More sharing options...
littlebigman Posted April 22, 2010 Author Share Posted April 22, 2010 Thanks for the tip. Unfortunately, StringRegExp() still fails, although the contents of the page that I copy into the clipboard and then into a variable (as either default or $CF_TEXT) is successfully displayed in a MsgBox I'm beginning to wonder if what is displayed by MsgBox() isn't what the variable really contains, which would explain why StringRegExp() fails. Here's the code: #include <Clipboard.au3> ;=========== 1. Empty clipboard and check that it's really empty ClipPut("") $clipboard = _ClipBoard_GetData() MsgBox(0,"Checking contents of clipboard",$clipboard) ;=========== 2. Wait for browser window to be displayed WinWaitActive("List of companies - Chrome") Sleep(500) ;=========== 3. Copy current page to clipboard Send("^a^c") Sleep(500) ;Makes no difference: Text is displayed OK in MsgBox, regardless ;$clipboard = _ClipBoard_GetData($CF_UNICODETEXT) ;$clipboard = _ClipBoard_GetData($CF_TEXT) $clipboard = _ClipBoard_GetData() MsgBox(0,"Contents of clipboard",$clipboard) ;=========== 4. Use regex to extract information from text ;^123 companies $nbrfound = StringRegExp($clipboard, '^(\d+) companies', 1) If @error <> 0 Then ;Error=1 Array is invalid. No matches. MsgBox(48,"Error",@error) Else MsgBox(0,"Pattern found",$nbrfound[0]) EndIf Has someone already struggled with text copied from a web page into the clipboard? Thank you. Link to comment Share on other sites More sharing options...
99ojo Posted April 22, 2010 Share Posted April 22, 2010 Hi, for more debugging: $nbrfound = StringRegExp($clipboard, '^(\d+) companies ', 1) If @error Then MsgBox (0,"", "Error: " & @error & " Extended: " & @extended); insert line straight after your StringRegExp Call See helpfile: Flag = 1 or 2 : @Error Meaning 0 Array is valid. Check @Extended for next offset 1 Array is invalid. No matches. 2 Bad pattern, array is invalid. @Extended = offset of error in pattern. ;-)) Stefan Link to comment Share on other sites More sharing options...
littlebigman Posted April 22, 2010 Author Share Posted April 22, 2010 (edited) Thanks Stefan for the tip. I do get an error (Error 1 Extended 0). Further investigating, I'm seeing some unexpected behavior: 1. @CRLF doesn't add a #13#10 to a string, and just returns garbage (eg. "0"): $clipboard = "dontgetit" ;Displayed OK MsgBox(0,"Contents of clipboard",$clipboard) $clipboard = "dontgetit" + @CRLF ;Why "0" ? MsgBox(0,"Contents of clipboard",$clipboard) 2. StringRegExp() works OK when I'm using a single string, but fails when working on the web page (obviously filled with CRLF's...) that I copy into the clipboard: ;========== GOOD $clipboard = "123 companies" $nbrfound = StringRegExp($clipboard, "^(\d+) companies", 1) ;Displays "123", as expected If @error Then MsgBox (0,"", "Error: " & @error & " Extended: " & @extended) Else MsgBox(0,"Result",$nbrfound[0] EndIf ;========== BAD ClipPut("") WinWaitActive("List of companies - Chrome") Sleep(500) Send("^a^c") Sleep(500) $clipboard = _ClipBoard_GetData() $nbrfound = StringRegExp($clipboard, "^(\d+) companies", 1) If @error Then MsgBox (0,"", "Error: " & @error & " Extended: " & @extended) Else MsgBox(0,"Result",$nbrfound[0] EndIf ;========== I have a couple of questions: 1. Has someone successfully used StringRegExp() with more than just a single-line string? 2. How can I build a string with CRLF's? Thank you. Edited April 22, 2010 by littlebigman Link to comment Share on other sites More sharing options...
Tvern Posted April 22, 2010 Share Posted April 22, 2010 instead of $clipboard = "dontgetit" + @CRLF use $clipboard = "dontgetit" & @CRLF For the regexp I think you might be able to use "\v" which matches any vertical whitespace character. Link to comment Share on other sites More sharing options...
littlebigman Posted April 22, 2010 Author Share Posted April 22, 2010 Thanks for the tip on &/+. I don't know what a "vertical whitespace character" is compared to just a space character (" " or \s). It seems like StringRegExp() can only work one line at a time: ;Found OK $clipboard = "123 a single line" $nbrfound = StringRegExp($clipboard, "^(\d+)", 1) MsgBox(0,"Pattern found",$nbrfound[0]) $clipboard = "first line" & @CRLF & "123 companies" $nbrfound = StringRegExp($clipboard, "^(\d+)", 1) ;Error 1 Extended 0 If @error Then MsgBox (0,"", "Error: " & @error & " Extended: " & @extended) Else MsgBox(0,"Pattern found",$nbrfound[0]) EndIf Before I change the code to loop through each line of the clipboard and call StringRegExp on it, can someone confirm that StringRegExp() can only work with a single line? Thank you. Link to comment Share on other sites More sharing options...
Tvern Posted April 22, 2010 Share Posted April 22, 2010 (edited) I'd say space ( Chr(32) ) is an horizontal whitespace character and @LF/@CR ( Chr(10) / Chr(13) ) are vertical whitespace characters, which implies that stringregexp can work on multiple lines, but that's just my interpetation of the helpfile I havn't tested it. edit: this regexp works on a multi-line string: #include<array.au3> Local $string For $i = 0 to 10 $string &= "this is line " & $i & @CRLF ;create a multi-line string Next MsgBox(0,"test",$string) ;check if we realy created a multi-line string $result = StringRegExp($string,"(this is line \d*)",3) ;search for matching strings _ArrayDisplay($result) ;display the result Edited April 22, 2010 by Tvern Link to comment Share on other sites More sharing options...
littlebigman Posted April 22, 2010 Author Share Posted April 22, 2010 Thanks Tvern. I ran the sample above, and it does work... but I'm still unsuccessfully using StringRegExp() to extract data from the clipboard. I have no idea what else I could try :-/ This is all the more frustrating since if I paste the clipboard into UltraEdit and paste the regex, UE has no problem finding the pattern. ;====== 1. Wait for browser WinWaitActive("List of companies - Chrome") Sleep(500) ;====== 2. Copy page to clipboard Send("^a^c") Sleep(500) ;====== 3. Copy clipboard to string $clipboard = _ClipBoard_GetData() ;OK I can see the pattern I'm looking for MsgBox(0,"Contents of clipboard",$clipboard) ;====== 4. Extract data from string $nbrfound = StringRegExp($clipboard, "^(\d+) companies", 3) If @error Then MsgBox (0,"", "Error: " & @error & " Extended: " & @extended) Else _ArrayDisplay($nbrfound) EndIf If someone has an idea... Maybe StringRegExp() needs some extra setting to work on a big block of text? Link to comment Share on other sites More sharing options...
Tvern Posted April 22, 2010 Share Posted April 22, 2010 sounds like the pattern just isn't right for what you are trying to do. If you add an example string and the desired output I'm sure the usual regexp guru's will be on it like flies on you know what. Link to comment Share on other sites More sharing options...
littlebigman Posted April 22, 2010 Author Share Posted April 22, 2010 Go for it boys ;-) $clipboard = "Dummy"& @CRLF & "123 companies"& @CRLF & "Dummy dummy" $nbrfound = StringRegExp($clipboard, "^(\d+) companies", 3) If @error Then MsgBox (0,"", "Error: " & @error & " Extended: " & @extended) Else _ArrayDisplay($nbrfound) EndIf Link to comment Share on other sites More sharing options...
Malkey Posted April 22, 2010 Share Posted April 22, 2010 Try, #include <Array.au3> Local $clipboard = "Dummy" & @CRLF & "123 companies" & @CRLF & "Dummy dummy" & @CRLF & _ " 321 companies" & @CRLF & "121 companies" & @CRLF ; This RE pattern with flag = 3, captures only the digit/s at the beginning of any line ; which are followed by a space then "companies". $nbrfound = StringRegExp($clipboard, '(?m)^(\d+) companies', 3); Use 3 for global match If @error Then MsgBox(0, "", "Error: " & @error & " Extended: " & @extended) Else _ArrayDisplay($nbrfound) EndIf Link to comment Share on other sites More sharing options...
littlebigman Posted April 22, 2010 Author Share Posted April 22, 2010 Damn, PCRE is single-line by default The CHM file doesn't say: Is it possible to configure PCRE in AutoIT (to activate PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, and PCRE_EXTENDED) so that I don't have to include settings in the pattern every time? Link to comment Share on other sites More sharing options...
enaiman Posted April 22, 2010 Share Posted April 22, 2010 I wonder - why do you use the Clipboard?? There is no need to involve any copy-paste. There are a couple _IE functions to help: _IEBodyReadHTML, _IEDocReadHTML SNMP_UDF ... for SNMPv1 and v2c so far, GetBulk and a new example script wannabe "Unbeatable" Tic-Tac-Toe Paper-Scissor-Rock ... try to beat it anyway :) Link to comment Share on other sites More sharing options...
jchd Posted April 22, 2010 Share Posted April 22, 2010 Damn, PCRE is single-line by default The CHM file doesn't say: Is it possible to configure PCRE in AutoIT (to activate PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, and PCRE_EXTENDED) so that I don't have to include settings in the pattern every time?These are build-time options!PCRE is oh well, PCRE! If we use non-standard options like these, we completely loose compatibility with Perl, most other PCRE engines and, most importantly, our own AutoIt code base.That we sacrifice all that just for your convenience so you don't have to write a 4-line function to circumvent the "problem" you see is a little too much to ask. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
littlebigman Posted April 27, 2010 Author Share Posted April 27, 2010 I wonder - why do you use the Clipboard?? There is no need to involve any copy-paste. There are a couple _IE functions to help: _IEBodyReadHTML, _IEDocReadHTMLBecause I'm only getting started with AutoIT and didn't know about those functions Thanks, the script is done and I happily downloaded the web pages I needed. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now