JohnOne Posted December 18, 2009 Posted December 18, 2009 I really just have no clue how it works, even after aeveral hours trying to study the helpfile examples.I've whittles the part of the webpage sourcecode down to Quote 17/12/2009</span> <a href="/news/archive/40698/europa-league-draw.html">EUROPA LEAGUE DRAWUsing InetGetSource() and _StringBetween(), but my head is battered trying to extract the text I want, which is "17/12/2009", "/news/archive/40698/europa-league-draw.html" and "EUROPA LEAGUE DRAW"If someone has the time I would really apprecicate a pattern, with a quick, what and why explaination.Even knowing I will feel quite the fool, after understanding this, I had to post.Any help appreciated. AutoIt Absolute Beginners Require a serial Pause Script Video Tutorials by Morthawt ipify Monkey's are, like, natures humans.
Richard Robertson Posted December 18, 2009 Posted December 18, 2009 (edited) A regular expression such as '>([0-9]{1,2}/[0-9]{1,2}/[0-9]{4})</span>\s*<a href="([^"]*)">([^<]*)<' should do it.I made assumptions that before the date there is a > and after the word draw is a <.Apart from that it's pretty simple. Start a group, get any character from 0 to 9, have either 1 or 2. Then a slash, then another number one or two digits, slash, four digit year. End group. This is the date in the first group. Then the span, any amount of whitespace, then the link code. In the quotes, read characters up until a " and then end that group. Then finish off reading until the next html tag starts. Edited December 18, 2009 by Richard Robertson
Malkey Posted December 18, 2009 Posted December 18, 2009 This is just :- carrying the ball over the line; driving the nail all the way home; or, adding the finishing touches.It uses a very slightly modified Richard Robertson regular expression pattern.#include <Array.au3> Local $sStr = "17/12/2009</span>" & @CRLF & _ '<a href="/news/archive/40698/europa-league-draw.html">EUROPA LEAGUE DRAW' ConsoleWrite(StringRegExpReplace($sStr, _ '(\d{1,2}/\d{1,2}/\d{4})</span>\s*<a href="([^"]*)">([^<]*)', _ "\1" & @CRLF & "\2" & @CRLF & "\3") & @CRLF) Local $aArr = StringRegExp($sStr, _ '(\d{1,2}/\d{1,2}/\d{4})</span>\s*<a href="([^"]*)">([^<]*)', 3) _ArrayDisplay($aArr)
JohnOne Posted December 18, 2009 Author Posted December 18, 2009 (edited) Thanks very gladly for your time gents. I'm stll having problems though Although Malkey example works a treat on the string in it, neither work on the string returned by _stringBetween() Its not as it appears in my quote so that could be a problem. Heres how it appears in a msgbox (I changed the string between code to add > before date, and< at end of string. Noted that (\d{1,2} = ([0-9]{1,2} Not fully grasping the last part yet, but am I correct thinking the red is ignored and the green is matched ? '>([0-9]{1,2}/[0-9]{1,2}/[0-9]{4})</span>\s*<a href="([^"]*)">([^<]*)<' EDIT: just noticed the string changed on account of the source changing. EDIT2: using flag 3 the error is 1 "Array is invalid. No matches." Edited December 18, 2009 by JohnOne AutoIt Absolute Beginners Require a serial Pause Script Video Tutorials by Morthawt ipify Monkey's are, like, natures humans.
PsaltyDS Posted December 18, 2009 Posted December 18, 2009 (edited) On 12/18/2009 at 2:31 PM, 'JohnOne said: Not fully grasping the last part yet, but am I correct thinking the red is ignored and the green is matched? '>([0-9]{1,2}/[0-9]{1,2}/[0-9]{4})</span>\s*<a href="([^"]*)">([^<]*)<' No, a match requires something to match each part: '>' ...followed by something that matches the group '([0-9]{1,2}/[0-9]{1,2}/[0-9]{4})' ...followed by something that matches '</span>\s*<a href="' ...followed by something that matches the group '([^"]*)' ...followed by '">' ...followed by something that matches the group '([^<]*)' ...followed by '<' The leading '>' and trailing '<' are not in your original string. This works with your original string: #include <Array.au3> Global $sString = '17/12/2009</span>' & @CRLF & '<a href="/news/archive/40698/europa-league-draw.html">EUROPA LEAGUE DRAW' Global $sRegExp = '([0-9]{1,2}/[0-9]{1,2}/[0-9]{4})</span>\s*<a href="([^"]*)">([^<]*)' Global $RET = StringRegExp($sString, $sRegExp, 3) If @error Then ConsoleWrite("@error = " & @error & @LF) Else _ArrayDisplay($RET, "$RET") EndIf Edited December 18, 2009 by PsaltyDS Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law
Richard Robertson Posted December 18, 2009 Posted December 18, 2009 It depends on how you have the results returned. Groups (the things in parenthesis) are returned as array elements. The entire string itself is the match.
JohnOne Posted December 18, 2009 Author Posted December 18, 2009 (edited) I am starting to understand the pattern now, but still cant grasp why the extracted string wont comply. They look different in the msgbox. This works #include <Inet.au3> #include <String.au3> #include <Array.au3> ;Global $Url = "http://www.evertonfc.com/news/news-archive.html" ;Global $sFile = _INetGetSource($Url) Global $sString = '17/12/2009</span>' & @CRLF & '<a href="/news/archive/40698/europa-league-draw.html">EUROPA LEAGUE DRAW' MsgBox(0,"String",$sString) Global $sRegExp = '([0-9]{1,2}/[0-9]{1,2}/[0-9]{4})</span>\s*<a href="([^"]*)">([^<]*)' Global $RET = StringRegExp($sString, $sRegExp, 3) If @error Then ConsoleWrite("@error = " & @error & @LF) Else _ArrayDisplay($RET, "$RET") EndIf This does not #include <Inet.au3> #include <String.au3> #include <Array.au3> ;#cs Global $Url = "http://www.evertonfc.com/news/news-archive.html" Global $sFile = _INetGetSource($Url) Global $pattern = '(\d{1,2}/\d{1,2}/\d{4})</span> \s* <a href="([^"]*)">([^<]*)' Global $sString = _StringBetween($sFile,'<span class="date"','/a>',-1) MsgBox(0,"",$sString[0]) ConsoleWrite(@CRLF & $sString[0]) Global $sString1 = StringRegExp($sString,$pattern,3) If @error Then MsgBox(0,"",@error) Else _ArrayDisplay($sString1) EndIf heads hurting now. Edit This dosent work either #include <Inet.au3> #include <String.au3> #include <Array.au3> ;Global $Url = "http://www.evertonfc.com/news/news-archive.html" ;Global $sFile = _INetGetSource($Url) Global $sString = '17/12/2009</span>' & @CRLF & '<a' & @CRLF & 'href="/news/archive/40698/europa-league-draw.html">EUROPA LEAGUE DRAW' MsgBox(0,"String",$sString) Global $sRegExp = '([0-9]{1,2}/[0-9]{1,2}/[0-9]{4})</span>\s*<a href="([^"]*)">([^<]*)' Global $RET = StringRegExp($sString, $sRegExp, 3) If @error Then ConsoleWrite("@error = " & @error & @LF) Else _ArrayDisplay($RET, "$RET") EndIf Edited December 18, 2009 by JohnOne AutoIt Absolute Beginners Require a serial Pause Script Video Tutorials by Morthawt ipify Monkey's are, like, natures humans.
PsaltyDS Posted December 18, 2009 Posted December 18, 2009 In the first of your failing examples, we don't get to see what $sString contains before the StringRegExp() is run. Post the failing content. In the second example, you have a clearly non-matching string because of the @CRLF between '<a' and 'href="...'. The pattern is looking for a literal space there, not just any whitespace. Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law
JohnOne Posted December 18, 2009 Author Posted December 18, 2009 This is what $sString outputs in the console, but it looks completely different in the msgbox. ">18/12/2009</span> <a href="/news/archive/ossie-eyes-prem-return.html">OSSIE EYES PREM RETURN<" Without the double quotes of course. Idf I am correct the seemingly whitespace contains a {RETURN} of some sort, and a {TAB} along with what seem to be spaces. And its not showing here as It shows in the console or msgbox. AutoIt Absolute Beginners Require a serial Pause Script Video Tutorials by Morthawt ipify Monkey's are, like, natures humans.
PsaltyDS Posted December 18, 2009 Posted December 18, 2009 On 12/18/2009 at 5:50 PM, 'JohnOne said: This is what $sString outputs in the console, but it looks completely different in the msgbox. ">18/12/2009</span> <a href="/news/archive/ossie-eyes-prem-return.html">OSSIE EYES PREM RETURN<" Without the double quotes of course. Idf I am correct the seemingly whitespace contains a {RETURN} of some sort, and a {TAB} along with what seem to be spaces. And its not showing here as It shows in the console or msgbox. Still works for me with lots of misc whitespace inserted: #include <Array.au3> Global $sString = '>18/12/2009</span> ' & @TAB & ' ' & @CR & ' ' & @LF & ' ' & @CRLF & _ ' <a href="/news/archive/ossie-eyes-prem-return.html">OSSIE EYES PREM RETURN<' Global $sRegExp = '([0-9]{1,2}/[0-9]{1,2}/[0-9]{4})</span>\s*<a href="([^"]*)">([^<]*)' Global $RET = StringRegExp($sString, $sRegExp, 3) If @error Then ConsoleWrite("@error = " & @error & @LF) Else _ArrayDisplay($RET, "$RET") EndIf Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law
JohnOne Posted December 18, 2009 Author Posted December 18, 2009 That works for me too But it doesent work from the source, Must be some sort of mad invisible character in the whitespace throwing StringRegEx out of wack. Thanks for you time. AutoIt Absolute Beginners Require a serial Pause Script Video Tutorials by Morthawt ipify Monkey's are, like, natures humans.
PsaltyDS Posted December 18, 2009 Posted December 18, 2009 Get $bString = StringToBinary($sString) and check out the results to see if there is something odd in there. Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law
JohnOne Posted December 18, 2009 Author Posted December 18, 2009 (edited) No joy, No output from $bString in console. Nothing ever goes smoothly for me, ever. #include <Inet.au3> #include <String.au3> #include <Array.au3> Global $Url = "http://www.evertonfc.com/news/news-archive.html" Global $sFile = _INetGetSource($Url) Global $sString = _StringBetween($sFile,'<span class="date">','</a>',-1) Global $sRegExp = '([0-9]{1,2}/[0-9]{1,2}/[0-9]{4})</span>\s*<a href="([^"]*)">([^<]*)' ConsoleWrite($sString[0]) $bString = StringToBinary($sString[0]) ConsoleWrite($bString) Console output >Running:(3.3.0.0):F:\Program Files\AutoIt3\autoit3.exe "F:\Test1\test.au3" 18/12/2009</span> <a href="/news/archive/ossie-eyes-prem-return.html">OSSIE EYES PREM RETURN 18/12/2009</span> <a href="/news/archive/ossie-eyes-prem-return.html">OSSIE EYES PREM RETURN+>21:14:22 AutoIT3.exe ended.rc:0 +>21:14:23 AutoIt3Wrapper Finished >Exit code: 0 Time: 11.668 EDIT for clarity Fixed code and output has changed Edited December 18, 2009 by JohnOne AutoIt Absolute Beginners Require a serial Pause Script Video Tutorials by Morthawt ipify Monkey's are, like, natures humans.
PsaltyDS Posted December 18, 2009 Posted December 18, 2009 (edited) Doh! $sString is an array from _StringBetween(). You'll have to use $sString[0], not just $sString. Did you make the same mistake passing it into StringRegExp()? Hint: Yes, you did, in the second script in post #7. Edited December 18, 2009 by PsaltyDS Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law
JohnOne Posted December 18, 2009 Author Posted December 18, 2009 (edited) OOpse changed wrong line in code to (was trying different flags) $bString = StringToBinary($sString[0]) Output >Running:(3.3.0.0):F:\Program Files\AutoIt3\autoit3.exe "F:\Test1\test.au3" 18/12/2009</span> <a href="/news/archive/ossie-eyes-prem-return.html">OSSIE EYES PREM RETURN 18/12/2009</span> <a href="/news/archive/ossie-eyes-prem-return.html">OSSIE EYES PREM RETURN+>21:14:22 AutoIT3.exe ended.rc:0 +>21:14:23 AutoIt3Wrapper Finished >Exit code: 0 Time: 11.668 Well thats certainly not right, its exactly the same EDIT2 edited post 13 to correct code and output Edited December 18, 2009 by JohnOne AutoIt Absolute Beginners Require a serial Pause Script Video Tutorials by Morthawt ipify Monkey's are, like, natures humans.
PsaltyDS Posted December 18, 2009 Posted December 18, 2009 Post exactly what you're running now. Because this: Global $sString[1] = ['18/12/2009</span>' & @CRLF & _ '<a href="/news/archive/ossie-eyes-prem-return.html">OSSIE EYES PREM RETURN18/12/2009</span>'] Global $bString = StringToBinary($sString[0]) ConsoleWrite("$bString = " & $bString & @LF) Outputs this: >Running:(3.3.2.0):C:\Program Files\AutoIt3\autoit3.exe "C:\Program Files\AutoIt3\Testing\Test1.au3" $bString = 0x31382F31322F323030393C2F7370616E3E0D0A3C6120687265663D222F6E6577732F617263686976652F6F737369652D657965732D7072656D2D72657475726E2E68746D6C223E4F535349452045594553205052454D2052455455524E31382F31322F323030393C2F7370616E3E Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law
JohnOne Posted December 18, 2009 Author Posted December 18, 2009 Just keeps getting weirder 1 minute ago #include <Inet.au3> #include <String.au3> #include <Array.au3> Global $Url = "http://www.evertonfc.com/news/news-archive.html" Global $sFile = _INetGetSource($Url) Global $sString = _StringBetween($sFile,'<span class="date">','</a>',-1) Global $sRegExp = '([0-9]{1,2}/[0-9]{1,2}/[0-9]{4})</span>\s*<a href="([^"]*)">([^<]*)' ConsoleWrite($sString[0] & @CRLF) $bString = StringToBinary($sString[0]) ConsoleWrite($bString) $blah = "abcdefg" $blahblah = StringToBinary($blah) ConsoleWrite($blahblah) >"F:\Program Files\AutoIt3\SciTE\AutoIt3Wrapper\AutoIt3Wrapper.exe" /run /prod /ErrorStdOut /in "F:\Test1\test.au3" /autoit3dir "F:\Program Files\AutoIt3" /UserParams +>21:21:37 Starting AutoIt3Wrapper v.2.0.0.1 Environment(Language:0409 Keyboard:00000809 OS:WIN_VISTA/ CPU:X86 OS:X86) >Running AU3Check (1.54.14.0) from:F:\Program Files\AutoIt3 +>21:21:37 AU3Check ended.rc:0 >Running:(3.3.0.0):F:\Program Files\AutoIt3\autoit3.exe "F:\Test1\testp.au3" 18/12/2009</span> <a href="/news/archive/ossie-eyes-prem-return.html">OSSIE EYES PREM RETURN 18/12/2009</span> <a href="/news/archive/ossie-eyes-prem-return.html">OSSIE EYES PREM RETURNabcdefg+>21:21:42 AutoIT3.exe ended.rc:0 +>21:21:43 AutoIt3Wrapper Finished >Exit code: 0 Time: 6.446 AutoIt Absolute Beginners Require a serial Pause Script Video Tutorials by Morthawt ipify Monkey's are, like, natures humans.
JohnOne Posted December 18, 2009 Author Posted December 18, 2009 Changed ConsoleWrite($bString) to ConsoleWrite($bString & @CRLF) Now outputting >Running:(3.3.0.0):F:\Program Files\AutoIt3\autoit3.exe "F:\Test1\test.au3" 18/12/2009</span> <a href="/news/archive/ossie-eyes-prem-return.html">OSSIE EYES PREM RETURN 0x31382F31322F323030393C2F7370616E3E0D0A2020202020202020202020202020202020202020093C6120687265663D222F6E6577732F617263686976652F6F737369652D657965732D7072656D2D72657475726E2E68746D6C223E4F535349452045594553205052454D2052455455524E abcdefg +>21:32:07 AutoIT3.exe ended.rc:0 +>21:32:08 AutoIt3Wrapper Finished >Exit code: 0 Time: 6.744 AutoIt Absolute Beginners Require a serial Pause Script Video Tutorials by Morthawt ipify Monkey's are, like, natures humans.
PsaltyDS Posted December 18, 2009 Posted December 18, 2009 (edited) That section of white spaces is: 0x3E0D0A2020202020202020202020202020202020202020093C That's just @CRLF, some spaces, and one @TAB between the > and <. Nothing strange there. Edited December 18, 2009 by PsaltyDS Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law
JohnOne Posted December 18, 2009 Author Posted December 18, 2009 (edited) Yup, just been looking myself I got it to 0D0A202020202020202020202020202020202020202009 = @CRLF, 20 spaces, and on TAB Just makes it worse, really am cracking up now EDIT: Going to try a different O/S Thanks again mate, much obliged. Edited December 18, 2009 by JohnOne AutoIt Absolute Beginners Require a serial Pause Script Video Tutorials by Morthawt ipify Monkey's are, like, natures humans.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now