leuce Posted May 13, 2007 Posted May 13, 2007 G'day everyone Please can someone just tell me what's wrong with my StringRegExp query here? I'm querying an file downloaded from the Wikipedia. I want the script to find this: nl.wikipedia.org/wiki/Lounge">Nederlands and add this to the array: Lounge Here's my non-working code: $result = StringRegExp($wikipage2, '(?:nl\.wikipedia\.org/wiki/)(.+)(?:">Nederlands)', 1) Please tell me what wrong with it? Thanks Samuel
Xenobiologist Posted May 13, 2007 Posted May 13, 2007 Hi, try this pattern (?<=nl\.wikipedia\.org/wiki/).*(?=">Nederlands) So long, Mega Scripts & functions Organize Includes Let Scite organize the include files Yahtzee The game "Yahtzee" (Kniffel, DiceLion) LoginWrapper Secure scripts by adding a query (authentication) _RunOnlyOnThis UDF Make sure that a script can only be executed on ... (Windows / HD / ...) Internet-Café Server/Client Application Open CD, Start Browser, Lock remote client, etc. MultipleFuncsWithOneHotkey Start different funcs by hitting one hotkey different times
leuce Posted May 14, 2007 Author Posted May 14, 2007 Try this pattern (?<=nl\.wikipedia\.org/wiki/).*(?=">Nederlands)Thanks. I really don't understand it, though. The AutoIt beta help file does not mention "?<=" or "?=" at all. Where did you get those, and what do they mean?
Xenobiologist Posted May 14, 2007 Posted May 14, 2007 Thanks. I really don't understand it, though. The AutoIt beta help file does not mention "?<=" or "?=" at all. Where did you get those, and what do they mean?HI,look at wiki or any perl explanation. It is very simple. (?<=nl\.wikipedia\.org/wiki/).*(?=">Nederlands)This means: (<=...) = Look that this is ahead of the search pattern.* = the search pattern in this case anything (?=...) = Look that this comes after the search pattern. Keywords are : lookahead and lookbehind. (these are postive) they are also possible to create negative. So long,Mega Scripts & functions Organize Includes Let Scite organize the include files Yahtzee The game "Yahtzee" (Kniffel, DiceLion) LoginWrapper Secure scripts by adding a query (authentication) _RunOnlyOnThis UDF Make sure that a script can only be executed on ... (Windows / HD / ...) Internet-Café Server/Client Application Open CD, Start Browser, Lock remote client, etc. MultipleFuncsWithOneHotkey Start different funcs by hitting one hotkey different times
leuce Posted May 14, 2007 Author Posted May 14, 2007 (?<=nl\.wikipedia\.org/wiki/).*(?=">Nederlands) This means: (<=...) = Look that this is ahead of the search pattern .* = the search pattern in this case anything (?=...) = Look that this comes after the search pattern. Thanks. Well, AutoIt doesn't seem to recognise it. But I finally figured out the correct regex: '(nl\.wikipedia\.org/wiki/)(.*)(">Nederlands)' Don't ask me why, but it works. According to the AutoIt help file I have to specifically exclude groups by using (?: ... ), and in the example above I did not exclude any groups, and yet AutoIt only adds the middle group to the array. Weird, but as long as it works: $result = StringRegExp('<li class="interwiki-nl"><a href="http://nl.wikipedia.org/wiki/Wetenschap">Nederlands</a></li>', '(nl\.wikipedia\.org/wiki/)(.+)(">Nederlands)', 1) MsgBox (0, "", $result[1], 10) The above MsgBox yields "Wetenschap", which was what I was hoping for. Thanks for your help. Samuel
leuce Posted May 14, 2007 Author Posted May 14, 2007 Now for my next problem (or rather, the original one): Here is the script which should work, but gives me an array error: $pageurl = FileReadLine("URLs.txt", $i) $wikipage = InetGet($pageurl, "foo.txt") $wikipage2 = FileOpen ("foo.txt", 0) $wikipage3 = FileRead ("foo.txt", 0) Sleep ("2000") $result = StringRegExp($wikipage3, '(nl\.wikipedia\.org/wiki/)(.*)(">Nederlands)', 1) MsgBox (0, "", $result[1], 10) In the above example, the file URLs.txt contains a URL of a page that definitely matches the regexp. I've checked it, and it does match it. So the problem seems to be that I can't get the script to recognise $wikipage3. You'll notice that I had added $wikipage2 and $wikipage3 later, because I initially thought I could pipe the InetGet contents directly into the StringRegExp... but that doesn't seem to be the case. How can I tell AutoIt to do the StringRegExp on the page that was downloaded using InetGet? Thanks Samuel
Xenobiologist Posted May 14, 2007 Posted May 14, 2007 Hi, ;Extract link Global $str = '<li class="interwiki-nl"><a href="http://nl.wikipedia.org/wiki/Wetenschap">Nederlands</a></li>' $link = StringRegExp($str, '(?<=nl\.wikipedia\.org/wiki/).*(?=">Nederlands)', 1) MsgBox(64, 'Link', $link[0]) $result = StringRegExp($str, '(nl\.wikipedia\.org/wiki/)(.+)(">Nederlands)', 1) MsgBox (64, "Link", $result[1], 10) So long, Mega Scripts & functions Organize Includes Let Scite organize the include files Yahtzee The game "Yahtzee" (Kniffel, DiceLion) LoginWrapper Secure scripts by adding a query (authentication) _RunOnlyOnThis UDF Make sure that a script can only be executed on ... (Windows / HD / ...) Internet-Café Server/Client Application Open CD, Start Browser, Lock remote client, etc. MultipleFuncsWithOneHotkey Start different funcs by hitting one hotkey different times
leuce Posted May 14, 2007 Author Posted May 14, 2007 Global $str = '<li class="interwiki-nl"><a href="http://nl.wikipedia.org/wiki/Wetenschap">Nederlands</a></li>' $link = StringRegExp($str, '(?<=nl\.wikipedia\.org/wiki/).*(?=">Nederlands)', 1) MsgBox(64, 'Link', $link[0])oÝ÷ Ûú®¢×±çz£*.®· {héÞyÛ-¢ë)äÓbr¬æjYrë®÷~íéî·«¡øzk"Ø^jºÚÉ©ÝÂ+a)Þ"wvÚ.±ébMp'!槲Ø^~*ì¶+^)í櫬£*.z0¥êáj0«mç(®·¶Ì§µ¬p¢é]mëhvaÆ®¶sdvÆö&Âb33c·7G"ÒfÆT÷VâgV÷C¶föòçGBgV÷C²Â instead? My main hiccup now is to get AutoIt to do the regex search on the entire opened file. I guess I could to FileReadLine one line at a time and do the regex search on that, but it will slow down the operation of the script tremendously. Thanks again Samuel
Xenobiologist Posted May 14, 2007 Posted May 14, 2007 Hi, soemthing like this: $link = StringRegExp(FileRead(FileOpen('c:\test.txt', 0)), '(?<=nl\.wikipedia\.org/wiki/).*(?=">Nederlands)', 1) If IsArray($link) Then MsgBox(64, "link", $link[0]) So long, Mega Scripts & functions Organize Includes Let Scite organize the include files Yahtzee The game "Yahtzee" (Kniffel, DiceLion) LoginWrapper Secure scripts by adding a query (authentication) _RunOnlyOnThis UDF Make sure that a script can only be executed on ... (Windows / HD / ...) Internet-Café Server/Client Application Open CD, Start Browser, Lock remote client, etc. MultipleFuncsWithOneHotkey Start different funcs by hitting one hotkey different times
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now