Jump to content
Sign in to follow this  
leuce

My StringRegExp doesn't get it

Recommended Posts

leuce

G'day everyone

Please can someone just tell me what's wrong with my StringRegExp query here?

I'm querying an file downloaded from the Wikipedia. I want the script to find this:

nl.wikipedia.org/wiki/Lounge">Nederlands

and add this to the array:

Lounge

Here's my non-working code:

$result = StringRegExp($wikipage2, '(?:nl\.wikipedia\.org/wiki/)(.+)(?:">Nederlands)', 1)

Please tell me what wrong with it?

Thanks

Samuel

Share this post


Link to post
Share on other sites
Xenobiologist

Hi,

try this pattern (?<=nl\.wikipedia\.org/wiki/).*(?=">Nederlands)

So long,

Mega


Scripts & functions Organize Includes Let Scite organize the include files

Yahtzee The game "Yahtzee" (Kniffel, DiceLion)

LoginWrapper Secure scripts by adding a query (authentication)

_RunOnlyOnThis UDF Make sure that a script can only be executed on ... (Windows / HD / ...)

Internet-Café Server/Client Application Open CD, Start Browser, Lock remote client, etc.

MultipleFuncsWithOneHotkey Start different funcs by hitting one hotkey different times

Share this post


Link to post
Share on other sites
leuce

Try this pattern (?<=nl\.wikipedia\.org/wiki/).*(?=">Nederlands)

Thanks. I really don't understand it, though. The AutoIt beta help file does not mention "?<=" or "?=" at all. Where did you get those, and what do they mean?

Share this post


Link to post
Share on other sites
Xenobiologist

Thanks. I really don't understand it, though. The AutoIt beta help file does not mention "?<=" or "?=" at all. Where did you get those, and what do they mean?

HI,

look at wiki or any perl explanation.

It is very simple.

(?<=nl\.wikipedia\.org/wiki/).*(?=">Nederlands)

This means:

(<=...) = Look that this is ahead of the search pattern

.* = the search pattern in this case anything

(?=...) = Look that this comes after the search pattern.

Keywords are : lookahead and lookbehind. (these are postive) they are also possible to create negative.

So long,

Mega


Scripts & functions Organize Includes Let Scite organize the include files

Yahtzee The game "Yahtzee" (Kniffel, DiceLion)

LoginWrapper Secure scripts by adding a query (authentication)

_RunOnlyOnThis UDF Make sure that a script can only be executed on ... (Windows / HD / ...)

Internet-Café Server/Client Application Open CD, Start Browser, Lock remote client, etc.

MultipleFuncsWithOneHotkey Start different funcs by hitting one hotkey different times

Share this post


Link to post
Share on other sites
leuce

(?<=nl\.wikipedia\.org/wiki/).*(?=">Nederlands)

This means:

(<=...) = Look that this is ahead of the search pattern

.* = the search pattern in this case anything

(?=...) = Look that this comes after the search pattern.

Thanks. Well, AutoIt doesn't seem to recognise it. But I finally figured out the correct regex:

'(nl\.wikipedia\.org/wiki/)(.*)(">Nederlands)'

Don't ask me why, but it works. According to the AutoIt help file I have to specifically exclude groups by using (?: ... ), and in the example above I did not exclude any groups, and yet AutoIt only adds the middle group to the array. Weird, but as long as it works:

$result = StringRegExp('<li class="interwiki-nl"><a href="http://nl.wikipedia.org/wiki/Wetenschap">Nederlands</a></li>', '(nl\.wikipedia\.org/wiki/)(.+)(">Nederlands)', 1)
MsgBox (0, "", $result[1], 10)

The above MsgBox yields "Wetenschap", which was what I was hoping for.

Thanks for your help.

Samuel

Share this post


Link to post
Share on other sites
leuce

Now for my next problem (or rather, the original one):

Here is the script which should work, but gives me an array error:

$pageurl = FileReadLine("URLs.txt", $i)
$wikipage = InetGet($pageurl, "foo.txt")
$wikipage2 = FileOpen ("foo.txt", 0)
$wikipage3 = FileRead ("foo.txt", 0)

Sleep ("2000")

$result = StringRegExp($wikipage3, '(nl\.wikipedia\.org/wiki/)(.*)(">Nederlands)', 1)
MsgBox (0, "", $result[1], 10)

In the above example, the file URLs.txt contains a URL of a page that definitely matches the regexp. I've checked it, and it does match it. So the problem seems to be that I can't get the script to recognise $wikipage3. You'll notice that I had added $wikipage2 and $wikipage3 later, because I initially thought I could pipe the InetGet contents directly into the StringRegExp... but that doesn't seem to be the case.

How can I tell AutoIt to do the StringRegExp on the page that was downloaded using InetGet?

Thanks

Samuel

Share this post


Link to post
Share on other sites
Xenobiologist

Hi,

;Extract link

Global $str = '<li class="interwiki-nl"><a href="http://nl.wikipedia.org/wiki/Wetenschap">Nederlands</a></li>'

$link = StringRegExp($str, '(?<=nl\.wikipedia\.org/wiki/).*(?=">Nederlands)', 1)
MsgBox(64, 'Link', $link[0])

$result = StringRegExp($str, '(nl\.wikipedia\.org/wiki/)(.+)(">Nederlands)', 1)
MsgBox (64, "Link", $result[1], 10)

So long,

Mega


Scripts & functions Organize Includes Let Scite organize the include files

Yahtzee The game "Yahtzee" (Kniffel, DiceLion)

LoginWrapper Secure scripts by adding a query (authentication)

_RunOnlyOnThis UDF Make sure that a script can only be executed on ... (Windows / HD / ...)

Internet-Café Server/Client Application Open CD, Start Browser, Lock remote client, etc.

MultipleFuncsWithOneHotkey Start different funcs by hitting one hotkey different times

Share this post


Link to post
Share on other sites
leuce

Global $str = '<li class="interwiki-nl"><a href="http://nl.wikipedia.org/wiki/Wetenschap">Nederlands</a></li>'

$link = StringRegExp($str, '(?<=nl\.wikipedia\.org/wiki/).*(?=">Nederlands)', 1)
MsgBox(64, 'Link', $link[0])oÝ÷ Ûú®¢×±çz£­*.®· {héÞyÛ-¢ë)äÓbr¬æjYr­ë®÷~íéî·«¡ø­zk"Ø^jºÚÉ©ÝÂ+a)Þ"wvÚ.±ébMp'!槲Ø^~*ì¶+^)íæ«­¬£*.z0¥êáj0«mç(®·¶Ì§µ¬p¢é]mëhvaÆ®¶­sdvÆö&Âb33c·7G"ÒfÆT÷VâgV÷C¶föòçGBgV÷C²Â

instead? My main hiccup now is to get AutoIt to do the regex search on the entire opened file.

I guess I could to FileReadLine one line at a time and do the regex search on that, but it will slow down the operation of the script tremendously.

Thanks again

Samuel

Share this post


Link to post
Share on other sites
Xenobiologist

Hi,

soemthing like this:

$link = StringRegExp(FileRead(FileOpen('c:\test.txt', 0)), '(?<=nl\.wikipedia\.org/wiki/).*(?=">Nederlands)', 1)
If IsArray($link) Then MsgBox(64, "link", $link[0])

So long,

Mega


Scripts & functions Organize Includes Let Scite organize the include files

Yahtzee The game "Yahtzee" (Kniffel, DiceLion)

LoginWrapper Secure scripts by adding a query (authentication)

_RunOnlyOnThis UDF Make sure that a script can only be executed on ... (Windows / HD / ...)

Internet-Café Server/Client Application Open CD, Start Browser, Lock remote client, etc.

MultipleFuncsWithOneHotkey Start different funcs by hitting one hotkey different times

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×