Sign in to follow this  
Followers 0
phew

Google RegExp

3 posts in this topic

hellowww,

i'm trying to write my own googlescript, searching for a pattern returning the results of the first google-site. working so far, just got one problem, i guess it's my regexp (i'm not good in regexp, confusing thingy!)

my $srcv = TCPRecv($sock, 10000) is receiving the data from google (the quelltext of the first results website, when i search for "testpattern" ie. the quelltext of: http://www.google.de/search?hl=de&q=te...Suche&meta= is saved as a string in my $srcv)

now i want the script to check for ALL url's found in this quelltext using:

$www = StringRegExp($srcv, '<a href="(.*?)" class=l>', 1)

it works so far, i can ie. write an urls.txt with strings matching the regexp, but there i also get a problem:

in my quelltext there is:

<a href="http://testpattern.msnbc.msn.com/" class=l>
<a href="http://testpattern.msnbc.msn.com/archive/2007/08/06/299549.aspx" class=l>
<a href="http://www.msnbc.msn.com/id/4326967/" class=l>
<a href="http://www.testpattern.de/" class=l>
<a href="http://forum.de.selfhtml.org/archiv/2007/5/t152535/" class=l>     <---------- this one is not matched
<a href="http://www.testpattern.org/" class=l>
<a href="http://ivs.cs.uni-magdeburg.de/~dumke/ST1/GBeleg.html" class=l>
<a href="http://forum.de.selfhtml.org/archiv/2007/5/t152535/" class=l>
<a href="http://www.prunejuice.net/testpattern/" class=l>
[...] and much more unneeded stuff and a few other <a href="......" class=l>

now my script writes down all filtered url's in urls.txt. here my result for searching for "testpattern" at google.com:

http://testpattern.msnbc.msn.com/
http://testpattern.msnbc.msn.com/archive/2007/08/06/299549.aspx
http://www.msnbc.msn.com/id/4326967/
http://www.testpattern.de/
http://www.testpattern.org/
http://ivs.cs.uni-magdeburg.de/~dumke/ST1/GBeleg.html
http://www.prunejuice.net/testpattern/

in the $srcv (quelltext of google result website), "forum.de.selfhtml.org/archiv/2007/5/t152535/" is written in this form:

[...] <a href="http://forum.de.selfhtml.org/archiv/2007/5/t152535/" class=l> [...]

why is this link not matched in my regexp? it's not written down in my urls.txt - i guess there must be smtn wrong with my regexp, but i have no clue what!

help pls, greets

Share this post


Link to post
Share on other sites



If you use IE.au3 you can just use IELinkGetCollection.

Share this post


Link to post
Share on other sites

If you use IE.au3 you can just use IELinkGetCollection.

thank you very much =)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0