phew Posted October 18, 2007 Share Posted October 18, 2007 hellowww,i'm trying to write my own googlescript, searching for a pattern returning the results of the first google-site. working so far, just got one problem, i guess it's my regexp (i'm not good in regexp, confusing thingy!)my $srcv = TCPRecv($sock, 10000) is receiving the data from google (the quelltext of the first results website, when i search for "testpattern" ie. the quelltext of: http://www.google.de/search?hl=de&q=te...Suche&meta= is saved as a string in my $srcv)now i want the script to check for ALL url's found in this quelltext using:$www = StringRegExp($srcv, '<a href="(.*?)" class=l>', 1)it works so far, i can ie. write an urls.txt with strings matching the regexp, but there i also get a problem:in my quelltext there is: <a href="http://testpattern.msnbc.msn.com/" class=l> <a href="http://testpattern.msnbc.msn.com/archive/2007/08/06/299549.aspx" class=l> <a href="http://www.msnbc.msn.com/id/4326967/" class=l> <a href="http://www.testpattern.de/" class=l> <a href="http://forum.de.selfhtml.org/archiv/2007/5/t152535/" class=l> <---------- this one is not matched <a href="http://www.testpattern.org/" class=l> <a href="http://ivs.cs.uni-magdeburg.de/~dumke/ST1/GBeleg.html" class=l> <a href="http://forum.de.selfhtml.org/archiv/2007/5/t152535/" class=l> <a href="http://www.prunejuice.net/testpattern/" class=l> [...] and much more unneeded stuff and a few other <a href="......" class=l>now my script writes down all filtered url's in urls.txt. here my result for searching for "testpattern" at google.com:http://testpattern.msnbc.msn.com/ http://testpattern.msnbc.msn.com/archive/2007/08/06/299549.aspx http://www.msnbc.msn.com/id/4326967/ http://www.testpattern.de/ http://www.testpattern.org/ http://ivs.cs.uni-magdeburg.de/~dumke/ST1/GBeleg.html http://www.prunejuice.net/testpattern/in the $srcv (quelltext of google result website), "forum.de.selfhtml.org/archiv/2007/5/t152535/" is written in this form:[...] <a href="http://forum.de.selfhtml.org/archiv/2007/5/t152535/" class=l> [...]why is this link not matched in my regexp? it's not written down in my urls.txt - i guess there must be smtn wrong with my regexp, but i have no clue what!help pls, greets Link to comment Share on other sites More sharing options...
Thatsgreat2345 Posted October 18, 2007 Share Posted October 18, 2007 If you use IE.au3 you can just use IELinkGetCollection. Link to comment Share on other sites More sharing options...
phew Posted October 18, 2007 Author Share Posted October 18, 2007 If you use IE.au3 you can just use IELinkGetCollection.thank you very much =) Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now