Jump to content

get some links


Recommended Posts

Hi everyone. I'm trying to get the links of a SERP page of Google (results of a query).

Ultimately, what i want is to get the green links that the Google result page shows after you realize a query.

I can't understand why i can't get response using StringBetween and the tags that i want. Here is my code:

query_final = "www.google.es/search?&gl= XX&proba=YY&q=sushi+barcelona"
        $WinHttpReq = ObjCreate("winhttp.winhttprequest.5.1")   ;Realizamos la peticion HTTP
    $URL = "http://" & $query_final[0]
    $WinHttpReq.open("GET",$URL)
    $WinHttpReq.send()
    $HTML = $WinHttpReq.ResponseText                        ;Recuperamos el HTML de la web
    
    $center = _StringBetween($HTML, '<div id="center_col">','</div>')
        ;$links = _StringBetween($HTML, '<cite>','</cite>') ; this is the tag where the link that i want is but i can't get an answer
    ConsoleWrite(@error & @CRLF)
    ConsoleWrite("central column: " & $centro[0])

I tryed using different tags in StringBetween but there's no succeed. In this code i just try to get a part of the resultant page (the center) for try to get the links after, but doesn't work. Anyone knows what i'm doing wrong or how to make it work? The answer is always that there is no instring founded :s

I don't want to use IE functions because i want the query independent of the client, so i need to use WinHttpReq.open.

Thanks a lot :blink:

Link to comment
Share on other sites

Sorry, i quickly changed, thats the code

query_final = "www.google.es/search?&gl= XX&proba=YY&q=sushi+barcelona"
    $WinHttpReq = ObjCreate("winhttp.winhttprequest.5.1")   ;Realizamos la peticion HTTP
    $URL = "http://" & $query_final[0]
    $WinHttpReq.open("GET",$URL)
    $WinHttpReq.send()
    $HTML = $WinHttpReq.ResponseText                        ;Recuperamos el HTML de la web
    
    $center = _StringBetween($HTML, '<div id="center_col">','</div>')
     ;$links = _StringBetween($HTML, '<cite>','</cite>') ; this is the tag where the link that i want is but i can't get an answer
    ConsoleWrite(@error & @CRLF)
    ConsoleWrite("central column: " & $center[0])

I've put it because i wanted to see how many results the function returned

Link to comment
Share on other sites

Still errors in your script - $query_final[0] ?

Even if I correct that error still the script refuses to work on my computer - it exits on $WinHttpReq.send()

What you can do is: see if you get any return at all in $HTML because if you don't get anything there, obviously _StringBetween "nothing" will return nothing.

If you get something in $HTML - look for the tags you put in _StringBetween ...

SNMP_UDF ... for SNMPv1 and v2c so far, GetBulk and a new example script

wannabe "Unbeatable" Tic-Tac-Toe

Paper-Scissor-Rock ... try to beat it anyway :)

Link to comment
Share on other sites

Thanks enaiman. The point is that i checked that i recive the HTML correctly with $WinHttpReq.ResponseText, and the tag i used is in the HTML, so it had to work, but it doesn't.

Query_final is because in the main script this var is a multidimension array, but for this example i've turn it into 1D, so sorry for the mistake, the [0] has to be removed.

Anyone see what's wrong or why is not working??

Thanks to everyone :blink:

Link to comment
Share on other sites

The response of the var $HTML is:

<html><head><meta http-equiv="content-type" content="text/html;charset=utf-8"><title>403 Forbidden</title><style><!--body {font-family: arial,sans-serif}div.nav {margin-top: 1ex}div.nav A {font-size: 10pt; font-family: arial,sans-serif}span.nav {font-size: 10pt; font-family: arial,sans-serif; font-weight: bold}div.nav A,span.big {font-size: 12pt; color: #0000cc}div.nav A {font-size: 10pt; color: black}A.l:link {color: #6f6f6f}A.u:link {color: green}//--></style><script><!--var rc=403;//--></script></head><body text=#000000 bgcolor=#ffffff><table border=0 cellpadding=2 cellspacing=0 width=100%><tr><td rowspan=3 width=1% nowrap><b><font face=times color=#0039b6 size=10>G</font><font face=times color=#c41200 size=10>o</font><font face=times color=#f3c518 size=10>o</font><font face=times color=#0039b6 size=10>g</font><font face=times color=#30a72f size=10>l</font><font face=times color=#c41200 size=10>e</font>&nbsp;&nbsp;</b><td>&nbsp;</td></tr><tr><td bgcolor="#3366cc"><font face=arial,sans-serif color="#ffffff"><b>Error</b></td></tr><tr><td>&nbsp;</td></tr></table><blockquote><H1>Forbidden</H1>Your client does not have permission to get URL <code>/search?&amp;gl=%20XX&amp;proba=YY&amp;q=sushi+barcelona</code> from this server. (Client IP address: 79.145.194.168)<br><br>

Please see Google's Terms of Service posted at http://www.google.com/terms_of_service.html

<BR><BR><P>If you believe that you have received this response in error, please <A HREF="http://www.google.com/support/bin/request.py?contact_type=user&hl=en">report</A> your problem. However, please make sure to take a look at our Terms of Service (http://www.google.com/terms_of_service.html). In your email, please send us the <b>entire</b> code displayed below. Please also send us any information you may know about how you are performing your Google searches-- for example, "I'm using the Opera browser on Linux to do searches from home. My Internet access is through a dial-up account I have with the FooCorp ISP." or "I'm using the Konqueror browser on Linux to search from my job at myFoo.com. My machine's IP address is 10.20.30.40, but all of myFoo's web traffic goes through some kind of proxy server whose IP address is 10.11.12.13." (If you don't know any information like this, that's OK. But this kind of information can help us track down problems, so please tell us what you can.)</P><P>We will use all this information to diagnose the problem, and we'll hopefully have you back up and searching with Google again quickly!</P>

<P>Please note that although we read all the email we receive, we are not always able to send a personal response to each and every email. So don't despair if you don't hear back from us!</P>

<P>Also note that if you do not send us the <b>entire</b> code below, <i>we will not be able to help you</i>.</P><P>Best wishes,<BR>The Google Team</BR></P><BLOCKQUOTE>/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/<BR>

UtgHViGXEtCTb_uE5-4Sc5Ayr4l5zinN_jl_TaQFwu97VYtzJ<BR>

b9cktSKu1f0B_Jl5SX5UPwRH1EICyQWPPrkkQtZ_6Dkq3vMDV<BR>

SgJHcByYqJF3KceNpueBuwKHkDa1fF3YhsbGaAjxH9ByQ8zvL<BR>

ZOtVoCyqxKqc8Iar5s4HdsJZ96NmfNtYCwrhPpyUBh6p-wjr9<BR>

w1LceEAeKVa--OPcudOnst7neDJo1veqG5FFed_78E5YQguhL<BR>

f4Q3hgqR_8fxqaNz5Bv6m5sdwM5SyewpquafrEK5NCkqRk1Ws<BR>

VwCDQd1BHyOqSysQTWrl9RIagwH542NTyMgWdbGfFjQPK0R-_<BR>

8DtBE5k7_MwgPR5O-sT0wf-ZH0vyEHa-tPkdaZg0lsvFLdKMF<BR>

kd4va8Ss3qMMV2lMU3us8pTf6Gc_7fN69gwhLsNNd5JrxFukl<BR>

-vv92bTt3OJ-vALhGAZuyopMwxQRELtBi-8Hp7kwJPsYEOwNE<BR>

MSj5bzZlNcI7pN6g3opJSKx4Bw2PcZaK9AWXwrHLO1gkb0Ow9<BR>

AWJk2XCPrLSIfOBG-HJ3QXy4r1iN0SGOjdyFmCeWZMY18n8S4<BR>

n_BolOaL0EgzwgnJVJOrbz4tjJVTRAdibuD_7u8ChqjCma4RR<BR>

LSGCuwUROgEVZ8azyLZELxJHBu38sHqts19cy2rXOyYVoXcT-<BR>

gngelxu-y2gBs8i8JYcURBDsV4vL5eX2vsY4JmrqUHYZAErCF<BR>

ZLejk0bO3r3LlU-NxB8BOO_i-QOQszn6uvKIvkY1kYQ2pFSQq<BR>

Ba9GSNz6ojDEM7po2lolAQ0CPd62YHje<BR>

+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+<BR></BLOCKQUOTE>

<p></blockquote><table width=100% cellpadding=0 cellspacing=0><tr><td bgcolor="#3366cc"><img alt="" width=1 height=4></td></tr></table></body></html>1

But why you say you can't try it? You can try the code:

$WinHttpReq = ObjCreate("winhttp.winhttprequest.5.1")
    $URL = "http://www.google.es/search?&gl= XX&proba=YY&q=sushi+barcelona"
    $WinHttpReq.open("GET",$URL)
    $WinHttpReq.send()
    $HTML = $WinHttpReq.ResponseText                        ;Recuperamos el HTML de la web
    ConsoleWrite($HTML) ; there's the response
    $answer = _StringBetween($HTML, '<div id="center_col">','</div>')
    ConsoleWrite("number of results: " & $answer[0])

I agree that the problem is that maybe i'm not using the correct tags in StringBetween, but how could i know what the right tags are if the response is not completed?? Aren't the tags supposed to be the same as if i do the same query in a client (like firefox) and check the tags with a tool like firebug??

Thanks!!

Link to comment
Share on other sites

I'll eat my hat if you can find any <div id="center_col"> or even a <div in the html code. Obviously the _StringBetween fails when the string you're looking for isn't found.

I wonder - how can you attempt to use _StringBetween blindly? Have you read the documentation for that function? Have you checked the html code returned to see where the info you want is located? Sorry but you've done a very poor job and wasted my time and other users time (those who read your post).

Please, in the future, do your job first, then ask for help.

It is obvious from the code you posted that you don't get anything back from Google "Your client does not have permission to get URL"

And about why I canm't test your code, I'll quote myself:

Even if I correct that error still the script refuses to work on my computer - it exits on $WinHttpReq.send()

SNMP_UDF ... for SNMPv1 and v2c so far, GetBulk and a new example script

wannabe "Unbeatable" Tic-Tac-Toe

Paper-Scissor-Rock ... try to beat it anyway :)

Link to comment
Share on other sites

Sorry for bothering you enaiman, but i just asked why it doesn't works because i used the same function with the same behavour in other scripst and it worked, so that was my doubt. Obviously, i've read all the documentation about the function and i'm not agree thinking that i'm using the functions blindly because, as i told you, i used a very similar code in other scripts and it worked. I know that there's a problem with the tags but i thought that the tags where the same as the ones you can get by watching the page code.

Of course i checked the info fo the HTML of the page and the tags i'm expecting, but doesn't work.

I did my job and i couldn't find the answer and thats why i asked to the other poeple in the forum. I'm sorry if you think i'm making you loose your time, it was not my intention enaiman.

Thanks for your help anyway, i'll try to do it better in the future (i'm trying to learn, i'm new with Autoit) but please don't be that rude in the future too.

Link to comment
Share on other sites

I don't think I was rude at all; consider it more like a "cold shower".

A simple look at the html code returned could have told you that the information you seek is simply not there; that's why I said that you have used _StringBetween blindly ...

As a lesson for the future: whenever something doesn't work in a script - do some debugging, MsgBox, _ArrayDisplay, ConsoleWrite and many other things are there to help any debugging. I'm not expecting you to be an expert in debugging so soon but you can start doing it - it will make your scripter life much easier.

SNMP_UDF ... for SNMPv1 and v2c so far, GetBulk and a new example script

wannabe "Unbeatable" Tic-Tac-Toe

Paper-Scissor-Rock ... try to beat it anyway :)

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...