Jump to content

Download from a repeating pattern


Recommended Posts

Hi all!

In Google Image search, the link to every search is this:

<a class="irc_fsl irc_but" href="/url?sa=i&amp;rct=j&amp;q=&amp;esrc=s&amp;source=images&amp;cd=&amp;cad=rja&amp;uact=8&amp;docid=52J1DT9EdL3YlM&amp;tbnid=HWENGOZPg550qM:&amp;ved=0CAIQjBw&amp;url=http%3A%2F%2Fwww.autoitscript.com%2Fw%2Fimages%2Fc%2Fc7%2FAutoit-1-2-3.jpg&amp;ei=NlbPU9fOCanQ4QSR9IH4Bg&amp;bvm=bv.71667212,d.bGE&amp;psig=AFQjCNGuYULJGRB-DrgwyzLTK9QFXy__yA&amp;ust=1406182924754493" data-ved="0CAIQjBw" data-href="http://www.autoitscript.com/w/images/c/c7/Autoit-1-2-3.jpg"><span class="irc_but_t">Visa bild</span></a>

Would it be possible in Autoit to search for each and every say class="irc_fsl irc_but", and put them each in a array? Maybe even output them into an array much like _ArrayFindAll does. Would also be nice if it did this from default browser and not IE because then I can deal with the anti spam quota!

Link to comment
Share on other sites

What should you array look like? Splitted by class= ... ?

Scripts & functions Organize Includes Let Scite organize the include files

Yahtzee The game "Yahtzee" (Kniffel, DiceLion)

LoginWrapper Secure scripts by adding a query (authentication)

_RunOnlyOnThis UDF Make sure that a script can only be executed on ... (Windows / HD / ...)

Internet-Café Server/Client Application Open CD, Start Browser, Lock remote client, etc.

MultipleFuncsWithOneHotkey Start different funcs by hitting one hotkey different times

Link to comment
Share on other sites

Being totally blunt, I'd settle for if each row in the array looked like above because I can simply use string manipulation to extract the desired part of it. But if you look at above again, you see the link "'>" at the end that is the direct link to the image. That is my goal.

Link to comment
Share on other sites

If you were looking for image each time you could something like this (there are probably way better ways to do this ... I sadly don't know regular expressions - which would be probably way better).  This works on the example you provided - I don't know if it would work on every result:

#include <Array.au3>
$string = '<a class="irc_fsl irc_but" href="/url?sa=i&amp;rct=j&amp;q=&amp;esrc=s&amp;source=images&amp;cd=&amp;cad=rja&amp;uact=8&amp;docid=52J1DT9EdL3YlM&amp;tbnid=HWENGOZPg550qM:&amp;ved=0CAIQjBw&amp;url=http%3A%2F%2Fwww.autoitscript.com%2Fw%2Fimages%2Fc%2Fc7%2FAutoit-1-2-3.jpg&amp;ei=NlbPU9fOCanQ4QSR9IH4Bg&amp;bvm=bv.71667212,d.bGE&amp;psig=AFQjCNGuYULJGRB-DrgwyzLTK9QFXy__yA&amp;ust=1406182924754493" data-ved="0CAIQjBw" data-href="http://www.autoitscript.com/w/images/c/c7/Autoit-1-2-3.jpg"><span class="irc_but_t">Visa bild</span></a>'
$delim=-'href='
$valueArray=StringSplit($string,'href=',1)
;_ArrayDisplay($valueArray)

for $a=0 to UBound($valueArray)-1
    $containsHTTP = StringInStr($valueArray[$a],"http:")
    $containsJpg=StringInStr($valueArray[$a],"jpg")
    if $containsJpg<>0 and $containsHTTP<>0 Then
        ;MsgBox("","","This index contains your http with JPG: "&$a)
    $endofURL = StringInStr($valueArray[$a],'">')
    $lineLen=StringLen($valueArray[$a])
    $modifiedResult=StringTrimRight($valueArray[$a],(($lineLen-$endofURL)))
    ConsoleWrite(@crlf&$modifiedResult&@crlf)
    EndIf
Next

Build your own poker game with AutoIt: pokerlogic.au3 | Learn To Program Using FREE Tools with AutoIt

Link to comment
Share on other sites

As each link is tagged <a ... data-href="link"  you can try this regex way

$string = '<a class="irc_fsl irc_but" href="/url?sa=i&amp;rct=j&amp;q=&amp;esrc=s&amp;source=images&amp;cd=&amp;cad=rja&amp;uact=8&amp;docid=52J1DT9EdL3YlM&amp;tbnid=HWENGOZPg550qM:&amp;ved=0CAIQjBw&amp;url=http%3A%2F%2Fwww.autoitscript.com%2Fw%2Fimages%2Fc%2Fc7%2FAutoit-1-2-3.jpg&amp;ei=NlbPU9fOCanQ4QSR9IH4Bg&amp;bvm=bv.71667212,d.bGE&amp;psig=AFQjCNGuYULJGRB-DrgwyzLTK9QFXy__yA&amp;ust=1406182924754493" data-ved="0CAIQjBw" data-href="http://www.autoitscript.com/w/images/c/c7/Autoit-1-2-3.jpg"><span class="irc_but_t">Visa bild</span></a>'

Msgbox(0,"", StringRegExp($string, 'data-href="([^"]+)', 3)[0] )

and (maybe) on the source code of the page

#include <Array.au3>

$txt = ... ; source code
$aLinks = StringRegExp($txt, 'data-href="([^"]+)', 3)
_ArrayDisplay($aLinks)
Edited by mikell
Link to comment
Share on other sites

I greatly appreciate the effort people are giving in helping me with this. However, it does appear my explanation of the situation is flawed. So I will try to explain a bit deeper.

This is the (right now example) website:

https://www.google.se/search?q=autoit+download+google+search&source=lnms&tbm=isch&sa=X&ei=fQ_SU_6FJKrMygP66IDwCw&ved=0CAgQ_AUoAQ&biw=1366&bih=683

At that place, each and every result is represented by:

<a class="irc_fsl irc_but" href="/url?sa=i&amp;rct=j&amp;q=&amp;esrc=s&amp;source=images&amp;cd=&amp;cad=rja&amp;uact=8&amp;docid=52J1DT9EdL3YlM&amp;tbnid=HWENGOZPg550qM:&amp;ved=0CAIQjBw&amp;url=http%3A%2F%2Fwww.autoitscript.com%2Fw%2Fimages%2Fc%2Fc7%2FAutoit-1-2-3.jpg&amp;ei=NlbPU9fOCanQ4QSR9IH4Bg&amp;bvm=bv.71667212,d.bGE&amp;psig=AFQjCNGuYULJGRB-DrgwyzLTK9QFXy__yA&amp;ust=1406182924754493" data-ved="0CAIQjBw" data-href="http://www.autoitscript.com/w/images/c/c7/Autoit-1-2-3.jpg"><span class="irc_but_t">Visa bild</span></a>

So if I can extract each and every above into preferably an array, that would mean Google image scraping to me since extracting the direct link won't be much of a hassle!

As a side note, this is also how I managed cleaning it up (disclaimer: this is NOT orignally what I need help with, as it is already done);

;© Original editor and creator Sodori 2014-07-25

Local $imageRaw = '<a class="irc_fsl irc_but" href="/url?sa=i&amp;rct=j&amp;q=&amp;esrc=s&amp;source=images&amp;cd=&amp;cad=rja&amp;uact=8&amp;docid=52J1DT9EdL3YlM&amp;tbnid=HWENGOZPg550qM:&amp;ved=0CAIQjBw&amp;url=http%3A%2F%2Fwww.autoitscript.com%2Fw%2Fimages%2Fc%2Fc7%2FAutoit-1-2-3.jpg&amp;ei=NlbPU9fOCanQ4QSR9IH4Bg&amp;bvm=bv.71667212,d.bGE&amp;psig=AFQjCNGuYULJGRB-DrgwyzLTK9QFXy__yA&amp;ust=1406182924754493" data-ved="0CAIQjBw" data-href="http://www.autoitscript.com/w/images/c/c7/Autoit-1-2-3.jpg"><span class="irc_but_t">Visa bild</span></a>'


Local $index = StringInStr($imageRaw, 'data-href="', 0, -1) ;Determing the start
Local $imageProcess = $index & @LF & StringRight($imageRaw, StringLen($imageRaw) - $index) ;Cutting off anything unneeded ahead of the link
$imageprocess = StringReplace($imageProcess, 'ata-href="', "") ;Cleaning that one up, the link is now clear from start

;Starting to clean the end by checking where the link ends. Did not do a for loop since I can manually exit it instead and save time!
Local $i = 0
While $i <> -1
   If StringInStr(StringRight($imageProcess, $i), '"><') > 0 Then
      Local $imageProcessed = StringLeft($imageProcess, StringLen($imageProcess) -$i)
      $i = -1
   Else
      $i += 1
   EndIf
WEnd

ConsoleWrite($imageProcessed & @LF)

So, as you can see, I just need that array with a bunch of $imageRaw that I can convert into direct links.

Link to comment
Share on other sites

You see provided link on my post? From there you got a good amount of images. Each image or search result, I want to end up downloading them. But to download them I need to extract the direct link for each one. And to do THAT I first need to fetch the raw code from the image and preferably store it in an array with each result in it's own line. Guess my only reason to bring up such a big code and not simply the direct link, was to give the person with the know how a bit of lee way if you understand me. Was that any better?

Link to comment
Share on other sites

I need help to convert https://www.google.se/search?q=autoit+download+google+search&source=lnms&tbm=isch&sa=X&ei=fQ_SU_6FJKrMygP66IDwCw&ved=0CAgQ_AUoAQ&biw=1366&bih=683 into an array of each search result it contains.

The code I put out that returns into your console was purely none related to my issue. I just guessed it would be helpful material for anyone stumbling over this issue in the future as well as a solid proof that I have already cracked that nut.

Link to comment
Share on other sites

I am using autoit on my work, and got quite a lot of projects at the same time. My other thread was about a different task to this. Thus I felt better separating them as they booth are two different issues. Even if I would eventually have to address InetGet for this one as well. But not anymore, thanks to a good community!

Addressing the matter back at hand, eventually I will get into that. But that will be for greater project than this as well as when I feel more ready in learning a new programming code, as it seems anyways. I was hoping I could get away cheaply for now, since it's not THAT much worry about speed on this program I am making. It's more or less going whenever Google likes it to go.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...