Jump to content

Best way to use google image search? [SOLVED]


rootx
 Share

Recommended Posts

I would like to download the first 5 images in a folder. THX.

#include <INet.au3>
#include <String.au3>
#include <Array.au3>


Global $sSource, $aImgURL, $sKeyWord

$sKeyWord = "pug"

$sSource = _INetGetSource("http://www.google.com/search?q=" & $sKeyWord & "&tbm=isch")

$aImgURL = _StringBetween($sSource, 'src="', '"')


For $x = 1 to UBound($aImgURL)-1
    ConsoleWrite($aImgURL[$x]&@CRLF)
Next

 

Edited by rootx
Link to comment
Share on other sites

Link to comment
Share on other sites

1 hour ago, j0kky said:

You can download 'em using InetGet, they don't have a standard name, but to know the extension you should search for their magic number.

thx, but the question is how to intercept the url of the source and not the thumbnail, does anyone have any idea ?? THX

#include <INet.au3>
#include <String.au3>
#include <Array.au3>


Global $sSource, $aImgURL, $sKeyWord

$sKeyWord = "pug"
$type = "jpg"
$width = "800"
$height = "600"

$sSource = _INetGetSource("http://www.google.ch/search?q="& $sKeyWord &"&as_st=y&hl=it&tbs=ift:"&$type&",isz:ex,iszw:"&$width&",iszh:"&$height&"&tbm=isch&source=lnt")

$aImgURL = _StringBetween($sSource, 'src="', '"')


For $x = 1 to UBound($aImgURL)-1
    ConsoleWrite($aImgURL[$x]&@CRLF)
    InetGet($aImgURL[$x],@ScriptDir&"\"&$x&".jpg")
Next

 

Link to comment
Share on other sites

Try to save $sSource to an .html file and open it, you will see it differs from the page you're seeing while visiting the same url with browser:

https://www.google.ch/search?q=pug&as_st=y&hl=it&tbs=ift:jpg,isz:ex,iszw:800,iszh:600&tbm=isch&source=lnt&gws_rd=ssl

In my opinion you should play with:

_IEDocReadHTML

 

Edited by j0kky
Link to comment
Share on other sites

22 hours ago, j0kky said:

Try to save $sSource to an .html file and open it, you will see it differs from the page you're seeing while visiting the same url with browser:

https://www.google.ch/search?q=pug&as_st=y&hl=it&tbs=ift:jpg,isz:ex,iszw:800,iszh:600&tbm=isch&source=lnt&gws_rd=ssl

In my opinion you should play with:

_IEDocReadHTML

 

_IEDocReadHTML doesn't work. but....

#include <IE.au3>
#include <MsgBoxConstants.au3>
#include <Inet.au3>
#include <Array.au3>
#include <File.au3>
#include <String.au3>



$x = _INetGetSource("http://www.google.ch/search?as_st=y&tbm=isch&hl=it&as_q=pug&as_epq=&as_oq=&as_eq=&cr=&as_sitesearch=&safe=images&tbs=ift:jpg")

FileWrite(@ScriptDir&"\9.html",$x)
Local $aRetArray
_FileReadToArray(@ScriptDir&"\9.html", $aRetArray)

;_ArrayDisplay($aRetArray, "Default Search")
 Local $aArray = _StringBetween($x, 'href="', '"')

 ; _ArrayDisplay($aArray, "Default Search")

    For $xs = 1 to UBound($aArray)-1
        ConsoleWrite($aArray[$xs]&@CRLF)
    Next

the source code isn't correct... beacuse if you read from the browser you find easly... this

/imgres?imgurl=http%3A%2F%2Fcdn3-www.dogtime.com%2Fassets%2Fuploads%2F2011%2F01%2Ffile_23124_pug-460x290.jpg&imgrefurl=http%3A%2F%2Fdogtime.com%2Fdog-breeds%2Fpug&docid=BTPG4yF8_O0fQM&tbnid=8FbyFFzHno3BCM%3A&vet=1&w=460&h=290&hl=it&safe=images&bih=715&biw=1156&ved=0ahUKEwif1eWAys7QAhUDzxQKHc39AREQMwgdKAAwAA&iact=mrc&uact=8

But Autoit extract... this

http://dogtime.com/dog-breeds/pug&amp;sa=U&amp;ved=0ahUKEwiU-sLNzc7QAhUBfhoKHYuWAP4QwW4IGDAA&amp;usg=AFQjCNFtqNOflzABBIVCR79FpfulvDD6Pw

Why??? Any Idea? I need to read raw source html. THX


 

Link to comment
Share on other sites

15 hours ago, rootx said:

_IEDocReadHTML doesn't work.

What does it mean, exatly?

#include <String.au3>
#include <ie.au3>


Global $sSource, $aImgURL, $sKeyWord

$sKeyWord = "pug"
$type = "jpg"
$width = "800"
$height = "600"

$obj = _IECreate("http://www.google.ch/search?q="& $sKeyWord &"&as_st=y&hl=it&tbs=ift:"&$type&",isz:ex,iszw:"&$width&",iszh:"&$height&"&tbm=isch&source=lnt")
$sSource = _IEDocReadHTML($obj)
FileWrite("log.html", $sSource)

$aImgURL = _StringBetween($sSource, '"ou":"', '"')


For $x = 1 to UBound($aImgURL)-1
    ConsoleWrite($aImgURL[$x]&@CRLF)
    ;InetGet($aImgURL[$x],@ScriptDir&"\"&$x&".jpg")
Next

 

Edited by j0kky
Link to comment
Share on other sites

1 hour ago, j0kky said:

What does it mean, exatly?

#include <String.au3>
#include <ie.au3>


Global $sSource, $aImgURL, $sKeyWord

$sKeyWord = "pug"
$type = "jpg"
$width = "800"
$height = "600"

$obj = _IECreate("http://www.google.ch/search?q="& $sKeyWord &"&as_st=y&hl=it&tbs=ift:"&$type&",isz:ex,iszw:"&$width&",iszh:"&$height&"&tbm=isch&source=lnt")
$sSource = _IEDocReadHTML($obj)
FileWrite("log.html", $sSource)

$aImgURL = _StringBetween($sSource, '"ou":"', '"')


For $x = 1 to UBound($aImgURL)-1
    ConsoleWrite($aImgURL[$x]&@CRLF)
    ;InetGet($aImgURL[$x],@ScriptDir&"\"&$x&".jpg")
Next

 

 

Ok but there is a way to have a regExp to intercept  start with [http://]   end with [.jpg] that because some url have a strange path.... 4 example....

"http://vignette1.wikia.nocookie.net/dogs/images/4/47/Gadget_the_pug_expressive_eyes.jpg/revision/latest?cb\u003d20110813111020"

I added a regex to save the file with the original name.

#include <String.au3>
#include <ie.au3>


Global $sSource, $aImgURL, $sKeyWord

DirCreate(@ScriptDir&"\img")

$folder = (@ScriptDir&"\img\")

$sKeyWord = "pug"
$type = "jpg"
$width = "800"
$height = "600"

$obj = _IECreate("http://www.google.ch/search?q="& $sKeyWord &"&as_st=y&hl=it&tbs=ift:"&$type&",isz:ex,iszw:"&$width&",iszh:"&$height&"&tbm=isch&source=lnt")
$sSource = _IEDocReadHTML($obj)
FileWrite("log.html", $sSource)

$aImgURL = _StringBetween($sSource, '"ou":"', '"')


    For $x = 1 to UBound($aImgURL)-1
        ConsoleWrite($aImgURL[$x]&@CRLF)
        InetGet($aImgURL[$x],$folder&StringRegExpReplace($aImgURL[$x], '.*/([^-]+).*', "$1"))
    Next

_IEQuit($obj)

 

Link to comment
Share on other sites

StringRegExp($aImgURL[$x], '(?i)(http.?://.*\.(jpg|bmp|cms|jpeg))', 1)

You have the limitation to insert between parentesis each known image extension. Anyhow implementing an error checking line is a good idea, because if there is an extension you haven't expected, your script will fail.

Edited by j0kky
now it catches https too
Link to comment
Share on other sites

An alternative way without using IE.

 

#include <Array.au3>
#include <String.au3>
Global Const $HTTP_STATUS_OK = 200

Local $sKeyWord = "house"
Local $sURL = "http://www.google.com/search?q=" & $sKeyWord & "&tbm=isch"
Local $sData = HttpGet($sURL)
;~ ConsoleWrite($sData & @CRLF)

Local $aMetas = _StringBetween($sData, '"rg_meta">', '</div>')
;~ _ArrayDisplay($aMetas)

Local $sUrlImage = ""
Local $sImageName = ""
Local $sExtension = ""

If IsArray($aMetas) Then
    If UBound($aMetas) >= 5 Then
        For $i = 0 To 4
            ConsoleWrite(">Image Number: " & $i + 1 & @CRLF)
            $sUrlImage = _GetImageUrl($aMetas[$i])
            $sImageName = _GetImageName($aMetas[$i]) ;maybe you want to get the name from image url instead of metadata
            $sExtension = _GetImageExtension($aMetas[$i])
            ConsoleWrite($sUrlImage & @CRLF)
            ConsoleWrite($sImageName & @CRLF)
            ConsoleWrite($sExtension & @CRLF)
            ConsoleWrite(@CRLF)
        Next
    EndIf
EndIf

Func _GetImageName($sData)
    Local $aData = _StringBetween($sData, '"s":"', '"')
    If IsArray($aData) Then Return $aData[0]
EndFunc   ;==>_GetImageName

Func _GetImageUrl($sData)
    Local $aData = _StringBetween($sData, '"ou":"', '"')
    If IsArray($aData) Then Return $aData[0]
EndFunc   ;==>_GetImageUrl

Func _GetImageExtension($sData)
    Local $aData = _StringBetween($sData, '"ity":"', '"')
    If IsArray($aData) Then Return $aData[0]
EndFunc   ;==>_GetImageExtension


Func HttpGet($sURL)
    Local $oHTTP = ObjCreate("WinHttp.WinHttpRequest.5.1")
    $oHTTP.Open("GET", $sURL, False)
    $oHTTP.SetRequestHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:48.0) Gecko/20100101 Firefox/48.0")
    $oHTTP.SetRequestHeader("Content-Type", "text/plain; charset=utf-8")
    If (@error) Then Return SetError(1, 0, 0)
    $oHTTP.Send()
    If (@error) Then Return SetError(2, 0, 0)
    If ($oHTTP.Status <> $HTTP_STATUS_OK) Then Return SetError(3, 0, 0)
    Return SetError(0, 0, $oHTTP.ResponseText)
EndFunc   ;==>HttpGet

Make sure to clean up the file name.

Saludos 

Link to comment
Share on other sites

#include <String.au3>
#include <ie.au3>
#include <WinAPIFiles.au3>
#include <InetConstants.au3>
#include <Array.au3>
Global $sSource, $aImgURL, $sKeyWord

$sKeyWord = "pug"
$type = "jpg"
$width = "800"
$height = "600"


$obj = _IECreate("http://www.google.ch/search?q="& $sKeyWord &"&as_st=y&hl=it&tbs=ift:"&$type&",isz:ex,iszw:"&$width&",iszh:"&$height&"&tbm=isch&source=lnt")
$sSource = _IEDocReadHTML($obj)
FileWrite("log.html", $sSource)

$aImgURL = _StringBetween($sSource,'imgurl=', '&amp;')

;_ArrayDisplay($aImgURL)

For $x = 1 to UBound($aImgURL)-1
    FileWrite(@ScriptDir&"\1.txt",StringReplace(StringReplace($aImgURL[$x],"%3A",":"),"%2F","/")&@CRLF)
    $url = StringReplace(StringReplace($aImgURL[$x],"%3A",":"),"%2F","/")
Next

$file = FileReadToArray(@ScriptDir&"\1.txt")


For $s = 1 to UBound($file)-1

    $last = StringSplit($file[$s], '/')
    $ls = UBound($last)-1
    ConsoleWrite(StringSplit($file[$s], '/', $STR_ENTIRESPLIT)[$ls]&@CRLF)

    If StringLeft($file[$s],5) = "https" Then
        ConsoleWrite(StringRegExp($file[$s],'(?i)(https://.*\.(jpg|bmp|cms|jpeg))', 1)[0]&@CRLF)
        InetGet($file[$s],@ScriptDir&"\x\"&StringSplit($file[$s], '/', $STR_ENTIRESPLIT)[$ls])
    Else
        ConsoleWrite(StringRegExp($file[$s],'(?i)(http://.*\.(jpg|bmp|cms|jpeg))', 1)[0]&@CRLF)
        InetGet($file[$s],@ScriptDir&"\x\"&StringSplit($file[$s], '/', $STR_ENTIRESPLIT)[$ls])
    EndIf
Next
_IEQuit($obj)

!!! only one error.... ueRSGNo.jpg%3F1 I changed the save file path name and the https... case.... now I downloaded 88 file correctly... Any suggestion to improve it? THX

PS: how can run ie hidden? I need to grab only the images Thx

Link to comment
Share on other sites

This is my version without all those StringReplace:

#include <String.au3>
#include <ie.au3>

Global $sSource, $aImgURL, $sKeyWord

$sKeyWord = "pug"
$type = "jpg"
$width = "800"
$height = "600"

$obj = _IECreate("http://www.google.ch/search?q="& $sKeyWord &"&as_st=y&hl=it&tbs=ift:"&$type&",isz:ex,iszw:"&$width&",iszh:"&$height&"&tbm=isch&source=lnt", 0, 0)
$sSource = _IEDocReadHTML($obj)

$aImgURL = _StringBetween($sSource, '"ou":"', '"')

For $x = 1 to UBound($aImgURL) - 1
    ;$sPattern = '(?i)(http.?://.*\.(jpg|bmp|cms|jpeg))' ; http?://.../name.ext
    $sPattern = '(?i).*/(.*\.(jpg|bmp|cms|jpeg))' ; name.ext
    $aRegEx = StringRegExp($aImgURL[$x], $sPattern, 1)
    If @error Then ContinueLoop
    ConsoleWrite($aRegEx[0] & @CRLF)
    InetGet($aImgURL[$x], @ScriptDir & "\" & $aRegEx[0])
Next

_IEQuit($obj)

 

Edited by j0kky
Link to comment
Share on other sites

2 hours ago, j0kky said:

This is my version without all those StringReplace:

#include <String.au3>
#include <ie.au3>

Global $sSource, $aImgURL, $sKeyWord

$sKeyWord = "pug"
$type = "jpg"
$width = "800"
$height = "600"

$obj = _IECreate("http://www.google.ch/search?q="& $sKeyWord &"&as_st=y&hl=it&tbs=ift:"&$type&",isz:ex,iszw:"&$width&",iszh:"&$height&"&tbm=isch&source=lnt", 0, 0)
$sSource = _IEDocReadHTML($obj)

$aImgURL = _StringBetween($sSource, '"ou":"', '"')

For $x = 1 to UBound($aImgURL) - 1
    ;$sPattern = '(?i)(http.?://.*\.(jpg|bmp|cms|jpeg))' ; http?://.../name.ext
    $sPattern = '(?i).*/(.*\.(jpg|bmp|cms|jpeg))' ; name.ext
    $aRegEx = StringRegExp($aImgURL[$x], $sPattern, 1)
    If @error Then ContinueLoop
    ConsoleWrite($aRegEx[0] & @CRLF)
    InetGet($aImgURL[$x], @ScriptDir & "\" & $aRegEx[0])
Next

_IEQuit($obj)

 

Nice,  downloaded 94 jpg, the winer is you. THX

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...