rootx

Best way to use google image search? [SOLVED]

16 posts in this topic

#1 ·  Posted (edited)

I would like to download the first 5 images in a folder. THX.

#include <INet.au3>
#include <String.au3>
#include <Array.au3>


Global $sSource, $aImgURL, $sKeyWord

$sKeyWord = "pug"

$sSource = _INetGetSource("http://www.google.com/search?q=" & $sKeyWord & "&tbm=isch")

$aImgURL = _StringBetween($sSource, 'src="', '"')


For $x = 1 to UBound($aImgURL)-1
    ConsoleWrite($aImgURL[$x]&@CRLF)
Next

 

Edited by rootx

Share this post


Link to post
Share on other sites



Share this post


Link to post
Share on other sites
28 minutes ago, Danyfirex said:

Hello. and your issue is...?

 

Saludos

get the name of the img and save it whit the correct type and name.

Share this post


Link to post
Share on other sites
1 hour ago, j0kky said:

You can download 'em using InetGet, they don't have a standard name, but to know the extension you should search for their magic number.

thx, but the question is how to intercept the url of the source and not the thumbnail, does anyone have any idea ?? THX

#include <INet.au3>
#include <String.au3>
#include <Array.au3>


Global $sSource, $aImgURL, $sKeyWord

$sKeyWord = "pug"
$type = "jpg"
$width = "800"
$height = "600"

$sSource = _INetGetSource("http://www.google.ch/search?q="& $sKeyWord &"&as_st=y&hl=it&tbs=ift:"&$type&",isz:ex,iszw:"&$width&",iszh:"&$height&"&tbm=isch&source=lnt")

$aImgURL = _StringBetween($sSource, 'src="', '"')


For $x = 1 to UBound($aImgURL)-1
    ConsoleWrite($aImgURL[$x]&@CRLF)
    InetGet($aImgURL[$x],@ScriptDir&"\"&$x&".jpg")
Next

 

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

Try to save $sSource to an .html file and open it, you will see it differs from the page you're seeing while visiting the same url with browser:

https://www.google.ch/search?q=pug&as_st=y&hl=it&tbs=ift:jpg,isz:ex,iszw:800,iszh:600&tbm=isch&source=lnt&gws_rd=ssl

In my opinion you should play with:

_IEDocReadHTML

 

Edited by j0kky

Share this post


Link to post
Share on other sites
22 hours ago, j0kky said:

Try to save $sSource to an .html file and open it, you will see it differs from the page you're seeing while visiting the same url with browser:

https://www.google.ch/search?q=pug&as_st=y&hl=it&tbs=ift:jpg,isz:ex,iszw:800,iszh:600&tbm=isch&source=lnt&gws_rd=ssl

In my opinion you should play with:

_IEDocReadHTML

 

_IEDocReadHTML doesn't work. but....

#include <IE.au3>
#include <MsgBoxConstants.au3>
#include <Inet.au3>
#include <Array.au3>
#include <File.au3>
#include <String.au3>



$x = _INetGetSource("http://www.google.ch/search?as_st=y&tbm=isch&hl=it&as_q=pug&as_epq=&as_oq=&as_eq=&cr=&as_sitesearch=&safe=images&tbs=ift:jpg")

FileWrite(@ScriptDir&"\9.html",$x)
Local $aRetArray
_FileReadToArray(@ScriptDir&"\9.html", $aRetArray)

;_ArrayDisplay($aRetArray, "Default Search")
 Local $aArray = _StringBetween($x, 'href="', '"')

 ; _ArrayDisplay($aArray, "Default Search")

    For $xs = 1 to UBound($aArray)-1
        ConsoleWrite($aArray[$xs]&@CRLF)
    Next

the source code isn't correct... beacuse if you read from the browser you find easly... this

/imgres?imgurl=http%3A%2F%2Fcdn3-www.dogtime.com%2Fassets%2Fuploads%2F2011%2F01%2Ffile_23124_pug-460x290.jpg&imgrefurl=http%3A%2F%2Fdogtime.com%2Fdog-breeds%2Fpug&docid=BTPG4yF8_O0fQM&tbnid=8FbyFFzHno3BCM%3A&vet=1&w=460&h=290&hl=it&safe=images&bih=715&biw=1156&ved=0ahUKEwif1eWAys7QAhUDzxQKHc39AREQMwgdKAAwAA&iact=mrc&uact=8

But Autoit extract... this

http://dogtime.com/dog-breeds/pug&amp;sa=U&amp;ved=0ahUKEwiU-sLNzc7QAhUBfhoKHYuWAP4QwW4IGDAA&amp;usg=AFQjCNFtqNOflzABBIVCR79FpfulvDD6Pw

Why??? Any Idea? I need to read raw source html. THX


 

Share this post


Link to post
Share on other sites

#8 ·  Posted (edited)

15 hours ago, rootx said:

_IEDocReadHTML doesn't work.

What does it mean, exatly?

#include <String.au3>
#include <ie.au3>


Global $sSource, $aImgURL, $sKeyWord

$sKeyWord = "pug"
$type = "jpg"
$width = "800"
$height = "600"

$obj = _IECreate("http://www.google.ch/search?q="& $sKeyWord &"&as_st=y&hl=it&tbs=ift:"&$type&",isz:ex,iszw:"&$width&",iszh:"&$height&"&tbm=isch&source=lnt")
$sSource = _IEDocReadHTML($obj)
FileWrite("log.html", $sSource)

$aImgURL = _StringBetween($sSource, '"ou":"', '"')


For $x = 1 to UBound($aImgURL)-1
    ConsoleWrite($aImgURL[$x]&@CRLF)
    ;InetGet($aImgURL[$x],@ScriptDir&"\"&$x&".jpg")
Next

 

Edited by j0kky
1 person likes this

Share this post


Link to post
Share on other sites
1 hour ago, j0kky said:

What does it mean, exatly?

#include <String.au3>
#include <ie.au3>


Global $sSource, $aImgURL, $sKeyWord

$sKeyWord = "pug"
$type = "jpg"
$width = "800"
$height = "600"

$obj = _IECreate("http://www.google.ch/search?q="& $sKeyWord &"&as_st=y&hl=it&tbs=ift:"&$type&",isz:ex,iszw:"&$width&",iszh:"&$height&"&tbm=isch&source=lnt")
$sSource = _IEDocReadHTML($obj)
FileWrite("log.html", $sSource)

$aImgURL = _StringBetween($sSource, '"ou":"', '"')


For $x = 1 to UBound($aImgURL)-1
    ConsoleWrite($aImgURL[$x]&@CRLF)
    ;InetGet($aImgURL[$x],@ScriptDir&"\"&$x&".jpg")
Next

 

 

Ok but there is a way to have a regExp to intercept  start with [http://]   end with [.jpg] that because some url have a strange path.... 4 example....

"http://vignette1.wikia.nocookie.net/dogs/images/4/47/Gadget_the_pug_expressive_eyes.jpg/revision/latest?cb\u003d20110813111020"

I added a regex to save the file with the original name.

#include <String.au3>
#include <ie.au3>


Global $sSource, $aImgURL, $sKeyWord

DirCreate(@ScriptDir&"\img")

$folder = (@ScriptDir&"\img\")

$sKeyWord = "pug"
$type = "jpg"
$width = "800"
$height = "600"

$obj = _IECreate("http://www.google.ch/search?q="& $sKeyWord &"&as_st=y&hl=it&tbs=ift:"&$type&",isz:ex,iszw:"&$width&",iszh:"&$height&"&tbm=isch&source=lnt")
$sSource = _IEDocReadHTML($obj)
FileWrite("log.html", $sSource)

$aImgURL = _StringBetween($sSource, '"ou":"', '"')


    For $x = 1 to UBound($aImgURL)-1
        ConsoleWrite($aImgURL[$x]&@CRLF)
        InetGet($aImgURL[$x],$folder&StringRegExpReplace($aImgURL[$x], '.*/([^-]+).*', "$1"))
    Next

_IEQuit($obj)

 

Share this post


Link to post
Share on other sites

#10 ·  Posted (edited)

StringRegExp($aImgURL[$x], '(?i)(http.?://.*\.(jpg|bmp|cms|jpeg))', 1)

You have the limitation to insert between parentesis each known image extension. Anyhow implementing an error checking line is a good idea, because if there is an extension you haven't expected, your script will fail.

Edited by j0kky
now it catches https too

Share this post


Link to post
Share on other sites

An alternative way without using IE.

 

#include <Array.au3>
#include <String.au3>
Global Const $HTTP_STATUS_OK = 200

Local $sKeyWord = "house"
Local $sURL = "http://www.google.com/search?q=" & $sKeyWord & "&tbm=isch"
Local $sData = HttpGet($sURL)
;~ ConsoleWrite($sData & @CRLF)

Local $aMetas = _StringBetween($sData, '"rg_meta">', '</div>')
;~ _ArrayDisplay($aMetas)

Local $sUrlImage = ""
Local $sImageName = ""
Local $sExtension = ""

If IsArray($aMetas) Then
    If UBound($aMetas) >= 5 Then
        For $i = 0 To 4
            ConsoleWrite(">Image Number: " & $i + 1 & @CRLF)
            $sUrlImage = _GetImageUrl($aMetas[$i])
            $sImageName = _GetImageName($aMetas[$i]) ;maybe you want to get the name from image url instead of metadata
            $sExtension = _GetImageExtension($aMetas[$i])
            ConsoleWrite($sUrlImage & @CRLF)
            ConsoleWrite($sImageName & @CRLF)
            ConsoleWrite($sExtension & @CRLF)
            ConsoleWrite(@CRLF)
        Next
    EndIf
EndIf

Func _GetImageName($sData)
    Local $aData = _StringBetween($sData, '"s":"', '"')
    If IsArray($aData) Then Return $aData[0]
EndFunc   ;==>_GetImageName

Func _GetImageUrl($sData)
    Local $aData = _StringBetween($sData, '"ou":"', '"')
    If IsArray($aData) Then Return $aData[0]
EndFunc   ;==>_GetImageUrl

Func _GetImageExtension($sData)
    Local $aData = _StringBetween($sData, '"ity":"', '"')
    If IsArray($aData) Then Return $aData[0]
EndFunc   ;==>_GetImageExtension


Func HttpGet($sURL)
    Local $oHTTP = ObjCreate("WinHttp.WinHttpRequest.5.1")
    $oHTTP.Open("GET", $sURL, False)
    $oHTTP.SetRequestHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:48.0) Gecko/20100101 Firefox/48.0")
    $oHTTP.SetRequestHeader("Content-Type", "text/plain; charset=utf-8")
    If (@error) Then Return SetError(1, 0, 0)
    $oHTTP.Send()
    If (@error) Then Return SetError(2, 0, 0)
    If ($oHTTP.Status <> $HTTP_STATUS_OK) Then Return SetError(3, 0, 0)
    Return SetError(0, 0, $oHTTP.ResponseText)
EndFunc   ;==>HttpGet

Make sure to clean up the file name.

Saludos 

1 person likes this

Share this post


Link to post
Share on other sites
#include <String.au3>
#include <ie.au3>
#include <WinAPIFiles.au3>
#include <InetConstants.au3>
#include <Array.au3>
Global $sSource, $aImgURL, $sKeyWord

$sKeyWord = "pug"
$type = "jpg"
$width = "800"
$height = "600"


$obj = _IECreate("http://www.google.ch/search?q="& $sKeyWord &"&as_st=y&hl=it&tbs=ift:"&$type&",isz:ex,iszw:"&$width&",iszh:"&$height&"&tbm=isch&source=lnt")
$sSource = _IEDocReadHTML($obj)
FileWrite("log.html", $sSource)

$aImgURL = _StringBetween($sSource,'imgurl=', '&amp;')

;_ArrayDisplay($aImgURL)

For $x = 1 to UBound($aImgURL)-1
    FileWrite(@ScriptDir&"\1.txt",StringReplace(StringReplace($aImgURL[$x],"%3A",":"),"%2F","/")&@CRLF)
    $url = StringReplace(StringReplace($aImgURL[$x],"%3A",":"),"%2F","/")
Next

$file = FileReadToArray(@ScriptDir&"\1.txt")


For $s = 1 to UBound($file)-1

    $last = StringSplit($file[$s], '/')
    $ls = UBound($last)-1
    ConsoleWrite(StringSplit($file[$s], '/', $STR_ENTIRESPLIT)[$ls]&@CRLF)

    If StringLeft($file[$s],5) = "https" Then
        ConsoleWrite(StringRegExp($file[$s],'(?i)(https://.*\.(jpg|bmp|cms|jpeg))', 1)[0]&@CRLF)
        InetGet($file[$s],@ScriptDir&"\x\"&StringSplit($file[$s], '/', $STR_ENTIRESPLIT)[$ls])
    Else
        ConsoleWrite(StringRegExp($file[$s],'(?i)(http://.*\.(jpg|bmp|cms|jpeg))', 1)[0]&@CRLF)
        InetGet($file[$s],@ScriptDir&"\x\"&StringSplit($file[$s], '/', $STR_ENTIRESPLIT)[$ls])
    EndIf
Next
_IEQuit($obj)

!!! only one error.... ueRSGNo.jpg%3F1 I changed the save file path name and the https... case.... now I downloaded 88 file correctly... Any suggestion to improve it? THX

PS: how can run ie hidden? I need to grab only the images Thx

Share this post


Link to post
Share on other sites

#14 ·  Posted (edited)

This is my version without all those StringReplace:

#include <String.au3>
#include <ie.au3>

Global $sSource, $aImgURL, $sKeyWord

$sKeyWord = "pug"
$type = "jpg"
$width = "800"
$height = "600"

$obj = _IECreate("http://www.google.ch/search?q="& $sKeyWord &"&as_st=y&hl=it&tbs=ift:"&$type&",isz:ex,iszw:"&$width&",iszh:"&$height&"&tbm=isch&source=lnt", 0, 0)
$sSource = _IEDocReadHTML($obj)

$aImgURL = _StringBetween($sSource, '"ou":"', '"')

For $x = 1 to UBound($aImgURL) - 1
    ;$sPattern = '(?i)(http.?://.*\.(jpg|bmp|cms|jpeg))' ; http?://.../name.ext
    $sPattern = '(?i).*/(.*\.(jpg|bmp|cms|jpeg))' ; name.ext
    $aRegEx = StringRegExp($aImgURL[$x], $sPattern, 1)
    If @error Then ContinueLoop
    ConsoleWrite($aRegEx[0] & @CRLF)
    InetGet($aImgURL[$x], @ScriptDir & "\" & $aRegEx[0])
Next

_IEQuit($obj)

 

Edited by j0kky
1 person likes this

Share this post


Link to post
Share on other sites
2 hours ago, j0kky said:

This is my version without all those StringReplace:

#include <String.au3>
#include <ie.au3>

Global $sSource, $aImgURL, $sKeyWord

$sKeyWord = "pug"
$type = "jpg"
$width = "800"
$height = "600"

$obj = _IECreate("http://www.google.ch/search?q="& $sKeyWord &"&as_st=y&hl=it&tbs=ift:"&$type&",isz:ex,iszw:"&$width&",iszh:"&$height&"&tbm=isch&source=lnt", 0, 0)
$sSource = _IEDocReadHTML($obj)

$aImgURL = _StringBetween($sSource, '"ou":"', '"')

For $x = 1 to UBound($aImgURL) - 1
    ;$sPattern = '(?i)(http.?://.*\.(jpg|bmp|cms|jpeg))' ; http?://.../name.ext
    $sPattern = '(?i).*/(.*\.(jpg|bmp|cms|jpeg))' ; name.ext
    $aRegEx = StringRegExp($aImgURL[$x], $sPattern, 1)
    If @error Then ContinueLoop
    ConsoleWrite($aRegEx[0] & @CRLF)
    InetGet($aImgURL[$x], @ScriptDir & "\" & $aRegEx[0])
Next

_IEQuit($obj)

 

Nice,  downloaded 94 jpg, the winer is you. THX

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now

  • Similar Content

    • LeloDragneel
      By LeloDragneel
      Hey guys,
      I'm looking to implement an accurate voice recognition method in my program. I tried to understand the Microsoft SAPI API, read their online documentation and found it very confusing and unclear. (Like seriously, it's so bad and vague, but that's just my opinion). I have also tried using UTTER UDF, but could not get a grasp either, because you know, that's an extension UDF to Microsoft SAPI.
      Let's face it, the Google Speech Recognition is much more accurate than Microsoft SAPI (by far). Right now, I am determined to just use the Google Speech API. I have dug deep in regards to implementing the Google Speech API in AutoIT and I haven't found even one post about it. I suppose it's because the Google Speech API was only recently made available to the public. In case you don't know what I'm talking about, here's the link to google api. On that page, notice that there is language support for various languages such as Java, C#, and PHP. However, there's no support for AutoIT. So my question is; how can I go about implementing the Google Speech API into my AutoIT program? Is it even possible?
      Cheers guys!
    • Mr_Was_geht_sie_das_an
      By Mr_Was_geht_sie_das_an
      Hi Autoit Community,
      do someone know, if is possible to creat an 3D-ModelSearch script ?
    • jonson1986
      By jonson1986
      Hello,
      I've thousands of URLs to check them these are safe, malware infected or any other type of error, that's why I searched and found Google Safe browsing API with this we can send HTTP GET request so different code will return to make us clear is it our sent URL is safe or not. Please guide me how can i make this possible I know basics of Auotit but don't know how to use this API to fulfill above mentioned purpose. Your help will be much appreciated. Thanks
      Here is API URL;
      https://developers.google.com/safe-browsing/v3/lookup-guide
    • jonson1986
      By jonson1986
      Hello I'm trying to translate few text using below code, I found it working previously couple of months ago but Now these days it's not working at all and I'm getting below errors when I run the script and Array display at the end of text also not able to show any translated text instead of value 0 & 1;
      --> IE.au3 T3.0-2 Warning from function _IEGetObjById, $_IESTATUS_NoMatch (gt-res-data)
      --> IE.au3 T3.0-2 Error from function _IEPropertyGet, $_IESTATUS_InvalidDataType
      Here is code,
      #include <IE.au3> #include <Array.au3> Local $tag="* # * # *" Local $oIE=_IECreate("https://translate.google.com/#auto/es") Local $oForm=_IEFormGetCollection($oIE,0) Local $oQuery=_IEGetObjByName($oForm,"text") _IEFormElementSetValue($oQuery, $tag & @CR & "Hello World" & @CR & "This is a test" & @CR & $tag) _IEFormSubmit($oForm) _IELoadWait($oIE) Local $oText=_IEGetObjById($oIE,"gt-res-data") $lines=StringSplit(_IEPropertyGet($oText,"innerText"),@CRLF,1) _IEQuit($oIE) _ArrayDelete($lines,_ArraySearch($lines,$tag,1,0,0,1,0) & "-" & $lines[0]) _ArrayDelete($lines,"1-" & _ArraySearch($lines,$tag,1,0,0,1,1)) $lines[0]=UBound($lines)-1 _ArrayDisplay($lines)  
    • jonson1986
      By jonson1986
      Hey I searched code on autoit forum and modify it according to my needs and try to translate text from Russian to English in return I'm getting error such as "Error 411 (Length Required)!!1"
      Both my autoit codes and error I got are given below, please help me to solve this issue, Thanks
      Autoit codes to translate text form Russian to English;
      #include <urlencode.au3> $File1 = @ScriptDir & "\russian_text.txt" $txt = FileRead($File1) ; Try to convert line breaks with .....so final URL looks simpler $txt = StringReplace($txt, @CRLF, '...........') $txt = StringReplace($txt, @LF, '.............') $txt = StringReplace($txt, @CR, '.............') FileWrite (@scriptdir & '\russian_text2.txt', $txt) $openfile = @ScriptDir & "\russian_text2.txt" $mytext = FileRead ($openfile) $encoding = urlencode ($mytext) FileWrite (@scriptdir & '\enooding.txt', $encoding) $from = "ru" $to = "en" $url = "https://translate.googleapis.com/translate_a/single?client=gtx" $url &= "&sl=" & $from & "&tl=" & $to & "&dt=t&q=" & $encoding $oHTTP = ObjCreate("Microsoft.XMLHTTP") $oHTTP.Open("POST", $url, False) $oHTTP.Send() $sData = $oHTTP.ResponseText $sData = StringRegExpReplace($sData, '.*?\["(.*?)(?<!\\)"[^\[]*', "$1" & @crlf) FileWrite (@scriptdir & '\errorcode.txt', $sData) Msgbox(0,"", $sData) In response of above codes, I'm getting below error;
       
      <!DOCTYPE html> <html lang=en> <meta charset=utf-8> <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width"> <title>Error 411 (Length Required)!!1</title> <style> *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px} </style> <a href=//www.google.com/><span id=logo aria-label=Google></span></a> <p><b>411.</b> <ins>That’s an error.</ins> <p>POST requests require a <code>Content-length</code> header. <ins>That’s all we know.</ins>