Jump to content
rootx

Best way to use google image search? [SOLVED]

Recommended Posts

rootx

I would like to download the first 5 images in a folder. THX.

#include <INet.au3>
#include <String.au3>
#include <Array.au3>


Global $sSource, $aImgURL, $sKeyWord

$sKeyWord = "pug"

$sSource = _INetGetSource("http://www.google.com/search?q=" & $sKeyWord & "&tbm=isch")

$aImgURL = _StringBetween($sSource, 'src="', '"')


For $x = 1 to UBound($aImgURL)-1
    ConsoleWrite($aImgURL[$x]&@CRLF)
Next

 

Edited by rootx

Share this post


Link to post
Share on other sites
Danyfirex

Hello. and your issue is...?

 

Saludos

Share this post


Link to post
Share on other sites
rootx
28 minutes ago, Danyfirex said:

Hello. and your issue is...?

 

Saludos

get the name of the img and save it whit the correct type and name.

Share this post


Link to post
Share on other sites
rootx
1 hour ago, j0kky said:

You can download 'em using InetGet, they don't have a standard name, but to know the extension you should search for their magic number.

thx, but the question is how to intercept the url of the source and not the thumbnail, does anyone have any idea ?? THX

#include <INet.au3>
#include <String.au3>
#include <Array.au3>


Global $sSource, $aImgURL, $sKeyWord

$sKeyWord = "pug"
$type = "jpg"
$width = "800"
$height = "600"

$sSource = _INetGetSource("http://www.google.ch/search?q="& $sKeyWord &"&as_st=y&hl=it&tbs=ift:"&$type&",isz:ex,iszw:"&$width&",iszh:"&$height&"&tbm=isch&source=lnt")

$aImgURL = _StringBetween($sSource, 'src="', '"')


For $x = 1 to UBound($aImgURL)-1
    ConsoleWrite($aImgURL[$x]&@CRLF)
    InetGet($aImgURL[$x],@ScriptDir&"\"&$x&".jpg")
Next

 

Share this post


Link to post
Share on other sites
j0kky

Try to save $sSource to an .html file and open it, you will see it differs from the page you're seeing while visiting the same url with browser:

https://www.google.ch/search?q=pug&as_st=y&hl=it&tbs=ift:jpg,isz:ex,iszw:800,iszh:600&tbm=isch&source=lnt&gws_rd=ssl

In my opinion you should play with:

_IEDocReadHTML

 

Edited by j0kky

Share this post


Link to post
Share on other sites
rootx
22 hours ago, j0kky said:

Try to save $sSource to an .html file and open it, you will see it differs from the page you're seeing while visiting the same url with browser:

https://www.google.ch/search?q=pug&as_st=y&hl=it&tbs=ift:jpg,isz:ex,iszw:800,iszh:600&tbm=isch&source=lnt&gws_rd=ssl

In my opinion you should play with:

_IEDocReadHTML

 

_IEDocReadHTML doesn't work. but....

#include <IE.au3>
#include <MsgBoxConstants.au3>
#include <Inet.au3>
#include <Array.au3>
#include <File.au3>
#include <String.au3>



$x = _INetGetSource("http://www.google.ch/search?as_st=y&tbm=isch&hl=it&as_q=pug&as_epq=&as_oq=&as_eq=&cr=&as_sitesearch=&safe=images&tbs=ift:jpg")

FileWrite(@ScriptDir&"\9.html",$x)
Local $aRetArray
_FileReadToArray(@ScriptDir&"\9.html", $aRetArray)

;_ArrayDisplay($aRetArray, "Default Search")
 Local $aArray = _StringBetween($x, 'href="', '"')

 ; _ArrayDisplay($aArray, "Default Search")

    For $xs = 1 to UBound($aArray)-1
        ConsoleWrite($aArray[$xs]&@CRLF)
    Next

the source code isn't correct... beacuse if you read from the browser you find easly... this

/imgres?imgurl=http%3A%2F%2Fcdn3-www.dogtime.com%2Fassets%2Fuploads%2F2011%2F01%2Ffile_23124_pug-460x290.jpg&imgrefurl=http%3A%2F%2Fdogtime.com%2Fdog-breeds%2Fpug&docid=BTPG4yF8_O0fQM&tbnid=8FbyFFzHno3BCM%3A&vet=1&w=460&h=290&hl=it&safe=images&bih=715&biw=1156&ved=0ahUKEwif1eWAys7QAhUDzxQKHc39AREQMwgdKAAwAA&iact=mrc&uact=8

But Autoit extract... this

http://dogtime.com/dog-breeds/pug&amp;sa=U&amp;ved=0ahUKEwiU-sLNzc7QAhUBfhoKHYuWAP4QwW4IGDAA&amp;usg=AFQjCNFtqNOflzABBIVCR79FpfulvDD6Pw

Why??? Any Idea? I need to read raw source html. THX


 

Share this post


Link to post
Share on other sites
j0kky
15 hours ago, rootx said:

_IEDocReadHTML doesn't work.

What does it mean, exatly?

#include <String.au3>
#include <ie.au3>


Global $sSource, $aImgURL, $sKeyWord

$sKeyWord = "pug"
$type = "jpg"
$width = "800"
$height = "600"

$obj = _IECreate("http://www.google.ch/search?q="& $sKeyWord &"&as_st=y&hl=it&tbs=ift:"&$type&",isz:ex,iszw:"&$width&",iszh:"&$height&"&tbm=isch&source=lnt")
$sSource = _IEDocReadHTML($obj)
FileWrite("log.html", $sSource)

$aImgURL = _StringBetween($sSource, '"ou":"', '"')


For $x = 1 to UBound($aImgURL)-1
    ConsoleWrite($aImgURL[$x]&@CRLF)
    ;InetGet($aImgURL[$x],@ScriptDir&"\"&$x&".jpg")
Next

 

Edited by j0kky
  • Like 1

Share this post


Link to post
Share on other sites
rootx
1 hour ago, j0kky said:

What does it mean, exatly?

#include <String.au3>
#include <ie.au3>


Global $sSource, $aImgURL, $sKeyWord

$sKeyWord = "pug"
$type = "jpg"
$width = "800"
$height = "600"

$obj = _IECreate("http://www.google.ch/search?q="& $sKeyWord &"&as_st=y&hl=it&tbs=ift:"&$type&",isz:ex,iszw:"&$width&",iszh:"&$height&"&tbm=isch&source=lnt")
$sSource = _IEDocReadHTML($obj)
FileWrite("log.html", $sSource)

$aImgURL = _StringBetween($sSource, '"ou":"', '"')


For $x = 1 to UBound($aImgURL)-1
    ConsoleWrite($aImgURL[$x]&@CRLF)
    ;InetGet($aImgURL[$x],@ScriptDir&"\"&$x&".jpg")
Next

 

 

Ok but there is a way to have a regExp to intercept  start with [http://]   end with [.jpg] that because some url have a strange path.... 4 example....

"http://vignette1.wikia.nocookie.net/dogs/images/4/47/Gadget_the_pug_expressive_eyes.jpg/revision/latest?cb\u003d20110813111020"

I added a regex to save the file with the original name.

#include <String.au3>
#include <ie.au3>


Global $sSource, $aImgURL, $sKeyWord

DirCreate(@ScriptDir&"\img")

$folder = (@ScriptDir&"\img\")

$sKeyWord = "pug"
$type = "jpg"
$width = "800"
$height = "600"

$obj = _IECreate("http://www.google.ch/search?q="& $sKeyWord &"&as_st=y&hl=it&tbs=ift:"&$type&",isz:ex,iszw:"&$width&",iszh:"&$height&"&tbm=isch&source=lnt")
$sSource = _IEDocReadHTML($obj)
FileWrite("log.html", $sSource)

$aImgURL = _StringBetween($sSource, '"ou":"', '"')


    For $x = 1 to UBound($aImgURL)-1
        ConsoleWrite($aImgURL[$x]&@CRLF)
        InetGet($aImgURL[$x],$folder&StringRegExpReplace($aImgURL[$x], '.*/([^-]+).*', "$1"))
    Next

_IEQuit($obj)

 

Share this post


Link to post
Share on other sites
j0kky
StringRegExp($aImgURL[$x], '(?i)(http.?://.*\.(jpg|bmp|cms|jpeg))', 1)

You have the limitation to insert between parentesis each known image extension. Anyhow implementing an error checking line is a good idea, because if there is an extension you haven't expected, your script will fail.

Edited by j0kky
now it catches https too

Share this post


Link to post
Share on other sites
Danyfirex

An alternative way without using IE.

 

#include <Array.au3>
#include <String.au3>
Global Const $HTTP_STATUS_OK = 200

Local $sKeyWord = "house"
Local $sURL = "http://www.google.com/search?q=" & $sKeyWord & "&tbm=isch"
Local $sData = HttpGet($sURL)
;~ ConsoleWrite($sData & @CRLF)

Local $aMetas = _StringBetween($sData, '"rg_meta">', '</div>')
;~ _ArrayDisplay($aMetas)

Local $sUrlImage = ""
Local $sImageName = ""
Local $sExtension = ""

If IsArray($aMetas) Then
    If UBound($aMetas) >= 5 Then
        For $i = 0 To 4
            ConsoleWrite(">Image Number: " & $i + 1 & @CRLF)
            $sUrlImage = _GetImageUrl($aMetas[$i])
            $sImageName = _GetImageName($aMetas[$i]) ;maybe you want to get the name from image url instead of metadata
            $sExtension = _GetImageExtension($aMetas[$i])
            ConsoleWrite($sUrlImage & @CRLF)
            ConsoleWrite($sImageName & @CRLF)
            ConsoleWrite($sExtension & @CRLF)
            ConsoleWrite(@CRLF)
        Next
    EndIf
EndIf

Func _GetImageName($sData)
    Local $aData = _StringBetween($sData, '"s":"', '"')
    If IsArray($aData) Then Return $aData[0]
EndFunc   ;==>_GetImageName

Func _GetImageUrl($sData)
    Local $aData = _StringBetween($sData, '"ou":"', '"')
    If IsArray($aData) Then Return $aData[0]
EndFunc   ;==>_GetImageUrl

Func _GetImageExtension($sData)
    Local $aData = _StringBetween($sData, '"ity":"', '"')
    If IsArray($aData) Then Return $aData[0]
EndFunc   ;==>_GetImageExtension


Func HttpGet($sURL)
    Local $oHTTP = ObjCreate("WinHttp.WinHttpRequest.5.1")
    $oHTTP.Open("GET", $sURL, False)
    $oHTTP.SetRequestHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:48.0) Gecko/20100101 Firefox/48.0")
    $oHTTP.SetRequestHeader("Content-Type", "text/plain; charset=utf-8")
    If (@error) Then Return SetError(1, 0, 0)
    $oHTTP.Send()
    If (@error) Then Return SetError(2, 0, 0)
    If ($oHTTP.Status <> $HTTP_STATUS_OK) Then Return SetError(3, 0, 0)
    Return SetError(0, 0, $oHTTP.ResponseText)
EndFunc   ;==>HttpGet

Make sure to clean up the file name.

Saludos 

  • Like 1

Share this post


Link to post
Share on other sites
rootx
#include <String.au3>
#include <ie.au3>
#include <WinAPIFiles.au3>
#include <InetConstants.au3>
#include <Array.au3>
Global $sSource, $aImgURL, $sKeyWord

$sKeyWord = "pug"
$type = "jpg"
$width = "800"
$height = "600"


$obj = _IECreate("http://www.google.ch/search?q="& $sKeyWord &"&as_st=y&hl=it&tbs=ift:"&$type&",isz:ex,iszw:"&$width&",iszh:"&$height&"&tbm=isch&source=lnt")
$sSource = _IEDocReadHTML($obj)
FileWrite("log.html", $sSource)

$aImgURL = _StringBetween($sSource,'imgurl=', '&amp;')

;_ArrayDisplay($aImgURL)

For $x = 1 to UBound($aImgURL)-1
    FileWrite(@ScriptDir&"\1.txt",StringReplace(StringReplace($aImgURL[$x],"%3A",":"),"%2F","/")&@CRLF)
    $url = StringReplace(StringReplace($aImgURL[$x],"%3A",":"),"%2F","/")
Next

$file = FileReadToArray(@ScriptDir&"\1.txt")


For $s = 1 to UBound($file)-1

    $last = StringSplit($file[$s], '/')
    $ls = UBound($last)-1
    ConsoleWrite(StringSplit($file[$s], '/', $STR_ENTIRESPLIT)[$ls]&@CRLF)

    If StringLeft($file[$s],5) = "https" Then
        ConsoleWrite(StringRegExp($file[$s],'(?i)(https://.*\.(jpg|bmp|cms|jpeg))', 1)[0]&@CRLF)
        InetGet($file[$s],@ScriptDir&"\x\"&StringSplit($file[$s], '/', $STR_ENTIRESPLIT)[$ls])
    Else
        ConsoleWrite(StringRegExp($file[$s],'(?i)(http://.*\.(jpg|bmp|cms|jpeg))', 1)[0]&@CRLF)
        InetGet($file[$s],@ScriptDir&"\x\"&StringSplit($file[$s], '/', $STR_ENTIRESPLIT)[$ls])
    EndIf
Next
_IEQuit($obj)

!!! only one error.... ueRSGNo.jpg%3F1 I changed the save file path name and the https... case.... now I downloaded 88 file correctly... Any suggestion to improve it? THX

PS: how can run ie hidden? I need to grab only the images Thx

Share this post


Link to post
Share on other sites
j0kky

This is my version without all those StringReplace:

#include <String.au3>
#include <ie.au3>

Global $sSource, $aImgURL, $sKeyWord

$sKeyWord = "pug"
$type = "jpg"
$width = "800"
$height = "600"

$obj = _IECreate("http://www.google.ch/search?q="& $sKeyWord &"&as_st=y&hl=it&tbs=ift:"&$type&",isz:ex,iszw:"&$width&",iszh:"&$height&"&tbm=isch&source=lnt", 0, 0)
$sSource = _IEDocReadHTML($obj)

$aImgURL = _StringBetween($sSource, '"ou":"', '"')

For $x = 1 to UBound($aImgURL) - 1
    ;$sPattern = '(?i)(http.?://.*\.(jpg|bmp|cms|jpeg))' ; http?://.../name.ext
    $sPattern = '(?i).*/(.*\.(jpg|bmp|cms|jpeg))' ; name.ext
    $aRegEx = StringRegExp($aImgURL[$x], $sPattern, 1)
    If @error Then ContinueLoop
    ConsoleWrite($aRegEx[0] & @CRLF)
    InetGet($aImgURL[$x], @ScriptDir & "\" & $aRegEx[0])
Next

_IEQuit($obj)

 

Edited by j0kky
  • Like 1

Share this post


Link to post
Share on other sites
rootx
2 hours ago, j0kky said:

This is my version without all those StringReplace:

#include <String.au3>
#include <ie.au3>

Global $sSource, $aImgURL, $sKeyWord

$sKeyWord = "pug"
$type = "jpg"
$width = "800"
$height = "600"

$obj = _IECreate("http://www.google.ch/search?q="& $sKeyWord &"&as_st=y&hl=it&tbs=ift:"&$type&",isz:ex,iszw:"&$width&",iszh:"&$height&"&tbm=isch&source=lnt", 0, 0)
$sSource = _IEDocReadHTML($obj)

$aImgURL = _StringBetween($sSource, '"ou":"', '"')

For $x = 1 to UBound($aImgURL) - 1
    ;$sPattern = '(?i)(http.?://.*\.(jpg|bmp|cms|jpeg))' ; http?://.../name.ext
    $sPattern = '(?i).*/(.*\.(jpg|bmp|cms|jpeg))' ; name.ext
    $aRegEx = StringRegExp($aImgURL[$x], $sPattern, 1)
    If @error Then ContinueLoop
    ConsoleWrite($aRegEx[0] & @CRLF)
    InetGet($aImgURL[$x], @ScriptDir & "\" & $aRegEx[0])
Next

_IEQuit($obj)

 

Nice,  downloaded 94 jpg, the winer is you. THX

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Similar Content

    • Xandy
      By Xandy
      MapIt is a tile world editor.  MapIt was built around the concept of reversing Dragon Warrior map images.  MapIt can take image input and produce a tile and world array.
      MapIt features Unity style dragable labels that adjust property values.

      MapParser is a C++ project that scans images for unique tiles.  MapParser is very fast.  Due to hard drive failure, many bugs were restored b/c I had to rewind many years.  Frustrated with the design, I wrote a new version from the ground up.  This New Version:  AutoIt Front-end, command line controls, and shared with the world; so that I can't lose it again.
      You can toggle the C++ MapParser off to see the difference in speeds between the MapParser CPP verse AutoIt function.  Function is named Scan_Tiles() in AutoIt.  You can also chose to download without MapParser.exe.
      At the moment Scanning a image resets the arrays, but you can add tiles after scanning.
      Images can be added as tiles without scan image at all.  Then configure settings to give your world parameters and manually fill the world data with tile indexes.
      Using the settings you can change tile size after a scan.  Example: you wanted to replace a map with different sized tiles.
      Changing and replacing tile / world data is easy.  B/c tile world editor.

      Hotkeys, I use CTRL+R in image above to signal replace tile action and I use "G" to Get the tile under mouse.
      Hotkeys are not saved to disk and thus are set to default between sessions.
      I might draw the world to pre-rendered surfaces and use them as multi-layer someday.  I do that in my AutoIt, DragonWarrior Remake but I could spend forever unsure what features are important for this.  The DW_Remake has a method of replacing a tile with a tile on two layers.  So you could replace a tree on the first layer with a grass, and a tree in the second layer.  This is all getting very confusing.
      I attempted to write the good code.  If something could be better, please advise.
      Fifth release.  Enjoy.
      For download, videos, and example of created world file data; please visit the MapIt webpage: http://songersoft.com/programming/mapit/mapit_about.phtml
      Special thanks: @AdmiralAlkex, @Melba23, @MrCreatoR
      Main AutoIt source file: Will not run without other Includes and SDL DLLs.
      Last Update: 5/26/2018 3:45 PM EST
      REMOVED CODE BLOCK:  I was informed the this page loaded very slowly, one solution so far has been to remove the 2k lines in the code block.
      When I recieve more feedback from the User I may reduce image size or remove images.
      Next Version Added: $eSETTING_TILE_LAST_PATH I have the weekend, I want to write world layers with aBoard surfaces.  
    • lenclstr746
      By lenclstr746
      HELLO GUYS
      I'm a work on a background see and click bot project 
      I can complete it if your help me
      (using imagesearch , gdi+ and  fastfind)
    • dadalt95
      By dadalt95
      I would like to know if it's possible to pass an image recognition (captcha) system.
       
      What are the ways to achieve this?
      Just the references or links is enough for me by now.
       
      Thanks by now!
       
      Thanks!
    • Ascer
      By Ascer
      1. Description.
      oAuth 2.0 is security system implemented by Google a few years ago. You are able to connect into your Google accounts and manage documents. In this UDF i show you how to pass first authorization process., this allow you to automate most of functions using API interface. 2. Requirements.
      Google account. oAuth.au3 Download 3. Possibilities
      ;============================================================================================================ ; Date: 2018-02-10, 14:21 ; ; Description: UDF for authorize your app with oAuth 2.0 Google. ; ; Function(s): ; oAuth2GetAuthorizationCode() -> Get Code for "grant". ; oAuth2GetAccessToken() -> Get "access_token" and "refresh_token" first time. ; oAuth2RefreshAccessToken() -> Get current "access_token" using "refresh_token". ; ; Author(s): Ascer ;============================================================================================================ 4. Enable your Google API.
          4.1. Video Tutorial not mine!
       YouTube     4.2 Screenshots from authorization process (Polish language) 
      Go to https://console.developers.google.com/apis/dashboard and accept current rules.  

       
      Next create an new project  

       
      Enter name of you new project and click Create  

       
      Google will working now, please wait until finish. Next go to enable your API interface, we make if for Google  

       
      Take "Gmail" in search input and after click in found result.  

       
      Click Enable interface, Google will working now.  

       
      Create your login credentials  

       
      Select Windows Interface (combobox), User credentials (radio) and click button what is need bla bla  

       
      Type name of a new client id for oAuth 2.0 and click Create a new Client ID.  

       
      Next configure screen aplication, type some name and click Next. Google will working now.  

       
      Last step on this website is download source with your credentials in *Json format.  

       
      Now you received a file named client_id.json, it's how it look in Sublime Text:  

       
      5. Coding.
      Now we need to call a some function to get access code.  
      #include <oAuth.au3> Local $sClientId = "167204758184-vpeues0uk6b0g4jrnv0ipq5fapoig2v8.apps.googleusercontent.com" Local $sRedirectUri = "http://localhost" oAuth2GetAuthorizationCode($sClientId, $sRedirectUri)  
      Function will execute default browser for ask you to permission.  

       
      Next Google ask you to permission for access to your personal details by application Autoit   

       
      Now you can thing is something wrong but all is ok, you need to copy all after  code= . It your access code.  

       
      Let's now ask Google about our Access Token and Refresh Token  
      #include <oAuth.au3> Local $sClientId = "167204758184-vpeues0uk6b0g4jrnv0ipq5fapoig2v8.apps.googleusercontent.com" Local $sClientSecret = "cWalvFr3WxiE6cjUkdmKEPo8" Local $sAuthorizationCode = "4/AAAPXJOZ-Tz0s6mrx7JbV6nthXSfcxaszFh_aH0azVqHkSHkfiwE8uamcabn4eMbEWg1eAuUw7AU0PQ0XeWUFRo#" Local $sRedirectUri = "http://localhost" Local $aRet = oAuth2GetAccessToken($sClientId, $sClientSecret, $sAuthorizationCode, $sRedirectUri) If Ubound($aRet) <> 4 then ConsoleWrite("+++ Something wrong with reading ResponseText." & @CRLF) Exit EndIf ConsoleWrite("Successfully received data from Google." & @CRLF) ConsoleWrite("access_token: " & $aRet[0] & @CRLF) ConsoleWrite("expires_in: " & $aRet[1] & @CRLF) ConsoleWrite("refresh_token: " & $aRet[2] & @CRLF) ConsoleWrite("token_type: " & $aRet[3] & @CRLF)  
      Important! When you received error 400 and output says: Invalid grant it means that your previous generated access_code lost validity and you need to generate new calling previus code. When everything is fine you should received a 4 informations about your: access_token, expires_in, refresh_token and token_type. Access_Token time is a little short so you need to know fuction possible to refresh it (tell Google that he should generate a new Token for you)  
      #include <oAuth.au3> Local $sRefreshToken = "1/ba8JpW7TjQH3-UI1BvPaXhSf-oTQ4BmZAbBfhcKgKfY" Local $sClientId = "167204758184-vpeues0uk6b0g4jrnv0ipq5fapoig2v8.apps.googleusercontent.com" Local $sClientSecret = "cWalvFr3WxiE6cjUkdmKEPo8" Local $sRedirectUri = "http://localhost" Local $aRet = oAuth2RefreshAccessToken($sRefreshToken, $sClientId, $sClientSecret) If Ubound($aRet) <> 3 then ConsoleWrite("+++ Something wrong with reading ResponseText." & @CRLF) Exit EndIf ConsoleWrite("Successfully received data from Google." & @CRLF) ConsoleWrite("access_token: " & $aRet[0] & @CRLF) ConsoleWrite("expires_in: " & $aRet[1] & @CRLF) ConsoleWrite("token_type: " & $aRet[2] & @CRLF)  
      6. Finish words
      If you followed all this above steps im sure that you received all informations required for coding your Google API (Gmail, Dropbox, YouTube, Calender etc. See next thread: [UDF] Gmail API - Email automation with AutoIt!
×