rootx

Best way to use google image search? [SOLVED]

16 posts in this topic

#1 ·  Posted (edited)

I would like to download the first 5 images in a folder. THX.

#include <INet.au3>
#include <String.au3>
#include <Array.au3>


Global $sSource, $aImgURL, $sKeyWord

$sKeyWord = "pug"

$sSource = _INetGetSource("http://www.google.com/search?q=" & $sKeyWord & "&tbm=isch")

$aImgURL = _StringBetween($sSource, 'src="', '"')


For $x = 1 to UBound($aImgURL)-1
    ConsoleWrite($aImgURL[$x]&@CRLF)
Next

 

Edited by rootx

Share this post


Link to post
Share on other sites



Share this post


Link to post
Share on other sites
28 minutes ago, Danyfirex said:

Hello. and your issue is...?

 

Saludos

get the name of the img and save it whit the correct type and name.

Share this post


Link to post
Share on other sites
1 hour ago, j0kky said:

You can download 'em using InetGet, they don't have a standard name, but to know the extension you should search for their magic number.

thx, but the question is how to intercept the url of the source and not the thumbnail, does anyone have any idea ?? THX

#include <INet.au3>
#include <String.au3>
#include <Array.au3>


Global $sSource, $aImgURL, $sKeyWord

$sKeyWord = "pug"
$type = "jpg"
$width = "800"
$height = "600"

$sSource = _INetGetSource("http://www.google.ch/search?q="& $sKeyWord &"&as_st=y&hl=it&tbs=ift:"&$type&",isz:ex,iszw:"&$width&",iszh:"&$height&"&tbm=isch&source=lnt")

$aImgURL = _StringBetween($sSource, 'src="', '"')


For $x = 1 to UBound($aImgURL)-1
    ConsoleWrite($aImgURL[$x]&@CRLF)
    InetGet($aImgURL[$x],@ScriptDir&"\"&$x&".jpg")
Next

 

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

Try to save $sSource to an .html file and open it, you will see it differs from the page you're seeing while visiting the same url with browser:

https://www.google.ch/search?q=pug&as_st=y&hl=it&tbs=ift:jpg,isz:ex,iszw:800,iszh:600&tbm=isch&source=lnt&gws_rd=ssl

In my opinion you should play with:

_IEDocReadHTML

 

Edited by j0kky

Share this post


Link to post
Share on other sites
22 hours ago, j0kky said:

Try to save $sSource to an .html file and open it, you will see it differs from the page you're seeing while visiting the same url with browser:

https://www.google.ch/search?q=pug&as_st=y&hl=it&tbs=ift:jpg,isz:ex,iszw:800,iszh:600&tbm=isch&source=lnt&gws_rd=ssl

In my opinion you should play with:

_IEDocReadHTML

 

_IEDocReadHTML doesn't work. but....

#include <IE.au3>
#include <MsgBoxConstants.au3>
#include <Inet.au3>
#include <Array.au3>
#include <File.au3>
#include <String.au3>



$x = _INetGetSource("http://www.google.ch/search?as_st=y&tbm=isch&hl=it&as_q=pug&as_epq=&as_oq=&as_eq=&cr=&as_sitesearch=&safe=images&tbs=ift:jpg")

FileWrite(@ScriptDir&"\9.html",$x)
Local $aRetArray
_FileReadToArray(@ScriptDir&"\9.html", $aRetArray)

;_ArrayDisplay($aRetArray, "Default Search")
 Local $aArray = _StringBetween($x, 'href="', '"')

 ; _ArrayDisplay($aArray, "Default Search")

    For $xs = 1 to UBound($aArray)-1
        ConsoleWrite($aArray[$xs]&@CRLF)
    Next

the source code isn't correct... beacuse if you read from the browser you find easly... this

/imgres?imgurl=http%3A%2F%2Fcdn3-www.dogtime.com%2Fassets%2Fuploads%2F2011%2F01%2Ffile_23124_pug-460x290.jpg&imgrefurl=http%3A%2F%2Fdogtime.com%2Fdog-breeds%2Fpug&docid=BTPG4yF8_O0fQM&tbnid=8FbyFFzHno3BCM%3A&vet=1&w=460&h=290&hl=it&safe=images&bih=715&biw=1156&ved=0ahUKEwif1eWAys7QAhUDzxQKHc39AREQMwgdKAAwAA&iact=mrc&uact=8

But Autoit extract... this

http://dogtime.com/dog-breeds/pug&amp;sa=U&amp;ved=0ahUKEwiU-sLNzc7QAhUBfhoKHYuWAP4QwW4IGDAA&amp;usg=AFQjCNFtqNOflzABBIVCR79FpfulvDD6Pw

Why??? Any Idea? I need to read raw source html. THX


 

Share this post


Link to post
Share on other sites

#8 ·  Posted (edited)

15 hours ago, rootx said:

_IEDocReadHTML doesn't work.

What does it mean, exatly?

#include <String.au3>
#include <ie.au3>


Global $sSource, $aImgURL, $sKeyWord

$sKeyWord = "pug"
$type = "jpg"
$width = "800"
$height = "600"

$obj = _IECreate("http://www.google.ch/search?q="& $sKeyWord &"&as_st=y&hl=it&tbs=ift:"&$type&",isz:ex,iszw:"&$width&",iszh:"&$height&"&tbm=isch&source=lnt")
$sSource = _IEDocReadHTML($obj)
FileWrite("log.html", $sSource)

$aImgURL = _StringBetween($sSource, '"ou":"', '"')


For $x = 1 to UBound($aImgURL)-1
    ConsoleWrite($aImgURL[$x]&@CRLF)
    ;InetGet($aImgURL[$x],@ScriptDir&"\"&$x&".jpg")
Next

 

Edited by j0kky
1 person likes this

Share this post


Link to post
Share on other sites
1 hour ago, j0kky said:

What does it mean, exatly?

#include <String.au3>
#include <ie.au3>


Global $sSource, $aImgURL, $sKeyWord

$sKeyWord = "pug"
$type = "jpg"
$width = "800"
$height = "600"

$obj = _IECreate("http://www.google.ch/search?q="& $sKeyWord &"&as_st=y&hl=it&tbs=ift:"&$type&",isz:ex,iszw:"&$width&",iszh:"&$height&"&tbm=isch&source=lnt")
$sSource = _IEDocReadHTML($obj)
FileWrite("log.html", $sSource)

$aImgURL = _StringBetween($sSource, '"ou":"', '"')


For $x = 1 to UBound($aImgURL)-1
    ConsoleWrite($aImgURL[$x]&@CRLF)
    ;InetGet($aImgURL[$x],@ScriptDir&"\"&$x&".jpg")
Next

 

 

Ok but there is a way to have a regExp to intercept  start with [http://]   end with [.jpg] that because some url have a strange path.... 4 example....

"http://vignette1.wikia.nocookie.net/dogs/images/4/47/Gadget_the_pug_expressive_eyes.jpg/revision/latest?cb\u003d20110813111020"

I added a regex to save the file with the original name.

#include <String.au3>
#include <ie.au3>


Global $sSource, $aImgURL, $sKeyWord

DirCreate(@ScriptDir&"\img")

$folder = (@ScriptDir&"\img\")

$sKeyWord = "pug"
$type = "jpg"
$width = "800"
$height = "600"

$obj = _IECreate("http://www.google.ch/search?q="& $sKeyWord &"&as_st=y&hl=it&tbs=ift:"&$type&",isz:ex,iszw:"&$width&",iszh:"&$height&"&tbm=isch&source=lnt")
$sSource = _IEDocReadHTML($obj)
FileWrite("log.html", $sSource)

$aImgURL = _StringBetween($sSource, '"ou":"', '"')


    For $x = 1 to UBound($aImgURL)-1
        ConsoleWrite($aImgURL[$x]&@CRLF)
        InetGet($aImgURL[$x],$folder&StringRegExpReplace($aImgURL[$x], '.*/([^-]+).*', "$1"))
    Next

_IEQuit($obj)

 

Share this post


Link to post
Share on other sites

#10 ·  Posted (edited)

StringRegExp($aImgURL[$x], '(?i)(http.?://.*\.(jpg|bmp|cms|jpeg))', 1)

You have the limitation to insert between parentesis each known image extension. Anyhow implementing an error checking line is a good idea, because if there is an extension you haven't expected, your script will fail.

Edited by j0kky
now it catches https too

Share this post


Link to post
Share on other sites

#11 ·  Posted

An alternative way without using IE.

 

#include <Array.au3>
#include <String.au3>
Global Const $HTTP_STATUS_OK = 200

Local $sKeyWord = "house"
Local $sURL = "http://www.google.com/search?q=" & $sKeyWord & "&tbm=isch"
Local $sData = HttpGet($sURL)
;~ ConsoleWrite($sData & @CRLF)

Local $aMetas = _StringBetween($sData, '"rg_meta">', '</div>')
;~ _ArrayDisplay($aMetas)

Local $sUrlImage = ""
Local $sImageName = ""
Local $sExtension = ""

If IsArray($aMetas) Then
    If UBound($aMetas) >= 5 Then
        For $i = 0 To 4
            ConsoleWrite(">Image Number: " & $i + 1 & @CRLF)
            $sUrlImage = _GetImageUrl($aMetas[$i])
            $sImageName = _GetImageName($aMetas[$i]) ;maybe you want to get the name from image url instead of metadata
            $sExtension = _GetImageExtension($aMetas[$i])
            ConsoleWrite($sUrlImage & @CRLF)
            ConsoleWrite($sImageName & @CRLF)
            ConsoleWrite($sExtension & @CRLF)
            ConsoleWrite(@CRLF)
        Next
    EndIf
EndIf

Func _GetImageName($sData)
    Local $aData = _StringBetween($sData, '"s":"', '"')
    If IsArray($aData) Then Return $aData[0]
EndFunc   ;==>_GetImageName

Func _GetImageUrl($sData)
    Local $aData = _StringBetween($sData, '"ou":"', '"')
    If IsArray($aData) Then Return $aData[0]
EndFunc   ;==>_GetImageUrl

Func _GetImageExtension($sData)
    Local $aData = _StringBetween($sData, '"ity":"', '"')
    If IsArray($aData) Then Return $aData[0]
EndFunc   ;==>_GetImageExtension


Func HttpGet($sURL)
    Local $oHTTP = ObjCreate("WinHttp.WinHttpRequest.5.1")
    $oHTTP.Open("GET", $sURL, False)
    $oHTTP.SetRequestHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:48.0) Gecko/20100101 Firefox/48.0")
    $oHTTP.SetRequestHeader("Content-Type", "text/plain; charset=utf-8")
    If (@error) Then Return SetError(1, 0, 0)
    $oHTTP.Send()
    If (@error) Then Return SetError(2, 0, 0)
    If ($oHTTP.Status <> $HTTP_STATUS_OK) Then Return SetError(3, 0, 0)
    Return SetError(0, 0, $oHTTP.ResponseText)
EndFunc   ;==>HttpGet

Make sure to clean up the file name.

Saludos 

1 person likes this

Share this post


Link to post
Share on other sites

#13 ·  Posted

#include <String.au3>
#include <ie.au3>
#include <WinAPIFiles.au3>
#include <InetConstants.au3>
#include <Array.au3>
Global $sSource, $aImgURL, $sKeyWord

$sKeyWord = "pug"
$type = "jpg"
$width = "800"
$height = "600"


$obj = _IECreate("http://www.google.ch/search?q="& $sKeyWord &"&as_st=y&hl=it&tbs=ift:"&$type&",isz:ex,iszw:"&$width&",iszh:"&$height&"&tbm=isch&source=lnt")
$sSource = _IEDocReadHTML($obj)
FileWrite("log.html", $sSource)

$aImgURL = _StringBetween($sSource,'imgurl=', '&amp;')

;_ArrayDisplay($aImgURL)

For $x = 1 to UBound($aImgURL)-1
    FileWrite(@ScriptDir&"\1.txt",StringReplace(StringReplace($aImgURL[$x],"%3A",":"),"%2F","/")&@CRLF)
    $url = StringReplace(StringReplace($aImgURL[$x],"%3A",":"),"%2F","/")
Next

$file = FileReadToArray(@ScriptDir&"\1.txt")


For $s = 1 to UBound($file)-1

    $last = StringSplit($file[$s], '/')
    $ls = UBound($last)-1
    ConsoleWrite(StringSplit($file[$s], '/', $STR_ENTIRESPLIT)[$ls]&@CRLF)

    If StringLeft($file[$s],5) = "https" Then
        ConsoleWrite(StringRegExp($file[$s],'(?i)(https://.*\.(jpg|bmp|cms|jpeg))', 1)[0]&@CRLF)
        InetGet($file[$s],@ScriptDir&"\x\"&StringSplit($file[$s], '/', $STR_ENTIRESPLIT)[$ls])
    Else
        ConsoleWrite(StringRegExp($file[$s],'(?i)(http://.*\.(jpg|bmp|cms|jpeg))', 1)[0]&@CRLF)
        InetGet($file[$s],@ScriptDir&"\x\"&StringSplit($file[$s], '/', $STR_ENTIRESPLIT)[$ls])
    EndIf
Next
_IEQuit($obj)

!!! only one error.... ueRSGNo.jpg%3F1 I changed the save file path name and the https... case.... now I downloaded 88 file correctly... Any suggestion to improve it? THX

PS: how can run ie hidden? I need to grab only the images Thx

Share this post


Link to post
Share on other sites

#14 ·  Posted (edited)

This is my version without all those StringReplace:

#include <String.au3>
#include <ie.au3>

Global $sSource, $aImgURL, $sKeyWord

$sKeyWord = "pug"
$type = "jpg"
$width = "800"
$height = "600"

$obj = _IECreate("http://www.google.ch/search?q="& $sKeyWord &"&as_st=y&hl=it&tbs=ift:"&$type&",isz:ex,iszw:"&$width&",iszh:"&$height&"&tbm=isch&source=lnt", 0, 0)
$sSource = _IEDocReadHTML($obj)

$aImgURL = _StringBetween($sSource, '"ou":"', '"')

For $x = 1 to UBound($aImgURL) - 1
    ;$sPattern = '(?i)(http.?://.*\.(jpg|bmp|cms|jpeg))' ; http?://.../name.ext
    $sPattern = '(?i).*/(.*\.(jpg|bmp|cms|jpeg))' ; name.ext
    $aRegEx = StringRegExp($aImgURL[$x], $sPattern, 1)
    If @error Then ContinueLoop
    ConsoleWrite($aRegEx[0] & @CRLF)
    InetGet($aImgURL[$x], @ScriptDir & "\" & $aRegEx[0])
Next

_IEQuit($obj)

 

Edited by j0kky
1 person likes this

Share this post


Link to post
Share on other sites

#15 ·  Posted

2 hours ago, j0kky said:

This is my version without all those StringReplace:

#include <String.au3>
#include <ie.au3>

Global $sSource, $aImgURL, $sKeyWord

$sKeyWord = "pug"
$type = "jpg"
$width = "800"
$height = "600"

$obj = _IECreate("http://www.google.ch/search?q="& $sKeyWord &"&as_st=y&hl=it&tbs=ift:"&$type&",isz:ex,iszw:"&$width&",iszh:"&$height&"&tbm=isch&source=lnt", 0, 0)
$sSource = _IEDocReadHTML($obj)

$aImgURL = _StringBetween($sSource, '"ou":"', '"')

For $x = 1 to UBound($aImgURL) - 1
    ;$sPattern = '(?i)(http.?://.*\.(jpg|bmp|cms|jpeg))' ; http?://.../name.ext
    $sPattern = '(?i).*/(.*\.(jpg|bmp|cms|jpeg))' ; name.ext
    $aRegEx = StringRegExp($aImgURL[$x], $sPattern, 1)
    If @error Then ContinueLoop
    ConsoleWrite($aRegEx[0] & @CRLF)
    InetGet($aImgURL[$x], @ScriptDir & "\" & $aRegEx[0])
Next

_IEQuit($obj)

 

Nice,  downloaded 94 jpg, the winer is you. THX

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now

  • Similar Content

    • ZeroByDevide
      Open Google chrome incongito
      By ZeroByDevide
      I got some code from internet and i wanted to open a incongito browser and search my website but i cant seem to open google chrome in the incongito mode.
      whenever i click to run my code it opens a tab in normal google chrome.
      heres my code now:
       ShellExecute("chrome.exe", "http://www.erikzandstra.nl", "--incongito")
    • nbg15
      tesseract doesnt detect the easiest image *image to text*
      By nbg15
      Hello everybody..
       
      i have this picture here *attached* and this script here: 
       
      $ImageToReadPath = @MyDocumentsDir & "\GDIPlus_Image2.jpg" $ResultTextPath = @MyDocumentsDir & "\Result" $OutPutPath = $ResultTextPath & ".txt" $TesseractExePath = @MyDocumentsDir & "\Tesseract.exe" ShellExecuteWait($TesseractExePath, '"' & $ImageToReadPath & '" "' & $ResultTextPath & '"', "", "", @SW_HIDE) If @error Then Exit MsgBox(0, "Error", @error) EndIf MsgBox(0, "Result", FileRead($OutPutPath)) FileDelete($OutPutPath)  
      but tesseract doesnt recognized the correct word... and gives me trash back...

      this is the image >> 
      and the result was >> "samm" 

      the image was an normal jpg and generated with this code here:
       
      _ScreenCapture_Capture(@MyDocumentsDir & "\GDIPlus_Image2.jpg", 712,268,853,284)
      Could anybody give me a hint what i can do better to get this easy image to text?
       
      thank u very much!!!
       
       
      Edit: i also tried to capture the screen as bmp with a higher resolution... nothing changed... 
       
       
      _ScreenCapture_SetBMPFormat(4) _ScreenCapture_Capture(@MyDocumentsDir & "\GDIPlus_Image.bmp", 712,279,853,295)  
    • RyukShini
      Image quality is bad compared to paint
      By RyukShini
      #Region ;**** Directives created by AutoIt3Wrapper_GUI **** #AutoIt3Wrapper_Icon=car.ico #EndRegion ;**** Directives created by AutoIt3Wrapper_GUI **** #include <GDIPlus.au3> #include <File.au3> #include <Array.au3> #include <ColorConstants.au3> #include <GUIConstantsEx.au3> #include <WindowsConstants.au3> #include <ProgressConstants.au3> ; Declare array Dim $Images[1] ; Gets all JPG files in the current directory (@ScriptDir). Local $search = FileFindFirstFile("*.jpg") ; Check if the search was successful If $search = -1 Then MsgBox(0, "Error", "No JPG files could be found.") Exit EndIf ; Resize array While 1 If IsArray($Images) Then Local $Bound = UBound($Images) ReDim $Images[$Bound+1] EndIf $Images[$Bound] = FileFindNextFile($search) If @error Then ExitLoop WEnd ; Close the search handle FileClose($search) ; Create directory "resized" if not there yet $nymappe = InputBox("Mappe / Bil Navn", "Mappe / Bil Navn") If NOT FileExists(@ScriptDir & "\" & $nymappe & "\") Then DirCreate(@ScriptDir & "\" & $nymappe & "\") EndIf ; Loop for JPGs - gets dimension of JPG and calls resize function to resize to 50% width and 50% height For $i = 1 to Ubound($Images)-1 If $Images[$i] <> "" AND FileExists(@ScriptDir & "\" & $Images[$i]) Then Local $ImagePath = @ScriptDir & "\" & $Images[$i] _GDIPlus_Startup() Local $hImage = _GDIPlus_ImageLoadFromFile($ImagePath) Local $ImageWidth = _GDIPlus_ImageGetWidth($hImage) Local $ImageHeight = _GDIPlus_ImageGetHeight($hImage) _GDIPlus_ImageDispose($hImage) _GDIPlus_Shutdown() ;MsgBox(0,"DEBUG", $ImageWidth & " x " & $ImageHeight) Local $NewImageWidth = ($ImageWidth / 100) * 15 Local $NewImageHeight = ($ImageHeight / 100) * 15 ;MsgBox(0,"DEBUG: " & $i,$Images[$i]) _ImageResize(@ScriptDir & "\" & $Images[$i], @ScriptDir & "\" & $nymappe & "\" & $Images[$i], $NewImageWidth, $NewImageHeight) EndIf Next ; Resize function Func _ImageResize($sInImage, $sOutImage, $iW, $iH) Local $hWnd, $hDC, $hBMP, $hImage1, $hImage2, $hGraphic, $CLSID, $i = 0 ;OutFile path, to use later on. Local $sOP = StringLeft($sOutImage, StringInStr($sOutImage, "\", 0, -1)) ;OutFile name, to use later on. Local $sOF = StringMid($sOutImage, StringInStr($sOutImage, "\", 0, -1) + 1) ;OutFile extension , to use for the encoder later on. Local $Ext = StringUpper(StringMid($sOutImage, StringInStr($sOutImage, ".", 0, -1) + 1)) ; Win api to create blank bitmap at the width and height to put your resized image on. $hWnd = _WinAPI_GetDesktopWindow() $hDC = _WinAPI_GetDC($hWnd) $hBMP = _WinAPI_CreateCompatibleBitmap($hDC, $iW, $iH) _WinAPI_ReleaseDC($hWnd, $hDC) ;Start GDIPlus _GDIPlus_Startup() ;Get the handle of blank bitmap you created above as an image $hImage1 = _GDIPlus_BitmapCreateFromHBITMAP ($hBMP) ;Load the image you want to resize. $hImage2 = _GDIPlus_ImageLoadFromFile($sInImage) ;Get the graphic context of the blank bitmap $hGraphic = _GDIPlus_ImageGetGraphicsContext ($hImage1) ;Draw the loaded image onto the blank bitmap at the size you want _GDIPLus_GraphicsDrawImageRect($hGraphic, $hImage2, 0, 0, $iW, $iH) ;Get the encoder of to save the resized image in the format you want. $CLSID = _GDIPlus_EncodersGetCLSID($Ext) ;Generate a number for out file that doesn't already exist, so you don't overwrite an existing image. Do $i += 1 Until (Not FileExists($sOP & $i & "_" & $sOF)) ;Prefix the number to the begining of the output filename $sOutImage = $sOP & $i & "_" & $sOF ;Save the new resized image. _GDIPlus_ImageSaveToFileEx($hImage1, $sOutImage, $CLSID) ;Clean up and shutdown GDIPlus. _GDIPlus_ImageDispose($hImage1) _GDIPlus_ImageDispose($hImage2) _GDIPlus_GraphicsDispose ($hGraphic) _WinAPI_DeleteObject($hBMP) _GDIPlus_Shutdown() EndFunc Quality gets quite bad compared to using Paint / Photoshop when resizing with GDIPlus
      Any idea how to make the quality better?
      Thanks in advance
    • UEZ
      _GDIPlus_BitmapApplyFilter UDF beta
      By UEZ
      A collection of image filter effects usable with AutoIt!
       
      IMPORTANT: You are not allowed to sell this code or just parts of it in a commercial project or modify it and distribute it with a different name!
      Distributing copies of this UDF incl. _GDIPlus_BitmapApplyFilter.dll in compiled format (exe) must be free of any fee!
       
      More information can be found in the forum thread!
       
    • Reizvoller
      WinGetCaretPost and Google Chrome
      By Reizvoller
      Greetings!
      Func caretPlay () activateWindow () WinActivate ($workSpace) Local $myCaret = WinGetCaretPos () _ArrayDisplay ($myCaret) EndFunc Func activateWindow () $workSpace = WinGetHandle ("Login - Google Chrome") WinActivate ($workSpace) $dimensions = WinGetClientSize ($workSpace) $midPoint = $dimensions[1]/2 EndFunc Those are two little functions in a larger app I am building.  When I run the "caretPlay" function it returns the same values of 0,0 regardless of where I position the caret. I did read on the function reference page for WinGetCaretPos that applications with Multiple Document Interfaces (MDIs) may not return accurate values. I tried all three options for the CaretCoordMode option with no change in results.
      Either I am doing something wrong or Google Chrome isn't meant to work with this function. Any advice as to which it is?
      Thanks!