Sign in to follow this  
Followers 0
Azevedo

Parsing url from a html

4 posts in this topic

Hey,

I'm building this script to download images from google images search.

Baiscally i have this part:

$url = "https://www.google.com/search?hl=en&tbm=isch&q=flinstones"
$sData = InetRead( $url )
$stream = BinaryToString( $sData )

Which stores the html content into a variable $stream. The images to download a are in the pattern:

"imgurl=http://........jpg"

My question is: How do I parse those image URLs using RegEx inside $stream in an array like

image[1]="http://...."

image[2]="http://...."

thanks

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

I'd use an xml dom object, or load into a hidden IE, and use:

_IEImgGetCollection

loop through the collection and get the .src

You will have to wait for the RegExp sample :) ..here

#include <Array.au3>
$html = '<img class="ipsUserPhoto ipsUserPhoto_mini" alt="Are my AutoIt EXEs really infected? - last post by JLogan3o13" src="http://src1g?_r=1342793095"/>' & @CRLF & _
'<img class="ipsUserPhoto ipsUserPhoto_mini" alt="Are my AutoIt EXEs really infected? - last post by JLogan3o13" src="http://src1g?_r=1342793095"/>'
$array = StringRegExp($html, '\<img.*src\=\"(.*)\"', 3)
Edited by jdelaney

IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

That may only get the thumbnails. To get the original images, this may serve you well:

#include <Inet.au3>
#include <Array.au3>
$sURL = "https://www.google.com/search?hl=en&tbm=isch&q=flinstones"
$sData = _InetGetSource($sURL)
$aImages = StringRegExp($sData, 'imgurl=(.*?)&amp;', 3)
_ArrayDisplay($aImages)
Edited by GMK

Share this post


Link to post
Share on other sites

Thanks Delaney, GMK

That may only get the thumbnails. To get the original images, this may serve you well:

#include <Inet.au3>
#include <Array.au3>
$sURL = "https://www.google.com/search?hl=en&tbm=isch&q=flinstones"
$sData = _InetGetSource($sURL)
$aImages = StringRegExp($sData, 'imgurl=(.*?)&amp;', 3)
_ArrayDisplay($aImages)

Wonderful, thats what I was looking for!

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0