Jump to content

getting images from webpage with no image tag


Recommended Posts

i thought of creating a bot that can search through lets say a profile on instagram and download all images on it after serching through the forum for example scripts i found that they all utilize the image tag but i looked through the source code of instagram and they dont have any image tag for displaying images infact there is no tag used by them for display da page for display  use other javascript functions to display there images so wondering if anyone has a possible solution to that 

Link to comment
Share on other sites

please forgive me for posting so soon again but i wanted to edit my post but didnt knew how to do it so if once sum1 tels me i wont repeat the same mistake again

EDIT-i found the edit option aftr posting this sorry wont happen again

well heres the additional info

so this is the link to the user profile https://instagram.com/ishotgirls/ (dont ban me for link m only tryin to make srciptin fun)

and heres the link to one of the image on profiles page https://instagram.com/p/3ajgNEHztx/

so i was trying to use the inspect element option on both the page as viewing page source directly didnt help all the tags and links to images where all hidden behind sumthing but with inspect element i found the link to the image and here are my findings

 the link to image on the image page was under <meta> tag sumthing like this 

<meta property="og:image" content="https://igcdn-photos-e-a.akamaihd.net/hphotos-ak-xaf1/t51.2885-15/11379951_1740330396193516_322014751_n.jpg" />

now i have no idea how i should extract the image from this any help on scripting this

m shure this can be done with a few lines of code but i have no idea how

as for the profile page it has got the link to various images on it but they are hidden too upon using inspect element i found them the link of all the images on the users profile well they are under tags like this 

<a class="pgmiImageLink" href="/p/3ajgNEHztx/?taken-by=ishotgirls" data-reactid=".0.0.2.0.0:1.0.$0.0:$998266412094929777_1323756997.0.1"><div src="https://scontent-lga1-1.cdninstagram.com/hphotos-xaf1/t51.2885-15/e15/11379951_1740330396193516_322014751_n.jpg" class="pgmiThumb tThumbImage Image" data-reactid=".0.0.2.0.0:1.0.$0.0:$998266412094929777_1323756997.0.1.0"><div data-reactid=".0.0.2.0.0:1.0.$0.0:$998266412094929777_1323756997.0.1.0.0"><div class="iImage" id="iImage_14" style="background-image:url(https://scontent-lga1-1.cdninstagram.com/hphotos-xaf1/t51.2885-15/e15/11379951_1740330396193516_322014751_n.jpg);" data-reactid=".0.0.2.0.0:1.0.$0.0:$998266412094929777_1323756997.0.1.0.0.$=1$iImage_14"></div></div></div></a>

sumthing like this .....in this only the first line after<a class="pgmiImageLink"  href="........." is the link to the page which has the image 

so i think for creating the script i first have to make the bot extract this link then go to it then extract the image from there 

but i have no idea how to extract that link or the image in the image page any help for it

well another thin in continiuation if we visit the profile page we find that it has show more option below and more images load wen we scroll down so is it possible to sumhow load that show more part and then get the links from there too

Edited by zreo15
needed to organise info a bit
Link to comment
Share on other sites

oh thanks it works n i understood da code too n got to learn a few things

well i took another look at the profile page source code and they actually have the direct url to page in it 

 id="iImage_14" style="background-image:url(https://scontent-lga1-1.cdninstagram.com/hphotos-xaf1/t51.2885-15/e15/11379951_1740330396193516_322014751_n.jpg);"

theres even a image id for each image well now i guess all i have to do is use the StringRegExpReplace function to serch for that image id and copy the url following it but for each image i would need a array variable so how to implement array with that search function

and while using that InetGet function the name of the image for saving them i would have to use array there too so is it possible to use array with this function too n how may we use them

and what about the show more option how to trigger it to display more images

Edited by zreo15
found better info
Link to comment
Share on other sites

OK let's try this then  ;)

#Include <Array.au3>

$url = "https://instagram.com/ishotgirls/"
$page = BinaryToString(InetRead($url))
$imgs = StringRegExp($page, 'standard_resolution":{"url":"([^"]+)', 3)
For $i = 0 to UBound($imgs)-1
  $imgs[$i] = StringReplace($imgs[$i], "\", "")
Next
_ArrayDisplay($imgs)

$dir = @scriptdir & "\my_beautiful_images\"
If not FileExists($dir) Then DirCreate($dir)
For $i = 0 to UBound($imgs)-1
   InetGet($imgs[$i], $dir & "\" & $i+1 & ".jpg")
Next

 

Link to comment
Share on other sites

na aim was to build a script for manga sites but thought to implement it on instagram first seemed harder the manga sites have image tag so it wud be really easy to get from them thats why thought to try on a harder example n well as i said choose that link just to kill boredom 

ohk well thanx man that worked amazing but can u explain the code a bit 

 

$imgs = StringRegExp($page, 'standard_resolution":{"url":"([^"]+)', 3)
For $i = 0 to UBound($imgs)-1
  $imgs[$i] = StringReplace($imgs[$i], "\", "")

the standard resolution where is that in the source code that line {"url":"([^"]+) how does that work n why is there 3 for last parameter 

try this i was just havin fun with your code and sum thin strange happens in the beggining when u run this code 

$url = "https://instagram.com/ishotgirls/"
$page = BinaryToString(InetRead($url))
Run("notepad.exe")
WinWaitActive("Untitled - Notepad")
Send($page)

try to run it any idea why dat strange behaviour wen script first starts to run

Edited by zreo15
Link to comment
Share on other sites

ohk got it this works too but what about show more option any way to trigger it other using the mouse to manually scroll over it and making it show more pics then getting the url of all pics at once

Link to comment
Share on other sites

This 'show more' thing is a javascript behavior and is much more complicated
Maybe _IE* funcs could be a better way for that, I don't know how to do it though

BTW grabbing images from manga sites is much easier and doesn't require to handle javascript

Link to comment
Share on other sites

ya manga sites one is easier since they have a proper indexing for each image and the images url have a proper cdn url like 

<img src="http://a.mhcdn.net/store/manga/7439/01-001.0/compressed/n3am_dangerous_zone_v01_c01_000.jpg?v=1268127542"

and the credits page is marked with a proper credit url

<img src="http://a.mhcdn.net/store/manga/7439/01-001.0/compressed/ncredit.jpg?v=1268127542" onerror="javascript:rerender(this);" width="600" id="image" alt="3 AM Dangerous Zone 1 Page 37">

so its easy to run the loop too for chapterz too

anyway thanx for hlp man will try to find sum way for those javascript functions 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...