andrewz

Extracting Data from the source of a website - wont work :(

25 posts in this topic

#1 ·  Posted (edited)

Hello!

I am currently facing a problem which I can't seem to be able to solve.

What do I want to do with the script ?
Extract all the links of the hotels on this website: http://www.yelp.de/search?cflt=hotels&find_loc=Berlin%2C+Germany
For example the first link to the first hotel would be: http://www.yelp.de/biz/novum-hotel-city-b-berlin-zentrum-berlin  - changes sometimes, so the link will be different.

To start off, I tried to export only one hotel at first. I am using this code to read the content from the source

and then get the content between two "functions" or whatever these are called:

#NoTrayIcon
#include <Inet.au3>
#include <Array.au3>
#include <String.au3>
#include <AutoItConstants.au3>
#include <MsgBoxConstants.au3>
Global $url = "http://www.yelp.de/search?find_desc=Hotels&find_loc=Berlin&ns=1"

Global $content = _INetGetSource($url)
Global $string_A = _StringBetween($content, '<div class="media-avatar">', '</div>')

MsgBox(0,"",$string_A[0])

It's part of an older project, which did almost the same thing, with the exeption that this one is not as easy :(

The link is saved differently, and I can't find a way to export it. After it's saved into an array, I am going to

save the links into a variable with a do - until function. But first I need this step working.

Please, if anyone has an idea how to solve this, even the smallest help is appreciated!

Edited by andrewz

Share this post


Link to post
Share on other sites



If you run Microsofts Internet Explorer you could use the IE UDF to extract the links.


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

If you run Microsofts Internet Explorer you could use the IE UDF to extract the links.

 

Doesnt matter which browser, I could use any. But thanks, will look into it ;)

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

Simple example with the IE UDF functionality. Get all links:

#include <IE.au3>
#include <MsgBoxConstants.au3>

Global $url = "http://www.yelp.de/search?find_desc=Hotels&find_loc=Berlin&ns=1"

Local $oIE = _IECreate($url)
Local $oLinks = _IELinkGetCollection($oIE)
Local $iNumLinks = @extended

Local $sTxt = $iNumLinks & " links found" & @CRLF & @CRLF
For $oLink In $oLinks
    $sTxt &= $oLink.href & @CRLF
Next
MsgBox($MB_SYSTEMMODAL, "Link Info", $sTxt)

Only difference between this and helpfile example (_IELinkGetCollection()) is *_IECreate() & $url, so it should be pretty easy to start using the IE UDF. :D

Edited by MikahS
1 person likes this

Snips & Scripts


My Snips: graphCPUTemp ~ getENVvars
My Scripts: Short-Order Encrypter - message and file encryption V1.6.1 ~ AuPad - Notepad written entirely in AutoIt V1.9.4

Feel free to use any of my code for your own use.                                                                                                                                                           Forum FAQ

 

Share this post


Link to post
Share on other sites

@mikahs and water, you both are brilliant!

Share this post


Link to post
Share on other sites

@mikahs and water, you both are brilliant!

 

I'd say that's just Water. ;)

Nonetheless, it is my pleasure. :)


Snips & Scripts


My Snips: graphCPUTemp ~ getENVvars
My Scripts: Short-Order Encrypter - message and file encryption V1.6.1 ~ AuPad - Notepad written entirely in AutoIt V1.9.4

Feel free to use any of my code for your own use.                                                                                                                                                           Forum FAQ

 

Share this post


Link to post
Share on other sites

MikahS,

you are brilliant too :)

I just pointed him into the right direction, but you showed him working code.

1 person likes this

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

#8 ·  Posted (edited)

MikahS,

you are brilliant too :)

I just pointed him into the right direction, but you showed him working code.

 

Thank you Water, I appreciate it. :)

Edited by MikahS

Snips & Scripts


My Snips: graphCPUTemp ~ getENVvars
My Scripts: Short-Order Encrypter - message and file encryption V1.6.1 ~ AuPad - Notepad written entirely in AutoIt V1.9.4

Feel free to use any of my code for your own use.                                                                                                                                                           Forum FAQ

 

Share this post


Link to post
Share on other sites

Give credit where credit is due :)

1 person likes this

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

#10 ·  Posted (edited)

So I used it like this:

Global $url = "http://www.yelp.de/search?find_desc=Hotels&find_loc=Berlin&ns=1#start="&$timesran&"&cflt=hotels"
_IECreate($url)
Sleep(5000)
Local $oIE = _IECreate($url,1,1)
Local $oLinks = _IELinkGetCollection($oIE)
Local $iNumLinks = @extended

I first use _IECreate to open the window cuz "yelp.de" doesnt load the items immediately, it first shows

hotel 1-10 and then after a few seconds displays the ones from the next page. So if the variable

$timesran would be "10" (That means page 2) , it would first completly load the page, then

take the current, already opened IE window and store it inside the variable, and finally collect

all the links.

But, as of today this doesnt seem to work :( Is there any workaround to this, so that the programm would

first wait for the page to not only load completly, but also load the items completly which are loaded

usually 2-3 seconds afterwards.

Try :  http://www.yelp.de/search?find_desc=Hotels&find_loc=Berlin&ns=1#start=10&cflt=hotels

Thanks in advance

Edited by andrewz

Share this post


Link to post
Share on other sites

worked fine for me:

#include <IE.au3>
Global $timesran = "10"
Global $url = "http://www.yelp.de/search?find_desc=Hotels&find_loc=Berlin&ns=1#start="&$timesran&"&cflt=hotels"
_IECreate($url)
Sleep(5000)
Local $oIE = _IECreate($url,1,1)
Local $oLinks = _IELinkGetCollection($oIE)
Local $iNumLinks = @extended
MsgBox(0, "", $iNumLinks)

Snips & Scripts


My Snips: graphCPUTemp ~ getENVvars
My Scripts: Short-Order Encrypter - message and file encryption V1.6.1 ~ AuPad - Notepad written entirely in AutoIt V1.9.4

Feel free to use any of my code for your own use.                                                                                                                                                           Forum FAQ

 

Share this post


Link to post
Share on other sites

#12 ·  Posted (edited)

 

worked fine for me:

#include <IE.au3>
Global $timesran = "10"
Global $url = "http://www.yelp.de/search?find_desc=Hotels&find_loc=Berlin&ns=1#start="&$timesran&"&cflt=hotels"
_IECreate($url)
Sleep(5000)
Local $oIE = _IECreate($url,1,1)
Local $oLinks = _IELinkGetCollection($oIE)
Local $iNumLinks = @extended
MsgBox(0, "", $iNumLinks)

 

Did it display you the hotels 1-10 or 11-20 ?

Maybe I should have posted more code:

#NoTrayIcon
#include <IE.au3>
#include <MsgBoxConstants.au3>
#include <File.au3>
#include <Inet.au3>

FileDelete("Links.txt")
_FileCreate("Edited.txt")

$timesran = 0

Do
_FileCreate("Links.txt")
Global $url = "http://www.yelp.de/search?find_desc=&find_loc=Frankfurt+am+Main%2C+Hessen&ns=1#find_desc=Hotel&start="&$timesran&"&cflt=hotels"
_IECreate($url)
Sleep(5000)
Local $oIE = _IECreate($url,1,1)
Local $oLinks = _IELinkGetCollection($oIE)
Local $iNumLinks = @extended

Local $sTxt = $iNumLinks & " links found" & @CRLF & @CRLF
For $oLink In $oLinks
    $sTxt &= $oLink.href & @CRLF
Next

FileWrite("Links.txt",$sTxt)


$file = "Links.txt"
FileOpen($file, 0)
For $i = 1 to _FileCountLines($file)
    $line = FileReadLine($file, $i)
If StringInStr($line,"http://www.yelp.de/biz/") = true Then
    $content = FileRead("Edited.txt")
    If StringInStr($content,$line) = false Then
    FileWrite("Edited.txt",$line & @CRLF)
    EndIf
    EndIf
Next
FileClose($file)
FileDelete("Links.txt")
sleep(1000)
$timesran += 10

Until $timesran = "1000"

$msg=_FileCountLines ("Edited.txt")
MsgBox(0,"",$msg)
Edited by andrewz

Share this post


Link to post
Share on other sites

Hotels 11-20.


Snips & Scripts


My Snips: graphCPUTemp ~ getENVvars
My Scripts: Short-Order Encrypter - message and file encryption V1.6.1 ~ AuPad - Notepad written entirely in AutoIt V1.9.4

Feel free to use any of my code for your own use.                                                                                                                                                           Forum FAQ

 

Share this post


Link to post
Share on other sites

Hotels 11-20.

 

Hmm I edited my last post with the full code, dunno why it worked yesterday. :/

Am I missing something ? Thanks in advance

Share this post


Link to post
Share on other sites

What is the expected order of hotels?


Snips & Scripts


My Snips: graphCPUTemp ~ getENVvars
My Scripts: Short-Order Encrypter - message and file encryption V1.6.1 ~ AuPad - Notepad written entirely in AutoIt V1.9.4

Feel free to use any of my code for your own use.                                                                                                                                                           Forum FAQ

 

Share this post


Link to post
Share on other sites

#16 ·  Posted (edited)

What is the expected order of hotels?

 

Oh  I still got "until timeran = 1000" from berlin, I switched to frankfurt tho.

Frankfurth has 704 listed hotels on the site. http://www.yelp.de/search?find_desc=&find_loc=Frankfurt+am+Main%2C+Hessen&ns=1#find_desc=Hotel&start=0&cflt=hotels

The order doesnt matter, aslong as all the links from page 1-71 are saved

into a txt file. Yelp.de however has this different way of loading the results,

why the usual collect link function didnt work untill I changed it to first

run IE and load it, then take it if it already exists and grab all the links.

Somehow this worked yesterday, and today suddently stopped working .

&start=0 means page 1

&start=10 means page 2

and so on...you probably already understood that.

Edited by andrewz

Share this post


Link to post
Share on other sites

Maybe because you are using a different URL?

; one used before

"http://www.yelp.de/search?find_desc=Hotels&find_loc=Berlin&ns=1"
;one used now
 

"http://www.yelp.de/search?find_desc=&find_loc=Frankfurt+am+Main%2C+Hessen&ns=1#find_desc=Hotel&start="&$timesran&"&cflt=hotels"

Snips & Scripts


My Snips: graphCPUTemp ~ getENVvars
My Scripts: Short-Order Encrypter - message and file encryption V1.6.1 ~ AuPad - Notepad written entirely in AutoIt V1.9.4

Feel free to use any of my code for your own use.                                                                                                                                                           Forum FAQ

 

Share this post


Link to post
Share on other sites

#18 ·  Posted (edited)

 

Maybe because you are using a different URL?

; one used before

"http://www.yelp.de/search?find_desc=Hotels&find_loc=Berlin&ns=1"
;one used now
 

"http://www.yelp.de/search?find_desc=&find_loc=Frankfurt+am+Main%2C+Hessen&ns=1#find_desc=Hotel&start="&$timesran&"&cflt=hotels"

 

Doesnt make a difference, the displayed results are the same, I also tried it with : http://www.yelp.de/search?find_desc=Hotel&find_loc=Frankfurt+am+Main%2C+Hessen&cflt=hotels#start=10

which showed the same results and still didnt work. Might also be the PC here which uses some older IE version, and is kinda slow.

The programm also opens all the links perfectly fine, but the collected links are always hotels 1 - 10 , dunno why it doesnt grab the

ones which load after a few seconds.

EDIT: Got it to work by removing the filedelete and filecreate function each time it repeats! It's screwing everything up :o

The intention of deleting and recreating it was to lower the search time in the links file. Now I will just make it empty the file.

Thanks for the ideas/help tho :)

Just added this:

FileOpen($file, 2)
FileClose($file)
Edited by andrewz

Share this post


Link to post
Share on other sites

Thanks for the ideas/help tho :)

 

My pleasure. ;)


Snips & Scripts


My Snips: graphCPUTemp ~ getENVvars
My Scripts: Short-Order Encrypter - message and file encryption V1.6.1 ~ AuPad - Notepad written entirely in AutoIt V1.9.4

Feel free to use any of my code for your own use.                                                                                                                                                           Forum FAQ

 

Share this post


Link to post
Share on other sites

I'm so confused...

1.  You're not declaring the "first" _IECreate() with a variable

2.  You're using Sleep() instead of _IELoadWait()

3.  You then go to the same URL "again" right after the sleep, this time with a declared _IECreate() variable object

4.  You're constantly declaring variable and creating (over writing older ones) files within a loop


[center]Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.[/center]

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now

  • Similar Content

    • FrancescoDiMuro
      By FrancescoDiMuro
      Good evening everyone
      Before all, I want to say that I'm doing this script to see how _IE* functions work, and see if my studs can hack a quiz I'm working on.
      I want to clarify that I'm not automating any game, bypassing any CAPTCHAs, or anything that could damage anyone.
      I was trying to autofill a form, based on which question is displayed.
      The question is always stored in here:
      <header> <h1><span class="questionid">1. </span>Here goes the question</h1> </header> And answers are stored in here:
      <ul class="answers"> <li><label><span><input id="answer_0" name="answer[]" type="radio" value="0">Answer 1</span></label></li> <li><label><span><input id="answer_1" name="answer[]" type="radio" value="1">Answer 2</span></label></li> <li><label><span><input id="answer_2" name="answer[]" type="radio" value="2">Anwser 3</span></label></li> <li><label><span><input id="answer_3" name="answer[]" type="radio" value="3">Answer 4</span></label></li> </ul></fieldset></form></div> And, there are 15 questions like this.
      How can automatically fill my form?
      Thanks in advance
      Francesco
    • houser747
      By houser747
      I have previously used _IEFormElementGetObjByName and _IEFormElementSetValue to enter text into a search box on a form and then submit the form.
      I am now trying to enter text into a search box which is not part of a form. 
      Here is the HTML from the website that i'm trying to enter the data on and then submit the search.
      <div class="row">
          <div class="form-group col-xs-12">
              <span id="FullWidthWithSubmenuContent_FullWidthContent_MainContent_AircraftRegistry_lblSearchText" for="input-search">Registreringsbeteckning</span>
              <div class="input-group col-xs-12">
                  <span id="FullWidthWithSubmenuContent_FullWidthContent_MainContent_AircraftRegistry_preSearchText" class="input-group-addon">SE -</span>
                  <input name="ctl00$FullWidthWithSubmenuContent$FullWidthContent$MainContent$AircraftRegistry$txtSearchText" type="text" value="DTH" id="FullWidthWithSubmenuContent_FullWidthContent_MainContent_AircraftRegistry_txtSearchText" class="form-control" />
              </div>
          </div>
      </div>
      <div class="row">
          <div class="form-group col-xs-12">
              <label class="sr-only" for="">Sök</label>
              <input type="submit" name="ctl00$FullWidthWithSubmenuContent$FullWidthContent$MainContent$AircraftRegistry$btnSearch" value="Sök" id="FullWidthWithSubmenuContent_FullWidthContent_MainContent_AircraftRegistry_btnSearch" class="btn btn-primary ladda-button" data-style="expand-right" />
          </div>
      </div>
      Many thanks in advance
      cheers
      Roger
    • 9252Survive
      By 9252Survive
      Hi All, 
       
      I am fairly new to AutoIT and I am still trying to learn, I have been using _FileListToArray to list all the files with a particular extension in an array and then loop through it for operation  (   For $i = 1 To UBound($FileArray) - 1).
      So far this has been working fine. But I am not able to figure out a problem that I have; what if I have 50 files but I only want to loop through first 10 files and then next ten and so on?  Or rather I should say, how I can I only feed max 10 files to the array at a time when I do _FileListToArray regardless of the total number of files in the folder?
      Any insight/help will be much appreciated 
    • TrashBoat
      By TrashBoat
      Is this possible:
      Executing a function from an include, but taking the function name from a gui input and then executing that function using the include:
      #include <Something.au3> ;input reads "Tree" $functionName = GuiCtrlRead($input1) $functionName(1) And the include is gonna have
      Func Tree($x) If $x = 1 Then $this = "text" MsgBox(0,$this,"whatever") EndFunc is it possible?
    • SkysLastChance
      By SkysLastChance
      #include <IE.au3> #include <MsgBoxConstants.au3> Local $oIE = _IECreate("website") Local $oForms = _IEFormGetCollection($oIE) For $oForm In $oForms MsgBox($MB_SYSTEMMODAL, "Form Info", $oForm.name) _IEImgClick($oForm.name, "Reserved.ReportViewerWebControl.axd?OpType=Resource&amp;Version=12.0.5522.0&amp;Name=Microsoft.ReportingServices.Rendering.HtmlRenderer.RendererResources.TogglePlus.gif", "src") Next I am having trouble clicking an image. Here is what I have tried.