andrewz

Extracting Data from the source of a website - wont work :(

25 posts in this topic

#1 ·  Posted (edited)

Hello!

I am currently facing a problem which I can't seem to be able to solve.

What do I want to do with the script ?
Extract all the links of the hotels on this website: http://www.yelp.de/search?cflt=hotels&find_loc=Berlin%2C+Germany
For example the first link to the first hotel would be: http://www.yelp.de/biz/novum-hotel-city-b-berlin-zentrum-berlin  - changes sometimes, so the link will be different.

To start off, I tried to export only one hotel at first. I am using this code to read the content from the source

and then get the content between two "functions" or whatever these are called:

#NoTrayIcon
#include <Inet.au3>
#include <Array.au3>
#include <String.au3>
#include <AutoItConstants.au3>
#include <MsgBoxConstants.au3>
Global $url = "http://www.yelp.de/search?find_desc=Hotels&find_loc=Berlin&ns=1"

Global $content = _INetGetSource($url)
Global $string_A = _StringBetween($content, '<div class="media-avatar">', '</div>')

MsgBox(0,"",$string_A[0])

It's part of an older project, which did almost the same thing, with the exeption that this one is not as easy :(

The link is saved differently, and I can't find a way to export it. After it's saved into an array, I am going to

save the links into a variable with a do - until function. But first I need this step working.

Please, if anyone has an idea how to solve this, even the smallest help is appreciated!

Edited by andrewz

Share this post


Link to post
Share on other sites



If you run Microsofts Internet Explorer you could use the IE UDF to extract the links.


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

If you run Microsofts Internet Explorer you could use the IE UDF to extract the links.

 

Doesnt matter which browser, I could use any. But thanks, will look into it ;)

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

Simple example with the IE UDF functionality. Get all links:

#include <IE.au3>
#include <MsgBoxConstants.au3>

Global $url = "http://www.yelp.de/search?find_desc=Hotels&find_loc=Berlin&ns=1"

Local $oIE = _IECreate($url)
Local $oLinks = _IELinkGetCollection($oIE)
Local $iNumLinks = @extended

Local $sTxt = $iNumLinks & " links found" & @CRLF & @CRLF
For $oLink In $oLinks
    $sTxt &= $oLink.href & @CRLF
Next
MsgBox($MB_SYSTEMMODAL, "Link Info", $sTxt)

Only difference between this and helpfile example (_IELinkGetCollection()) is *_IECreate() & $url, so it should be pretty easy to start using the IE UDF. :D

Edited by MikahS
1 person likes this

Snips & Scripts


My Snips: graphCPUTemp ~ getENVvars
My Scripts: Short-Order Encrypter - message and file encryption V1.6.1 ~ AuPad - Notepad written entirely in AutoIt V1.9.4

Feel free to use any of my code for your own use.                                                                                                                                                           Forum FAQ

 

Share this post


Link to post
Share on other sites

@mikahs and water, you both are brilliant!

Share this post


Link to post
Share on other sites

@mikahs and water, you both are brilliant!

 

I'd say that's just Water. ;)

Nonetheless, it is my pleasure. :)


Snips & Scripts


My Snips: graphCPUTemp ~ getENVvars
My Scripts: Short-Order Encrypter - message and file encryption V1.6.1 ~ AuPad - Notepad written entirely in AutoIt V1.9.4

Feel free to use any of my code for your own use.                                                                                                                                                           Forum FAQ

 

Share this post


Link to post
Share on other sites

MikahS,

you are brilliant too :)

I just pointed him into the right direction, but you showed him working code.

1 person likes this

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

#8 ·  Posted (edited)

MikahS,

you are brilliant too :)

I just pointed him into the right direction, but you showed him working code.

 

Thank you Water, I appreciate it. :)

Edited by MikahS

Snips & Scripts


My Snips: graphCPUTemp ~ getENVvars
My Scripts: Short-Order Encrypter - message and file encryption V1.6.1 ~ AuPad - Notepad written entirely in AutoIt V1.9.4

Feel free to use any of my code for your own use.                                                                                                                                                           Forum FAQ

 

Share this post


Link to post
Share on other sites

Give credit where credit is due :)

1 person likes this

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

#10 ·  Posted (edited)

So I used it like this:

Global $url = "http://www.yelp.de/search?find_desc=Hotels&find_loc=Berlin&ns=1#start="&$timesran&"&cflt=hotels"
_IECreate($url)
Sleep(5000)
Local $oIE = _IECreate($url,1,1)
Local $oLinks = _IELinkGetCollection($oIE)
Local $iNumLinks = @extended

I first use _IECreate to open the window cuz "yelp.de" doesnt load the items immediately, it first shows

hotel 1-10 and then after a few seconds displays the ones from the next page. So if the variable

$timesran would be "10" (That means page 2) , it would first completly load the page, then

take the current, already opened IE window and store it inside the variable, and finally collect

all the links.

But, as of today this doesnt seem to work :( Is there any workaround to this, so that the programm would

first wait for the page to not only load completly, but also load the items completly which are loaded

usually 2-3 seconds afterwards.

Try :  http://www.yelp.de/search?find_desc=Hotels&find_loc=Berlin&ns=1#start=10&cflt=hotels

Thanks in advance

Edited by andrewz

Share this post


Link to post
Share on other sites

worked fine for me:

#include <IE.au3>
Global $timesran = "10"
Global $url = "http://www.yelp.de/search?find_desc=Hotels&find_loc=Berlin&ns=1#start="&$timesran&"&cflt=hotels"
_IECreate($url)
Sleep(5000)
Local $oIE = _IECreate($url,1,1)
Local $oLinks = _IELinkGetCollection($oIE)
Local $iNumLinks = @extended
MsgBox(0, "", $iNumLinks)

Snips & Scripts


My Snips: graphCPUTemp ~ getENVvars
My Scripts: Short-Order Encrypter - message and file encryption V1.6.1 ~ AuPad - Notepad written entirely in AutoIt V1.9.4

Feel free to use any of my code for your own use.                                                                                                                                                           Forum FAQ

 

Share this post


Link to post
Share on other sites

#12 ·  Posted (edited)

 

worked fine for me:

#include <IE.au3>
Global $timesran = "10"
Global $url = "http://www.yelp.de/search?find_desc=Hotels&find_loc=Berlin&ns=1#start="&$timesran&"&cflt=hotels"
_IECreate($url)
Sleep(5000)
Local $oIE = _IECreate($url,1,1)
Local $oLinks = _IELinkGetCollection($oIE)
Local $iNumLinks = @extended
MsgBox(0, "", $iNumLinks)

 

Did it display you the hotels 1-10 or 11-20 ?

Maybe I should have posted more code:

#NoTrayIcon
#include <IE.au3>
#include <MsgBoxConstants.au3>
#include <File.au3>
#include <Inet.au3>

FileDelete("Links.txt")
_FileCreate("Edited.txt")

$timesran = 0

Do
_FileCreate("Links.txt")
Global $url = "http://www.yelp.de/search?find_desc=&find_loc=Frankfurt+am+Main%2C+Hessen&ns=1#find_desc=Hotel&start="&$timesran&"&cflt=hotels"
_IECreate($url)
Sleep(5000)
Local $oIE = _IECreate($url,1,1)
Local $oLinks = _IELinkGetCollection($oIE)
Local $iNumLinks = @extended

Local $sTxt = $iNumLinks & " links found" & @CRLF & @CRLF
For $oLink In $oLinks
    $sTxt &= $oLink.href & @CRLF
Next

FileWrite("Links.txt",$sTxt)


$file = "Links.txt"
FileOpen($file, 0)
For $i = 1 to _FileCountLines($file)
    $line = FileReadLine($file, $i)
If StringInStr($line,"http://www.yelp.de/biz/") = true Then
    $content = FileRead("Edited.txt")
    If StringInStr($content,$line) = false Then
    FileWrite("Edited.txt",$line & @CRLF)
    EndIf
    EndIf
Next
FileClose($file)
FileDelete("Links.txt")
sleep(1000)
$timesran += 10

Until $timesran = "1000"

$msg=_FileCountLines ("Edited.txt")
MsgBox(0,"",$msg)
Edited by andrewz

Share this post


Link to post
Share on other sites

Hotels 11-20.


Snips & Scripts


My Snips: graphCPUTemp ~ getENVvars
My Scripts: Short-Order Encrypter - message and file encryption V1.6.1 ~ AuPad - Notepad written entirely in AutoIt V1.9.4

Feel free to use any of my code for your own use.                                                                                                                                                           Forum FAQ

 

Share this post


Link to post
Share on other sites

Hotels 11-20.

 

Hmm I edited my last post with the full code, dunno why it worked yesterday. :/

Am I missing something ? Thanks in advance

Share this post


Link to post
Share on other sites

What is the expected order of hotels?


Snips & Scripts


My Snips: graphCPUTemp ~ getENVvars
My Scripts: Short-Order Encrypter - message and file encryption V1.6.1 ~ AuPad - Notepad written entirely in AutoIt V1.9.4

Feel free to use any of my code for your own use.                                                                                                                                                           Forum FAQ

 

Share this post


Link to post
Share on other sites

#16 ·  Posted (edited)

What is the expected order of hotels?

 

Oh  I still got "until timeran = 1000" from berlin, I switched to frankfurt tho.

Frankfurth has 704 listed hotels on the site. http://www.yelp.de/search?find_desc=&find_loc=Frankfurt+am+Main%2C+Hessen&ns=1#find_desc=Hotel&start=0&cflt=hotels

The order doesnt matter, aslong as all the links from page 1-71 are saved

into a txt file. Yelp.de however has this different way of loading the results,

why the usual collect link function didnt work untill I changed it to first

run IE and load it, then take it if it already exists and grab all the links.

Somehow this worked yesterday, and today suddently stopped working .

&start=0 means page 1

&start=10 means page 2

and so on...you probably already understood that.

Edited by andrewz

Share this post


Link to post
Share on other sites

Maybe because you are using a different URL?

; one used before

"http://www.yelp.de/search?find_desc=Hotels&find_loc=Berlin&ns=1"
;one used now
 

"http://www.yelp.de/search?find_desc=&find_loc=Frankfurt+am+Main%2C+Hessen&ns=1#find_desc=Hotel&start="&$timesran&"&cflt=hotels"

Snips & Scripts


My Snips: graphCPUTemp ~ getENVvars
My Scripts: Short-Order Encrypter - message and file encryption V1.6.1 ~ AuPad - Notepad written entirely in AutoIt V1.9.4

Feel free to use any of my code for your own use.                                                                                                                                                           Forum FAQ

 

Share this post


Link to post
Share on other sites

#18 ·  Posted (edited)

 

Maybe because you are using a different URL?

; one used before

"http://www.yelp.de/search?find_desc=Hotels&find_loc=Berlin&ns=1"
;one used now
 

"http://www.yelp.de/search?find_desc=&find_loc=Frankfurt+am+Main%2C+Hessen&ns=1#find_desc=Hotel&start="&$timesran&"&cflt=hotels"

 

Doesnt make a difference, the displayed results are the same, I also tried it with : http://www.yelp.de/search?find_desc=Hotel&find_loc=Frankfurt+am+Main%2C+Hessen&cflt=hotels#start=10

which showed the same results and still didnt work. Might also be the PC here which uses some older IE version, and is kinda slow.

The programm also opens all the links perfectly fine, but the collected links are always hotels 1 - 10 , dunno why it doesnt grab the

ones which load after a few seconds.

EDIT: Got it to work by removing the filedelete and filecreate function each time it repeats! It's screwing everything up :o

The intention of deleting and recreating it was to lower the search time in the links file. Now I will just make it empty the file.

Thanks for the ideas/help tho :)

Just added this:

FileOpen($file, 2)
FileClose($file)
Edited by andrewz

Share this post


Link to post
Share on other sites

Thanks for the ideas/help tho :)

 

My pleasure. ;)


Snips & Scripts


My Snips: graphCPUTemp ~ getENVvars
My Scripts: Short-Order Encrypter - message and file encryption V1.6.1 ~ AuPad - Notepad written entirely in AutoIt V1.9.4

Feel free to use any of my code for your own use.                                                                                                                                                           Forum FAQ

 

Share this post


Link to post
Share on other sites

I'm so confused...

1.  You're not declaring the "first" _IECreate() with a variable

2.  You're using Sleep() instead of _IELoadWait()

3.  You then go to the same URL "again" right after the sleep, this time with a declared _IECreate() variable object

4.  You're constantly declaring variable and creating (over writing older ones) files within a loop


Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now

  • Similar Content

    • islandspapand
      By islandspapand
      Hi all
      i am currently trying to click on an element in a HTML Table, but just can get it to work.
      i am able to click the top of the table so it changes to sort  but just can't click on the element in the table.
      an i need to click on element to continue in the site.
      i have attached the code so far and pictures of the table  element want to click plus the source of the table.
      i am able to get data in the table with $oTable = _IETableGetCollection($oIE, 2) but not able to click on them.
       
      Help is very much appreciated
       
      #cs ---------------------------------------------------------------------------- AutoIt Version: 3.3.14.2 Author: myName Script Function: Template AutoIt script. #ce ---------------------------------------------------------------------------- ; Script Start - Add your code below here #include <IE.au3> #include "DOM.au3" #include <Array.au3> #include <MsgBoxConstants.au3> Global $oIE = _IECreate("*") _IELoadWait($oIE) Sleep(2000) _PageLogin($oIE) _PageLoadWait() _PageNewReq($oIE) _PageLoadWait() _InputModelInf($oIE) _PageLoadWait() Sleep(1000) $aTableLink = BGe_IEGetDOMObjByXPathWithAttributes($oIE, "//table/tbody/tr/td[.='Name Of user']", 2000) ;~ $aTableLink = BGe_IEGetDOMObjByXPathWithAttributes($oIE, "//table/tbody/tr", 2000) ;~ _ArrayDisplay($aTableLink,"$aTableLink") If IsArray($aTableLink) Then ConsoleWrite("Able to BGe_IEGetDOMObjByXPathWithAttributes($oIE, //table/tbody/tr/td[.='Name Of user'])" & @CRLF) For $i = 0 To UBound($aTableLink)-1 ConsoleWrite(" OuterHTML : " & $aTableLink[$i].outerHTML & @CRLF) ConsoleWrite(" Parentnode : " & $aTableLink[$i].parentnode & @CRLF) ConsoleWrite(" Parentnode.click : " & $aTableLink[$i].parentnode.fireEvent("onclick","click") & @CRLF) $objClick = $aTableLink[$i].parentnode ;~ _IEAction($aTableLink[$i] , "focus") _IEAction($objClick , "focus") ;~ If _IEAction($aTableLink[$i], "click") Then If _IEAction($objClick, "click") Then ConsoleWrite("Able to _IEAction($aForumLink[0], 'click')" & @CRLF) _IELoadWait($oIE) Else ConsoleWrite("UNable to _IEAction($aForumLink[0], 'click')" & @CRLF) Exit 3 EndIf Next Else ConsoleWrite("Unable to BGe_IEGetDOMObjByXPathWithAttributes($oIE, //table/tbody/tr/td[.='Name Of user'])" & @CRLF) Exit 2 EndIf _PageLoadWait() Func _InputModelInf($oTmpIE) ; Add Var for Model & Serial in Func $oModelInput = _IEGetObjById($oTmpIE,"model") _IEAction($oModelInput,"focus") _IEDocInsertText($oModelInput, "*") $oSerialInput = _IEGetObjById($oTmpIE,"serial") _IEAction($oModelInput,"focus") _IEDocInsertText($oSerialInput, "*") $links = $oTmpIE.document.getElementsByClassName("btn btn-primary ng-scope") For $link In $links If $link.innertext = "Søg" Or $link.innertext = "Search" Then $link.click() ExitLoop EndIf Next Return True EndFunc Func _PageNewReq($oTmpIE) $links = $oTmpIE.document.getElementsByClassName("ng-scope k-link") For $link In $links If $link.innertext = "Send ny fejlmelding" Or $link.innertext = "Submit a New Service Request" Then $link.click() ExitLoop EndIf Next Return True EndFunc Func _PageLogin($oTmpIE) $oUserInput = _IEGetObjById($oTmpIE,"loginid") _IEDocInsertText($oUserInput, "*") $oPasswordInput = _IEGetObjById($oTmpIE,"password") _IEDocInsertText($oPasswordInput, "*") $links = $oTmpIE.document.getElementsByClassName("btn btn-primary login ng-scope") For $link In $links If $link.innertext = "Sign in" Then $link.click() ExitLoop EndIf Next Return True EndFunc Func _PageLoadWait() Local $PageLoadWait = False ;~ nav navbar-nav navbar-right ng-hide ;~ nav navbar-nav navbar-right $tags = $oIE.document.GetElementsByTagName("ul") For $tag in $tags $class_value = $tag.GetAttribute("class") If $class_value = "nav navbar-nav navbar-right" Then ConsoleWrite('@@ Debug(' & @ScriptLineNumber & ') : Webpage loading :) ' & @CRLF) ;### Debug Console $PageLoadWait = True ExitLoop EndIf Next Do sleep(250) For $tag in $tags $class_value = $tag.GetAttribute("class") If $class_value = "nav navbar-nav navbar-right ng-hide" Then ConsoleWrite('@@ Debug(' & @ScriptLineNumber & ') : Webpage load finished :)'& @CRLF) ;### Debug Console $PageLoadWait = False ExitLoop EndIf Next Until $PageLoadWait = False EndFunc  
      Thanks in advance
       
       


    • Eli_jahbot
      By Eli_jahbot
      My esteemed Autoits I need your help once again.
      I'm trying to figure out how to create a loop that gets 1 value from an array and repeats until each value from the array has been used. I have never used arrays before and I know once I learn more things should get easier for me. 
      Here is what Im trying to do:
      -Have an array of values that determine what application i log into. ex: app1, app2, app3, app4 etc.
      -Have a loop that repeats a process sequentially using each value in the array to finish the process for each app1, app2, app 3 and so forth. I have 30 apps that I need to update on a regular basis and getting this sorted out is what I perceive to be the best way to do it.
      Here is my feeble attempt that obviously fails:
      #include <msgboxconstants.au3>
      #include <Constants.au3>
      #include <array.au3>
      Login()
      Func Login()
          local $array[30] = ["10", "11", "12",etc etc]
          ;;Local $site = InputBox("ERx Site","What site do you want to login as?","","")
          Local $userid = InputBox("ERx Login", "What is your username?", "", "")
          Local $Passwd = InputBox("Security Check", "Enter your UAT password.", "", "*")
      for $1 = 1 to 30(I need to do the same steps in 30 different apps)
      run("Z:launch.exe")
      WinWaitActive("Input")
      Send (Sequential ARRAY VALUE HERE)
      Send("{ENTER}")
      WinWaitActive("window")
      Send($userid)
      Send("{TAB}")
      send($Passwd)
      Send("{ENTER}")
      WinWaitActive("[CLASS:SunAwtDialog]")
      Sleep(500)
      WinClose("Home Page")
      Next
      EndFunc
       
      your help is greatly appreciated.
      Thanks for your time
    • Atoxis
      By Atoxis
      Howdy, this is my first post, massive fan of autoit. 
      I've searched and tried and I would just like people who are better at this than me to let me know if this is even a thing.

      I'd like to perform just a variable. For example, it would be. *see inserted code*
      So what i'm wanting is, create the constant $test, and that variable would be what is followed after the = . Then perform the _FileCreate. Then perform the variable.  Logically or in my head rather.. That variable is declared and is equal to what it is set to above, therefore just placing the variable plainly in the script, it should be equal to what it was declared as.  So what am I doing wrong, and or how can I have autoit just perform the variable.  

       
      #include <File.au3> Const $test = FileWriteLine(@DesktopDir & "\Log.txt", @CRLF ) _FileCreate(@DesktopDir & "\Log.txt") $test  
    • Spask
      By Spask
      Hi, I'm trying to find a text value inside of a html.
      This is what the line looks like normally:
      <p id="line1" class> <span class="bot">TEXT HERE</span> </p> The text then changes to a non breaking space:
      <p id="line1" class> <span class="bot">&nbsp;</span> </p> And then it changes back to normal text but it's different every time.
      Can I code this so that it grabs the text every time it changes and has a variable that represents it?
      I currently have this inside of my loop:
      $span = .document.getElementsByTagName("span") For $text In $span If $text.value = "&nbsp;" Then Sleep(50) MsgBox(0,0,0) ;messagebox to test if it can be found, but I don't know how to grab the text EndIf Next The problem is that there are many other lines in the html that have the same span but are called "line3", "line5", etc and the one I need is from "line1".
      I will appreciate if anyone can help with this!
    • ur
      By ur
      I have a script , during compilation and test execution, it worked perfectly but sometimes I am getting error as "Variable used without being declared."
       
      I understood somewhere in the branching logic this is happening.
       
      But not able to find it exactly.
       
      As I am using multiple include statements.the line number is also not giving accurately.
       

       
      Can anyone suggest what is the approach to resolve this?