Sign in to follow this  
Followers 0
andrewz

Exporting Data from a Website

15 posts in this topic

#1 ·  Posted (edited)

Hey ;P

Dunno how to start of so here is an explanation of what I want to be able to automate:

Export data from a website called immobilienscout24.de (A website where people offer

properties), for example the name of the owner, where it's located and how much it is.

The data is ALWAYS saved at the same location. For instance:

<div class="margin-bottom font-line-l">
    <span data-qa="contactName" class="font-bold">Herr Thomas und Uschi Westhoff</span>

Is there any function in autoIT available to export this kind of data? In this case it would

be the name "Herr Thomas und Uschi Westhoff" (Yeah german names haha). I cant

seem to find it :( With exporting I mean just saving this into a variable or clipboard.

Here is the link I used for the example:

http://www.immobilienscout24.de/expose/78279770

 

I would be sooooo thankful if anyone could give me an idea on how to start off, as it

takes ages to copy paste all the included data into excel by hand.

Thanks in advance & best regards

Andrewz

Edited by andrewz

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

Have you had a chance to look at the _IE functions reference?

EDIT: By the way, welcome to the AutoIt forum! :D

Edited by MikahS

Snips & Scripts


My Snips: graphCPUTemp ~ getENVvars
My Scripts: Short-Order Encrypter - message and file encryption V1.6.1 ~ AuPad - Notepad written entirely in AutoIt V1.9.4

Feel free to use any of my code for your own use.                                                                                                                                                           Forum FAQ

 

Share this post


Link to post
Share on other sites

Have you had a chance to look at the _IE functions reference?

EDIT: By the way, welcome to the AutoIt forum! :D

 

Thanks you for the welcome :P

And another thanks for the IE functions, I only heard a bit of those but

didnt really look into them yet. I will for sure try my best to use them, if

I cant, I'll ask ^^

best regards,

Andrewz

Share this post


Link to post
Share on other sites

Example of and xpath to use in my sig:

$xpath = "//span[@data-qa='contactName']"

IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

 

Example of and xpath to use in my sig:

$xpath = "//span[@data-qa='contactName']"

 

Sort of ..

$txt = BinaryToString(InetRead("http://www.immobilienscout24.de/expose/78279770", 1))
$name = StringRegExpReplace($txt, '(?is).*contactname.*?>([^<]+).*', "$1")
Msgbox(0,"", $name)

:)

 

Thank you both so much!  Guess I couldnt have figured that out since I am still a beginner :P

@mikell, that works perfect :P I'm gonna make a full application out of it to grab all the required

data from these properties and then save them into a csv table, that should be easy.

It's because im currently doing an internship at an estate agency (I didnt have a choice, would have

gone for any IT-company straight away lol) for school and they always export hundrets of properties

into excel by copy and paste, which of course takes ages to complete.

best regards

Edited by andrewz

Share this post


Link to post
Share on other sites

#7 ·  Posted (edited)

I almost got it now, but there is an error that I dont know how to bypass or fix.

Currently, the script only works if all the data is given on the website, if one is missing

cuz the owner didnt include it , the script doesnt write down anything and exits.

Sooo here it is :

If FileExists("Immobilien.csv") =false Then
FileWrite("Immobilien.csv","Name;Adresse;Tel;Objekt;Ort;Baujahr;Zi;frei/vermie.;Wfl./ qm;Kaltmiete;Warmmietpreis;Scout- ID"& @CRLF)
EndIf

#include <Inet.au3>
#include <Array.au3>
#include <String.au3>
Global $mobil_A= "0"
Global $telefon_A = "0"
Global $url = InputBox("ScoutID","Enter the Scout-ID")
Global $content = _INetGetSource($url)
Global $name_A = _StringBetween($content, '<span data-qa="contactName" class="font-bold">', '</span>')
Global $preis_A = _StringBetween($content, ' "offerPrice": "', '",')
Global $strase_A = _StringBetween($content, '<strong class="font-standard">' , '</strong><br/>')
Global $telefon_A = _StringBetween($content, '<div class="is24-phone-number hide">' ,'</div>')
Global $objekttyp_A = _StringBetween($content, '<dd class="is24qa-wohnungstyp">' ,'</dd>')
Global $ort_A = _StringBetween($content, '</strong><br/>' , '<br/>')
Global $baujahr_A = _StringBetween($content, '<dd class="is24qa-baujahr">','</dd>')
Global $zimmer_A = _StringBetween($content, '<dd class="is24qa-zimmer">','</dd>')
Global $bezugsfrei_A = _StringBetween($content, '<dd class="is24qa-bezugsfrei-ab">' ,'</dd>')
Global $wohnflache_A = _StringBetween($content, '<dd class="is24qa-wohnflaeche-ca">' ,'</dd>')
Global $preiswarm_A =_StringBetween($content, '<strong class="is24qa-gesamtmiete">','</strong>')




$aio= $name_A[0]&";"&$strase_A[0]&";"&$telefon_A[0]&";"&$objekttyp_A[0]&";"&$ort_A[0]&";"&$baujahr_A[0]&";"&$zimmer_A[0]&";"&$bezugsfrei_A[0]&";"&$wohnflache_A[0]&";"&$preis_A[0]&",00"&";"&$preiswarm_A[0]&";"&$url

$sString1 = StringReplace($aio, " ", "") ;removing spaces -to format it later to csv
$sString2 = StringReplace($sString1, "<p>", "") ;removing <p> -useless
$sString3 = StringReplace($sString2, "<span>Mobil:</span>", "") ;removing <span>Mobil:</span> -useless
$sString4 = StringReplace($sString3, "</p>", "") ;removing </p> - useless
$sString5 = StringReplace($sString4, "Â", "") ;removing  from m²
$sString6 = StringReplace($sString5, '<spanclass="is24-operator">=</span>', "") ;removing <spanclass="is24-operator">=</span> -useless
$sString7 = StringReplace($sString6, "EUR", "") ;removing EUR -useless cuz we will format it later in excel
$sString8 = StringReplace($sString7, "<span>Telefon:</span>","") ;removing <span>Telefon:</span> -useless
$sStringfinal = StringReplace($sString8, @CRLF, "") ;finally removing @CRLF to get a csv format


FileWrite ( "Immobilien.csv", $sStringfinal & @CRLF )

I did it a bit different cuz it was easier for me this way.

Now if you try it with this linK: http://www.immobilienscout24.de/expose/78294144 it work perfect.

BUT with this link: http://www.immobilienscout24.de/expose/78295011 it exits, cuz of course it cant

find the adress for example, which is given in the first link as "Grasserstr. 5" but there is no

given in the second link.

Is there anyway to skip or make that variable 0 if it cant be found ?

Thanks in advance!

Edited by andrewz

Share this post


Link to post
Share on other sites

#8 ·  Posted (edited)

This a bit of code that I use in a script of mine. Now if any or all of the fourth field(s) and beyond are blank, the script continues. It doesn't exit. Note that I am using some of the _IE* functions..

$oForm = _IEFormGetObjByName($oIE, "cpform")

$Spammer[0] = _IEFormElementGetObjByName($oForm, "user[username]")
$Spammer[0] = _IEFormElementGetValue($Spammer[0])
$Spammer[1] = _IEFormElementGetObjByName($oForm, "user[email]")
$Spammer[1] = _IEFormElementGetValue($Spammer[1])
$Spammer[2] = _IEFormElementGetObjByName($oForm, "user[ipaddress]")
$Spammer[2] = _IEFormElementGetValue($Spammer[2])
$Spammer[3] = _IEFormElementGetObjByName($oForm, "user[homepage]")
$Spammer[3] = _IEFormElementGetValue($Spammer[3])
$Spammer[4] = _IEFormElementGetObjByName($oForm, "profile[field1]") ;Biography
$Spammer[4] = _IEFormElementGetValue($Spammer[4])
$Spammer[5] = _IEFormElementGetObjByName($oForm, "profile[field2]") ;Location
$Spammer[5] = _IEFormElementGetValue($Spammer[5])
$Spammer[6] = _IEFormElementGetObjByName($oForm, "profile[field3]") ;Interests
$Spammer[6] = _IEFormElementGetValue($Spammer[6])
$Spammer[7] = _IEFormElementGetObjByName($oForm, "profile[field4]") ;Occupation
$Spammer[7] = _IEFormElementGetValue($Spammer[7])
This is a bit of code from another script that I have written. Note that it uses the native Inet function '_INetGetSource'. If any of the array elements don't exist, the script does not quit. I don't know if either of these 'code bits' will help you, but good luck with your project!

 

Global $Banyan_Calico[5] = ["Registering", "Activating", "Modifying", "Viewing User Profile", "Viewing User Control Panel"], $Quatrain

While 1
Local $Source = _INetGetSource("http://forum.powweb.com/online.php?who=members")
If StringInStr($Source, "The server is too busy at the moment.") <> 0 Then MsgBox(48 + 4096, "Oh No!!", "Busy server.", 3) ;If text does exist
For $a = 0 To UBound($Banyan_Calico) - 1
If StringInStr($Source, $Banyan_Calico[$a], 1) <> 0 Then ;If text does exist
SoundPlay(@ScriptDir & "\foghorn.mp3")
MsgBox(48 + 4096, @ScriptName, $Banyan_Calico[$a], 3)
Whoson()
EndIf
Next
TraySetIcon("hourglass.ico")
Timer()
WEnd
Edited by somdcomputerguy

- Bruce /*somdcomputerguy */  If you change the way you look at things, the things you look at change.

Share this post


Link to post
Share on other sites

This a bit of code that I use in a script of mine. Now if any or all of the fourth field(s) and beyond are blank, the script continues. It doesn't exit. Note that I am using some of the _IE* functions..

$oForm = _IEFormGetObjByName($oIE, "cpform")

$Spammer[0] = _IEFormElementGetObjByName($oForm, "user[username]")
$Spammer[0] = _IEFormElementGetValue($Spammer[0])
$Spammer[1] = _IEFormElementGetObjByName($oForm, "user[email]")
$Spammer[1] = _IEFormElementGetValue($Spammer[1])
$Spammer[2] = _IEFormElementGetObjByName($oForm, "user[ipaddress]")
$Spammer[2] = _IEFormElementGetValue($Spammer[2])
$Spammer[3] = _IEFormElementGetObjByName($oForm, "user[homepage]")
$Spammer[3] = _IEFormElementGetValue($Spammer[3])
$Spammer[4] = _IEFormElementGetObjByName($oForm, "profile[field1]") ;Biography
$Spammer[4] = _IEFormElementGetValue($Spammer[4])
$Spammer[5] = _IEFormElementGetObjByName($oForm, "profile[field2]") ;Location
$Spammer[5] = _IEFormElementGetValue($Spammer[5])
$Spammer[6] = _IEFormElementGetObjByName($oForm, "profile[field3]") ;Interests
$Spammer[6] = _IEFormElementGetValue($Spammer[6])
$Spammer[7] = _IEFormElementGetObjByName($oForm, "profile[field4]") ;Occupation
$Spammer[7] = _IEFormElementGetValue($Spammer[7])
This is a bit of code from another script that I have written. Note that it uses the native Inet function '_INetGetSource'. If any of the array elements don't exist, the script does not quit. I don't know if either of these 'code bits' will help you, but good luck with your project!

 

Global $Banyan_Calico[5] = ["Registering", "Activating", "Modifying", "Viewing User Profile", "Viewing User Control Panel"], $Quatrain

While 1
Local $Source = _INetGetSource("http://forum.powweb.com/online.php?who=members")
If StringInStr($Source, "The server is too busy at the moment.") <> 0 Then MsgBox(48 + 4096, "Oh No!!", "Busy server.", 3) ;If text does exist
For $a = 0 To UBound($Banyan_Calico) - 1
If StringInStr($Source, $Banyan_Calico[$a], 1) <> 0 Then ;If text does exist
SoundPlay(@ScriptDir & "\foghorn.mp3")
MsgBox(48 + 4096, @ScriptName, $Banyan_Calico[$a], 3)
Whoson()
EndIf
Next
TraySetIcon("hourglass.ico")
Timer()
WEnd

 

Spammer :P, anway thanks a lot ! Let's see if I can get this working now...

best regards,

Andrewz

Share this post


Link to post
Share on other sites

Ya, $Spammer[] :) I chose that variable name since I use the script to get info from another forum that I moderate. That way I don't have to clip/paste all the necessary info individually, which takes quite a long time. BTW, you don't need to quote any or all of my post(s), I know what I have written. Although a partial quote may help someone else know what you are replying about, but again it's not really necessary.


- Bruce /*somdcomputerguy */  If you change the way you look at things, the things you look at change.

Share this post


Link to post
Share on other sites

Hmm I can't find out how to solve this error ...the main question is solved anyway.

Share this post


Link to post
Share on other sites

One way to solve the error problem is an error checking  - obviously  :)

$preis_A = _StringBetween($content, ' "offerPrice": "', '",')
$preis = (IsArray($preis_A) = 1) ? $preis_A[0] : "not found"

Using this small example, if the _StringBetween fails then the returned result is "not found" instead of nothing

Share this post


Link to post
Share on other sites

#13 ·  Posted (edited)

One way to solve the error problem is an error checking  - obviously  :)

$preis_A = _StringBetween($content, ' "offerPrice": "', '",')
$preis = (IsArray($preis_A) = 1) ? $preis_A[0] : "not found"

Using this small example, if the _StringBetween fails then the returned result is "not found" instead of nothing

 

Hey, thanks, that will work too :P

I did it this way:

If  IsArray($preis_A) Then
$preis_B = $preis_A[0]
Else
$preis_B = "not found"
EndIf

And later on use $preis_B in order to display the data.

Which way to you think is better? (Maybe resource consuming related)

The one I use or the one you provided? Your's looks shorter so maybe it

is better, but I dunno anything about this ...

Edited by andrewz

Share this post


Link to post
Share on other sites

The 2 ways are exactly the same - have a look in the helpfile at 'ternary operator' for details

But one way uses 1 line of code, the other one uses 5 lines  :P

Share this post


Link to post
Share on other sites

Hi All,

I've been stuggeling with something simulair the last couple of days (Been browsing the fora for a possible sullotion, array's and such are still kinda new to me..)

And the one sullotion above seemed also great one for me... but it doesn't do exactly what it suppose to to.

It does create a .csv, every time I run the script it puts in another line, but it doesn't seem to find the info in the HTML/website (all it gives are 0's) So I suspect that the script doesn't read the site or don't seem to find info that I want. Been breaking my head over it all weekend, but can't seem to find where I gone wrong. :ermm:

Here is the script I use to test it and the HTML where I test it with

HotKeySet("{ESC}", "Terminate")

Opt("WinTextMatchMode", 2)      ;1=complete, 2=quick
Opt("WinTitleMatchMode", 1)     ;1=start, 2=subStr, 3=exact, 4=advanced, -1 to -4=Nocase
AutoItSetOption("MouseCoordMode", 0)
opt("SendKeyDelay",90)
opt("WinWaitDelay",35)
opt("TrayIconDebug",1)
#include <IE.au3>
#include <Inet.au3>
#include <Array.au3>
#include <String.au3>
#include <MsgBoxConstants.au3>
If FileExists("C:\Data\Auto ITs\check\check.csv") =false Then
FileWrite("C:\Data\Auto ITs\check\check.csv","Actief;Lidstaat;nummer;Tijdstip waarop de aanvraag werd ontvangen;Naam;Adres;Cnummer"& @CRLF)
EndIf

$content = _INetGetSource("C:\Data\Auto ITs\check\Test.htm")
$Status = _StringBetween($content, '<span class="validStyle">', "</span></b></td>")
$Lidstaat = _StringBetween($content, '<td class="labelStyle">Lidstaat</td> <td>' , '</td>')
$nr = _StringBetween($content, '<td class="labelStyle">nummer</td> <td>' , '</td>')
$Tijd = _StringBetween($content, '<td class="labelStyle">Tijdstip waarop de aanvraag werd ontvangen</td> <td>' , '</td>')
$Naam = _StringBetween($content, '<td class="labelStyle">Naam</td> <td>' , '</td>')
$Adres= _StringBetween($content, '<td class="labelStyle">Adres</td> <td>' , '</td>')
$Cnummer = _StringBetween($content, '<td class="labelStyle">Cnummer</td> <td>' , '</td>')

$aio= $Status&";"&$Lidstaat&";"&$nr&";"&$Tijd&";"&$Naam&";"&$Adres&";"&$Cnummer

$sString1 = StringReplace($aio, " ", "") ;removing spaces -to format it later to csv
$sString2 = StringReplace($sString1, "<p>", "") ;removing <p> -useless
$sString3 = StringReplace($sString2, "<span>Mobil:</span>", "") ;removing <span>Mobil:</span> -useless
$sString4 = StringReplace($sString3, "</p>", "") ;removing </p> - useless
$sString5 = StringReplace($sString4, "Â", "") ;removing  from m²
$sString6 = StringReplace($sString5, '<spanclass="is24-operator">=</span>', "") ;removing <spanclass="is24-operator">=</span> -useless
$sString7 = StringReplace($sString6, "EUR", "") ;removing EUR -useless cuz we will format it later in excel
$sString8 = StringReplace($sString7, "<span>Telefon:</span>","") ;removing <span>Telefon:</span> -useless
$sStringfinal = StringReplace($sString8, @CRLF, "") ;finally removing @CRLF to get a csv format

FileWrite ( "check.csv", $sStringfinal & @CRLF )

Func Terminate()
    Exit 0
EndFunc

The HTML test page

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
    
    
    <title>Test</title>
</head>
<body>
<a id="top-page" name="top-page"></a>
<div id="layout" class="layout">








<div id="header">
    

    <h2>Info</h2>
    <fieldset>
        <table id="vatResponseFormTable">
            <tr>
                <td class="labelLeft" colspan="3"><b><span class="validStyle">Ja, correct</span></b></td> 
           </tr>
           <tr>
                <td><br /></td>
            </tr>
            <tr>
                <td class="labelStyle">Lidstaat</td> 
                <td>NL</td>
                <td class="errorFormStyle"></td>
            </tr>
            <tr>
                <td class="labelStyle">nummer</td> 
                <td>820471616gdwsg01</td>
            </tr>
            <tr>
                <td class="labelStyle">Tijdstip waarop de aanvraag werd ontvangen</td> 
                <td>2015/01/12 12:28:03</td>
            </tr>
            
                <tr>
                    <td class="labelStyle">Naam</td> 
                    <td>T. Est
</td>
                    
                </tr>
             
             
             
            
                <tr>
                    <td class="labelStyle">Adres</td> 
                    <td><br />Straat 00189<br />1234AA Stad<br />
</td>
                </tr>
            
             
            
                <tr>
                    <td class="labelStyle">Cnummer</td> 
                    <td></td>
                </tr>
            
        </table>
        <br />
        <p><a href="backtest.html">Back</a></p>
    </fieldset>

                </div>
            </div>
        </div>
    </div>
</div>



</div>

</body>
</html>

If somebody could point out where I gone wrong or send me in the right direction it would be greatly appreciated :)

Thanks in advanced!

-Kap

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0