Jump to content

Read website data


Recommended Posts

I am trying to read the data from this website:

https://allegro.pl/moje-allegro/moje-konto/nadchodzace-oplaty (you need to log in but don't worry I will show screenshot below)

I am interested in this section to read the data (screenshot from google chrome web browser):

section.thumb.jpg.0cde6670a3c2c97be585331f78ee23d3.jpg

Normally I would use IE.au3 and for example function _IEDocReadHTML() to get the data from that section but there is empty div instead of.

<div id="opbox-quote-billing"></div>

I think the data is loaded with some javascript.

Is there any other way to read the data? Any idea?

Link to comment
Share on other sites

If it's Javascript-loaded, then waiting a bit more after the page load event could help.

You could also use a loop (do/until) to keep polling the page with _IEDocReadHTML() until the desired content is returned.

My stuff

Spoiler

My UDFs  _AuThread multithreading emulation for AutoIt · _ExtInputBox an inputbox with multiple inputs and more features · forceUTF8 fix strings encoding without knowing its original charset · JSONgen JSON generator · _TCPServer UDF multi-client and multi-task (run on background) event-based TCP server easy to do · _TCPClient_UDF multi-server and multi-task (runs on background) event-based TCP client easy to do · ParseURL and ParseStr functions ported from PHP · _CmdLine UDF easily parse command line parameters, keys or flags · AutoPHP Create documents (bills, incomes) from HTML by sending variables/arrays from AutoIt to PHP · (Un)Serialize Convert arrays and data into a storable string (PHP compatible) · RTTL Plays and exports to MP3 Nokia-format monophonic ringtones (for very old cellphones) · I18n library Simple and easy to use localization library · Scripting.Dictionary OOP and OOP-like approach · Buffer/stack limit arrays to N items by removing the last one once the limit is reached · NGBioAPI UDF to work with Nitgen fingerprint readers · Serial/Licensing system require license key based on unique machine ID from your users · HTTP a simple WinHTTP library that allows GET, POST and file uploads · Thread true AutoIt threads (under-dev) · RC4 RC4 encryption compatible with PHP and JS ·  storage.au3 localStorage and sessionStorage for AutoIt Classes _WKHtmlToX uses wkhtmlto* to convert HTML files and webpages into PDF or images (jpg, bmp, gif, png...) Snippets _Word_DocFindReplaceByLongText replace strings using Word UDF with strings longer than 255 characters (MSWord limit) rangeparser parser for printing-like pages interval (e.g.: "1,2,3-5") EnvParser parse strings/paths with environment variables and get full path GUICtrlStaticMarquee static text scrolling Random stuff Super Mario beep sound your ears will hurt

 

Link to comment
Share on other sites

Using the Chrome debugger it may be possible to find the page in which it draws it's data from by finding the function, if you do; you might be able to use winhttp requests to grab the HTML of that page and interpret the data from there. Though this is an entirely other solution that only works for some sites.

No, I'm not making a bot, keylogger, password stealer, virus, or anything else malicious, there are over a million different reasons someone would ask for whatever I did. Yes, I need to figure out the answer to the exact question I asked because I have valid reasons as to why I'm not using the alternatives. No, I cannot prove I'm not doing anything malicious because this is a forum and that is impossible.

Thank you AutoIt community for always accusing people of doing something evil, good day.

 - The Infamous Impulsive Puls3

Link to comment
Share on other sites

Well, the website I want read data from has testing enviroment (sandbox) where I created account for testing purposes (you can try to log in if you want)

Login = "TestujemyTylko"
Password = "AllegroTest123"

(please don't change it)

Url = https://allegro.pl.allegrosandbox.pl/

Click on the "Moje Allegro" and "Zaloguj" to log in

2.png.8659acd65e11c9467085e21d1d24cd6b.png

Page I want to read data from = https://allegro.pl.allegrosandbox.pl/moje-allegro/moje-konto/nadchodzace-oplaty

screen.thumb.JPG.c21c558a00090bce28a77e868a954c66.JPG

Above screen shot comes from Google Chrome web browser.

When I try to load that page on IE (on my Win10) there is some problem and page is not fully loaded (main data is missing).

3.thumb.JPG.aa2e48eab67961dc9f9a65411d1f9579.JPG

I reported this to Allegro directly but they don't want to fix this as they don't support IE anymore.

Anyway I created some code which auto log in to that webpage and read source html from destination webpage. Unfortunatelly code does not include the data I want to read. I belive script must be re-written to use Google chrome instead or you have any other idea?

#include <IE.au3>

GLobal $sResult = __IE_AllegroRead() ;Allegro is website name I want read data from
ClipPut($sResult) ; put result html code to clipboard and paste in some notepad to analyse
MsgBox($MB_ICONINFORMATION, @ScriptName, "Script finished! Check your clipboard")

Func __IE_AllegroRead()
    Local Const $sReadDataPageUrl = "https://allegro.pl.allegrosandbox.pl/moje-allegro/moje-konto/nadchodzace-oplaty" ;destination url where I want to read data from
    Local $oIE = _IECreate()
    if @error then Exit MsgBox($MB_ICONERROR, @ScriptName, '_IECreate() error = ' & @error)
    __IE_AllegroLogIn($oIE, $sReadDataPageUrl)

    _IENavigate($oIE, $sReadDataPageUrl) ; the big problem is that some IE versions does not support this url and table with result data you can only see for 1 sec and then it disappear
    if @error then Exit MsgBox($MB_ICONERROR, @ScriptName, '_IENavigate($oIE, $destination_url) error = ' & @error)

    _IELoadWait($oIE)
    Sleep(5000)
    Local $sHTML = _IEDocReadHTML($oIE)
    Return $sHTML

EndFunc

Func __IE_AllegroLogIn($oIE, $sDestinationUrl)
    Local Const $sAllegroLogoutLink = "https://allegro.pl.allegrosandbox.pl/logout.php"
    Local Const $sAuthLogin = "TestujemyTylko"
    Local COnst $sAuthPassword = "AllegroTest123"

    _IENavigate($oIE, $sAllegroLogoutLink)
    if @error then Exit MsgBox($MB_ICONERROR, @ScriptName, '_IENavigate($oIE, $link_AllegroLogout) error = ' & @error)

    _IENavigate($oIE, $sDestinationUrl)
    if @error then Exit MsgBox($MB_ICONERROR, @ScriptName, '_IENavigate($oIE, $destination_url) error = ' & @error)

    Local $sLocationUrl = _IEPropertyGet($oIE, "locationurl")
    if @error then Exit MsgBox($MB_ICONERROR, @ScriptName, '_IEPropertyGet($oIE, "locationurl") error = ' & @error)

    If StringInStr($sLocationUrl, "/login/") Then

        Local $oUserName = _IEGetObjByName($oIE, "username")
        if @error then Exit MsgBox($MB_ICONERROR, @ScriptName, '_IEGetObjByName($oIE, "username") error = ' & @error)
        __IESetValue_EndSpaceBackspace($oIE, $oUserName, $sAuthLogin)

        Local $oPassword = _IEGetObjByName($oIE, "password")
        if @error then Exit MsgBox($MB_ICONERROR, @ScriptName, '_IEGetObjByName($oIE, "password") error = ' & @error)
        __IESetValue_EndSpaceBackspace($oIE, $oPassword, $sAuthPassword)

        Local $oLogin = _IEGetObjById($oIE, "login-button")
        if @error then Exit MsgBox($MB_ICONERROR, @ScriptName, '_IEGetObjById($oIE, "login-button") error = ' & @error)
        _IEAction($oLogin, "click")
        if @error then Exit MsgBox($MB_ICONERROR, @ScriptName, '_IEAction($oLogin, "click") error = ' & @error)
        _IELoadWait($oIE)

    EndIf
EndFunc   ;==>AllegroZalogujIE

Func __IESetValue_EndSpaceBackspace($oIE, $object, $value) ;tricky function to insert text in the IE input element (e.g. for log in)
    Local $counter = 0
    Local $counter_limit = 10
    Local $hIE = _IEPropertyGet($oIE, "hwnd")
    while 1
        WinActivate($hIE)
        _IEAction($object, "focus")
        _IEAction($object, "selectall")
        _IEFormElementSetValue($object, $value)
        ControlSend(_IEPropertyGet($oIE, "hwnd"), "", "", "{END}{SPACE}")
        if $object.value == $value & " " then
            ControlSend(_IEPropertyGet($oIE, "hwnd"), "", "", "{END}{BACKSPACE}")
            if $object.value == $value then
                ExitLoop
            EndIf
        EndIf
        if $counter > $counter_limit then Exit MsgBox($MB_ICONERROR, @ScriptName, "_IESetValue_EndSpaceBackspace() failed (timeout). Exit")
        $counter += 1
    WEnd
EndFunc

PS: I tried to retrieve the data I need via WEB API but they do not have this function to do it.

Link to comment
Share on other sites

14 minutes ago, maniootek said:

I belive script must be re-written to use Google chrome instead

Either that or Firefox, since they don't support IE and the data isn't visible in the browser when you tried

15 minutes ago, maniootek said:

I tried to retrieve the data I need via WEB API but they do not have this function to do it

That's hard to believe. Maybe this is what you're looking for?

https://allegro.pl/webapi/documentation.php/theme/id,66

Link to comment
Share on other sites

21 minutes ago, Danp2 said:

Either that or Firefox, since they don't support IE and the data isn't visible in the browser when you tried

Never tried to automate Firefox or Google Chrome, any tips?

21 minutes ago, Danp2 said:

That's hard to believe. Maybe this is what you're looking for?

https://allegro.pl/webapi/documentation.php/theme/id,66

Belive me, I did enough research about it. They do not have it.

Link to comment
Share on other sites

@Danp2 Thank you for your advice. I tried WebDriver for the first time and I was able to reproduce the code:

#include "wd_core.au3"
#include "wd_helper.au3"

Global $sDesiredCapabilities

Example()

Func Example()

    SetupChrome()

    _WD_Startup()
    If @error <> $_WD_ERROR_Success Then Exit -1

    Local $sSession = _WD_CreateSession($sDesiredCapabilities)

    Local Const $sAuthLogin = "TestujemyTylko"
    Local Const $sAuthPassword = "AllegroTest123"

    _WD_Navigate($sSession, "https://allegro.pl.allegrosandbox.pl/moje-allegro/moje-konto/nadchodzace-oplaty")

    Local $sElement

    ;accept cookie policy
    $sElement = _WD_FindElement($sSession, $_WD_LOCATOR_ByXPath, "//button[text()='przejdź dalej']")
    _WD_ElementAction($sSession, $sElement, 'click')

    ;enter login
    $sElement = _WD_FindElement($sSession, $_WD_LOCATOR_ByXPath, "//input[@name='username']")
    _WD_ElementAction($sSession, $sElement, 'value', $sAuthLogin)

    ;enter password
    $sElement = _WD_FindElement($sSession, $_WD_LOCATOR_ByXPath, "//input[@name='password']")
    _WD_ElementAction($sSession, $sElement, 'value', $sAuthPassword)

    ;click "log in"
    ;$sElement = _WD_FindElement($sSession, $_WD_LOCATOR_ByXPath, "//span[text()='Zaloguj się']")
    $sElement = _WD_FindElement($sSession, $_WD_LOCATOR_ByXPath, "//button[@id='login-button']") ;--this works too
    _WD_ElementAction($sSession, $sElement, 'click')

    ;wait for fully page load
    Sleep(1000)
    _WD_LoadWait($sSession)
    ;_WD_WaitElement($sSession, $_WD_LOCATOR_ByXPath, "//div[@id='opbox-quote-billing']", 1*1000, 30*1000, True) ; --this works too

    ;load the main div which contains all required data
    $sElement = _WD_FindElement($sSession, $_WD_LOCATOR_ByXPath, "//div[@id='opbox-quote-billing']")

    ;read data
    Local $sText = _WD_ElementAction($sSession, $sElement, 'text')

    _WD_DeleteSession($sSession)
    _WD_Shutdown()

    ;show result
    MsgBox(0, @ScriptName, $sText)
EndFunc


Func SetupChrome()
    _WD_Option('Driver', 'chromedriver.exe')
    _WD_Option('Port', 9515)
    _WD_Option('DriverParams', '--log-path="' & @ScriptDir & '\chrome.log"')

    $sDesiredCapabilities = '{"capabilities": {"alwaysMatch": {"goog:chromeOptions": {"w3c": true }}}}'
EndFunc

 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...