Jump to content

Get HTML Page


Go to solution Solved by mikell,

Recommended Posts

I'm probably being real dense here but I'm trying to either get the HTML source or the page display of this URL:

https:// username:password@ website. com/x/admin/file.php?startDate=2023-01-04

The start date changes and file.php returns a display of multiple lines of comma separated data. I've tried several approaches and I keep getting nothing returned. I've resorted to loading a webpage with that url and then scrapping the data. Not very elegant, and a bit of a pain.

I'm guessing the file.php is my problem. If I use "Inetread" or "_InetGetSourc" I get an empty string. I've tried using the WinHTTP UDF but I'm not sure what I'm doing there.

#include <INet.au3>

    Local $sAddress = "https://username:password@website.com/x/admin/file.php?startDate=2023-01-04"
    Local $sHTMLSource = _InetGetSource($sAddress,1)
    
    ; The rest is simply for display the results
    
    MsgBox (262144, '', @extended)
    ; Create a simple GUI for output
    GUICreate("Event Test", 640, 480)
    Local $idGUIEdit = GUICtrlCreateEdit("The HTML source is:" & @CRLF & @CRLF & $sHTMLSource, 10, 10, 600, 400)
    GUISetState() ; Show GUI

    ; Waiting for user to close the window
    Local $iMsg
    While 1
        $iMsg = GUIGetMsg()
        If $iMsg = $GUI_EVENT_CLOSE Then ExitLoop
    WEnd

    GUIDelete()

I just need the right approach.

Tia,

John

Edited by Jos
Link to post
Share on other sites

Hi @major4579,

first of all, why do you paste the link "https://username:password@website.com/x/admin/file.php?startDate=2023-01-04" which would lead to
"https:// instanthousecall . com"?
Is this any kind of phishing or advertisement tryout or what's going on?

Best regards
Sven

Edited by Jos

Stay innovative!

Spoiler

🌍 Au3Forums

📊 AutoIt limits/defaults

💎 Code Katas: [...] (comming soon)

🎭 Collection of GitHub users with AutoIt projects

🐞 False-Positives

🔍 Forum search

🔮 Me on GitHub

💬 Opinion about new forum sub category

 📑 UDF wiki list

✂ VSCode-AutoItSnippets

📑 WebDriver FAQs

👨‍🏫 WebDriver Tutorial (coming soon)

Link to post
Share on other sites
  • Developers

@All ...  It was correct what @SOLVE-SMARTstated, but I have removed the links and am waiting for the OP to reply.

Jos

SciTE4AutoIt3 Full installer Download page   - Beta files       Read before posting     How to post scriptsource   Forum etiquette  Forum Rules 
 
Live for the present,
Dream of the future,
Learn from the past.
  :)

Link to post
Share on other sites

@SOLVE-SMART and @Jos

Yes I was trying to obscure the website, but instanthouscall.com is the actual website. They sell a remote access program that I use to support my clients. The purpose of the program I am trying to write is to get the time I've spent connected to each of my clients and transfer it into my billing program.

I had not realized that the original website was still there, my apologies. Webpage.com and username:password are simply placeholders.

@Danp2,

My understanding of INETGET() is to download a file. I'm trying to receive the output of a PHP process which is very different.

 

Edited by major4579
Link to post
Share on other sites

Alright @major4579,

thanks for the clarification 😀 . I don't understand what you mean by this:

On 1/11/2023 at 11:54 PM, major4579 said:

[...] I've resorted to loading a webpage with that url and then scrapping the data. Not very elegant, and a bit of a pain. [...]

What are you doing exactly (scrapping the data)? You load the URL and get what kind of information (of these file.php)? JSON data or a HTML structure?
With the help of WebDriver, you would do a similar thing (in case I understand you correctly). The advantage of WebDriver would be, that you can run it in the background (headless mode) and just get your data.

But how you doing your "scrapping" action? And how your data looks like? Please provide more information and example please 🤝 .

Best regards
Sven

Stay innovative!

Spoiler

🌍 Au3Forums

📊 AutoIt limits/defaults

💎 Code Katas: [...] (comming soon)

🎭 Collection of GitHub users with AutoIt projects

🐞 False-Positives

🔍 Forum search

🔮 Me on GitHub

💬 Opinion about new forum sub category

 📑 UDF wiki list

✂ VSCode-AutoItSnippets

📑 WebDriver FAQs

👨‍🏫 WebDriver Tutorial (coming soon)

Link to post
Share on other sites

@SOLVE-SMART

What the program I wrote does is open a browser with the link I showed above. The php file then displays the following:

display.png.024ab0789383daa6d3ef71d07cc15db9.png

The program pauses for me to click in the display. It then selects all (^a) and copies it (^c) and then processes the data as I want. As I said - not very elegant. And requires my input.

Ok so your saying I can do this with webdriver? Where do I find more info about webdriver and autoit? Is there a UDF for Webdriver or do I need to use the API interface?

Thanks,

John

Link to post
Share on other sites

I started looking at Webdriver and it seems to be a lot more complex then should be needed for my needs. But if this is the only way to get the output of a PHP file then I will study and try to implement it. There seem to be a lot of resources available.

So my question: is this the simplest and best way to go about doing what I'm doing as described in my previous message? I only need to load the entire page (HTML or screen copy) into a text variable. I have already written the code to process that.

Thanks again,

John

Link to post
Share on other sites

Okay @major4579,

before I try to guide you through WebDriver references and basics (if you can wait only few weeks more, I will be done with my Tutorial about au3WebDriver), I want to know which browser do you use and wheater this browser does have DevTools? [F12] should open the DevTools, at least in the most common browser like Chrome, Firefox etc.

I try to understand if a single cURL GET request would give you your expected data. No need for WebDriver. Here a small example what I mean:

Spoiler

devtools-response-view.thumb.gif.28b6d5caf37a955d6a13e8414013ea35.gif

Do you receive your CSV output or pure HTML/php code?

Best regards
Sven

Stay innovative!

Spoiler

🌍 Au3Forums

📊 AutoIt limits/defaults

💎 Code Katas: [...] (comming soon)

🎭 Collection of GitHub users with AutoIt projects

🐞 False-Positives

🔍 Forum search

🔮 Me on GitHub

💬 Opinion about new forum sub category

 📑 UDF wiki list

✂ VSCode-AutoItSnippets

📑 WebDriver FAQs

👨‍🏫 WebDriver Tutorial (coming soon)

Link to post
Share on other sites

I'm currently using Vivaldi based on Chromium, but I can use (and have) Chrome. F12 does open DevTools. 

Currently, with my scrapping the screen, I basically get CSV (because that's what I copy off the screen).  I have already written the  code to parse  pure HTML and it was pretty easy to use parts of that code parse the CSV data I am now getting.

Now I'm going to make things more complicated - I used to be able to do this with the WINHTTP UDF, but they upgraded the website security and that broke the way I was receiving the info. I was getting pure HTML code, but now I get nothing, i.e., no data returned. For your info here's the code that used to work. So I'm looking for an alternative way to get the data.

; $sAddress is the base URL
; $sDate is the starting date to retrieve the data
; $sUserName and $sPassword are login credentials
; $sHTML is a global text variable and is used to return the entire page

#include <WinHttp.au3>
#include <WinHttpConstants.au3>

$sForm = _
            '<form action="' & $sAddress & '" method="get">' & _
            '    <input name="startDate"/>' & _ ;
            '</form>'

    ; Initialize and get session handle
    $hOpen = _WinHttpOpen()

    $hConnect = $sForm ; will pass form as string so this is for coding correctness because $hConnect goes in byref

    ; Fill form
    $sHTML = _WinHttpSimpleFormFill($hConnect, $hOpen, _
            Default, _
            "name:startDate", $sDate, _
            "[CRED:" & $sUserName & "," & $sPassword & "]")

    If (@error) Then
        $nError = @error
        Return 0    
        EndIf

    ; Close handles
    _WinHttpCloseHandle($hConnect)
    _WinHttpCloseHandle($hOpen)

I will need to study your GIF to see if it's something I can use. Thank you.

John

Link to post
Share on other sites

It was only a example about the DevTools of Chrome and the Network tab.

One of the simplest (but not robust) way is your "scrapping" approach like you already do. You open your URL in your default browser, you wait for the page load and click by MouseClick() in your browser, do your select all ('^a') and copy ('^a') action and proceed with your data. This already works well, or am I wrong? What do I miss? I guess I still don't understand your requirements, sorry 😔 .

Of course this could (probably should) be done by a more robust way like the usage of WebDriver, but when your approach is fine, why changing?

Best regards
Sven

Stay innovative!

Spoiler

🌍 Au3Forums

📊 AutoIt limits/defaults

💎 Code Katas: [...] (comming soon)

🎭 Collection of GitHub users with AutoIt projects

🐞 False-Positives

🔍 Forum search

🔮 Me on GitHub

💬 Opinion about new forum sub category

 📑 UDF wiki list

✂ VSCode-AutoItSnippets

📑 WebDriver FAQs

👨‍🏫 WebDriver Tutorial (coming soon)

Link to post
Share on other sites

Because previously (using WINHTTP) I was able to select the date and then the info was displayed in a format that I coded for me to use.

Now I select the date then I have to wait for the browser to open, and display the results, then I click on the webpage and the program does the rest. Really, (1) I'm just lazy and impatient (2) this is what coding is supposed to do - program the menial tasks. And (3) I was used to the data being downloaded in the background, processed and displayed.

I use this program multiple times a week so yes I'd like to code the part that downloads the data.

Link to post
Share on other sites

One thing I've noticed in regards to Webdriver is that it's used to control a browser (unless my short look is not correct) . Since the PHP file is executed on the webserver, I would think that there would be a simple way to do this. But then again I haven't found it so that's why I'm here with my request.

 

Link to post
Share on other sites

@major4579I would suggest revisiting the WinHTTP method. In the code you posted you have this --

$sForm = _
            '<form action="' & $sAddress & '" method="get">' & _
            '    <input name="startDate"/>' & _ ;
            '</form>'

Forms typically are submitted via post, not get. Have you tried using method="post"?

Link to post
Share on other sites

"GET" had been working for years. But I did try "POST" and the results were the same. I did look a little deeper and  _WinHttpSimpleFormFill returned error =  4 - Connection problems. Same error as I've been getting all along. My guess is WinHTTP isn't handling the  HTTPS:. or the updated Certificate. 

I should have included this info before: WinHTTP is ver 1.6.4.1 and the file is dated 5/19/17.

Link to post
Share on other sites

@Danp2

I grabbed that newer version of WinHTTP and saved it in my Includes folder. But it didn't make a difference, still cannot retrieve any data.

@mikell

I'll grab a copy of curl and give that a try. In your $cmd line you have:

...$sUserName & ":" & $sPassword & " " & $url

Shouldn't this be:

...$sUserName & ":" & $sPassword & "@" & $url

with an "@" symbol before the $url?

Thanks!

 

Link to post
Share on other sites

@mikell

I got Curl to return a web page, but it's not the webpage I want. It doesn't have any of the info I want in it. The 2 things I can think of are: the credentials are not being passed properly -or- nccs.php isn't finishing before curl garbs the webpage.

I will play with it when I have time.

Thanks,

John

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...