Jump to content

InetGet was working previously, but not extracting full html


Recommended Posts

I have an AutoIT script It monitors 2 websites for content that applys to me and the services that I provide. One site is : www.Freelancer.com The other: www.PeoplePerHour.com Both sites publish new jobs on their site hourly or so. My AutoIT app, will view those sites and present new jobs to me in a grid that pops up on my screen. Lately, the app has stopped showing me any jobs from PeoplePerHour.

 

For freelancer.com,  Inetget is giving full html but for peopleperhour, now its not coming.

Func _CheckPPH()
    Local Static $hTimer = 0
    Local Static $hDownload = 0
    Local $aTitlesandUrls = 0
    Local Static $sTempFile = ""
    If $hTimer = 0 Then $hTimer = TimerInit()
    If $hDownload = 0 Then
        $sTempFile = _WinAPI_GetTempFileName(@TempDir)
        ConsoleWrite("Checking PPH..." & @CRLF)
        ConsoleWrite(">Downloading..." & @CRLF)
;~         $hDownload = InetGet("http://www.peopleperhour.com/freelance-jobs", $sTempFile, $INET_FORCERELOAD, $INET_DOWNLOADBACKGROUND)
        $hDownload = InetGet("http://www.peopleperhour.com/freelance-jobs", $sTempFile, $INET_FORCERELOAD)
;~         Return 0
    EndIf
;~     Sleep(30)
;~     Local $isCompleted = InetGetInfo($hDownload, $INET_DOWNLOADCOMPLETE)
;~     Local $isError = InetGetInfo($hDownload, $INET_DOWNLOADERROR)
;~     Sleep(30)
;~     If TimerDiff($hTimer) > 3000 And $isError Then
;~         ConsoleWrite("!PPH Fail" & @CRLF)
;~         InetClose($hDownload)
;~         $hDownload = 0
;~         Return 0
;~     EndIf
;~     Sleep(30)
    Local $Show = 0
;~     If TimerDiff($hTimer) > 3000 And $isCompleted Then
    If $hDownload > 0 Then
        ConsoleWrite("+Downloaded..." & @CRLF)
        Local $sPPHHtml = FileRead($sTempFile)
        $aTitlesandUrls = _StringBetween($sPPHHtml, '"title">' & @LF, 'time>')
;~         _ArrayDisplay($aTitlesandUrls)
        Local $aPPH[0][4]
        Local $sTitle = ""
        Local $sUrl = ""
        Local $sID = ""
        Local $sDate = ""
        Local $iRet=0
        Sleep(30)
        For $i = 0 To UBound($aTitlesandUrls) - 1
            $sTitle = _StringBetween($aTitlesandUrls[$i], '<a title="', '" class')
            $sUrl = _StringBetween($aTitlesandUrls[$i], 'href="', '">')
            $sDate = _GetDate($aTitlesandUrls[$i])
            If IsArray($sTitle) And IsArray($sUrl) Then
                $sID = _GetID($sUrl[0])
;~                 _ArrayAdd($aPPH, $sDate & "|" & $sTitle[0] & "|" & $sUrl[0] & "|" & $sID)
                $iRet = _BuildPopupsPPH($sID, $sDate, "PPH: " & $sTitle[0], $sUrl[0])
                If $iRet Then $Show+=1
            EndIf
        Next

        Sleep(30)
;~         If $Show > 0 Then ShowLatestJobs()
;~         _ArrayDisplay($aPPH)
        FileDelete($sTempFile)
        InetClose($hDownload)
        $hDownload = 0
        $hTimer = 0
        Return $Show
    EndIf
    Sleep(30)
EndFunc   ;==>_CheckPPH

Link to post
Share on other sites

Is this topic related to your previous topic?  If so, why did you start another topic?  Also, why didn't you answer my question in the previous topic?  Is it because you knew that harvesting data from the sites that you referred to above is prohibited by their terms of use which would also mean that helping you to do so here would be prohibited?

 

Edited by TheXman
Typo
Link to post
Share on other sites
9 minutes ago, Jahar said:

For previous one, you have asked me to go thru scripts given as examples.

No, I asked you why were asking for help to access a non-existent domain.

 

 

Link to post
Share on other sites
  • Moderators

@Jahar As stated above (and in the other thread) both sites you specify have verbiage in their TOS that states scraping or crawling of their site pages is not permitted. Case closed, please do not open another thread on this topic.

Edited by JLogan3o13

"Profanity is the last vestige of the feeble mind. For the man who cannot express himself forcibly through intellect must do so through shock and awe" - Spencer W. Kimball

How to get your question answered on this forum!

Link to post
Share on other sites
Guest
This topic is now closed to further replies.
  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By rudi
      Hello,
      is there a way to use inetget() to catch the content of an 404 error page returned by the web server?
       
      $URL="https://www.autoitscript.com/ThisPathDoesntExist" $content=InetGet($url,"c:\temp\xxx.html",1+2) ConsoleWrite('@@ Debug(' & @ScriptLineNumber & ') : $content = "' & $content & """" & @CRLF & "@Extended: """ & @extended & """" & @CRLF & '>Error code: ' & @error & @CRLF) ;### Debug Console  
      >"C:\Program Files (x86)\AutoIt3\SciTE\..\AutoIt3.exe" "C:\Program Files (x86)\AutoIt3\SciTE\AutoIt3Wrapper\AutoIt3Wrapper.au3" /run /prod /ErrorStdOut /in "C:\temp\löschmich\xxx.au3" /UserParams +>15:27:05 Starting AutoIt3Wrapper v.18.708.1148.0 SciTE v.4.1.0.0 Keyboard:00000407 OS:WIN_10/ CPU:X64 OS:X64 Environment(Language:0407) CodePage:0 utf8.auto.check:4 +> SciTEDir => C:\Program Files (x86)\AutoIt3\SciTE UserDir => C:\Users\admin.AD\AppData\Local\AutoIt v3\SciTE\AutoIt3Wrapper SCITE_USERHOME => C:\Users\admin.AD\AppData\Local\AutoIt v3\SciTE >Running AU3Check (3.3.14.5) from:C:\Program Files (x86)\AutoIt3 input:C:\temp\löschmich\xxx.au3 +>15:27:05 AU3Check ended.rc:0 >Running:(3.3.14.5):C:\Program Files (x86)\AutoIt3\autoit3.exe "C:\temp\löschmich\xxx.au3" --> Press Ctrl+Alt+Break to Restart or Ctrl+Break to Stop @@ Debug(6) : $content = "0" @Extended: "0" >Error code: 13 +>15:27:05 AutoIt3.exe ended.rc:0 +>15:27:05 AutoIt3Wrapper Finished. >Exit code: 0 Time: 0.9361  
      The browser (I use Chrome) is displaying this 404 page: (That's what I'd like to catch)
      Not Found The requested URL /ThisPathDoesntExist was not found on this server. html code (Browser ctrl+u):
      <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>404 Not Found</title> </head><body> <h1>Not Found</h1> <p>The requested URL /ThisPathDoesntExist was not found on this server.</p> </body></html>  
      Wireshark response 404 packet:
      Hypertext Transfer Protocol HTTP/1.1 404 Not Found\r\n Server: nginx\r\n Date: Wed, 06 Apr 2022 13:34:26 GMT\r\n Content-Type: text/html; charset=iso-8859-1\r\n Content-Length: 217\r\n Connection: keep-alive\r\n Vary: Accept-Encoding\r\n \r\n [HTTP response 1/1] [Time since request: 0.056074000 seconds] [Request in frame: 1476] [Request URI: http://www.autoitscript.com/ThisPathDoesntExist] File Data: 217 bytes Line-based text data: text/html (7 lines) <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>404 Not Found</title> </head><body> <h1>Not Found</h1> <p>The requested URL /ThisPathDoesntExist was not found on this server.</p> </body></html> any suggestions appreciated,
      <edit> also tried _inetgetsource() and inetread() </edit>
      Rudi
    • By Zaoka
      HI,
      for couple of years I'm using  Jos script for sending reports, email with excel attachment. But from last week i'm getting this error when sending excel or word attachment
      message has lines too long for transport jpeg, pdf works with no problems, any sugestion ?
    • By diepfeile
      I'm using the following:
      Autoit 3.3.14.5
      newly installed Beta 3.3.15.5
      SQlite version 3380000 aka 3.38.0
      I put sqlite3.dll and sqlite3_x64.dll in C:\Windows\System32 since many scripts depend on them.


      I extended the output of _SQLite_Startup()
      with:
      ConsoleWrite("@AutoItX64 " & @AutoItX64 & @CRLF) ConsoleWrite("$sDll_Filename " & $sDll_Filename & @CRLF) ConsoleWrite("_SQLite_LibVersion=" & _SQLite_LibVersion() & @CRLF)

      Also using the script from https://www.autoitscript.com/autoit3/docs/libfunctions/_SQLite_Startup.htm for testing.

       
      >Running:(3.3.14.5):C:\Program Files (x86)\AutoIt3\autoit3.exe "R:\Download\aasdf.au3" @AutoItX64 0 $sDll_Filename sqlite3.dll _SQLite_LibVersion=0 >Running:(3.3.14.5):C:\Program Files (x86)\AutoIt3\autoit3_x64.exe "R:\Download\aasdf.au3" @AutoItX64 1 $sDll_Filename sqlite3_x64.dll _SQLite_LibVersion=3.38.0 >Running:(3.3.15.5):C:\Program Files (x86)\AutoIt3\Beta\autoit3.exe "R:\Download\aasdf.au3" @AutoItX64 0 $sDll_Filename sqlite3.dll _SQLite_LibVersion=0 >Running:(3.3.15.5):C:\Program Files (x86)\AutoIt3\Beta\autoit3_x64.exe "R:\Download\aasdf.au3" @AutoItX64 1 $sDll_Filename sqlite3_x64.dll _SQLite_LibVersion=3.38.0


      Why doesn't it work in 32bit, despite me having the 32bit sqlite.dll? Autoit urges running scripts in 32bit mode and Scite starts scripts just in 32bit mode without the flag?
      With #AutoIt3Wrapper_UseX64=Y it just works, both normal Autoit and beta!
      sqlite3.dll sqlite3_x64.dll
    • By Iraj
      Hello Team, Greetings!
      Is there any way to run any 3rd party application silently in background without the GUI getting in the front ?
      I was implementing angry-ip scanner with autoit & wanted the angry-ip application to run in background quietly as I am copying its output to other file on completion. is there any way to achieve my query?
      Below is code I tried: 
      $range = "192.168.0.1 192.168.0.255" ShellExecuteWait("C:\Windows\DDM\ipscan.exe","-f:range "&$iprange&" -q -o C:\temp\ScanResults.csv","","open",@SW_HIDE) Thanks!
    • By PeterVerbeek
      This topic give you access to an AutoIt functions library I maintain which is called PAL, Peter's AutoIt Library. The latest version 1.26 contains 214 functions divided into these topics:
      window, desktop and monitor GUI, mouse and color GUI controls including graphical buttons (jpg, png) logics and mathematics include constants string, xml string and file string dialogues and progress bars data lists: lists, stacks, shift registers and key maps (a.ka. dictionaries) miscellaneous: logging/debugging, process and system info Change log and files section  on the PAL website (SourceForge).
      A lot of these functions were created in the development of Peace, Peter's Equalizer APO Configuration Extension, which is a user interface for the system-wide audio driver called Equalizer APO.
×
×
  • Create New...