Jump to content

_InetGetSource stops workings after 300 calls?


Recommended Posts

I know this sounds odd, but I couldn't think of anything else.

I made a script that enters on a site and gets some info from this page. There are 92000 pages in total. After getting the data from exactly 300 pages, the script stops getting the data.

Here is the code:

#include <Excel.au3>
#include <String.au3>
#include <Array.au3>
#include <INet.au3>

$excel_read = _ExcelBookOpen("C:\Users\Macedo\Desktop\correct_links.xlsx")
$excel_write = _ExcelBookOpen("C:\Users\Macedo\Desktop\info.xlsx")

$line = 1
$name = ""

While $line < 92480
    $HTML = _INetGetSource(_ExcelReadCell($excel_read,$line,1))
    $name = _StringBetween($HTML,'<title>',',')
    If @error == 1 Then
        $name = "Unavailable"
        SetError(0)
    Else
        $name = StringStripWS($name[0],4)
    EndIf

    _ExcelWriteCell($excel_write,$name,$line+1,1)

    $line = $line + 1
WEnd

After 300 pages, $name is always set to "Unavailable", which means @error == 1.

First I thought there was something special with page number 300, but that's not it. If I start getting info from, let's say, page 250, the script will work till page 550. :)

Any ideas guys? Thanks in advance!

Edited by pedmacedo
Link to comment
Share on other sites

  • Developers

The @error test is for _stringBetween() not _InetGetSource.

So have you checked what _InetGetSource() is returning when it goes wrong and what its error is?

Edited by Jos

SciTE4AutoIt3 Full installer Download page   - Beta files       Read before posting     How to post scriptsource   Forum etiquette  Forum Rules 
 
Live for the present,
Dream of the future,
Learn from the past.
  :)

Link to comment
Share on other sites

The @error test is for _stringBetween() not _InetGetSource.

So have you checked what _InetGetSource() is returning when it goes wrong and what its error is?

I will try to do that, thanks for the suggestion.

I'd hazzard a guess that the website is putting the blockers on you after that many downloads, after all you begin to look like a bot.

I hope that's not the case :) I will try using IE UDF just to confirm this hypothesis.
Link to comment
Share on other sites

I hope that's not the case I will try using IE UDF just to confirm this hypothesis.

That won't Fix the problem, if you "Ping" a Website 300 times in 10 seconds of course it will look suspicious! Edited by guinness

UDF List:

 
_AdapterConnections()_AlwaysRun()_AppMon()_AppMonEx()_ArrayFilter/_ArrayReduce_BinaryBin()_CheckMsgBox()_CmdLineRaw()_ContextMenu()_ConvertLHWebColor()/_ConvertSHWebColor()_DesktopDimensions()_DisplayPassword()_DotNet_Load()/_DotNet_Unload()_Fibonacci()_FileCompare()_FileCompareContents()_FileNameByHandle()_FilePrefix/SRE()_FindInFile()_GetBackgroundColor()/_SetBackgroundColor()_GetConrolID()_GetCtrlClass()_GetDirectoryFormat()_GetDriveMediaType()_GetFilename()/_GetFilenameExt()_GetHardwareID()_GetIP()_GetIP_Country()_GetOSLanguage()_GetSavedSource()_GetStringSize()_GetSystemPaths()_GetURLImage()_GIFImage()_GoogleWeather()_GUICtrlCreateGroup()_GUICtrlListBox_CreateArray()_GUICtrlListView_CreateArray()_GUICtrlListView_SaveCSV()_GUICtrlListView_SaveHTML()_GUICtrlListView_SaveTxt()_GUICtrlListView_SaveXML()_GUICtrlMenu_Recent()_GUICtrlMenu_SetItemImage()_GUICtrlTreeView_CreateArray()_GUIDisable()_GUIImageList_SetIconFromHandle()_GUIRegisterMsg()_GUISetIcon()_Icon_Clear()/_Icon_Set()_IdleTime()_InetGet()_InetGetGUI()_InetGetProgress()_IPDetails()_IsFileOlder()_IsGUID()_IsHex()_IsPalindrome()_IsRegKey()_IsStringRegExp()_IsSystemDrive()_IsUPX()_IsValidType()_IsWebColor()_Language()_Log()_MicrosoftInternetConnectivity()_MSDNDataType()_PathFull/GetRelative/Split()_PathSplitEx()_PrintFromArray()_ProgressSetMarquee()_ReDim()_RockPaperScissors()/_RockPaperScissorsLizardSpock()_ScrollingCredits_SelfDelete()_SelfRename()_SelfUpdate()_SendTo()_ShellAll()_ShellFile()_ShellFolder()_SingletonHWID()_SingletonPID()_Startup()_StringCompact()_StringIsValid()_StringRegExpMetaCharacters()_StringReplaceWholeWord()_StringStripChars()_Temperature()_TrialPeriod()_UKToUSDate()/_USToUKDate()_WinAPI_Create_CTL_CODE()_WinAPI_CreateGUID()_WMIDateStringToDate()/_DateToWMIDateString()Au3 script parsingAutoIt SearchAutoIt3 PortableAutoIt3WrapperToPragmaAutoItWinGetTitle()/AutoItWinSetTitle()CodingDirToHTML5FileInstallrFileReadLastChars()GeoIP databaseGUI - Only Close ButtonGUI ExamplesGUICtrlDeleteImage()GUICtrlGetBkColor()GUICtrlGetStyle()GUIEventsGUIGetBkColor()Int_Parse() & Int_TryParse()IsISBN()LockFile()Mapping CtrlIDsOOP in AutoItParseHeadersToSciTE()PasswordValidPasteBinPosts Per DayPreExpandProtect GlobalsQueue()Resource UpdateResourcesExSciTE JumpSettings INISHELLHOOKShunting-YardSignature CreatorStack()Stopwatch()StringAddLF()/StringStripLF()StringEOLToCRLF()VSCROLLWM_COPYDATAMore Examples...

Updated: 22/04/2018

Link to comment
Share on other sites

It could be that AutoIt or IE is getting out of allowed FD or other kind of handle. To get rid of a possible issue with _Excel stuf, could you first load an array with the Excel names then change your _InetGetSource line to supply the name from the array. Then if it still coughs after 300 calls, we're sure the issue is with _Inet* not _Excel* supplying wrong data (unlikely but not 100% excluded).

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

pedmacedo,

_inetgetsource is just running inetread. I've used inetread to iterate over 1000+ pages on occasion. I don't know what the Excel function is but would hazard a guess that the server is throttling traffic.

kylomas

Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Link to comment
Share on other sites

That has been suggested before and is most likely what's happening. I use _InetRead or friends routinely but I'm unsure if I reach 300 downloads in a single run, that's why I was attempting to simplify hings a bit more.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

jhcd,

yes, had an invalid premise that lead to bad conclusions a couple hours ago. Guinness corrected the code and reminded me that "assumptions make an ass out of you and me".

kylomas

Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Link to comment
Share on other sites

Something similar happend to me a while back while trying to automate a free webservice.

Tried to run a task to many times and it failed after a while, when I physically navigated to the site, I was met with a blank page, so I figured it was IP related.

It may well be something similar, I think they call it hammering.

AutoIt Absolute Beginners    Require a serial    Pause Script    Video Tutorials by Morthawt   ipify 

Monkey's are, like, natures humans.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...