Jump to content

How to Read Web page table data


MaoMao
 Share

Recommended Posts

Performing statistic for data from internet. The sources are format in tables.

Need help to read some spacific columns and made summation calculation.

Example: http://www.bts.gov/publications/national...rtation_statistics/html/table_

Used to save html pages then read to excel then back to Autoit array for processing.

However for large amount of data excel is too slow.

Try to find direct way to read web page table data to array for calculation.

How to read column 2009 to an array and calculate the sum. (i.e. the 9,671,776 vertically)

;$oIE = _IECreate($Url, 0, 0) ; Working but very slow
;$oIE = InetRead($Url) ; not getting the table??
;$oIE = _INetGetSource($Url)  ; fast but not getting the table
$oIE = _INetGetSource($Url, 'True')  ; fast but not getting the table
Edited by MaoMao
Link to comment
Share on other sites

Please check functions _IETableGetCollection and _IETableWriteToArray found in the IE UDF which comes with AutoIt.

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2022-02-19 - Version 1.6.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (NEW 2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to comment
Share on other sites

Thanks.

How come it is not displaying the table in array.

Any where to verify and work on specific column and rows?

#include <IE.au3>
#include <Array.au3>
$oIE = _IE_Example ("http://www.bts.gov/publications/national_transportation_statistics/html/table_01_37.html")
$oTable = _IETableGetCollection ($oIE, 0)
$aTableData = _IETableWriteToArray ($oTable)
$iNumTables = @extended
MsgBox(0, "Table Info", "There are " & $iNumTables & " tables on the page")
_ArrayDisplay($aTableData)
Link to comment
Share on other sites

Replace _IE_Example with _IECreate and give it a try.

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2022-02-19 - Version 1.6.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (NEW 2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to comment
Share on other sites

Something like that:

#include <IE.au3>
#include <Array.au3>
$oIE = _IECreate("http://www.bts.gov/publications/national_transportation_statistics/html/table_01_37.html", 0, 0)
$oTable = _IETableGetCollection ($oIE) ; Get number of tables
$iNumTables = @extended
MsgBox(0, "Table Info", "There are " & $iNumTables & " tables on the page")
$oTable = _IETableGetCollection ($oIE, $iNumTables-1) ; Get last table
$aTableData = _IETableWriteToArray ($oTable)
_ArrayDisplay($aTableData)

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2022-02-19 - Version 1.6.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (NEW 2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to comment
Share on other sites

Thank you.

For large site data it seems the IETableGetCollection able to array display 230 columns.

Are there maximum rows and columns the IETableGetCollection able to read adn display? (Help file seems didn't tell the limit)

Need to read around 1600 rows.

Are there also way to swap the rows and columns?

Link to comment
Share on other sites

_IETableWriteToArray

--------------------------------------------------------------------------------

Reads the contents of a Table into an array.

$o_object Object variable of an InternetExplorer.Application, Table object

$f_transpose [optional] Boolean value specifying whether to swap the rows and columns in the output array

Link to comment
Share on other sites

Yes.

The _IETableWriteToArray is able to write the table data to array. Seems there is 230 column limit when display.

Trying to write to Excel using the following to verify and output result but seems not writing any into Excel.

_ExcelWriteSheetFromArray($oExcel, $aTableData, 1, 1, 0, 0) (error writing to first row to 250 cell)

$oExcel = $aTableData ( didn't write any cell )

$oExcel.Range("A1") = $aTableData ( write A1 cell Only)

How to write the whole array to excel?

Edited by MaoMao
Link to comment
Share on other sites

An early post by Water seem solve the problem.

_ExcelWriteSheetFromArray($oExcel, $aTableData, 1, 1, 0, 0) and array display seems have difficulity writing cell with large columns.

$aArray2=$oExcel.transpose($aArray)

_ExcelWriteSheetFromArray($oExcel, $aTableData, 1, 1, 0, 0) and array display fine.

Now trying to find way to calculate the sum of a column.

Link to comment
Share on other sites

If you want to write it to the end of each column calculate the last used row, add 1 and then write a formula into the cell using _ExcelWriteFormula.

How to calculate the last row can be found

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2022-02-19 - Version 1.6.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (NEW 2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to comment
Share on other sites

Working with array may faster. But got problem in calculation.

Got the Web value in Array :

108,326,000

95,585,808

90,883,640

90,880,000

86,986,500

Sum is only calculating the first few digits before the "," ( comma )

For $i= 1 to 10;UBound($aArray2) -1

$Sum = $aArray2[$i][3]+ $Sum

Next

MsgBox(0, "Table Info", "There are Sub Total " & $Sum & " sum on the page")

Result = 108 + 95 + 90 +90 +86 = 469

Are there function to correct this?

Edited by MaoMao
Link to comment
Share on other sites

Yes, remove the "," and then change the string to an integer.

$a = Int(StringReplace("108,326,042", "," , ""))
ConsoleWrite($a & @CRLF)
Edited by water

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2022-02-19 - Version 1.6.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (NEW 2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to comment
Share on other sites

Good. Sum is calcualted.

The _IECreate is very slow. Not sure why open with IE for the url. But FireFox open very fast on the computer.

Tried alternative but not working.

;$oIE = _IECreate($Url, 0, 0) ; Working but very slow

;$oIE = InetRead($Url) ; not getting the table??

;$oIE = _INetGetSource($Url) ; fast but not getting the table

$oIE = _INetGetSource($Url, 'True') ; fast but not getting the table

Any advice?

Link to comment
Share on other sites

  • 8 months later...

I guys,

I need help for something related to your problems, as described in this post.

My problem is that the array was fullfilled with informations in the table cells, but what it's in thata array is not what is in the html code but what i see in the page in IE.

the difference is that in the IE Page the tabel cell contain "my pc is a..." and the html code contain "my pc is able to fly".

What was in the array is the first one, "my pc is a..." but i need the second one, "my pc is able to fly"...

Some help?

I think this is caused by a javascript code in html page, limiting the lenght of what to display in a cell... but how can this be bypassed?

thanks a lot...

Link to comment
Share on other sites

Stick to your own topic please.

UDF List:

 
_AdapterConnections()_AlwaysRun()_AppMon()_AppMonEx()_ArrayFilter/_ArrayReduce_BinaryBin()_CheckMsgBox()_CmdLineRaw()_ContextMenu()_ConvertLHWebColor()/_ConvertSHWebColor()_DesktopDimensions()_DisplayPassword()_DotNet_Load()/_DotNet_Unload()_Fibonacci()_FileCompare()_FileCompareContents()_FileNameByHandle()_FilePrefix/SRE()_FindInFile()_GetBackgroundColor()/_SetBackgroundColor()_GetConrolID()_GetCtrlClass()_GetDirectoryFormat()_GetDriveMediaType()_GetFilename()/_GetFilenameExt()_GetHardwareID()_GetIP()_GetIP_Country()_GetOSLanguage()_GetSavedSource()_GetStringSize()_GetSystemPaths()_GetURLImage()_GIFImage()_GoogleWeather()_GUICtrlCreateGroup()_GUICtrlListBox_CreateArray()_GUICtrlListView_CreateArray()_GUICtrlListView_SaveCSV()_GUICtrlListView_SaveHTML()_GUICtrlListView_SaveTxt()_GUICtrlListView_SaveXML()_GUICtrlMenu_Recent()_GUICtrlMenu_SetItemImage()_GUICtrlTreeView_CreateArray()_GUIDisable()_GUIImageList_SetIconFromHandle()_GUIRegisterMsg()_GUISetIcon()_Icon_Clear()/_Icon_Set()_IdleTime()_InetGet()_InetGetGUI()_InetGetProgress()_IPDetails()_IsFileOlder()_IsGUID()_IsHex()_IsPalindrome()_IsRegKey()_IsStringRegExp()_IsSystemDrive()_IsUPX()_IsValidType()_IsWebColor()_Language()_Log()_MicrosoftInternetConnectivity()_MSDNDataType()_PathFull/GetRelative/Split()_PathSplitEx()_PrintFromArray()_ProgressSetMarquee()_ReDim()_RockPaperScissors()/_RockPaperScissorsLizardSpock()_ScrollingCredits_SelfDelete()_SelfRename()_SelfUpdate()_SendTo()_ShellAll()_ShellFile()_ShellFolder()_SingletonHWID()_SingletonPID()_Startup()_StringCompact()_StringIsValid()_StringRegExpMetaCharacters()_StringReplaceWholeWord()_StringStripChars()_Temperature()_TrialPeriod()_UKToUSDate()/_USToUKDate()_WinAPI_Create_CTL_CODE()_WinAPI_CreateGUID()_WMIDateStringToDate()/_DateToWMIDateString()Au3 script parsingAutoIt SearchAutoIt3 PortableAutoIt3WrapperToPragmaAutoItWinGetTitle()/AutoItWinSetTitle()CodingDirToHTML5FileInstallrFileReadLastChars()GeoIP databaseGUI - Only Close ButtonGUI ExamplesGUICtrlDeleteImage()GUICtrlGetBkColor()GUICtrlGetStyle()GUIEventsGUIGetBkColor()Int_Parse() & Int_TryParse()IsISBN()LockFile()Mapping CtrlIDsOOP in AutoItParseHeadersToSciTE()PasswordValidPasteBinPosts Per DayPreExpandProtect GlobalsQueue()Resource UpdateResourcesExSciTE JumpSettings INISHELLHOOKShunting-YardSignature CreatorStack()Stopwatch()StringAddLF()/StringStripLF()StringEOLToCRLF()VSCROLLWM_COPYDATAMore Examples...

Updated: 22/04/2018

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...