Sign in to follow this  
Followers 0
MaoMao

How to Read Web page table data

17 posts in this topic

#1 ·  Posted (edited)

Performing statistic for data from internet. The sources are format in tables.

Need help to read some spacific columns and made summation calculation.

Example: http://www.bts.gov/publications/national...rtation_statistics/html/table_

Used to save html pages then read to excel then back to Autoit array for processing.

However for large amount of data excel is too slow.

Try to find direct way to read web page table data to array for calculation.

How to read column 2009 to an array and calculate the sum. (i.e. the 9,671,776 vertically)

;$oIE = _IECreate($Url, 0, 0) ; Working but very slow
;$oIE = InetRead($Url) ; not getting the table??
;$oIE = _INetGetSource($Url)  ; fast but not getting the table
$oIE = _INetGetSource($Url, 'True')  ; fast but not getting the table
Edited by MaoMao

Share this post


Link to post
Share on other sites



Please check functions _IETableGetCollection and _IETableWriteToArray found in the IE UDF which comes with AutoIt.


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

Thanks.

How come it is not displaying the table in array.

Any where to verify and work on specific column and rows?

#include <IE.au3>
#include <Array.au3>
$oIE = _IE_Example ("http://www.bts.gov/publications/national_transportation_statistics/html/table_01_37.html")
$oTable = _IETableGetCollection ($oIE, 0)
$aTableData = _IETableWriteToArray ($oTable)
$iNumTables = @extended
MsgBox(0, "Table Info", "There are " & $iNumTables & " tables on the page")
_ArrayDisplay($aTableData)

Share this post


Link to post
Share on other sites

Replace _IE_Example with _IECreate and give it a try.


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

OK. Get the table.

For web site with multiple tables, it is only getting the top table.

Any trick to make it collect the bottom table?

Share this post


Link to post
Share on other sites

Something like that:

#include <IE.au3>
#include <Array.au3>
$oIE = _IECreate("http://www.bts.gov/publications/national_transportation_statistics/html/table_01_37.html", 0, 0)
$oTable = _IETableGetCollection ($oIE) ; Get number of tables
$iNumTables = @extended
MsgBox(0, "Table Info", "There are " & $iNumTables & " tables on the page")
$oTable = _IETableGetCollection ($oIE, $iNumTables-1) ; Get last table
$aTableData = _IETableWriteToArray ($oTable)
_ArrayDisplay($aTableData)

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

Thank you.

For large site data it seems the IETableGetCollection able to array display 230 columns.

Are there maximum rows and columns the IETableGetCollection able to read adn display? (Help file seems didn't tell the limit)

Need to read around 1600 rows.

Are there also way to swap the rows and columns?

Share this post


Link to post
Share on other sites

_IETableWriteToArray

--------------------------------------------------------------------------------

Reads the contents of a Table into an array.

$o_object Object variable of an InternetExplorer.Application, Table object

$f_transpose [optional] Boolean value specifying whether to swap the rows and columns in the output array

Share this post


Link to post
Share on other sites

#9 ·  Posted (edited)

Yes.

The _IETableWriteToArray is able to write the table data to array. Seems there is 230 column limit when display.

Trying to write to Excel using the following to verify and output result but seems not writing any into Excel.

_ExcelWriteSheetFromArray($oExcel, $aTableData, 1, 1, 0, 0) (error writing to first row to 250 cell)

$oExcel = $aTableData ( didn't write any cell )

$oExcel.Range("A1") = $aTableData ( write A1 cell Only)

How to write the whole array to excel?

Edited by MaoMao

Share this post


Link to post
Share on other sites

An early post by Water seem solve the problem.

_ExcelWriteSheetFromArray($oExcel, $aTableData, 1, 1, 0, 0) and array display seems have difficulity writing cell with large columns.

$aArray2=$oExcel.transpose($aArray)

_ExcelWriteSheetFromArray($oExcel, $aTableData, 1, 1, 0, 0) and array display fine.

Now trying to find way to calculate the sum of a column.

Share this post


Link to post
Share on other sites

If you want to write it to the end of each column calculate the last used row, add 1 and then write a formula into the cell using _ExcelWriteFormula.

How to calculate the last row can be found


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

#12 ·  Posted (edited)

Working with array may faster. But got problem in calculation.

Got the Web value in Array :

108,326,000

95,585,808

90,883,640

90,880,000

86,986,500

Sum is only calculating the first few digits before the "," ( comma )

For $i= 1 to 10;UBound($aArray2) -1

$Sum = $aArray2[$i][3]+ $Sum

Next

MsgBox(0, "Table Info", "There are Sub Total " & $Sum & " sum on the page")

Result = 108 + 95 + 90 +90 +86 = 469

Are there function to correct this?

Edited by MaoMao

Share this post


Link to post
Share on other sites

#13 ·  Posted (edited)

Yes, remove the "," and then change the string to an integer.

$a = Int(StringReplace("108,326,042", "," , ""))
ConsoleWrite($a & @CRLF)
Edited by water

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

Good. Sum is calcualted.

The _IECreate is very slow. Not sure why open with IE for the url. But FireFox open very fast on the computer.

Tried alternative but not working.

;$oIE = _IECreate($Url, 0, 0) ; Working but very slow

;$oIE = InetRead($Url) ; not getting the table??

;$oIE = _INetGetSource($Url) ; fast but not getting the table

$oIE = _INetGetSource($Url, 'True') ; fast but not getting the table

Any advice?

Share this post


Link to post
Share on other sites

I guys,

I need help for something related to your problems, as described in this post.

My problem is that the array was fullfilled with informations in the table cells, but what it's in thata array is not what is in the html code but what i see in the page in IE.

the difference is that in the IE Page the tabel cell contain "my pc is a..." and the html code contain "my pc is able to fly".

What was in the array is the first one, "my pc is a..." but i need the second one, "my pc is able to fly"...

Some help?

I think this is caused by a javascript code in html page, limiting the lenght of what to display in a cell... but how can this be bypassed?

thanks a lot...

Share this post


Link to post
Share on other sites

Stick to your own topic please.


_AdapterConnections()_AlwaysRun()_AppMon()_AppMonEx()_BinaryBin()_CheckMsgBox()_CmdLineRaw()_ContextMenu()_ConvertLHWebColor()/_ConvertSHWebColor()_DesktopDimensions()_DisplayPassword()_DotNet_Load()/_DotNet_Unload()_Fibonacci()_FileCompare()_FileCompareContents()_FileNameByHandle()_FilePrefix/SRE()_FindInFile()_GetBackgroundColor()/_SetBackgroundColor()_GetConrolID()_GetCtrlClass()_GetDirectoryFormat()_GetDriveMediaType()_GetFilename()/_GetFilenameExt()_GetHardwareID()_GetIP()_GetIP_Country()_GetOSLanguage()_GetSavedSource()_GetStringSize()_GetSystemPaths()_GetURLImage()_GIFImage()_GoogleWeather()_GUICtrlCreateGroup()_GUICtrlListBox_CreateArray()_GUICtrlListView_CreateArray()_GUICtrlListView_SaveCSV()_GUICtrlListView_SaveHTML()_GUICtrlListView_SaveTxt()_GUICtrlListView_SaveXML()_GUICtrlMenu_Recent()_GUICtrlMenu_SetItemImage()_GUICtrlTreeView_CreateArray()_GUIDisable()_GUIImageList_SetIconFromHandle()_GUIRegisterMsg()_GUISetIcon()_Icon_Clear()/_Icon_Set()_IdleTime()_InetGet()_InetGetGUI()_InetGetProgress()_IPDetails()_IsFileOlder()_IsGUID()_IsHex()_IsPalindrome()_IsRegKey()_IsStringRegExp()_IsSystemDrive()_IsUPX()_IsValidType()_IsWebColor()_Language()_Log()_MicrosoftInternetConnectivity()_MSDNDataType()_PathFull/GetRelative/Split()_PathSplitEx()_PrintFromArray()_ProgressSetMarquee()_ReDim()_RockPaperScissors()/_RockPaperScissorsLizardSpock()_ScrollingCredits_SelfDelete()_SelfRename()_SelfUpdate()_SendTo()_ShellAll()_ShellFile()_ShellFolder()_SingletonHWID()_SingletonPID()_Startup()_StringCompact()_StringIsValid()_StringRegExpMetaCharacters()_StringReplaceWholeWord()_StringStripChars()_Temperature()_TrialPeriod()_UKToUSDate()/_USToUKDate()_WinAPI_Create_CTL_CODE()_WinAPI_CreateGUID()_WMIDateStringToDate()/_DateToWMIDateString()Au3 script parsingAutoIt SearchAutoIt3 PortableAutoIt3WrapperToPragmaAutoItWinGetTitle()/AutoItWinSetTitle()CodingDirToHTML5FileInstallrFileReadLastChars()GeoIP databaseGUI - Only Close ButtonGUI ExamplesGUICtrlDeleteImage()GUICtrlGetBkColor()GUICtrlGetStyle()GUIEventsGUIGetBkColor()Int_Parse() & Int_TryParse()IsISBN()LockFile()Mapping CtrlIDsOOP in AutoItParseHeadersToSciTE()PasswordValidPasteBinPosts Per DayPreExpandProtect GlobalsQueue()Resource UpdateResourcesExSciTE JumpSettings INISHELLHOOKShunting-YardSignature CreatorStack()Stopwatch()StringAddLF()/StringStripLF()StringEOLToCRLF()VSCROLLWM_COPYDATAMore Examples...

Updated: 04/09/2015

Share this post


Link to post
Share on other sites

Have you tried requesting the excel file from the statistics office? If this is your country they ought to be able to help you...

Just an optimistic thought ;)

regards

Dan

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0