Jump to content

Read data from html Tables from raw HTML source


Recommended Posts

  • 4 months later...

After seeing the function __HTML_Filter() in this topic by Stilgar (https://www.autoitscript.com/forum/topic/124330-_htmlau3-v101/) I thought I'd include that function also in this script.
the purpose of that function is to clean the extracted data from the table by those codes that are not visible in the browser but are visible as code "dirty" in the data when they are picked up from the table.

Updated the udf and the example script in first post.

To see the difference in the extracted data with or without the use of the HTML_Filter() function, just extract the table data from the example page by clicking on the "Preview array" button with the filter CheckBox "tags to entities" one time unchecked and then checked instead.

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Link to post
Share on other sites
  • 9 months later...

Hi Chimp!
Thanks a lot for your example - it saved me a lot work!
I had to parse a table with almost 1400 rows (and lots of rowspans) in an 1.5MB HTML file, and got some performance issues. Here is how I solved them:
First, I adapted the HTML tag position search in _ParseTags to search starting on the last tag found position, so StringInStr doesn't need to count thousands of "<tr" tags every iteration. Then, _ArraySort failed (too many rows...). So, to get the tag list pre-sorted, I search for the first opening and first closing tag. If the opening is before the closing, write to $aThisTagsPositions and find the next opening; if the closing is before the next opening, write to $aThisTagsPositions and find the next closing.

This made it possible to read that huge HTML file in less than 90 seconds.

Just replace the code on lines 208-216 with this:

Local $iNextOpenPosition = StringInStr($sHtml, $sOpening, 0, 1)
        Local $iNextClosePosition = StringInStr($sHtml, $sClosing, 0, 1)
        Local $iOpenCount = 1

        ; 2) find in the HTML the positions of the $sOpening <tag and $sClosing </tag> tags
        For $i = 1 To $iNrOfThisTag * 2 ;search all the opening and closing tags
            If ($iNextOpenPosition < $iNextClosePosition) And $iNextOpenPosition <> 0 Then
                $aThisTagsPositions[$i][0] = $iNextOpenPosition
                $aThisTagsPositions[$i][1] = $sOpening ; it marks which kind of tag is this
                $aThisTagsPositions[$i][2] = $iOpenCount; nr of this tag
                $iOpenCount += 1
                $iNextOpenPosition = StringInStr($sHtml, $sOpening, 0, 1, $aThisTagsPositions[$i][0] + 1)
            Else
                $aThisTagsPositions[$i][0] = $iNextClosePosition + StringLen($sClosing) - 1
                $aThisTagsPositions[$i][1] = $sClosing ; it marks which kind of tag is this
                $iNextClosePosition = StringInStr($sHtml, $sClosing, 0, 1, $aThisTagsPositions[$i][0] + 1)
            EndIf
        Next

 

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By mLipok
      Usually when I collect data from DataBase I need to give EndUser a possibility to select rows which should be taken in the processing loop.
      I was searching on the forum and I'm not able to find any UDF or even example of how to select data from array.
      I have my own solutions but I think they are not worth posting on the forum as it is very old code and I am looking for a better solution.

      Could anybody point me to some examples/solutions ?

      Thank you in advance.
      @mLipok
    • By jmp
      Hello.
      I have IETable,
      Get by this code
      $oTable = _IETableGetCollection ($oIE, 1) $aTableData = _IETableWriteToArray ($oTable) Local Const $iArrayNumberOfCols = UBound($aTableData, $UBOUND_COLUMNS) Local Const $iArrayNumberOfRows = UBound($aTableData, $UBOUND_ROWS) Local $aArraySubstringsRow[$iArrayNumberOfCols] ;~ Local $aExtract = _ArrayExtract ($aTableData, 1, 1, 1, -1) ;~ MsgBox(0, "", $iArrayNumberOfCols) ;~ _ArrayDisplay($aExtract) Local Const $iArrayRowIndex = 1 Local $sSubstring For $i = 0 To $iArrayNumberOfCols - 1     $sSubstring = StringLeft($aTableData[$iArrayRowIndex][$i], 2)     $aArraySubstringsRow[$i] = $sSubstring Next _ArrayDisplay($aArraySubstringsRow, "This is a row") and i want to use cell (52, 82, 18, 9,...10) one by one for selecting dropdown box in internet explorer.

      So, How to show/get/extract cell one by one (in msgbox)? 
    • By EmilyLove
      I have a string containing the full path of an executable and an array of executables without their paths. I am trying to compare the string to the list in the array and if a match is found, remove it from the array. The entry get removed from the array successfully, and after checking its return result, uses it to update the ubound if it succeeded, but it doesn't want to update to the new value. Any ideas what I am doing wrong? It acts like it is read-only.
      #include <Array.au3> #include <File.au3> Local $sApp_Exe = "F:\App\Nextcloud\nextcloud.exe" Local $aWaitForEXEX = [3, "Nextcloud.exe", "nextcloudcmd.exe", "QtWebEngineProcess.exe"] For $h = 1 To $aWaitForEXEX[0] If StringInStr($sApp_Exe, $aWaitForEXEX[$h]) <> 0 Then $iRet = _ArrayDelete($aWaitForEXEX, $h) If $iRet <> -1 Then $aWaitForEXEX[0] = $iRet ;this line doesn't work. $aWaitForEXEX[0] doesn't update and shortly gives Error: Array variable has incorrect number of subscripts or subscript dimension range exceeded.: _ArrayDisplay($aWaitForEXEX) EndIf Next  
    • By vinnyMS
      #Include <Array.au3> #include <Constants.au3> $s = FileRead("2.txt") Local $w = StringRegExp($s, '(?is)(\b\w+\b)(?!.*\b\1\b)', 3) _ArrayColInsert($w, 1) For $i = 0 to UBound($w)-1 StringRegExpReplace($s, '(?i)\b' & $w[$i][0] & '\b', $w[$i][0]) $w[$i][1] = @extended Next _ArraySort($w, 1, 0, 0, 1) _ArrayDisplay($w) i have this script that returns 3 columns  
       
      i need to copy the  Col 0 and Col 1 as text to paste on notepad or excel
      you will have to create a "copy" button if possible
      array.au3 2.txt
    • By Hermes
      Hi, I have a site that has the following elements below: 
      <div>More element here</div> <div>More element here</div> <div>More element here</div> When I do this in Auto It:
      Local $oSelectDiv = _WD_FindElement($sSession, $_WD_LOCATOR_ByCSSSelector, "div") _WD_HighlightElement($sSession, $oSelectDiv, 1) I also tried to add [3], but it doesnt seems to work:
      Local $oSelectDiv = _WD_FindElement($sSession, $_WD_LOCATOR_ByCSSSelector, "div[3]") _WD_HighlightElement($sSession, $oSelectDiv, 1) It always highlight the first one, but I am trying to highlight the 3rd in the list. Is there anyway to select the 3rd div without having to add any class/id in the divs, and without using XPATH? The structure of the elements in that site were built that way.
×
×
  • Create New...