Jump to content
Chimp

Read data from html Tables from raw HTML source

Recommended Posts

Chimp

After seeing the function __HTML_Filter() in this topic by Stilgar (https://www.autoitscript.com/forum/topic/124330-_htmlau3-v101/) I thought I'd include that function also in this script.
the purpose of that function is to clean the extracted data from the table by those codes that are not visible in the browser but are visible as code "dirty" in the data when they are picked up from the table.

Updated the udf and the example script in first post.

To see the difference in the extracted data with or without the use of the HTML_Filter() function, just extract the table data from the example page by clicking on the "Preview array" button with the filter CheckBox "tags to entities" one time unchecked and then checked instead.


small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Share this post


Link to post
Share on other sites
barbossa

Hi Chimp!
Thanks a lot for your example - it saved me a lot work!
I had to parse a table with almost 1400 rows (and lots of rowspans) in an 1.5MB HTML file, and got some performance issues. Here is how I solved them:
First, I adapted the HTML tag position search in _ParseTags to search starting on the last tag found position, so StringInStr doesn't need to count thousands of "<tr" tags every iteration. Then, _ArraySort failed (too many rows...). So, to get the tag list pre-sorted, I search for the first opening and first closing tag. If the opening is before the closing, write to $aThisTagsPositions and find the next opening; if the closing is before the next opening, write to $aThisTagsPositions and find the next closing.

This made it possible to read that huge HTML file in less than 90 seconds.

Just replace the code on lines 208-216 with this:

Local $iNextOpenPosition = StringInStr($sHtml, $sOpening, 0, 1)
        Local $iNextClosePosition = StringInStr($sHtml, $sClosing, 0, 1)
        Local $iOpenCount = 1

        ; 2) find in the HTML the positions of the $sOpening <tag and $sClosing </tag> tags
        For $i = 1 To $iNrOfThisTag * 2 ;search all the opening and closing tags
            If ($iNextOpenPosition < $iNextClosePosition) And $iNextOpenPosition <> 0 Then
                $aThisTagsPositions[$i][0] = $iNextOpenPosition
                $aThisTagsPositions[$i][1] = $sOpening ; it marks which kind of tag is this
                $aThisTagsPositions[$i][2] = $iOpenCount; nr of this tag
                $iOpenCount += 1
                $iNextOpenPosition = StringInStr($sHtml, $sOpening, 0, 1, $aThisTagsPositions[$i][0] + 1)
            Else
                $aThisTagsPositions[$i][0] = $iNextClosePosition + StringLen($sClosing) - 1
                $aThisTagsPositions[$i][1] = $sClosing ; it marks which kind of tag is this
                $iNextClosePosition = StringInStr($sHtml, $sClosing, 0, 1, $aThisTagsPositions[$i][0] + 1)
            EndIf
        Next

 

  • Like 1

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Similar Content

    • Skeletor
      By Skeletor
      Hi Virtual People,
      My array works perfectly fine. However, what is the best practice if the line in the array doesn't have the correct amount of columns and if I can add a placeholder?

       
      For $count = 1 To _FileCountLines($FileRead1) Step 1 $string = FileReadLine($FileRead1, $count) $input = StringSplit($string, ",", 1) $value1 = $input[1] $value2 = $input[2] $value3 = $input[3] _Excel_RangeWrite($oWorkbook, $oWorkbook.Activesheet, $value2, "A1") _Excel_RangeWrite($oWorkbook, $oWorkbook.Activesheet, $value1, "B1") _Excel_RangeWrite($oWorkbook, $oWorkbook.Activesheet, $value3, "C1") Next  
    • MrCheese
      By MrCheese
      hi all,
      reviewing the forum, this thread is applicable: 
       
       
      I wanted to know if there is now a better way to do this?
      In essence, I load a tab delimited txt file into an array (works well). I used tab, as some fields in the original csv contains commas.
      However, I needed autoit to manipulate this array, and output it as a csv.
      IF my array contains items with a comma, without double quotes around the field, then how best do I get a csv out of this?
      My current workaround is to filewritefromarray tab delimited, then open it in excel and save as a csv. I will need to check this to see how the address fields behave that contain a comma.
       
      Any thoughts would be appreciated.
       
    • Skeletor
      By Skeletor
      Hi All,

      I would like to know how you would take a FileLineRead and insert it into an array which then inserts it into Excel?
      One thing to know is the files content is broken up, so I only use half of the content within $FileRead1.
      So its imperative that the $value1, $value2, etc variables be used. 
      Code below:
      $FileRead1 = FileReadLine("C:\temp\sample.txt",1) For $count = 1 To _FileCountLines($FileRead1) Step 1 $string = FileReadLine($FileRead1, $count) $input = StringSplit($string, ",", 1) $value1 = $input[1] $value2 = $input[2] $value3 = $input[3] $value4 = $input[4] _Excel_RangeWrite($oWorkbook, $oWorkbook.Activesheet, $value1, "A1") _Excel_RangeWrite($oWorkbook, $oWorkbook.Activesheet, $value2, "B1") _Excel_RangeWrite($oWorkbook, $oWorkbook.Activesheet, $value3, "C1") _Excel_RangeWrite($oWorkbook, $oWorkbook.Activesheet, $value4, "D1") Next  
    • AnonymousX
      By AnonymousX
      Hello,
      I'm trying to write a script that moves copies excel cells into an array. I'll than manipulate the values and send array into another program. 
      I don't want range to be specific to a workbook, or sheet, or set of cells.
      I want user to be able to highlight desired cells and to copy either normally ("Ctrl+C") or by a hotkey ("Alt+C"). 
      Could someone help me with this?
      Thank you,
      I've tried to write the framework: (edited)
      #include <MsgBoxConstants.au3> #include <Array.au3> #include <Excel.au3> HotKeySet("!v", "Pastedata") While True Sleep(1000) WEnd func Makearray() local $bArray ;User has cells already copied ;Convert clipboard into an array ;I don;t know how excel stores data to clipboard so don;t know how to bring it into array _Arraydisplay($bArray) MsgBox(0,0,$bArray) return $bArray endfunc func Pastedata() Local $aArray MsgBox(0,0,"wait",1) ;make array based on assumption user has already copied a range to clipboard $aArray = Makearray() ;paste code ;don;t worry about this I got the rest endfunc  
    • Dzenan03
      By Dzenan03
      I want to make a while loop, that creates variables based on a array. For thist I created the array $iDsO with the number and the name of folders in an other folder. Every folder has a different name an I want to create variables(arrays) for each folder that show me all the files in that folder. For example: I have the Folder \Folder1. In it there are the Folders \1, \2, \3. In 1, 2 and 3 there are some files(.png). The array for Folder1 is $iDsO and now I want to crate the arrays $iDsO1, $iDsO2 and $iDsO3 with the files in them can I make something like this:
      While $iDs > 0 ;$iDs is the number of files in Folder1>> $iDsO[0] $iDs#here should come the Foldername for example '1'# = _FileListtoArray(@ProgramFilesDir&"\Folder1\"&$iDsO[$iDs]) $iDs = $iDs - 1 Wend So that in the End I have three variabels ($iDs1, $iDs2 and $iDs3)
       
      Is this posible or if not what could I do instead ( I don´t know the number of folders in Folder1 in the begining).
×