Jump to content

Read data from html Tables from raw HTML source

Recommended Posts


After seeing the function __HTML_Filter() in this topic by Stilgar (https://www.autoitscript.com/forum/topic/124330-_htmlau3-v101/) I thought I'd include that function also in this script.
the purpose of that function is to clean the extracted data from the table by those codes that are not visible in the browser but are visible as code "dirty" in the data when they are picked up from the table.

Updated the udf and the example script in first post.

To see the difference in the extracted data with or without the use of the HTML_Filter() function, just extract the table data from the example page by clicking on the "Preview array" button with the filter CheckBox "tags to entities" one time unchecked and then checked instead.

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Share this post

Link to post
Share on other sites

Hi Chimp!
Thanks a lot for your example - it saved me a lot work!
I had to parse a table with almost 1400 rows (and lots of rowspans) in an 1.5MB HTML file, and got some performance issues. Here is how I solved them:
First, I adapted the HTML tag position search in _ParseTags to search starting on the last tag found position, so StringInStr doesn't need to count thousands of "<tr" tags every iteration. Then, _ArraySort failed (too many rows...). So, to get the tag list pre-sorted, I search for the first opening and first closing tag. If the opening is before the closing, write to $aThisTagsPositions and find the next opening; if the closing is before the next opening, write to $aThisTagsPositions and find the next closing.

This made it possible to read that huge HTML file in less than 90 seconds.

Just replace the code on lines 208-216 with this:

Local $iNextOpenPosition = StringInStr($sHtml, $sOpening, 0, 1)
        Local $iNextClosePosition = StringInStr($sHtml, $sClosing, 0, 1)
        Local $iOpenCount = 1

        ; 2) find in the HTML the positions of the $sOpening <tag and $sClosing </tag> tags
        For $i = 1 To $iNrOfThisTag * 2 ;search all the opening and closing tags
            If ($iNextOpenPosition < $iNextClosePosition) And $iNextOpenPosition <> 0 Then
                $aThisTagsPositions[$i][0] = $iNextOpenPosition
                $aThisTagsPositions[$i][1] = $sOpening ; it marks which kind of tag is this
                $aThisTagsPositions[$i][2] = $iOpenCount; nr of this tag
                $iOpenCount += 1
                $iNextOpenPosition = StringInStr($sHtml, $sOpening, 0, 1, $aThisTagsPositions[$i][0] + 1)
                $aThisTagsPositions[$i][0] = $iNextClosePosition + StringLen($sClosing) - 1
                $aThisTagsPositions[$i][1] = $sClosing ; it marks which kind of tag is this
                $iNextClosePosition = StringInStr($sHtml, $sClosing, 0, 1, $aThisTagsPositions[$i][0] + 1)


  • Like 1

Share this post

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Similar Content

    • VollachR
      By VollachR
      I'm looking for a way to take a number value from a Row2 of a 2D array and according to this check if files that appear in rows 3-11 in the array exists.
      For example, if the number in Row2 is 5 I need to check for the files in Row 3-6 only, if it is 6 than rows 3-7 and so on.
      I thought on using a FOR loop but I have very little experience with those.
      Can you suggest the best way to do what I need?
      BTW, the files in Rows 3-11 will usually have blank value for any row above the number in Row2 (e.g. Row2 = 5 so Rows3-6 will have values but 8-11 be empty), The values I need are in Column 1 of the array, the name of the key from the INI file that the array was created from is in Column 0.
      Full Example:
      Row2 of Array:
      Col0 = Games# - Col1 = 5
      Col0 = Exe2 - Col1 = Path To File
      Col0 = Exe3 - Col1 = Path To File
      Col0 = Exe4 - Col1 = Path To File
      Col0 = Exe5 - Col1 = Path To File
      I need that if Row2 is 5 to check these above for rows if the file exists, if it was 6 then the next row as well and so on up until number 10 in Row2 as it can't go above 10.
      So basically for whatever number in Row2 from 2-10 need to check 1-9 rows from 3-11 to see if the files in Col1 exists and if any of them don't exist it should call a function that shows an error message.
      I'm pretty sure I have the first line of the for look correct:
      For $i = 1 To $aAIO[2][1] Just not sure how to continue from there, also not sure if $i should be equal 1 or 2.
      Help will be appreciated.
    • FMS
      By FMS
      I'm trying to get data from twitter to an array and so far I found an Twitter UDF whish lookes very intresting but couldn't get it to work.
      It lookes not supported any more(2010) and buggy when i read all te replies.
      More around this subject (autoit and twitter) i couldn't find on this forum.
      Is there sombody who know's a good way to get live data from twitter to an array inside autoit?
      (I kinda doubt that this isn't tackled before)
      In the end I was hoping to get all tweets from date to date from an specific subject inside a 2D array to work whit.
    • AndreasNWWWWW
      By AndreasNWWWWW
      I got a question:  i am trying to run different functions based upon what i select in these radio buttons.(code below)
      it needs to check server 1. then run function 1 or function 2 after what i selected in the checkbox.
      once that function is done it moves to the next one, until it has been trough all 5 
      iv'e tried using while loops with different while $i equals to something but then i manualy need to go in and edit the script every time.
      #include <ButtonConstants.au3> #include <GUIConstantsEx.au3> #include <StaticConstants.au3> #include <WindowsConstants.au3> #Region ### START Koda GUI section ### Form= $Form1 = GUICreate("Form1", 615, 437, 192, 124) $Server2 = GUICtrlCreateLabel("Server2", 216, 95, 41, 17) $server1 = GUICtrlCreateLabel("Server1", 216, 72, 41, 17) $server4 = GUICtrlCreateLabel("Server4", 216, 144, 41, 17) $server3 = GUICtrlCreateLabel("Server3", 216, 119, 41, 17) $server5 = GUICtrlCreateLabel("Server5", 216, 170, 41, 17) $Start = GUICtrlCreateButton("Start", 240, 248, 147, 25) $Checkbox1 = GUICtrlCreateCheckbox("function1", 288, 72, 97, 17) $Checkbox2 = GUICtrlCreateCheckbox("function2", 392, 72, 97, 17) $Checkbox3 = GUICtrlCreateCheckbox("function1", 288, 96, 97, 17) $Checkbox4 = GUICtrlCreateCheckbox("function2", 392, 96, 97, 17) $Checkbox5 = GUICtrlCreateCheckbox("function1", 288, 120, 97, 17) $Checkbox6 = GUICtrlCreateCheckbox("function2", 392, 120, 97, 17) $Checkbox7 = GUICtrlCreateCheckbox("function1", 288, 144, 97, 17) $Checkbox8 = GUICtrlCreateCheckbox("function2", 392, 144, 97, 17) $Checkbox9 = GUICtrlCreateCheckbox("function1", 288, 170, 97, 17) $Checkbox10 = GUICtrlCreateCheckbox("function2", 392, 170, 97, 17) GUISetState(@SW_SHOW) #EndRegion ### END Koda GUI section ### While 1 $nMsg = GUIGetMsg() Switch $nMsg Case $GUI_EVENT_CLOSE Exit EndSwitch WEnd  
    • 31290
      By 31290
      Hi everyone, 
      I'm currently writing a script that allow me to list all currently installed software on a computer but some of the are listed in the HKLM64 hive of the registry whereas 95% of all others are in the HKLM "normal" one.
      Thing is, I'd like to combine these two reg key into one single ListView item.
      Here's my code so far, knowing that it's working on both cases (changing to HKLM64 or HKLM short)
      Thanks in advance for the help
    • SkysLastChance
      By SkysLastChance
      WinActivate("MEDITECH - Internet Explorer") Sleep (500) $oIE = _IEAttach("MEDITECH") $oDiv1 = _IEGetObjById($oIE, "sysmenu-searchbarbutton") _IEAction($oDiv1, "click") I am just trying to click the little magnifying glass, next to the gear button with no luck. I was hoping someone might have an idea why this is not working?