Jump to content
Chimp

Read data from html Tables from raw HTML source

Recommended Posts

Chimp

After seeing the function __HTML_Filter() in this topic by Stilgar (https://www.autoitscript.com/forum/topic/124330-_htmlau3-v101/) I thought I'd include that function also in this script.
the purpose of that function is to clean the extracted data from the table by those codes that are not visible in the browser but are visible as code "dirty" in the data when they are picked up from the table.

Updated the udf and the example script in first post.

To see the difference in the extracted data with or without the use of the HTML_Filter() function, just extract the table data from the example page by clicking on the "Preview array" button with the filter CheckBox "tags to entities" one time unchecked and then checked instead.


small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Share this post


Link to post
Share on other sites
barbossa

Hi Chimp!
Thanks a lot for your example - it saved me a lot work!
I had to parse a table with almost 1400 rows (and lots of rowspans) in an 1.5MB HTML file, and got some performance issues. Here is how I solved them:
First, I adapted the HTML tag position search in _ParseTags to search starting on the last tag found position, so StringInStr doesn't need to count thousands of "<tr" tags every iteration. Then, _ArraySort failed (too many rows...). So, to get the tag list pre-sorted, I search for the first opening and first closing tag. If the opening is before the closing, write to $aThisTagsPositions and find the next opening; if the closing is before the next opening, write to $aThisTagsPositions and find the next closing.

This made it possible to read that huge HTML file in less than 90 seconds.

Just replace the code on lines 208-216 with this:

Local $iNextOpenPosition = StringInStr($sHtml, $sOpening, 0, 1)
        Local $iNextClosePosition = StringInStr($sHtml, $sClosing, 0, 1)
        Local $iOpenCount = 1

        ; 2) find in the HTML the positions of the $sOpening <tag and $sClosing </tag> tags
        For $i = 1 To $iNrOfThisTag * 2 ;search all the opening and closing tags
            If ($iNextOpenPosition < $iNextClosePosition) And $iNextOpenPosition <> 0 Then
                $aThisTagsPositions[$i][0] = $iNextOpenPosition
                $aThisTagsPositions[$i][1] = $sOpening ; it marks which kind of tag is this
                $aThisTagsPositions[$i][2] = $iOpenCount; nr of this tag
                $iOpenCount += 1
                $iNextOpenPosition = StringInStr($sHtml, $sOpening, 0, 1, $aThisTagsPositions[$i][0] + 1)
            Else
                $aThisTagsPositions[$i][0] = $iNextClosePosition + StringLen($sClosing) - 1
                $aThisTagsPositions[$i][1] = $sClosing ; it marks which kind of tag is this
                $iNextClosePosition = StringInStr($sHtml, $sClosing, 0, 1, $aThisTagsPositions[$i][0] + 1)
            EndIf
        Next

 

  • Like 1

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Similar Content

    • lattey
      By lattey
      hi,
      i have checkboxes and each checkbox that checked, i put in array. 
      now, im stuck on how to loop the checked array and store in in one variable. what i can do now, is only write the result into a text file. 
      below is the code:
      #include <GUIConstantsEx.au3> ;~ #include <MsgBoxConstants.au3> #include <ButtonConstants.au3> #include <Array.au3> Global $Count = 3 Global $CheckBoxP[$Count] Global $step[$Count] global $array1[1] Global $ExitResult $hGUI = GUICreate("Summary Steps", 500, 400) GUISetFont(12, 400, "Tahoma") GUICtrlCreateLabel( "Please Select the Summary Steps for Script Check", 70, 20) GUISetFont(10, 400, "Tahoma") Global $array_Pstep[3] = ["fix2","fix1","fix3"] global $step[3] = ["2","3","4"] $Spacing = 50 For $i = 0 To UBound($array_Pstep) - 1 $CheckBoxP[$i] = GUICtrlCreateCheckbox($array_Pstep[$i], 80, $Spacing + (20 * $i), 65, 17) Next $submit = GUICtrlCreateButton("Submit",180, 280, 80, 30) $exit = GUICtrlCreateButton("Exit",180, 320, 80, 30) GUISetState() While 1 $Msg = GUIGetMsg() Select case $Msg=$submit For $i = 0 To $Count - 1 If GUICtrlRead($CheckBoxP[$i]) = $GUI_CHECKED Then _ArrayAdd($array1, $step[$i]) EndIf Next Global $logfilerray = @WorkingDir & "\checkedlist.txt" FileDelete ($logfilerray) Global $readlogfile = FileOpen($logfilerray,1) for $a = 1 to UBound($array1) - 1 ;~ $var=$array1[$a] FileWriteLine($readlogfile,$array1[$a]) Next FileClose($readlogfile) Exit case $Msg=$exit $ExitResult = MsgBox(1,"Summary Step", "Continue to Exit ?") if $ExitResult = 1 Then ;ok Exit EndIf Exit EndSelect WEnd  
    • omicron
      By omicron
      How do you perform a nested loop function with a multidimensional array from 2 lists.
      for i in list1
      (open file) extract variable
          while open for i in list 2
          (open file2) extract variable
       
      var1 + var2 = (search term)

      The list sizes will more than likely consist of different lengths.
       
      What is the best approach to accomplishing this method?
             
    • omicron
      By omicron
      Hello!

      I am working on a function that I am just getting lost on. The goal is a multiple nested loop.

      Here are the steps:
      Contents of file1.txt::
      [topic] var1=Name var2=OtherName var3=SomeotheName Contents of file2.txt::
      [subTopic] top=sub1 top2=sub2 top3=sub3 The Shell I am working from::
      #include <file.au3> $file = "c:\yourfile.txt" FileOpen($file, 0) For $i = 1 to _FileCountLines($file) $line = FileReadLine($file, $i) msgbox(0,'','the line ' & $i & ' is ' & $line) Next FileClose($file) Understanding however that the "msgbox" needs to then become a variable. in example the following::
      $file = "c:\yourfile.txt" FileOpen($file, 0) While true( prog.exe is running && "WinName" is open) do For $i = 1 to _FileCountLines($file) $line = FileReadLine($file, $i) ;Open File to log "current location of file 1" FileWriteLine ("filename", $i & ' is ' & $line) var = $line Next $file2 = "c:\yourfile.txt" FileOpen($file, 0) For $i = 1 to _FileCountLines($file) $line = FileReadLine($file, $i) ; OpenFile to log "Current location of file 2" FileWriteLine ("filename", $i & ' is ' & $line) Next FileClose($file2) FileClose($file) The goal in written form is the following ::

      While in "OpenWindow"
          read from file 1 starting at line 1 until end of file.
         file 1 is a list of names to be searched.
         With $line selected, add this element to the element in file 2.
       
      The search of a variables in list 1 and list 2 differ on the amount of posts that day. (This is not a web based platform, it is a game) I need to search 2 names and take a screenshot of the out put. The sizes of the names list depend on the activity of names at the time of search.
      This loop continues until all the names from both lists have been searched. Mostly in the format of::
      File1= item
      File2= Vendor
       
      Item + Vendor  ( Capture screen, scroll) -- Not sure how to detect if I need to scroll)
       
      Thank you for your help and support!
    • Skeletor
      By Skeletor
      Hi Virtual People,
      My array works perfectly fine. However, what is the best practice if the line in the array doesn't have the correct amount of columns and if I can add a placeholder?

       
      For $count = 1 To _FileCountLines($FileRead1) Step 1 $string = FileReadLine($FileRead1, $count) $input = StringSplit($string, ",", 1) $value1 = $input[1] $value2 = $input[2] $value3 = $input[3] _Excel_RangeWrite($oWorkbook, $oWorkbook.Activesheet, $value2, "A1") _Excel_RangeWrite($oWorkbook, $oWorkbook.Activesheet, $value1, "B1") _Excel_RangeWrite($oWorkbook, $oWorkbook.Activesheet, $value3, "C1") Next  
    • MrCheese
      By MrCheese
      hi all,
      reviewing the forum, this thread is applicable: 
       
       
      I wanted to know if there is now a better way to do this?
      In essence, I load a tab delimited txt file into an array (works well). I used tab, as some fields in the original csv contains commas.
      However, I needed autoit to manipulate this array, and output it as a csv.
      IF my array contains items with a comma, without double quotes around the field, then how best do I get a csv out of this?
      My current workaround is to filewritefromarray tab delimited, then open it in excel and save as a csv. I will need to check this to see how the address fields behave that contain a comma.
       
      Any thoughts would be appreciated.
       
×