Jump to content

stumped from StringInStr and StringMid


Gianni
 Share

Go to solution Solved by JohnOne,

Recommended Posts

the little snipped below should:
1) open an html page containing 2 tables
2) extract from the HTML only the portions between <table and </table> (inclusive)
3) print on console the extracted portion

but as you can see from the output generated on the console the first table extracted
doesn't ends with the </table> tag, but it includes an extra portion of the string that follows the </table> tag

#include <IE.au3>
#include <String.au3>
#include <Array.au3>
;
; 1) open an html page containing 2 tables
Local $oie = _IE_Example("table")
Do
    Sleep(250)
Until IsObj($oie)
Local $sHtml = _IEBodyReadHTML($oie) ; extract whole HTML
;
; finds how many tables are on the HTML page
StringReplace($sHtml, "<table", "<table") ; in @xtended nr. of occurences
Local $iNrOfTables = @extended
; ClipPut($sHtml)
If $iNrOfTables Then ; if at least one table exists
    ; $aTablesPositions array will contain the position of the
    ; starting <table and ending </table> tags within the HTML
    Local $aTablesPositions[$iNrOfTables + 1][2] ; 1 based
    ; 2) extract from the HTML only the portions between <table and </table> (inclusive)
    For $i = 1 To $iNrOfTables
        $aTablesPositions[$i][0] = StringInStr($sHtml, "<table", 0, $i) ; start position of $i occurrence of <table
        $aTablesPositions[$i][1] = StringInStr($sHtml, "</table>", 0, $i) + 7 ; end position of the $i occurence of </table>
        ; 3) print on console the extracted portion
        ConsoleWrite("Table " & $i & @CRLF & "--------" & @CRLF)
        ConsoleWrite(StringMid($sHtml, $aTablesPositions[$i][0], $aTablesPositions[$i][1]) & @CRLF & "--------" & @CRLF)
    Next
    ; _ArrayDisplay($aTablesPositions)
Else
    ConsoleWrite("No tables in HTML" & @CRLF)
EndIf

here a reduced portion of the output:

Table 1
--------
<TABLE id=tableOne border=1>
........
<TD>aid</TD>
<TD>of</TD></TR></TBODY></TABLE><BR>$oTableTwo = _IETableGetObjByName($oIE, "tableTwo")<BR>&lt;table border="1" id="tableTwo"&gt;
--------
Table 2
--------
<TABLE id=tableTwo border=1>
<TBODY>
........
<TD>Ten</TD>
<TD>Eleven</TD></TR></TBODY></TABLE>
--------

where am I wrong?
thanks for the help

Edited by Chimp

 

image.jpeg.9f1a974c98e9f77d824b358729b089b0.jpeg Chimp

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Link to comment
Share on other sites

Want to fix that code or just use _StringBetween?

#include <IE.au3>
#include <String.au3>
#include <Array.au3>

Local $oie = _IE_Example("table")
Do
    Sleep(250)
Until IsObj($oie)
Local $sHtml = _IEBodyReadHTML($oie) ; extract whole HTML

If Not @error Then ; if at least one table exists
    $t = _StringBetween($sHtml, "<table", "</table")
     _ArrayDisplay($t)
Else
    ConsoleWrite("No tables in HTML" & @CRLF & $sHtml & @CRLF)
EndIf

AutoIt Absolute Beginners    Require a serial    Pause Script    Video Tutorials by Morthawt   ipify 

Monkey's are, like, natures humans.

Link to comment
Share on other sites

Off the top of my head, I'm thinking it is a @CRLF issue.

Second thoughts, I just realized you are doing a position for your final Mid element, when it needs to be a calculation .... count from the first element. Subtract first from last I'm thinking.

Edited by TheSaint

Make sure brain is in gear before opening mouth!
Remember, what is not said, can be just as important as what is said.

Spoiler

What is the Secret Key? Life is like a Donut

If I put effort into communication, I expect you to read properly & fully, or just not comment.
Ignoring those who try to divert conversation with irrelevancies.
If I'm intent on insulting you or being rude, I will be obvious, not ambiguous about it.
I'm only big and bad, to those who have an over-active imagination.

I may have the Artistic Liesense ;) to disagree with you. TheSaint's Toolbox (be advised many downloads are not working due to ISP screwup with my storage)

userbar.png

Link to comment
Share on other sites

@JohnOne
thanks JonOne for the simplification, but I would need to know the position of the tags within the HTML

in my snippe those data should be in the $aTablesPositions array.

I need that datas for the next step, that is to parse nested tables
for example
normal tables like this could be parsed with _StringBetween
<table>
</table>
<table>
</table>

but with nested table _StringBetween fail parsing **
<table>
    <table>
    </table>
</table>

@TheSaint
thanks for answer,
.... not clear what you mean
"count from the first element. Subtract first from last ".

Anyway I do not understand why my first snipprt fails.

edit:

** or even worst with mixed table, simple and nested

<table>
</table>

<table>
    <table>
    </table>
</table>

<table>
</table>

Edited by Chimp

 

image.jpeg.9f1a974c98e9f77d824b358729b089b0.jpeg Chimp

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Link to comment
Share on other sites

Well In the StringInStr function there is a parameter for start position.

So your search for end tag should begin at the return value of the position of start tag, to save time.

Then the difference between start tag and end tag, is the count parameter you use in StringMid.

Make sense?

AutoIt Absolute Beginners    Require a serial    Pause Script    Video Tutorials by Morthawt   ipify 

Monkey's are, like, natures humans.

Link to comment
Share on other sites

Well In the StringInStr function there is a parameter for start position.

So your search for end tag should begin at the return value of the position of start tag, to save time.

Then the difference between start tag and end tag, is the count parameter you use in StringMid.

Make sense?

 

no matter about the speed

also this way would fail with mixed (normal and nested) tables

I still do not understand why my first snippet fails... :huh2:

 

image.jpeg.9f1a974c98e9f77d824b358729b089b0.jpeg Chimp

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Link to comment
Share on other sites

  • Solution

Here is your StringMid call

StringMid($sHtml, $aTablesPositions[$i][0], $aTablesPositions[$i][1])

Start param is fine.

Count is not, read again what I wrote above, count has to be the difference between both StringInStr results.

AutoIt Absolute Beginners    Require a serial    Pause Script    Video Tutorials by Morthawt   ipify 

Monkey's are, like, natures humans.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...