Sign in to follow this  
Followers 0
litlmike

IE.au3, separating link and title from Anchor

8 posts in this topic

I have a table on a private webpage that contains data. And of course: The <a> tag is used to create an anchor to link from, the href attribute is used to address the document to link to, and the words between the open and close of the anchor tag will be displayed as a hyperlink.

When I use _IETableGetCollection & _IETableWriteToArray the result is only the 'Text to be displayed'; however, I also need the a href="url". How do I capture that data?

How the HTML appears:

<a href="fake.cfm?id=377477" target="_blank">Name Phony </a>

TIA

Share this post


Link to post
Share on other sites



Check this out

I just modified the example in the help file.

#include <IE.au3>
$oIE = _IE_Example ("basic")
$oLinks = _IELinkGetCollection ($oIE)
$iNumLinks = @extended
MsgBox(0, "Link Info", $iNumLinks & " links found")
For $oLink In $oLinks
    MsgBox(0, "Link Info", $oLink.href & @CRLF & _IEPropertyGet($oLink,'innertext'))
NextoÝ÷ Ûú®¢×©ä²Øb²¥¦É¶Þ¶×«ºw^®ËZØhÂÚ.±ëazÞ}©Ýßاêí{wôz0z÷«Ëaz·Á¬­¢ë¶¡­çß®­çZµ¨§ DÚnW®+^N+­¬y©âaz+,¹ì"¶¬x4ߪy,

Share this post


Link to post
Share on other sites

If I understand what you are asking, your trouble is that _IETableWriteToArray() gives you the innerText of each of the table cells rather than the innerHTML (everything inside <> in the HTML source gets stripped out).

You could make a private copy of _IETableWriteToArray and change the innerText to innerHTML and you might get what you need. You can also scope a link collection to a table if you chhose, but it would get tricky unless you can count on the table structure being consistent:

$oLinks = _IETagNameGetCollection($oTable, "a")

Dale


Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Share this post


Link to post
Share on other sites

#5 ·  Posted (edited)

If I understand what you are asking, your trouble is that _IETableWriteToArray() gives you the innerText of each of the table cells rather than the innerHTML (everything inside <> in the HTML source gets stripped out).

You could make a private copy of _IETableWriteToArray and change the innerText to innerHTML and you might get what you need. You can also scope a link collection to a table if you chhose, but it would get tricky unless you can count on the table structure being consistent:

$oLinks = _IETagNameGetCollection($oTable, "a")

Dale

That is almost exactly what I want. I sadly am not too familiar with .href, innerText and InnerHTML. But I think I am wanting .href. Basically, just the info that allows the user to link to webpage. When I modded it to InnerHTML, it gave me <A href="fake.cfm?id=368777" target=_blank>Fake Name</A>. I just want fake.cfm?id=368777. I thought this might be a cool inclusion for your UDF, so I did the work for you below.

***EDIT***

I forgot to mention that changing innerText to .href in the UDF just errors out, but I am guessing you would already know that.

Function Reference
_IETableWriteToArray 
--------------------------------------------------------------------------------

Reads the contents of a Table into an array.


#include <IE.au3> 
_IETableWriteToArray ( ByRef $o_object [, $f_transpose], $f_href]]  )


Parameters

$o_object Object variable of an InternetExplorer.Application, Table object 
$f_transpose Boolean value specifying whether to swap the rows and columns in the output array 
$f_href Boolean value.  If True, gives innerHTML instead of innerText
 

Return Value

Success: Returns a 2-dimensional array containing the contents of the Table 
Failure: Returns 0 and sets @ERROR 
@Error: 0 ($_IEStatus_Success) = No Error 
 3 ($_IEStatus_InvalidDataType) = Invalid Data Type 
 4 ($_IEStatus_InvalidObjectType) = Invalid Object Type 
@Extended: Contains invalid parameter number 

 

Remarks

When table cells span multiple columns or rows, blank array elements are added to properly align the results. Data in spanning cells will be in the left or uppermost array elements.

Tables are often nested in HTML documents. If all of your data is unexpectedly returned in a single array element, you may need to reference a more deeply nested table to this function.


;===============================================================================
;
; Function Name:    _IETableWriteToArray()
; Description:      Reads the contents of a Table into an array
; Parameter(s):     $o_object   - Object variable of an InternetExplorer.Application, Table object
;                   $f_transpose- Boolean value.  If True, swap rows and columns in output array
;                   $f_href- Boolean value.  If True, gives innerHTML instead of innerText
; Requirement(s):   AutoIt3 V3.2 or higher
; Return Value(s):  On Success  - Returns a 2-dimensional array containing the contents of the Table
;                   On Failure  - Returns 0 and sets @ERROR
;                   @ERROR      - 0 ($_IEStatus_Success) = No Error
;                               - 3 ($_IEStatus_InvalidDataType) = Invalid Data Type
;                               - 4 ($_IEStatus_InvalidObjectType) = Invalid Object Type
;                   @Extended   - Contains invalid parameter number
; Author(s):        Dale Hohm
;
;===============================================================================
;
Func _IETableWriteToArray(ByRef $o_object, $f_transpose = False, $f_href = False)
    If Not IsObj($o_object) Then
        __IEErrorNotify("Error", "_IETableWriteToArray", "$_IEStatus_InvalidDataType")
        SetError($_IEStatus_InvalidDataType, 1)
        Return 0
    EndIf
    ;
    If Not __IEIsObjType($o_object, "table") Then
        __IEErrorNotify("Error", "_IETableWriteToArray", "$_IEStatus_InvalidObjectType")
        SetError($_IEStatus_InvalidObjectType, 1)
        Return 0
    EndIf
    ;
    Local $i_cols = 0, $trs, $tr, $tds, $i_col, $i_rows, $col, $row
    $trs = $o_object.rows
    For $tr In $trs
        $tds = $tr.cells
        $i_col = 0
        For $td In $tds
            $i_col = $i_col + $td.colSpan
        Next
        If $i_col > $i_cols Then $i_cols = $i_col
    Next
    $i_rows = $trs.length
    Local $a_TableCells[$i_cols][$i_rows]
    $row = 0
    For $tr In $trs
        $tds = $tr.cells
        $col = 0
        For $td In $tds
            If $f_href Then
                $a_TableCells[$col][$row] = $td.href
                $col = $col + $td.colSpan
            Else
                $a_TableCells[$col][$row] = $td.innerText
                $col = $col + $td.colSpan
            EndIf
                
        Next
        $row = $row + 1
    Next
    If $f_transpose Then
        Local $i_d1 = UBound($a_TableCells, 1), $i_d2 = UBound($a_TableCells, 2), $aTmp[$i_d2][$i_d1]
        For $i = 0 To $i_d2 - 1
            For $j = 0 To $i_d1 - 1
                $aTmp[$i][$j] = $a_TableCells[$j][$i]
            Next
        Next
        $a_TableCells = $aTmp
    EndIf
    SetError($_IEStatus_Success)
    Return $a_TableCells
EndFunc   ;==>_IETableWriteToArray
Edited by litlmike

Share this post


Link to post
Share on other sites

The reason that .href will error out is that the elements you are working with are table cells rather than the links that they contain. Links have a .href property, TD's do not. Both have a .innerhtml property however.

Thanks for posting the mod - perhaps it will be useful to others. It is too special purpost for the core UDF however. Besides, if I had already done it in the UDF you wouldn't have learned all this cool stuff ;-)

Dale


Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Share this post


Link to post
Share on other sites

The reason that .href will error out is that the elements you are working with are table cells rather than the links that they contain. Links have a .href property, TD's do not. Both have a .innerhtml property however.

Thanks for posting the mod - perhaps it will be useful to others. It is too special purpost for the core UDF however. Besides, if I had already done it in the UDF you wouldn't have learned all this cool stuff ;-)

Dale

You are right, I did learn a lot of cool things and I am always thankful for the lesson.

Ahhhh, so the object is a table, not a link ergo, .href cannot be used on object table, only object link. Well then, this seems to imply that there is no solution for my problem - only possible workarounds, is this correct?

I can think that it is possible that I can separate the links if they follow some pattern, I will check. However, this seems less intelligent to me, can you verify that the solution cannot be reached with the previous method.

Thanks

Share this post


Link to post
Share on other sites

#8 ·  Posted (edited)

Well, you can make your specialized solution do what you want, it just takes more work...

This checks to see if there are any "a" links in the cell, then returns the href (of only the first one if there are multiple), else returns Null.

If $f_href Then
                $oLink = _IETagnameGetCollection($oTD, "a", 0)
                If @extended Then
                    $a_TableCells[$col][$row] = $oLink.href
                Else
                    $a_TableCells[$col][$row] = ""
                    $col = $col + $td.colSpan
                EndIf

Dale

Edit: fixed major logic flaw

Edited by DaleHohm

Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0