Sign in to follow this  
Followers 0
andomatic

Read Internet Explorer table by rows with link text

5 posts in this topic

Hello,

I'm working on pulling some data from a table in Internet Explorer and writing the contents, on a row by row basis, into a csv file. . I am able to loop through to an extent but have the following questions:

1. How do I determine the end of a row

2. Sometimes the table data will have a hyperlink attached. I'd very much like to capture that url in my csv file.

If I use the _IETableWriteToArray function, I am able to do everything I need except for getting the URL. Does anyone know a way to pull the URL for certain cells in the table? I have looked at a For...Next using _IETagnameGetCollection but am not successful. Thanks in advance for any assistance! Code below:

$oTable = _IETableGetCollection($oIE,0)
    $oTRs = _IETagnameGetCollection($oTable, "TR")

    For $oTR In $oTRs
        $oTDs = _IETagnameGetCollection($oTR, "TD")
        For $oTD In $oTDs
            $sRowCont = _IEPropertyGet($oTD, "innertext")
            msgbox(0,"Cell Text:",$sRowCont)
            $oTD = _IETagnameGetCollection($oTR, "TD", 0)
            $oLink = _IETagnameGetCollection($oTD, "a", 0)
            msgbox(0,"Link Text",$oLink )
            ;ConsoleWrite('@@ Debug(' & @ScriptLineNumber & ') : $blnFound = ' & $blnFound & @crlf & '>Error code: ' & @error & @crlf) ;### Debug Console
        Next
    Next

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

I would personally just modify _IETableWriteToArray() to return .innerHTML instead of .innerText

Try the function below which is the _IETableWriteToArray function with "$a_TableCells[$col][$row] = $td.innerText" changed to "$a_TableCells[$col][$row] = $td.innerHTML".

Keep in mind, this will return all of the data in the <td> so it may return more than you want, but it's a start.

; #FUNCTION# ====================================================================================================================
; Name...........: _IETableWriteToArray
; Description ...: Reads the contents of a Table into an array
; Parameters ....: $o_object    - Object variable of an InternetExplorer.Application, Table object
;                   $f_transpose- Boolean value.  If True, swap rows and columns in output array
; Return values .: On Success     - Returns a 2-dimensional array containing the contents of the Table
;                  On Failure    - Returns 0 and sets @ERROR
;                    @ERROR        - 0 ($_IEStatus_Success) = No Error
;                                - 3 ($_IEStatus_InvalidDataType) = Invalid Data Type
;                                - 4 ($_IEStatus_InvalidObjectType) = Invalid Object Type
;                    @Extended    - Contains invalid parameter number
; Author ........: Dale Hohm
; ===============================================================================================================================
Func _IETableWriteToArrayHTML(ByRef $o_object, $f_transpose = False)
    If Not IsObj($o_object) Then
        __IEErrorNotify("Error", "_IETableWriteToArray", "$_IEStatus_InvalidDataType")
        Return SetError($_IEStatus_InvalidDataType, 1, 0)
    EndIf
    ;
    If Not __IEIsObjType($o_object, "table") Then
        __IEErrorNotify("Error", "_IETableWriteToArray", "$_IEStatus_InvalidObjectType")
        Return SetError($_IEStatus_InvalidObjectType, 1, 0)
    EndIf
    ;
    Local $i_cols = 0, $tds, $i_col
    Local $trs = $o_object.rows
    For $tr In $trs
        $tds = $tr.cells
        $i_col = 0
        For $td In $tds
            $i_col = $i_col + $td.colSpan
        Next
        If $i_col > $i_cols Then $i_cols = $i_col
    Next
    Local $i_rows = $trs.length
    Local $a_TableCells[$i_cols][$i_rows]
    Local $col, $row = 0
    For $tr In $trs
        $tds = $tr.cells
        $col = 0
        For $td In $tds
            $a_TableCells[$col][$row] = $td.innerHTML
            $col = $col + $td.colSpan
        Next
        $row = $row + 1
    Next
    If $f_transpose Then
        Local $i_d1 = UBound($a_TableCells, 1), $i_d2 = UBound($a_TableCells, 2), $aTmp[$i_d2][$i_d1]
        For $i = 0 To $i_d2 - 1
            For $j = 0 To $i_d1 - 1
                $aTmp[$i][$j] = $a_TableCells[$j][$i]
            Next
        Next
        $a_TableCells = $aTmp
    EndIf
    Return SetError($_IEStatus_Success, 0, $a_TableCells)
EndFunc   ;==>_IETableWriteToArray
Edited by danwilli

Share this post


Link to post
Share on other sites

Great Idea, and worked like a champ. Thanks very very much!

Share this post


Link to post
Share on other sites

Thanks DW1! 

Thank you, your tip helped me a lot!

Share this post


Link to post
Share on other sites

#5 ·  Posted (edited)

One of my signature links is good for this kind of thing.  You write an xpath, and it will return an array of matching objects.

edit: oops, posted on someone digging up a necro

Edited by jdelaney

IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0

  • Similar Content

    • Messy_Code_Guy
      By Messy_Code_Guy
      All,
      I need some help with the following:
      1. Finding an image in a Word doc.  I have read the help file but I cannot figure out how to reference the image in the Word doc.
      2. Adding a hyperlink to that image.
      3. How would I loop the add hyperlink (text) and add hyperlink (image) to replace multiple links in a document.
      I have the add image and hyperlink working with the following code:
      $pic = "<PHOTO>" $picpath = IniRead(@ScriptDir & "\Config\Config.ini", "User Info", "Picture", 0) Local $oRange = _Word_DocFind($oDoc, $pic) _Word_DocPictureAdd($oDoc, $picpath, Default, Default, $oRange) _Word_DocFindReplace($oDoc, $pic, "", Default, 0, True, True) If @error Then $file1 = FileOpen("C:\Tech\Log_Files\_Error_Logs\Error_LOG.txt", 9) _FileWriteLog($file1, "," & @ComputerName & "," & @UserName & ",Error adding a picture to the document. " & $picpath & " " & " @error = " & @error & " @extended = " & @extended) FileClose($file1) EndIf $Link = "<LINKEDIN>" $LinkedIn = IniRead(@ScriptDir & "\Config\Config.ini", "User Info", "LinkedIn", 0) Local $oRange = _Word_DocFind($oDoc, $Link) _Word_DocLinkAdd($oDoc, $oRange, $LinkedIn, Default, "Click here to visit my LinkedIn page. " & @CRLF & $LinkedIn, "LinkedIn") If @error Then $file1 = FileOpen("C:\Tech\Log_Files\_Error_Logs\Error_LOG.txt", 9) _FileWriteLog($file1, "," & @ComputerName & "," & @UserName & ",Error adding a link to the document. " & $LinkedIn & " " & " @error = " & @error & " @extended = " & @extended) FileClose($file1) EndIf I just can't figure out how to find the images in a Word doc.
      Thanks for reading my post!
    • InunoTaishou
      By InunoTaishou
      I think RichEdit has been my favorite thing I've ever discovered on AutoIt lol. In my quest to add in more html tags to my _StringToRichEditArray I needed a way to do href! There was an example I found that I followed but it didn't format correctly and didn't work 100% but it gave me a good base. Think I'll tackle inserting an image next, not looking forward to that. If anyone has an idea on how to do it let me know.
      Known issues (these will cause the hyperlink to lose the +li attribute after the RichEdit is updated):
      The hyperlink and friendly text are appended/inserted (directly adjacent to a non whitespace) but the hyperlink is not a valid hyperlink. Changing the char color for the control causes the hyperlink to lose it's hyperlink color (the light blue). Fix for Issue 1:
      Use the full URL for the hyperlink (https://www.autoitscript.com/site/ instead of www.autoitscript.com/site) Use any hyperlink with any friendly text that does not have www at the beginning (Hyperlink: www.google.com, Friendly Text: google.com) Use any hyperlink with any, or no, friendly text, but have a whitespace to the left of the hyperlink. Fix for Issue 2:
      I have no fix. Updated RichEdit Hyperlink.au3
      Demo
      Original Post, outdated: Had a problem with inserting/appending hyperlinks that pointed to the local computer. (C:\Windows\)
       
       
    • Decipher
      By Decipher
      I would like to have a clickable text(hyperlink) in a RichEdit control besides a URL. To be more specific, I would like to display "Download" instead. How can I achieve this functionality? I have searched with no luck. Thanks for any help.

      Related Information:

      From GUIRichEdit.au3:

      Func _GUICtrlRichEdit_SetEventMask($hWnd, $iEventMask) If Not _WinAPI_IsClassName($hWnd, $_GRE_sRTFClassName) Then Return SetError(101, 0, False) If Not __GCR_IsNumeric($iEventMask) Then Return SetError(102, 0, False) _SendMessage($hWnd, $EM_SETEVENTMASK, 0, $iEventMask) Return True EndFunc ;==&gt;_GUICtrlRichEdit_SetEventMask $iEventMask = $ENM_LINK - Sends $EN_LINK notifications when the mouse pointer is over text having the link character. What is the link character "http://"? The following would work but is not a perfect solution "http://Download".
      From RichEditConstants.au3 used in _GUICtrlRichEdit_AutoDetectURL($hWnd, $fState)

      Global Const $__RICHEDITCONSTANT_WM_USER = 0x400 Global Const $EM_AUTOURLDETECT = $__RICHEDITCONSTANT_WM_USER + 91
      RichEdit Friendly Name Hyperlinks on MSDN

      $EN_LINK - http://msdn.microsoft.com/en-us/library/windows/desktop/bb787970%28v=vs.85%29.aspx

      GUIRegisterMsg($WM_COMMAND, "WM_NOTIFY") WM_NOTIFY is the function called when a link is clicked.


      Func WM_NOTIFY($hWnd, $iMsg, $iwParam, $ilParam) Local $hWndFrom, $iIDFrom, $iCode, $tNMHDR $tNMHDR = DllStructCreate($tagNMHDR, $ilParam) $hWndFrom = HWnd(DllStructGetData($tNMHDR, "hWndFrom")) $iIDFrom = DllStructGetData($tNMHDR, "IDFrom") $iCode = DllStructGetData($tNMHDR, "Code") Switch $hWndFrom Case $h_RichEdit Select Case $iCode = $EN_LINK Local $ENLINK = DllStructCreate($tagENLINK,$ilParam) Local $Link_Msg = DllStructGetData($ENLINK,4) If $Link_Msg = $WM_LBUTTONUP Then Local $Link = _GUICtrlRichEdit_GetTextRange($hWndFrom,DllStructGetData($ENLINK,7),DllStructGetData($ENLINK,8)) MsgBox(0, '', $Link) EndIf Case $iCode = $EN_MSGFILTER Local $tMsgFilter = DllStructCreate($tagEN_MSGFILTER, $ilParam) If DllStructGetData($tMsgFilter, 4) = $WM_RBUTTONUP Then ; WM_RBUTTONUP Local $hMenu = GUICtrlGetHandle($RichMENU[0]) _SetMenuTexts($hWndFrom, $RichMENU) _GUICtrlMenu_TrackPopupMenu($hMenu, $hWnd) EndIf EndSelect EndSwitch Return $GUI_RUNDEFMSG EndFunc ;==>WM_NOTIFY

      I am doing more research but from what I can tell this all done using DLLs therfore no easy solution that I will be able to conjure up.

      Suggested Methods:
      ITextRange2::SetURL method
      ITextRange2::GetURL method

      I have no idea how to use these.

      Resolved see last post.
    • Ascend4nt
      By Ascend4nt
      _ClipPutHTML


      Okay, since I created the _ClipPutHyperlink() function, I figured I might as well just go one step further and open up the whole HTML Clipboard send/put interface and make it really simple to use - not only for simple Hyperlinks, but also for complete pieces of HTML code as well.

      Note that 'PlainText' is the optional view of your HTML code that should be free of formatting. This is helpful when pasting to applications that don't accept HTML formatted strings, such as Notepad.

      Note also that the HTML code NEEDS to be encoded in UTF-8 format. (For straight-ANSI/ASCII code, you don't need to do anything, UTF-8 encoding only comes into play for Unicode formatting).

      Anyway, bundled in the ZIP is the UDF and a short example (same as the below HTMLPut Example).

      Additionally, see the sample code for _ClipGetHTML().
      Example 1: HyperlinkPut:


      ; Special Unicode text call _ClipPutHyperlink("http://www.google.co.jp/",ChrW(0x30B0)& ChrW(0x30FC)& ChrW(0x30B0)& ChrW(0x30EB)& " (Japanese Google)") ; Regular text _ClipPutHyperlink("http://www.google.com","Google") - Example 'paste' Output -
      Unicode Hyperlink:
      グーグル (Japanese Google)

      Regular Hyperlink:
      Google
      Example 2: HTMLPut


      - Example 'paste' output -
      Headline TextThis is a paragraph showing the formatting possibilities using the _ClipPutHTML() functions. The regular modifiders, such as bold, italics, and underlines work as usual, just like all other HTML formatting.

      Here's an example list:
      List item #1. List item #2. List item #3 with a Hyperlink Get the Code at my Site


      Ascend4nt's AutoIT Code License agreement:
      While I provide this source code freely, if you do use the code in your projects, all I ask is that:
      If you provide source, keep the header as I have put it, OR, if you expand it, then at least acknowledge me as the original author, and any other authors I credit

      If the program is released, acknowledge me in your credits (it doesn't have to state which functions came from me, though again if the source is provided - see #1)

      The source on it's own (as opposed to part of a project) can not be posted unless a link to the page(s) where the code were retrieved from is provided and a message stating that the latest updates will be available on the page(s) linked to.

      Pieces of the code can however be discussed on the threads where Ascend4nt has posted the code without worrying about further linking.
    • Ascend4nt
      By Ascend4nt
      _ClipGetHTML

      Seeing as how I created the _ClipPutHTML() & _ClipPutHyperlink() functions, I figured why not complete the set of available functions and finish out the HTML clipboard read/set UDFs.

      The below example is included in the ZIP file on my site.

      Example: HTML Clipboard Monitor

      #include <Misc.au3> ; _IsPressed() #include <_ClipGetHTML.au3> ; ================================================================================================ ; <HTMLClipBoardMonitor.au3> ; ; Simple program used to Monitor and Report on HTML formatted ClipBoard data ; ; Functions: ; MemoWrite() ; from the AutoIT documentation examples ; ; Dependencies: ; <_ClipGetHTML.au3> ; _ClipGetHTML() ; ; See also: ; <_ClipPutHTML.au3> ; ; Author: Ascend4nt, and [??] (whoever coded the AutoIT Help examples with MemoWrite()) ; ================================================================================================ Global $iMemo ; MemoWrite and GUI creation courtesy of AutoIT Help Examples ; Write message to memo Func MemoWrite($sMessage = "") GUICtrlSetData($iMemo, $sMessage & @CRLF, 1) EndFunc ;==>MemoWrite Local $hGUI Local $sHTMLStrPrev="",$aHTMLData Local $sPlainText,$aHTMLLinks ; Create GUI $hGUI = GUICreate("HTML ClipBoard Monitor ([F5] Forces Refresh)", 600, 400) $iMemo = GUICtrlCreateEdit("", 2, 2, 596, 396, 0x200000) ; $WS_VSCROLL=0x00200000 GUICtrlSetLimit($iMemo,1000000) GUICtrlSetFont($iMemo, 9, 400, 0, "Courier New") GUISetState() Do $aHTMLData=_ClipGetHTML() If Not @error And ($aHTMLData[0]<>$sHTMLStrPrev Or _IsPressed("74")) Then $sPlainText=ClipGet() ; Clear the Edit Control GUICtrlSetData($iMemo,"","") MemoWrite("==== New HTML Data received ("&@HOUR&':'&@MIN&":"&@SEC&") ===="&@CRLF&"Version #"&$aHTMLData[1]&@CRLF& _ "Fragment Start:"&$aHTMLData[2]&", Fragment End:"&$aHTMLData[3]&@CRLF& _ "Selection Start (optional [-1=unavailable]):"&$aHTMLData[4]& _ ", Selection End (optional):"&$aHTMLData[5]&@CRLF& _ "Source URL (optional string):"&$aHTMLData[6]&@CRLF& _ "4 characters at Fragment Start:"&StringMid($aHTMLData[0],$aHTMLData[2],4)&@CRLF& _ "4 characters at Fragment End:"&StringMid($aHTMLData[0],$aHTMLData[3],4)&@CRLF) If $aHTMLData[4]<>-1 Then MemoWrite("4 characters at Selection Start:"&StringMid($aHTMLData[0],$aHTMLData[4],4)&@CRLF& _ "4 characters at Selection End:"&StringMid($aHTMLData[0],$aHTMLData[5],4)) EndIf MemoWrite("---- CF_HTML Header (size="&StringLen($aHTMLData[7])&") ----"&@CRLF&$aHTMLData[7]&@CRLF) MemoWrite("---- RAW HTML Data (UTF-8 size="&StringLen($aHTMLData[0])&") ----"&@CRLF&BinaryToString($aHTMLData[0],4)&@CRLF) MemoWrite("---- Plain Text Variant (size="&StringLen($sPlainText)&") ----"&@CRLF&$sPlainText) #cs ; Want to put it back just the way it came? This is one approach, but ; the Offsets will not be placed properly ;_ClipPutHTML($aHTMLData[0],$sPlainText) ; This is the proper way: Local $sHTMLData=$aHTMLData[7]&$aHTMLData[0] _ClipBoard_SendHTML($sHTMLData,$sPlainText) #ce $sHTMLStrPrev=$aHTMLData[0] EndIf Until _IsPressed("1B") Or GUIGetMsg()=-3 ; $GUI_EVENT_CLOSE=-3 GUIDelete($hGUI) Get the Code at my Site


      Ascend4nt's AutoIT Code License agreement:
      While I provide this source code freely, if you do use the code in your projects, all I ask is that:
      If you provide source, keep the header as I have put it, OR, if you expand it, then at least acknowledge me as the original author, and any other authors I credit If the program is released, acknowledge me in your credits (it doesn't have to state which functions came from me, though again if the source is provided - see #1) The source on it's own (as opposed to part of a project) can not be posted unless a link to the page(s) where the code were retrieved from is provided and a message stating that the latest updates will be available on the page(s) linked to. Pieces of the code can however be discussed on the threads where Ascend4nt has posted the code without worrying about further linking. *EDIT: added Memory Lock/Unlock (recommended and often-used way to ensure a successful grab of a Clipboard memory object), and added clarification on correct way to get memory block size