zorphnog Posted May 20, 2010 Share Posted May 20, 2010 I've been using the _IE functions to attempt to convert some HTML tables into a database. Specifically, I have been using the _IETableWriteToArray function. I have no problems with the implementation of the _IE functions or the DOM interface. My question is more of a general IE DOM question.Is the DOM that is exposed by an IE object representative of the rendered document or that of the document before rendering? I am led to think that it is the latter because I am running into some issues with badly formed HTML pages. The pages have <span> elements that are incorrectly used around <td> elements resulting in certain cells not being available through the DOM. Link to comment Share on other sites More sharing options...
DaleHohm Posted May 20, 2010 Share Posted May 20, 2010 It absolutely exposes the elements in their rendered state. Dale Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model Automate input type=file (Related) Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded Better Better? IE.au3 issues with Vista - Workarounds SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead? Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble Link to comment Share on other sites More sharing options...
zorphnog Posted May 20, 2010 Author Share Posted May 20, 2010 Ok, so then the rendered document should ignore illegal <span> elements then correct? I can't post the exact HTML, but here is an example I derived from the page.test.htmexpandcollapse popup<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:p="urn:schemas-microsoft-com:office:powerpoint" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml"> <head> <meta http-equiv="Content-Language" content="en-us" /> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <title>Management Page</title> <style type="text/css"> .style11 { font-size: xx-small; text-align: left; } .style1 { font-size: xx-small; } .style12 { color: #800000; } .style13 { background-color: #C0C0C0; } .style14 { font-size: x-small; } .style6 { font-size: x-small; text-decoration: none; line-height: 20px; font-style: normal; color: #000000; font-weight: normal; } .style15 { text-align: center; } .style2 { font-size: x-small; } .style17 { font-size: x-small; font-weight: bold; } .style5 { font-size: x-small; } .style18 { font-size: x-small; text-align: left; } .style20 { font-size: x-small; font-weight: normal; } .style21 { font-size: x-small; font-family: Tahoma; } .style23 { font-family: Tahoma; } .style4 { font-weight: normal; } .style9 { font-size: x-small; } * { /* IE5-6 font declaration */ _font-size: inherit; _font-family: inherit; _font-color: inherit; _font-weight: inherit; } .style24 { font-size: x-small; text-align: left; font-family: Tahoma; } .style3 { font-size: x-small; } .style25 { font-size: x-small; text-align: center; } </style> <base target="_blank" /> </head> <body> <table style="font-family: Tahoma; font-size: 7.5pt; text-align: center;"> <tr> <td colspan="10"> <p><b><span style="FONT-SIZE: 10pt; COLOR: maroon; FONT-FAMILY: Tahoma"> 2010 Alerts</span></b></p> </td> </tr> <tr style="background-color: silver; font-weight: bold;"> <td width="94"> <p><span>Notice<br /> Number</span></p> </td> <td width="62"> <p><span>Release<br /> Date</span></p> </td> <td width="65"> <p><span>Major.Minor<br /> Revision</span></p> </td> <td width="60"> <p><span>Revision<br /> Date</span></p> </td> <td style="width: 88px"> <p><span>CVE</span></p> </td> <td style="width: 295px"> <p><span>Title</span></p> </td> <td width="93"> <p><span>Status</span></p> </td> <td style="width: 94px"> <p><span>A</span></p> </td> <td style="width: 65px"> <p><span>B</span></p> </td> <td width="78" style="width: 97px"> <p><span>C</span></p> </td> </tr> <tr> <td valign="top" width="94" class="style14"> <font class="inplacedisplayid1siteid0"><span> <a target="_blank" href=""> 201041</a></span></font></td> <td valign="top" width="62">11 Mar 10</td> <td valign="top" width="65"> </td> <td valign="top" width="62"> </td> <td valign="top" class="style18" style="width: 88px"> <a href="http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2010-0408"> <span class="style5">CVE-2010-0408</span></a> <br /> <a href="http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2010-0425"> CVE-2010-0425</a><br /> <a href="http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2010-0434"> CVE-2010-0434</a><font size="2"><br /> </font></td> <td valign="top" class="style14" style="width: 295px"> Multiple Vulnerabilities in Apache httpd</td> <span class="style14"> <td valign="top" width="93" class="style14">Active</td> <td valign="top" class="style14" style="width: 94px"> <font class="inplacedisplayid1siteid0"><span> <a target="_blank" href=""> 201041</a></span></font></td> </span> <td valign="top" style="width: 65px"> <span class="style14"> <span> <a href=""> 201007</a></span></span></td> <span class="style14"> <td valign="top" width="94" class="style14" style="width: 97px"> <font class="inplacedisplayid1siteid0"><span> <a target="_blank" href=""> 201041</a></span></font></td> </span> </tr> <tr> <td valign="top" width="94" class="style14"> <font class="inplacedisplayid1siteid0"><span> <a target="_blank" href=""> 201036</a></span></font></td> <td valign="top" width="62">25 Feb 10</td> <td valign="top" width="65">1.1</td> <td valign="top" width="62">10 Mar 10</td> <td valign="top" class="style18" style="width: 88px"> <font size="2" class="style14"> <a href="http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2010-0106"> CVE-2010-0106</a><br /> <a href="http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2010-0107"> CVE-2010-0107</a><br /> <a href="http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2010-0108"> CVE-2010-0108</a><br /> </font></td> <td valign="top" class="style14" style="width: 295px"> Multiple Vulnerabilities in Symantec Products</td> <span class="style14"> <td valign="top" width="93" class="style14">Active</td> <td valign="top" class="style14" style="width: 94px"> <font class="inplacedisplayid1siteid0"><span> <a target="_blank" href=""> 201036</a></span></font></td> </span> <td valign="top" style="width: 65px"> <span class="style14"> <span> <a href=""> 201002</a></span></span></td> <span class="style14"> <td valign="top" width="94" class="style14" style="width: 97px"> <font class="inplacedisplayid1siteid0"><span> <a target="_blank" href=""> 201036</a></span></font></td> </span> </tr> <tr> <td valign="top" width="94" class="style14"> <font class="inplacedisplayid1siteid0"><span> <a target="_blank" href=""> 201031</a></span></font></td> <td valign="top" width="62"><span class="style14">18 Feb 10</td> <span class="style14"> <td valign="top" width="65">0.1</td> <td valign="top" width="62">22 Feb 10</td> </span> <td valign="top" class="style18" style="width: 88px"> <font size="2"> <span style="mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA" class="style21"> <a href="http://www.cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2010-0020"> CVE-2010-0020</a><br /> <a href="http://www.cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2010-0021"> CVE-2010-0021</a><br /> <a href="http://www.cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2010-0022"> CVE-2010-0022</a><br /> <a href="http://www.cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2010-0231"> CVE-2010-0231</a></span><br /> </font></td> <td valign="top" class="style14" style="width: 295px"> Multiple Vulnerabilities in Microsoft SMB Server <span style="mso-bidi-font-weight: normal"> <span style="mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA"> (MS10-012)</span></span></td> <td valign="top" width="93">Active</td> <span class="style14"> <td valign="top" class="style14" style="width: 94px"> <font class="inplacedisplayid1siteid0"><span> <a target="_blank" href=""> 201031</a></span></font></td> <td valign="top" style="width: 65px"> <span> <a href=""> 201010</a></span></td> <td valign="top" width="94" class="style14" style="width: 97px"> <font class="inplacedisplayid1siteid0"><span> <a target="_blank" href=""> 201031</a></span></font></td> </span> </tr> <tr> <td valign="top" width="94" class="style14"> <font class="inplacedisplayid1siteid0"><span> <a target="_blank" href=""> 201030</a></span></font></td> <td valign="top" width="62"><span class="style14">18 Feb 10</td> <span class="style14"> <td valign="top" width="65">0.1</td> <td valign="top" width="62">22 Feb 10</td> </span> <td valign="top" class="style18" style="width: 88px"> <span style="mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA" class="style23"> <a href="http://www.cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2010-0239"> <span style="COLOR: blue"> CVE-2010-0239</span></a><br /> <a href="http://www.cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2010-0240"> CVE-2010-0240</a><br /> <a href="http://www.cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2010-0241"> CVE-2010-0241</a><br /> <a href="http://www.cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2010-0242"> CVE-2010-0242</a></span></td> <td valign="top" class="style14" style="width: 295px"> Multiple Vulnerabilities in Microsoft Windows TCP/IP <span style="mso-bidi-font-weight: normal"> <span style="mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA"> (MS10-009)</span></span></td> <td valign="top" width="93">Active</td> <span class="style14"> <td valign="top" class="style14" style="width: 94px"> <font class="inplacedisplayid1siteid0"><span> <a target="_blank" href=""> 201030</a></span></font></td> <td valign="top" style="width: 65px"> <span> <a href=""> 201007</a></span></td> <td valign="top" width="94" class="style14" style="width: 97px"> <font class="inplacedisplayid1siteid0"><span> <a target="_blank" href=""> 201030</a></span></font></td> </span> </tr> </table> </body> </html>#include <Array.au3> #include <IE.au3> $sHtml = FileRead(@ScriptDir & "\test.htm") $oIE = _IECreate() _IEDocWriteHTML($oIE, $sHtml) $oTable = _IETableGetCollection($oIE, 0) $aTable = _IETableWriteToArray($oTable, True) _ArrayDisplay($aTable) ; Now with <span> tags removed $sHtml = StringRegExpReplace($sHtml, "(?i)(?U)(</{0,1}span.*>)", "") _IEDocWriteHTML($oIE, $sHtml) $oTable = _IETableGetCollection($oIE, 0) $aTable = _IETableWriteToArray($oTable, True) _ArrayDisplay($aTable) Link to comment Share on other sites More sharing options...
DaleHohm Posted May 21, 2010 Share Posted May 21, 2010 If the DOM is confused by malformed HTML its a crap shoot what you will get. Use DebugBar or _IEDocReadHTML to see what the DOM sees. Dale Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model Automate input type=file (Related) Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded Better Better? IE.au3 issues with Vista - Workarounds SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead? Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble Link to comment Share on other sites More sharing options...
zorphnog Posted May 24, 2010 Author Share Posted May 24, 2010 The DOM sees the malformed HTML. It isn't until rendering time that the incorrect <span> tags are ignored. I don't mess around with DOM that often so this was more of a learning question for me. I was just under the impression that the DOM was an exact representation of what is seen in the browser (rendered), but it seems there are still some validation steps that take place before the DOM is drawn in the browser window. I'm using _IEDocReadHTML, applying my SRE to strip the <span> tags, and _IEDocWriteHTML. This works fine. The DOM just didn't represent what I thought it did. Thanks for the replies though. Link to comment Share on other sites More sharing options...
DaleHohm Posted May 24, 2010 Share Posted May 24, 2010 The raw HTML gets rendered into a DOM document that is hosted and displayed in the browser. There is no DOM until the document is rendered. Dale Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model Automate input type=file (Related) Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded Better Better? IE.au3 issues with Vista - Workarounds SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead? Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now