Sign in to follow this  
Followers 0
crashdemons

HTML DOM Viewer

15 posts in this topic

#1 ·  Posted (edited)

I decided to make this because I was bored.

Later, I wish I hadn't started because it was giving me a headache.

But now, I have it working halfway decent.

Posted Image

This script contains functions for building or processing parts of an HTML Document, concerning elements and content contained within them (including lower-hierarchy elements and text)

It also has it's own referencing expression for elements.

Example: HTML.Body.Table.TR,2.TD,6

Where: Parent,Occurence.Child,Occurence

(Note: Occurence is 1 when not included)

Changes:

-v4 fixed a processing bug in _HTML_GetContent and _HTML_GetByExpression that returned the content of the wrong tag when a side-by-side occurrence followed a nested occurrence.

-v3 small changes

-Zipped both files

-Updated the test script (GUI Resizing/size fixes + Open file + Open Remote file + some quick-options for expressions)

-v2 - Scripts can now check whether the selected TreeViewItem has been recently changed

-Added the files as attachments instead of code boxes (some of the code was being malformed when inside the code boxes)

The Zip below contains both HTML_DOM.au3 and DOM_test.au3

HTML_DOM.zip

Edited by crashdemons

My Projects - WindowDarken (Darken except the active window) Yahsmosis Chat Client (Discontinued) StarShooter Game (Red alert! All hands to battlestations!) YMSG Protocol Support (Discontinued) Circular Keyboard and OSK example. (aka Iris KB) Target Screensaver Drive Toolbar Thingy Rollup Pro (Minimize-to-Titlebar & More!) 2D Launcher physics example Ascii Screenshot AutoIt3 Quine Example ("Is a Quine" is a Quine.) USB Lock (Another system keydrive - with a toast.)

Share this post


Link to post
Share on other sites



@crashdemons

It is not running as it should on my side.

lot's of error comming up

>Running AU3Check (1.54.9.0) params: from:C:\Program Files\AutoIt3

C:\_\Apps\AutoIT3\UDF's\HTML_DOM.au3(311,41) : ERROR: syntax error (illegal character)

$string=StringReplace($string,"'",'''

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^

C:\_\Apps\AutoIT3\UDF's\HTML_DOM.au3(319,41) : ERROR: syntax error (illegal character)

$string=StringReplace($string,''',"'"

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^

C:\_\Apps\AutoIT3\UDF's\HTML_DOM.au3(319,42) : ERROR: StringReplace() [built-in] called with wrong number of args.

$string=StringReplace($string,''',"'")

regards

ptrex

Share this post


Link to post
Share on other sites

@crashdemons

It is not running as it should on my side.

lot's of error comming up

regards

ptrex

Only one error here:

testHTML-DOM.au3 (36) : ==> Unknown function name.:

_HTML_TreeAdd_Deep($_HTML_document,$TreeView1)

^ ERROR

>Exit code: 1 Time: 13.128

Maybe I need some files to be include?

Share this post


Link to post
Share on other sites

Only one error here:

testHTML-DOM.au3 (36) : ==> Unknown function name.:

_HTML_TreeAdd_Deep($_HTML_document,$TreeView1)

^ ERROR

>Exit code: 1 Time: 13.128

Maybe I need some files to be include?

In HTML_DOM.au3 Line 101, the function is already defined:

Func _HTML_TreeAdd_Deep($content,$treeview,$firstentry=True)

In the DOM Test.au3 Line 8, HTML_DOM.au3 is included already:

#include <HTML_DOM.au3>

Make sure that HTML_DOM.au3 is correctly named, in the right folder (Includes OR in same folder as the 'Test Script') and make sure the Includes line mentioned is there


My Projects - WindowDarken (Darken except the active window) Yahsmosis Chat Client (Discontinued) StarShooter Game (Red alert! All hands to battlestations!) YMSG Protocol Support (Discontinued) Circular Keyboard and OSK example. (aka Iris KB) Target Screensaver Drive Toolbar Thingy Rollup Pro (Minimize-to-Titlebar & More!) 2D Launcher physics example Ascii Screenshot AutoIt3 Quine Example ("Is a Quine" is a Quine.) USB Lock (Another system keydrive - with a toast.)

Share this post


Link to post
Share on other sites

UPDATE:

See the first post for a new copy of HTML_DOM.au3

The AutoIt tags were changing some of the double-quotes in the script to single-quotes - creating problems.

The file has been attached now instead of placed in a Highlighted codebox.


My Projects - WindowDarken (Darken except the active window) Yahsmosis Chat Client (Discontinued) StarShooter Game (Red alert! All hands to battlestations!) YMSG Protocol Support (Discontinued) Circular Keyboard and OSK example. (aka Iris KB) Target Screensaver Drive Toolbar Thingy Rollup Pro (Minimize-to-Titlebar & More!) 2D Launcher physics example Ascii Screenshot AutoIt3 Quine Example ("Is a Quine" is a Quine.) USB Lock (Another system keydrive - with a toast.)

Share this post


Link to post
Share on other sites

Impressive work - expecially considering you're doing it by brute force (string parsing rather than using a DOM parser).

I assume the "Tag Expression" is your own invention right? What rules do you use to construct it?

I find the example interface you constructed madeningly tiny, but I like the output.

Dale


Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Share this post


Link to post
Share on other sites

#7 ·  Posted (edited)

It was a pain.

The basis of the hierarchy used in the "tag expression" is based soley upon _HTML_GetContent which took all but insanity to finish. (Gets the content between a beginning and ending tag, counting each identical opening tag until they are all matched with an ending tag or the ending tag search returns 0 (no ending tag found). If there is no ending tags to begin with, then the tag isnt encapsulating and therefore has no content.)

The other main instrument was GetTLTags which uses the GetContent processor to find all of the "Top-Level" tags in a string (tags that arent already surrounded by another Tag's content area - within the content specified)

...

Using the two in conjuction I was able to Split a period-delimited string, the first element of the array was used to get the first Top-Level tag, the occurence is used to determine which (if more than one), GetContent is used to get the content of this tag - then this process is looped using the found content and the next element of the array

Then the nth element is used on the (n-1)th element's content and looped until n is the last element, where the newest content will be returned.

A similar process is used in GetByExpressionA except that the Attributes of the final tag are returned, not the content.

In simpler terms, seeing similar formats, I constructed this Idea:

Using tag,2.tagb

You could get information about 'tagb' which resides inside the content of the 2nd 'tag'

(which would be "Woohoo" in the lower example)

<tag>
  <tag1>
    Yipee
   </tag1>
</tag>
<tag>
  <tagb>
     Woohoo
   </tagb>
</tag>

I guess you could write HTML.Body.Table as pseudo:

GetContent(GetContent(GetContent($html_file_data, 'HTML'),'Body'),'Table')

However, this seems to take a bit of time if the HTML document is large..

Edited by crashdemons

My Projects - WindowDarken (Darken except the active window) Yahsmosis Chat Client (Discontinued) StarShooter Game (Red alert! All hands to battlestations!) YMSG Protocol Support (Discontinued) Circular Keyboard and OSK example. (aka Iris KB) Target Screensaver Drive Toolbar Thingy Rollup Pro (Minimize-to-Titlebar & More!) 2D Launcher physics example Ascii Screenshot AutoIt3 Quine Example ("Is a Quine" is a Quine.) USB Lock (Another system keydrive - with a toast.)

Share this post


Link to post
Share on other sites

@crashdemons

I finally got it the work after your sencond update.

But still had to add #include <StaticConstants.au3> to get rid of all the errors.

Anyhow. This is a good job !!

Same remarks as DaleHohm said. the GUI is bit tiny should be made resizeble.

But that is easy to fix.

Regards

Ptrex

Share this post


Link to post
Share on other sites

#9 ·  Posted (edited)

Share this post


Link to post
Share on other sites

I Just updated the Test Script (Viewer)

I find the example interface you constructed madeningly tiny, but I like the output.

- The GUI has been changed a bit and now allows resizing.

But still had to add #include <StaticConstants.au3> to get rid of all the errors.

- I didn't need it for some odd reason - but I added it and it still worked, so I left it in this time.

it would be nice if you could view the DOM for a remote page...

- Added.

Full Added List:

-GUI resizing and size changes

-File>Open and File>Exit menu options

-File>Open Remote File option

-Edit>Copy Content from expression (use the DOM-Like tag expression to point to a content to copy)

-Edit>Copy Attributes from expression (use the DOM-Like tag expression to point to a tag to copy it's attributes)

-both files are now zipped as the Test script is an eyesore in the codebox and the board won't let me upload more than one file.


My Projects - WindowDarken (Darken except the active window) Yahsmosis Chat Client (Discontinued) StarShooter Game (Red alert! All hands to battlestations!) YMSG Protocol Support (Discontinued) Circular Keyboard and OSK example. (aka Iris KB) Target Screensaver Drive Toolbar Thingy Rollup Pro (Minimize-to-Titlebar & More!) 2D Launcher physics example Ascii Screenshot AutoIt3 Quine Example ("Is a Quine" is a Quine.) USB Lock (Another system keydrive - with a toast.)

Share this post


Link to post
Share on other sites

#11 ·  Posted (edited)

Share this post


Link to post
Share on other sites

Awesome! Just going to try it now! :D Line 198 really wasn't necessary... :D Same with 151...

huh?

DOM_test.au3

151;    given the byte value, Unit #, long/short setting, and Unit Set
198     #ce

or did you mean HTML_DOM.au3 ?

151 $tlarr[$i]=StringReplace($tlarr[$i],'>','')
198;$content=$tagname    <---- Okay this one isn't really useful

Eh, Maybe you were just looking at the old copy :)


My Projects - WindowDarken (Darken except the active window) Yahsmosis Chat Client (Discontinued) StarShooter Game (Red alert! All hands to battlestations!) YMSG Protocol Support (Discontinued) Circular Keyboard and OSK example. (aka Iris KB) Target Screensaver Drive Toolbar Thingy Rollup Pro (Minimize-to-Titlebar & More!) 2D Launcher physics example Ascii Screenshot AutoIt3 Quine Example ("Is a Quine" is a Quine.) USB Lock (Another system keydrive - with a toast.)

Share this post


Link to post
Share on other sites

Great stuff!

Any chance you could allow resizing of the tree panels? When you get deep into the tree, you end up having to do a lot of scrolling.

Thanks for posting it, hope the headache passes soon :D

Share this post


Link to post
Share on other sites

#14 ·  Posted (edited)

Updated HTML_DOM.au3 to version 4

A problem in version 3 _HTML_GetContent causes _HTML_GetByExpression, _HTML_GetByExpressionA and possibly (but not likely) _HTML_GetTLTags, to return the content of the wrong element.

The problem was that the program was not using the right method to locate the element using the occurrence parameter.

This has been solved, but forces the script to do a little more work because it must find the content of each previous element of the same name (on the same level) before continuing.

Edited by crashdemons

My Projects - WindowDarken (Darken except the active window) Yahsmosis Chat Client (Discontinued) StarShooter Game (Red alert! All hands to battlestations!) YMSG Protocol Support (Discontinued) Circular Keyboard and OSK example. (aka Iris KB) Target Screensaver Drive Toolbar Thingy Rollup Pro (Minimize-to-Titlebar & More!) 2D Launcher physics example Ascii Screenshot AutoIt3 Quine Example ("Is a Quine" is a Quine.) USB Lock (Another system keydrive - with a toast.)

Share this post


Link to post
Share on other sites

Hi Crashdemons,

I've been using your UDF for one of my web scripting projects this winter, and it's excellent! It's saved me a bunch of time and hassle, thankyou!!!

I found a weird little anomaly with it that took me a long time to understand, figured I'd share it here incase anyone else comes up against it.

Here is my test script + 3 example input files to compare.

test script

#include <HTML_DOM.au3>
$f = FileRead("example-c.html")
if ($f == -1) then
msgbox(0, "FileRead Failed", @error)
endif

msgbox(0, "Whole File",$f)
;example-a.html has 5 TD's in the second TR
;example-b.html is identical, except the contents of the 3rd and 4th
;TD have been modified so they are not the same.
;Note what happens when you run this script on the two.
for $j = 1 to 7
$expr = "tr,2"&".td,"& $j
$sContent = _HTML_GetByExpression($f,$expr)
if @error then
msgbox(0,"_HTML_GetByExpression", "Error: "& @error)
endif
msgbox(0,$expr, $sContent)
next

example input a

<TABLE>
<TBODY>
<TR>
<TD>Job ID</TD>
<TD>Employee</TD>
<TD>Position</TD>
<TD>Site</TD>
<TD>Description</TD>
</TR>
<TR>
<TD>8394</TD>
<TD>HOWELL, BETTY J.</TD>
<TD>ITINERANT</TD>
<TD>ITINERANT</TD>
<TD>ALL DAY</TD>
</TR></TBODY></TABLE>

example input b

<TABLE>
<TBODY>
<TR>
<TD>Job ID</TD>
<TD>Employee</TD>
<TD>Position</TD>
<TD>Site</TD>
<TD>Description</TD>
</TR>
<TR>
<TD>8394</TD>
<TD>HOWELL, BETTY J.</TD>
<TD>ITINERANT</TD>
<TD>ITINERANT1</TD>
<TD>ALL DAY</TD>
</TR></TBODY></TABLE>

example input c

<TABLE>
<TBODY>
<TR>
<TD>Job ID</TD>
<TD>Employee</TD>
<TD>Position</TD>
<TD>Site</TD>
<TD>Description</TD>
</TR>
<TR>
<TD>8394</TD>
<TD>HOWELL, BETTY J.</TD>
<TD>ITINERANT1</TD>
<TD>ITINERANT</TD>
<TD>ALL DAY</TD>
</TR></TBODY></TABLE>

If you run domtest.au3 against all three of those example html snippets, you will see what I mean. The TD's in question happen to be numbers 3 and 4 of the second TR. (tr,2.td,3 + tr,2.td,4) For some reason on example A and C, the script gets 'stuck' reading tr,2.td,4 and mistakenly thinks the value of all following TD's(even ones that don't exist) is also 'ITINERANT'.

However in example B I've added a '1' onto tr,2.td,4 and it processes the entire file as you would expect.

I hope this helps someone in the future, it was driving me nuts =) I will just be dealing with it by pre-processing the values and modifying them so they are unique.

example-a.html

example-b.html

example-c.html

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0