Jump to content

Dealing with non standard DOM (WAP)


Recommended Posts

Hi everyone - I'm new here and I'm stuck. I am trying to build a script that will simply log me into a WAP site, scrape some screens, reformat them, and dump the reformatted HTML into a directory so it can be served by a local web server. I read through the examples, did some Google searches, attempted about 10 combinations of things, but I still can't get it to work. I tried using the _IE functions and the HTTP functions and both of them give me the same error.

Based on what I can see, the issue is that the WAP site breaks the DOM model, so I can't do anything with the elements. No matter what I try, I keep getting InvalidDataType. When I look at the site in a DOM viewer, it doesn't show me the elements (missing all the tags, etc). I can navigate the site in IE and Firefox and on a phone.

Can anyone help me figure out how to navigate this thing?

The site:

<!DOCTYPE html PUBLIC "-//WAPFORUM//DTD WML 2.0//EN" "http://www.wapforum.org/dtd/wml20.dtd"><html xmlns="http://www.w3.org/1999/xhtml" xmlns:wml="http://www.wapforum.org/2001/wml">

<head>

<title>login</title>

<meta http-equiv="pragma" content="no-cache"/>

<meta http-equiv="cache-control" content="max-age=0"/>

</head>

<body><br/>

<form method="post" action="1.asp?1----">User name: <input name="un" type="text" size="8"/><br/>Password: <input name="p" type="password" size="8"/><br/>

<input type="submit" value="login"/>

</form>

<a href="1.asp?c=------12">

Francais

</a><br/>

</body>

</html>

Some sample code:

; *******************************************************

; Example 1 - Open a browser with the "form" example, get a reference

; to the submit button by name and "click" it. This technique

; of submitting forms is useful because many forms rely on Javascript

; code and "onclick" events on their submit button making _IEFormSubmit()

; not perform as expected

; *******************************************************

;

#include <IE.au3>

$oIE = ("wap.somesite.com")

$oSubmit = _IEGetObjByName ($oIE, "login")

_IEAction ($oSubmit, "click")

_IELoadWait ($oIE)

The result:

>Running:(3.3.6.1):C:\Program Files\AutoIt3\autoit3.exe "C:\Users\mark\Documents\AutoIt3\old\_IEAction.au3"

--> IE.au3 V2.4-0 Error from function _IEGetObjByName, $_IEStatus_InvalidDataType

--> IE.au3 V2.4-0 Error from function _IEAction, $_IEStatus_InvalidDataType

--> IE.au3 V2.4-0 Error from function _IELoadWait, $_IEStatus_InvalidDataType

+>21:45:31 AutoIT3.exe ended.rc:0

>Exit code: 0 Time: 1.231

Link to comment
Share on other sites

Where, in that HTML you posted, do you see any objects with name or id = 'login'? Title and value don't count.

You just haven't correctly identified the object you want. Try experimenting with the _IE* function example scripts from the help file some more to see how they work, and how to ID the elements you want to work with.

The DOM, by the way, is defined by the browser engine (IE in the case) rendering the HTML. So there isn't any "non standard DOM" being used.

:mellow:

Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law
Link to comment
Share on other sites

Where, in that HTML you posted, do you see any objects with name or id = 'login'? Title and value don't count.

You just haven't correctly identified the object you want. Try experimenting with the _IE* function example scripts from the help file some more to see how they work, and how to ID the elements you want to work with.

The DOM, by the way, is defined by the browser engine (IE in the case) rendering the HTML. So there isn't any "non standard DOM" being used.

:mellow:

Thanks for the info. I understand what you are saying about the name or id. I have experimented with some of the _IE* examples and I substitued my site in the appropritate places. No matter what I try, I always get the same error: InvalidDataType. Is there something wrong with the code, or with the info that is coming back from the site? The HTML is missing tags, names, forms, etc... so I'm not sure how I can modify it.

I tried:

; *******************************************************

; Example 3 - Get a reference to the collection of forms on a page,

; and then loop through them displaying information for each

; demonstrating use of form index

; *******************************************************

;

;#include <IE.au3>

;$oIE = _IECreate ("http://wap.1800gotjunk.com")

;$oForms = _IEFormGetCollection ($oIE)

;$iNumForms = @extended

;MsgBox(0, "Forms Info", "There are " & $iNumForms & " forms on this page")

;For $i = 0 to $iNumForms - 1

; $oForm = _IEFormGetCollection ($oIE, $i)

; MsgBox(0, "Form Info", $oForm.name)

;Next

; *******************************************************

; Example 1 - Get a reference to a specific form by 0-based index,

; in this case the first form on the page

; *******************************************************

;

;#include <IE.au3>

;$oIE = _IECreate ("http://wap.1800gotjunk.com")

;$oForm = _IEFormGetCollection ($oIE, 0)

;$oQuery = _IEFormElementGetCollection ($oForm, 1)

;_IEFormElementSetValue ($oQuery, "AutoIt IE.au3")

;_IEFormSubmit ($oForm)

; *******************************************************

; Example 1 - Open a browser with the basic example, ready the body HTML,

; append new HTML to the original and write it back to the browser

; *******************************************************

;

#include <IE.au3>

$oIE = _IECreate ("http://wap.1800gotjunk.com")

$sHTML = _IEBodyReadHTML ($oIE)

$sHTML = $sHTML & "<p><font color=red size=+5>Big RED text!</font>"

_IEBodyWriteHTML ($oIE, $sHTML)

Link to comment
Share on other sites

The HTML is missing tags, names, forms, etc... so I'm not sure how I can modify it.

What does that mean? Are you getting WML from a WAP site, then just swapping the WML tags for HTML?

:mellow:

Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law
Link to comment
Share on other sites

Right. I want to hit a WAP site that is serving WML, login, parse the WML into prettier HTML, and serve the new HTML to people who are hitting a different site. The issue is that I can't seem to do anything with the WML. I can login with IE, but I can't seem to script it. The site is here: wap.1800gotjunk.com. Maybe it isn't possible?

Link to comment
Share on other sites

Well, if you've already translated the WML to HTML, then the only thing left is the validity of your HTML. This works fine:

#include <IE.au3>

_IEErrorHandlerRegister()

$sHTML = '<!DOCTYPE html PUBLIC "-//WAPFORUM//DTD WML 2.0//EN" "http://www.wapforum.org/dtd/wml20.dtd">' & @CRLF & _
    '<html xmlns="http://www.w3.org/1999/xhtml" xmlns:wml="http://www.wapforum.org/2001/wml">' & @CRLF & _
    '   <head>' & @CRLF & _
    '       <title>login</title>' & @CRLF & _
    '       <meta http-equiv="pragma" content="no-cache"/>' & @CRLF & _
    '       <meta http-equiv="cache-control" content="max-age=0"/>' & @CRLF & _
    '   </head>' & @CRLF & _
    '   <body>' & @CRLF & _
    '       <br/>' & @CRLF & _
    '       <form method="post" action="1.asp?1----">' & @CRLF & _
    '           User name: <input name="un" type="text" size="8"/><br/>Password: <input name="p" type="password" size="8"/><br/>' & @CRLF & _
    '           <input type="submit" value="login"/>' & @CRLF & _
    '       </form>' & @CRLF & _
    '       <a href="1.asp?c=------12">Francais</a><br/>' & @CRLF & _
    '   </body>' & @CRLF & _
    '</html>'

$oIE = _IECreate()
_IEDocWriteHTML($oIE, $sHTML)
$oForm = _IEFormGetCollection($oIE, 0)
ConsoleWrite(ObjName($oForm) & @LF & @LF)

$colInputs = _IETagNameGetCollection($oForm, "input", -1)
$n = 0
For $oInput In $colInputs
    ConsoleWrite($n & ":  " & $oInput.type & "; " & $oInput.name & @LF & @LF)
    $n += 1
Next

I haven't figured out what else you are trying to get it to do.

:mellow:

Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law
Link to comment
Share on other sites

Hey - first of all, thanks so much for helping me. I really want to learn AutoIt, but I feel like I'm getting off to a slow start. Let me clarify what I'm trying to do:

1. load this WML: wap.1800gotjunk.com

2. add my "un" and "password" to the form (pulling this data from an environment variable or a file)

3. submit the form to login

4. once I'm logged in, I want to crawl about 10 pages, parse out the data, format some new HTML pages, and serve them to another web server

I'm stuck on step 2 because I can't seem to interact with the forms using names or id. There doesn't seem to be any way to pass data to it or get data from it without getting an invalid datatype error.

How can I submit the form if I can't access the form elements?

Well, if you've already translated the WML to HTML, then the only thing left is the validity of your HTML. This works fine:

#include <IE.au3>

_IEErrorHandlerRegister()

$sHTML = '<!DOCTYPE html PUBLIC "-//WAPFORUM//DTD WML 2.0//EN" "http://www.wapforum.org/dtd/wml20.dtd">' & @CRLF & _
    '<html xmlns="http://www.w3.org/1999/xhtml" xmlns:wml="http://www.wapforum.org/2001/wml">' & @CRLF & _
    '   <head>' & @CRLF & _
    '       <title>login</title>' & @CRLF & _
    '       <meta http-equiv="pragma" content="no-cache"/>' & @CRLF & _
    '       <meta http-equiv="cache-control" content="max-age=0"/>' & @CRLF & _
    '   </head>' & @CRLF & _
    '   <body>' & @CRLF & _
    '       <br/>' & @CRLF & _
    '       <form method="post" action="1.asp?1----">' & @CRLF & _
    '           User name: <input name="un" type="text" size="8"/><br/>Password: <input name="p" type="password" size="8"/><br/>' & @CRLF & _
    '           <input type="submit" value="login"/>' & @CRLF & _
    '       </form>' & @CRLF & _
    '       <a href="1.asp?c=------12">Francais</a><br/>' & @CRLF & _
    '   </body>' & @CRLF & _
    '</html>'

$oIE = _IECreate()
_IEDocWriteHTML($oIE, $sHTML)
$oForm = _IEFormGetCollection($oIE, 0)
ConsoleWrite(ObjName($oForm) & @LF & @LF)

$colInputs = _IETagNameGetCollection($oForm, "input", -1)
$n = 0
For $oInput In $colInputs
    ConsoleWrite($n & ":  " & $oInput.type & "; " & $oInput.name & @LF & @LF)
    $n += 1
Next

I haven't figured out what else you are trying to get it to do.

:mellow:

Link to comment
Share on other sites

Are you working with the HTML from post #1 or not?

If so, the code I posted shows it works with proper syntax.

If not, post what you ARE working with in that step.

:mellow:

Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law
Link to comment
Share on other sites

Yes, the HTML is what shows up in IE when I view source after going to http://wap.1800gotjunk.com. The problem is that I can't interact with site using AutoIt. For example, I can't get any of the _IE functions to work on the site except for _IECreate.

All of these things fail with InvalidObjectType.

_IEBodyReadHTML

_IEBodyWriteHTML

_IEFormGetCollection

_IEGetObjByName

_IEBodyReadText

So, I can load the page, but I can't do anything with it.

Link to comment
Share on other sites

I took a look at your code, ran it, and everything worked as expected. If I substitute the wap site for the HTML I showed you, it doesn't work. It feels like the WML on the site is somehow different than the HTML that shows up in IE.

$oIE = _IECreate("wap.1800gotjunk.com")

$sHTML = _IEDocReadHTML ($oIE)

_IEDocWriteHTML($oIE, $sHTML)

$oForm = _IEFormGetCollection($oIE, 0)

ConsoleWrite(ObjName($oForm) & @LF & @LF)

$colInputs = _IETagNameGetCollection($oForm, "input", -1)

$n = 0

For $oInput In $colInputs

ConsoleWrite($n & ": " & $oInput.type & "; " & $oInput.name & @LF & @LF)

$n += 1

Next

--> IE.au3 V2.4-0 Error from function _IELoadWait, $_IEStatus_InvalidObjectType

--> IE.au3 V2.4-0 Error from function _IEDocReadHTML, $_IEStatus_InvalidObjectType (Expected document element)

--> IE.au3 V2.4-0 Error from function _IEDocWriteHTML, $_IEStatus_InvalidObjectType (Expected document element)

--> COM Error Encountered in _IEAction.au3

----> $IEComErrorScriptline = 2214

Yes, the HTML is what shows up in IE when I view source after going to http://wap.1800gotjunk.com. The problem is that I can't interact with site using AutoIt. For example, I can't get any of the _IE functions to work on the site except for _IECreate.

All of these things fail with InvalidObjectType.

_IEBodyReadHTML

_IEBodyWriteHTML

_IEFormGetCollection

_IEGetObjByName

_IEBodyReadText

So, I can load the page, but I can't do anything with it.

Link to comment
Share on other sites

You didn't use a full URL because it has not protocol prefix. What do you get from this:

#include <IE.au3>

_IEErrorHandlerRegister()

$oIE = _IECreate("http://wap.1800gotjunk.com")
ConsoleWrite("$oIE = " & ObjName($oIE) & @LF)

$sHTML = _IEDocReadHTML($oIE)
ConsoleWrite("$sHTML = " & $sHTML & @LF)

:mellow:

Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law
Link to comment
Share on other sites

I got this... same error. A mystery... Oh, thanks again for the troubleshooting. If I can just get in, I'm sure I can do the rest.

#include <IE.au3>

_IEErrorHandlerRegister()

$oIE = _IECreate("http://wap.1800gotjunk.com")

ConsoleWrite("$oIE = " & ObjName($oIE) & @LF)

$sHTML = _IEDocReadHTML($oIE)

ConsoleWrite("$sHTML = " & $sHTML & @LF)

>Running:(3.3.6.1):C:\Program Files\AutoIt3\autoit3.exe "C:\Users\mark\Documents\AutoIt3\old\_IEAction.au3"

--> IE.au3 V2.4-0 Error from function _IELoadWait, $_IEStatus_InvalidObjectType

$oIE =

--> IE.au3 V2.4-0 Error from function _IEDocReadHTML, $_IEStatus_InvalidObjectType (Expected document element)

$sHTML = 0

+>18:30:24 AutoIT3.exe ended.rc:0

>Exit code: 0 Time: 2.334

You didn't use a full URL because it has not protocol prefix. What do you get from this:

#include <IE.au3>

_IEErrorHandlerRegister()

$oIE = _IECreate("http://wap.1800gotjunk.com")
ConsoleWrite("$oIE = " & ObjName($oIE) & @LF)

$sHTML = _IEDocReadHTML($oIE)
ConsoleWrite("$sHTML = " & $sHTML & @LF)

:mellow:

Link to comment
Share on other sites

Works for me:

>Running:(3.3.6.1):C:\Program Files\AutoIt3\autoit3.exe "C:\Temp\Test.au3"    
$oIE = IWebBrowser2
$sHTML = <HTML xmlns="http://www.w3.org/1999/xhtml" xmlns:wml = "http://www.wapforum.org/2001/wml"><HEAD><TITLE>JunkNet login</TITLE>
<META content=no-cache http-equiv=pragma>
<META content=max-age=0 http-equiv=cache-control></HEAD>
<BODY><BR>
<FORM method=post action=1.asp?1---->User name: <INPUT size=8 type=text name=un><BR>Password: <INPUT value="" size=8 type=password name=p><BR><INPUT value=login type=submit> </FORM><A href="1.asp?c=------12">Francais </A><BR></BODY></HTML>
+>23:27:38 AutoIT3.exe ended.rc:0

Starting to wonder about the status of your Windows/IE install...

:mellow:

Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law
Link to comment
Share on other sites

Wow - I didn't think of that! I'll give it a shot on a different PC and I'll jump back and let you know how it shakes out. To be continued... Thanks again.

Works for me:

>Running:(3.3.6.1):C:\Program Files\AutoIt3\autoit3.exe "C:\Temp\Test.au3"    
$oIE = IWebBrowser2
$sHTML = <HTML xmlns="http://www.w3.org/1999/xhtml" xmlns:wml = "http://www.wapforum.org/2001/wml"><HEAD><TITLE>JunkNet login</TITLE>
<META content=no-cache http-equiv=pragma>
<META content=max-age=0 http-equiv=cache-control></HEAD>
<BODY><BR>
<FORM method=post action=1.asp?1---->User name: <INPUT size=8 type=text name=un><BR>Password: <INPUT value="" size=8 type=password name=p><BR><INPUT value=login type=submit> </FORM><A href="1.asp?c=------12">Francais </A><BR></BODY></HTML>
+>23:27:38 AutoIT3.exe ended.rc:0

Starting to wonder about the status of your Windows/IE install...

:mellow:

Link to comment
Share on other sites

OK, here's what I found out so far... I ran Wireshark and grabbed all of my traffic. Turns out that the script is running properly, but I simply cannot get the data back to the _ie function. In other words, Wireshark sees the correct traffic, but the traffic never makes it back to _ie (or the browser). I turned off my firewall and it didn't help. My laptop is a pristine Windows 7 install with no customization at all. Before I reinstall everything, do you have any ideas on what could be blocking the traffic from coming back to AutoIT? No wonder I kept getting InvalidDataType - there was no data.

Wow - I didn't think of that! I'll give it a shot on a different PC and I'll jump back and let you know how it shakes out. To be continued... Thanks again.

Link to comment
Share on other sites

And the final answer is.... something is wrong with my IE installation. I reinstalled IE and ran the script - got the same error. Did a clean install of AutoIt on a different PC and the script ran fine. The problem on my laptop is that no data is coming back from the web, so the function either hangs or throws an invaliddatatype.

Anyone have any suggestions besides reformatting the laptop and reinstalling everything. Maybe running AutoIt in a virtual PC?

OK, here's what I found out so far... I ran Wireshark and grabbed all of my traffic. Turns out that the script is running properly, but I simply cannot get the data back to the _ie function. In other words, Wireshark sees the correct traffic, but the traffic never makes it back to _ie (or the browser). I turned off my firewall and it didn't help. My laptop is a pristine Windows 7 install with no customization at all. Before I reinstall everything, do you have any ideas on what could be blocking the traffic from coming back to AutoIT? No wonder I kept getting InvalidDataType - there was no data.

Link to comment
Share on other sites

I just wanted to circle back here to close this out.... You were right - my IE/Windows setup is messed up in a way that prevents data from coming back to AutoIT from the _IE calls. I installed a VM with a fresh copy of XP and everything ran fine. I don't feel like blowing away my laptop config, since it is brand new, so I'm developing on another machine.

I wanted to add that AutoIt is amazing and I can't believe it took me this long to jump in! Thanks again for your help.

And the final answer is.... something is wrong with my IE installation. I reinstalled IE and ran the script - got the same error. Did a clean install of AutoIt on a different PC and the script ran fine. The problem on my laptop is that no data is coming back from the web, so the function either hangs or throws an invaliddatatype.

Anyone have any suggestions besides reformatting the laptop and reinstalling everything. Maybe running AutoIt in a virtual PC?

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...