Sign in to follow this  
Followers 0
Deltron0

_IEBodyReadHTML() carriage return

5 posts in this topic

Analyzing this HTML

<TH vAlign=top align=left width="15%">Address</TH>
          <TD width="35%">123 CHERRY RD
NEW YORK CITY, NY 19001</TD>
          <TH align=left width="15%">EID</TH>

I run:

$pos = StringInStr($bodyHTML,">Address")
$start = StringInStr($bodyHTML,"%"">",0,1,$pos) + 3
$end = StringInStr($bodyHTML,"",0,1,$start)
$len = $end - $start
$addy = StringMid($bodyHTML,$start,$len)
ConsoleWrite($addy)

That dumps:

123 CHERRY RD NEW YORK CITY, NY 19001

The carriage return (I do not mean the <br> tag) seems to disappear, any idea why this might happen?

Share this post


Link to post
Share on other sites



I still don't get why people go through sooooo much trouble to parse out HTML.

You can use the _IETableWriteToArray() function, and then navigate through the array to get your text.

looks like the you missed a char in the $end definition..."<"...anyways, try taking the final string, and running it in:

StringToASCIIArray()

See if any of the LF, CRLF, etc are present in the array.


IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.

Share this post


Link to post
Share on other sites

I normally use _IETableWriteToArray, however this table is an unpredictable size and I want to try to find the right cell using surrounding text. StringToASCIIArray() came back with "32" where the carriage return should be, for whatever reason it's seen as a space.

Share this post


Link to post
Share on other sites

Where are you reading the HTML...is the reader just out of room?


IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.

Share this post


Link to post
Share on other sites

#5 ·  Posted (edited)

$ActiveIEobj = _IECreate("http://intranetsitewithtableonit.org/casenumber",0,1,0)
_IELoadWait($ActiveIEobj,500,10000)
Local $bodyHTML = _IEBodyReadHTML($ActiveIEobj)
 
$pos = StringInStr($bodyHTML,">Address")
$start = StringInStr($bodyHTML,"%"">",0,1,$pos) + 3
$end = StringInStr($bodyHTML,"</TD>",0,1,$start)
$len = $end - $start
$addy = StringMid($bodyHTML,$start,$len)
ConsoleWrite($addy)

As you can see we are in the IE world. The captured $addy is 100% correct other than the missing carriage return. If I increase the size of $len I get the rest of the HTML - including the carriage returns

It looks like:

123 CHERRY RD NEW YORK CITY, NY 19001</TD>
<TH align=left width="15%">EID</TH>

Notice after "RD" is a space (32) instead of a carriage return. Could "view source" in IE7 be displaying the CR, but for some reason _IEBodyReadHTML sees the source differently?

Edited by Deltron0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0