Deltron0 Posted August 29, 2012 Posted August 29, 2012 Analyzing this HTML <TH vAlign=top align=left width="15%">Address</TH> <TD width="35%">123 CHERRY RD NEW YORK CITY, NY 19001</TD> <TH align=left width="15%">EID</TH> I run: $pos = StringInStr($bodyHTML,">Address") $start = StringInStr($bodyHTML,"%"">",0,1,$pos) + 3 $end = StringInStr($bodyHTML,"",0,1,$start) $len = $end - $start $addy = StringMid($bodyHTML,$start,$len) ConsoleWrite($addy) That dumps: 123 CHERRY RD NEW YORK CITY, NY 19001 The carriage return (I do not mean the <br> tag) seems to disappear, any idea why this might happen?
jdelaney Posted August 29, 2012 Posted August 29, 2012 I still don't get why people go through sooooo much trouble to parse out HTML. You can use the _IETableWriteToArray() function, and then navigate through the array to get your text. looks like the you missed a char in the $end definition..."<"...anyways, try taking the final string, and running it in: StringToASCIIArray() See if any of the LF, CRLF, etc are present in the array. IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.
Deltron0 Posted August 29, 2012 Author Posted August 29, 2012 I normally use _IETableWriteToArray, however this table is an unpredictable size and I want to try to find the right cell using surrounding text. StringToASCIIArray() came back with "32" where the carriage return should be, for whatever reason it's seen as a space.
jdelaney Posted August 29, 2012 Posted August 29, 2012 Where are you reading the HTML...is the reader just out of room? IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.
Deltron0 Posted August 30, 2012 Author Posted August 30, 2012 (edited) $ActiveIEobj = _IECreate("http://intranetsitewithtableonit.org/casenumber",0,1,0) _IELoadWait($ActiveIEobj,500,10000) Local $bodyHTML = _IEBodyReadHTML($ActiveIEobj) $pos = StringInStr($bodyHTML,">Address") $start = StringInStr($bodyHTML,"%"">",0,1,$pos) + 3 $end = StringInStr($bodyHTML,"</TD>",0,1,$start) $len = $end - $start $addy = StringMid($bodyHTML,$start,$len) ConsoleWrite($addy) As you can see we are in the IE world. The captured $addy is 100% correct other than the missing carriage return. If I increase the size of $len I get the rest of the HTML - including the carriage returns It looks like: 123 CHERRY RD NEW YORK CITY, NY 19001</TD> <TH align=left width="15%">EID</TH> Notice after "RD" is a space (32) instead of a carriage return. Could "view source" in IE7 be displaying the CR, but for some reason _IEBodyReadHTML sees the source differently? Edited August 30, 2012 by Deltron0
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now