Jump to content

capturing specific area of IE html source code


sunburn
 Share

Recommended Posts

Hi

I'm trying to capture only a specific area of a webpages html. I was attempting to use _stringbetween as I thought it would be identifiable that way, but I'm starting to thing that I way off. A little direction for research would be greatly appreciated.

Link to comment
Share on other sites

Hi

I'm trying to capture only a specific area of a webpages html. I was attempting to use _stringbetween as I thought it would be identifiable that way, but I'm starting to thing that I way off. A little direction for research would be greatly appreciated.

Post some code so we can see. _StringBetween returns an array if you did not know. As well as there are other COM properties that Dale did not include in IE. So post some code so we can possibly help you.

Link to comment
Share on other sites

Thanks for the reply, I didn't catch that stringbetween only worked with arrays. I'll reread, rephrase and repost. If you can direct areas for further research, that would help with my education.

sunburn

NOO , it doesn't only work with arrays. It returns an array so for example

BUT!!! It would be best if you posted some code so I could stop pulling stuff out of my a$$ trying to help you. Seriously when you don't post code its like asking a Blind, and deaf guy to walk through a booby trapped maze and expect him not to die.

#include <String.au3>
#include <Array.au3>
$string = '|HERE IS sOME STUFF||OHH MY GOODNESS MORE STUFF||BUT THE SAME DELIMITER GOODNESS||WHAT ARE WE TO DO|'
$ar_String = _StringBetween($string,'|','|')
_ArrayDisplay($ar_String)
Edited by Thatsgreat2345
Link to comment
Share on other sites

Ok, I think I understand that, and its a really good explanation (lets include it in the help file!!) I've spent a couple of hours trying to educate myself, but am still not getting anywhere. So if you let me digress, I'm going to get even more basic to find out where I'm going wrong. I've spent a lot of time trying to do this simple task..... I think that I've boiled it down to a misunderstanding of string functions, but maybe someone can enhance my knowledge and get me past this stumbling block.

In this little part of the project, I have a webpage, I've read the html source code. I'm trying isolate a patch of information for future manipulation. Here's a sample of the source code, with a very simple snipit iteration of my code to follow.

<PRE CLASS=fixedfont> <FONT COLOR="#191970">

Trip Pay Open Positions

Trip Rv Type Date Departure Arrival Hours XXX XXX XXX XXX XXX St Commts

</FONT>

<A HREF="/vips-bin/vipscgi?webtr?298?MEM?11?26JAN08"> 298</A> 02 26JAN 26JAN 0936 30JAN 0551 2500 XXX M10

<A HREF="/vips-bin/vipscgi?webtr?98?MEM?11?27JAN08"> 98</A> 27JAN 27JAN 2135 02FEB 0548 3435 XXX M10

<A HREF="/vips-bin/vipscgi?webtr?398?MEM?11?29JAN08"> 398</A> 29JAN 29JAN 0143 30JAN 0549 808 XXX M10

</PRE>

</PRE><FORM METHOD="POST" ACTION="/vips-bin/vipscgi?webdd?MEM?11?F/O?26JAN08?01FEB08?2">

I've been able to capture the html code with _iebodyreadhtml (this has been successful as Ive read it to a file to check), but doing anything with the stringinstring function has returned (0) (which as you all know means nothing was found).

$htmltext = _IEBodyReadHTML ($oIE)

$stringposition = stringinstr("</FONT>", $htmltext)

msgbox (0, "the location number of <font> in $htmltext", $stringposition)

Once I have this character postion, I can then manipulate as needed. In particular I want only the section of code between

<A HREF="/vips-bin/vips..... down to the last line ending with

</PRE>

</PRE><FORM METHOD="POST" ACTION=

(which is where I thought _stringbetween would be an easy solution). i've written the $htmltext to a file and tried various string functions on it but none of the string function find anything. I know this is very basic, but I would be very appreciative a someone explaining this to me and offering a reason why I'm being unsuccessful.

Thanks all.

Link to comment
Share on other sites

If you could describe exactly what you want to do we could be of more help... are you trying to alter the values? Links? Whats the goal? It may be a lot simpler than you're trying to make it.

If you just want to split it up with _stringbetween, then you could split it up into individual lines by having the delimiter being every new line. Personally, I often use the stringsplit when slicing and dicing source code. (i.e. split it at each '<A HREF="' and again at the '">' to get the link location.)

Brick

Edited by Brickoneer
Link to comment
Share on other sites

I'll describe the intent below, but first let me address where the proble lies......

In the above code

$htmltext = _IEBodyReadHTML ($oIE)

$stringposition = stringinstr("</FONT>", $htmltext)

msgbox (0, "the location number of <font> in $htmltext", $stringposition)

I'm thinking that the internet page source code is being captured to $htmltext

I should then be able to search that text for "</FONT>" which would give me a string/character position.

I allways get '0' which means it didnt find it........

Now I've displayed the $htmltext variable immeadiately after the _IEBodyRead and it showes everything in the source, so I'm at a loss as to why that is not coming up with a numeric number of position. Once I solve that problem, I can locate the info I need and start putting into the appropriate array. Until the string 'searching' functions return some value I can't isolate the info.

Once I've isolated the info, this is what I'm going to do

each line of html code contains scheduling information.

For example the first line

<A HREF="/vips-bin/vipscgi?webtr?298?MEM?11?26JAN08"> 298</A> 02 26JAN 26JAN 0936 30JAN 0551 2500 XXX M10

tells me that

schedule 298 (which has more information located on page "vips-bin/vipscgi?webtr?298?MEM?11?26JAN08")

is under revision 02

starts on 26Jan

Time: 26JAN 0936

Ends: 30JAN 0551

Pay hours total: 2500

Positions require: XXX

Location require or additional comments: M10

I intend to break this information up into individual array segments so that they can be search, and sorted.

So the first element of the array one would have these subelements

element 1) 298

element 2) vips-bin/vipscgi?webtr?298?MEM?11?26JAN08

element 3) 26JAN

element 4) 26JAN 0936

element 5) 30JAN 0551

element 6) 2500

element 7) XXX

element 8) M10

the second element would be

element 1) 98

element 2) vips-bin/vipscgi?webtr?98?MEM?11?27JAN08

element 3) 27JAN

element 4) 27JAN 2135

element 5) 02FEB 0548

element 6) 3435

element 7) XXX

element 8) M10

with a third element to follow in this case, but the total number will vary

Like I said that's the end goal, but until I can start identifying where specific strings are in the html souce code, I can not capture the information I need.

I really think there is a disconnect in my thinking about how string functions are working as I can not get them to return anything that $htmltext = _IEBodyReadHTML ($oIE)

reads.

hopefully I'm not boring to many people.

sunburn

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...