Sign in to follow this  
Followers 0
zzzBrett

Read Html, Search For Characters...

10 posts in this topic

Hi all:

I want to browse to a url with IE, then read all of the html from the site. Then, I want to search for a number that is between certain combination of characters. Any suggestions to get me started? Maybe Dale Hohm's IE functions would help. I've also looked in the function reference and couldn't seem to find a fucntion that would fit my needs. Thanks for hte help!

Brett

Share this post


Link to post
Share on other sites



One more thing..

I would like to retrieve a number that is between this combo of characters, but there may be MULTIPLE of these throughout the html. So, in this case I would want a way to.. maybe put these in an array and loop through them deciding which one I really want.

Sorry if this is confusing...

Thanks!

Brett

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

#include <IE.au3>
$IE=_IECreate()
_IENavigate($IE,$url)
$src=_IEDocReadHTML($IE)
$stringyouwant=_StringBetween($src,'what is before the text','what is after the text')

Func _StringBetween($s, $from, $to)
    $x = StringInStr($s, $from) + StringLen($from)
    $y = StringInStr(StringTrimLeft($s, $x), $to)
    If $x And $y Then
        Return StringMid($s, $x, $y)
    Else
        Return 0
    EndIf
EndFunc;==>_StringBetween

;===============================================================================
;
; Function Name:    _IEDocReadHTML()
; Description:      Retrieves the full HTML source of a document
; Parameter(s):  $o_object   - InternetExplorer.Application, Window or Frame object
; Requirement(s):   AutoIt3 Beta with COM support (post 3.1.1)
; Return Value(s):  Success - HTML included in the <HTML> of the docuement, including the <HTML> and </HTML> tags
;                   Failure - 0 and sets @ERROR to 1
; Author(s):        Dale Hohm
;
;===============================================================================
;
Func _IEDocReadHTML($o_object)
    If IsObj($o_object) Then
        SetError(0)
        Return $o_object.document.getElementsByTagName("HTML").item(0).outerHTML
    Else
        SetError(1)
        Return 0
    EndIf
EndFunc

:)

EDIT: Just saw your second post. Look through some of the functions in IE.au3 that use the index. Use a method like that to get the index and then assign them in an array....

EDIT2: Typo... Urghhh

Edited by Andrew Sparkes

---Sparkes.

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

Wow!

Thanks for the quick reply. This looks great. I guess it finds the first instance of the string i want. How would I make it search multiple times?

Maybe take out the string I found from the source, and search again until there is no result? Would that work?

Thanks,

Brett

EDIT: Just saw your edit :). I'll look for that and post back. Thanks!

Edited by zzzBrett

Share this post


Link to post
Share on other sites

Yes, you are correct, if you want to do it that way, you'll need this:

;===============================================================================
;
; Function Name:    _IEDocWriteHTML()
; Description:      Replaces the HTML for the entire document
; Parameter(s):  $o_object   - InternetExplorer.Application, Window or Frame object
;                   $s_html     - the HTML string to write to the document
; Requirement(s):   AutoIt3 Beta with COM support (post 3.1.1)
; Return Value(s):  Success - 1
;                   Failure - 0 and sets @ERROR to 1
; Author(s):        Dale Hohm
;
;===============================================================================
;
Func _IEDocWriteHTML($o_object, $s_html)
    If IsObj($o_object) Then
        $o_object.document.Write($s_html)
        $o_object.document.close()
        SetError(0)
        Return 1
    Else
        SetError(1)
        Return 0
    EndIf
EndFunc

---Sparkes.

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

How does this look?

#include <IE.au3>
$IE=_IECreate()
_IENavigate($IE,$url)
$src=_IEDocReadHTML($IE)
$stringyouwant[0] = "initial value"
$c = 0

$beforetext = "start characters"
$aftertext = "end characters"

While($stringyouwant[$c] <> "")
    $c = $c + 1
    $stringyouwant[$c]=_StringBetween($src,$beforetext,$aftertext)
    $wantToDelete = $beforetext & $stringyouwant[$c] & $aftertext
    _IEDocWriteHTML($wantToDelete, $src)
Wend

$total = $c
;$stringyouwant[1] through $stringyouwant[$total] are results







;INCLUDED FUNCTIONS

Func _StringBetween($s, $from, $to)
    $x = StringInStr($s, $from) + StringLen($from)
    $y = StringInStr(StringTrimLeft($s, $x), $to)
    If $x And $y Then
        Return StringMid($s, $x, $y)
    Else
        Return 0
    EndIf
EndFunc;==>_StringBetween

;===============================================================================
;
; Function Name:    _IEDocReadHTML()
; Description:      Retrieves the full HTML source of a document
; Parameter(s):  $o_object   - InternetExplorer.Application, Window or Frame object
; Requirement(s):   AutoIt3 Beta with COM support (post 3.1.1)
; Return Value(s):  Success - HTML included in the <HTML> of the docuement, including the <HTML> and </HTML> tags
;                   Failure - 0 and sets @ERROR to 1
; Author(s):        Dale Hohm
;
;===============================================================================
;
Func _IEDocReadHTML($o_object)
    If IsObj($o_object) Then
        SetError(0)
        Return $o_object.document.getElementsByTagName("HTML").item(0).outerHTML
    Else
        SetError(1)
        Return 0
    EndIf
    
    
;===============================================================================
;
; Function Name:    _IEDocWriteHTML()
; Description:      Replaces the HTML for the entire document
; Parameter(s):  $o_object   - InternetExplorer.Application, Window or Frame object
;                   $s_html     - the HTML string to write to the document
; Requirement(s):   AutoIt3 Beta with COM support (post 3.1.1)
; Return Value(s):  Success - 1
;                   Failure - 0 and sets @ERROR to 1
; Author(s):        Dale Hohm
;
;===============================================================================
;
Func _IEDocWriteHTML($o_object, $s_html)
    If IsObj($o_object) Then
        $o_object.document.Write($s_html)
        $o_object.document.close()
        SetError(0)
        Return 1
    Else
        SetError(1)
        Return 0
    EndIf
EndFunc

EDIT: It looks like I might be using the _IEDocWriteHTML func wrong. How would I properly use this?

Edited by zzzBrett

Share this post


Link to post
Share on other sites

How does this look?

#include <IE.au3>
$IE=_IECreate()
_IENavigate($IE,$url)
$src=_IEDocReadHTML($IE)
$stringyouwant[0] = "initial value"
$c = 0

$beforetext = "start characters"
$aftertext = "end characters"

While($stringyouwant[$c] <> "")
    $c = $c + 1
    $stringyouwant[$c]=_StringBetween($src,$beforetext,$aftertext)
    $wantToDelete = $beforetext & $stringyouwant[$c] & $aftertext
    _IEDocWriteHTML($wantToDelete, $src)
Wend

$total = $c
;$stringyouwant[1] through $stringyouwant[$total] are results

EDIT: It looks like I might be using the _IEDocWriteHTML func wrong. How would I properly use this?

Almost.

Add this instead of _IEDocWriteHTML($wantToDelete, $src):

$src=StringReplace($src,$wantToDelete,'')
_IEDocWriteHTML($IE,$src)

I think you may run into problems with the array though. Try it and see what happens. If you need some array funcs, the Array.au3 include file has some handy ones...

Sparkes.


---Sparkes.

Share this post


Link to post
Share on other sites

Look through the _Array...() functions in the helpfile. They have some nice uses. You might want to declare the array and then add to it with _ArrayAdd()


---Sparkes.

Share this post


Link to post
Share on other sites

#9 ·  Posted (edited)

I made a few changes, including the IELoadWait func, and declaring the variables properly. Now, it opens the IE window and goes into this endless loop after going to the correct page, then goes to "http:///" (it looks like) over and over. Not sure where the loop is going bad.

Also, sorry about deleting that last post. I did it so quickly, I didn't think anyone would ahve a chance to respond.

Here is the code:

#include <IE.au3>

dim $stringyouwant[50]
$IE=_IECreate()
_IENavigate($IE,$url)
_IELoadWait($IE)
$src=_IEDocReadHTML($IE)
$stringyouwant[0] = "initial value"
$c = 0

$beforetext = "before"
$aftertext = "after"

While($stringyouwant[$c] <> "")
    $c = $c + 1
    $stringyouwant[$c]=_StringBetween($src,$beforetext,$aftertext)
    $wantToDelete = $beforetext & $stringyouwant[$c] & $aftertext
    $src=StringReplace($src,$wantToDelete,'')
;   _IEDocWriteHTML($IE,$src) SHOULD I COMMENT THIS LINE? WHEN I DO, IT SEEMS TO DO BETTER, BUT I DON'T GET THE CORRECT RESULT
Wend

$total = $c

;$stringyouwant[1] through $stringyouwant[$total] are results
for $i = 1 to $total
    MsgBox(0, "Test", "result #"& $i & ': ' & $stringyouwant[$i])
Next

When I comment that line, I always get a zero as the result...I'm not sure if its getting the HTML incorrectly or what.

Thanks,

Brett

Edited by zzzBrett

Share this post


Link to post
Share on other sites

OK

I am so close to getting this fully working. But I am having a problem with the StringBetween Function.

Is this what it should be like? are there any errors?

Func _StringBetween($s, $from, $to)
    $x = StringInStr($s, $from) + StringLen($from)
    $y = StringInStr(StringTrimLeft($s, $x), $to)
    If $x And $y Then
        Return StringMid($s, $x, $y)
    Else
        Return 0
    EndIf
EndFunc;==>_StringBetween

It doesn't seem to be getting the right text. Its way off somewhere else in the HTML. I have no idea where it gets here..

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0