Jump to content

Finding a text inside HTML


Spask
 Share

Recommended Posts

Hi, I'm trying to find a text value inside of a html.

This is what the line looks like normally:

<p id="line1" class>
    <span class="bot">TEXT HERE</span>
</p>

The text then changes to a non breaking space:

<p id="line1" class>
    <span class="bot">&nbsp;</span>
</p>

And then it changes back to normal text but it's different every time.

Can I code this so that it grabs the text every time it changes and has a variable that represents it?

I currently have this inside of my loop:

$span = .document.getElementsByTagName("span")
    For $text In $span
        If $text.value = "&nbsp;" Then
            Sleep(50)
            MsgBox(0,0,0) ;messagebox to test if it can be found, but I don't know how to grab the text
        EndIf
    Next

The problem is that there are many other lines in the html that have the same span but are called "line3", "line5", etc and the one I need is from "line1".

I will appreciate if anyone can help with this!

Edited by Spask
Link to comment
Share on other sites

Here  is how I'd do it given the html file is temp.html in the  @ScriptDir directory and looping through the regular expression periodically to keep checking:

#include <MsgBoxConstants.au3>

; Open the file for reading and store the handle to a variable.
$hFileOpen = FileOpen(@ScriptDir & "\temp.html", 0)
If $hFileOpen = -1 Then
    MsgBox($MB_SYSTEMMODAL, "", "An error occurred when reading the file.")
EndIf

; Read the contents of the file using the handle returned by FileOpen.
$sFileRead = FileRead($hFileOpen)

; Check if a string fits a given regular expression pattern.
$aArray = StringRegExp($sFileRead, '(?i)(?-s)<p id="line1" class>\r?\n.*?bot">(.*?)[<;]+.*?\r?\n</P>', 3)


For $i = 0 To UBound($aArray) - 1
    MsgBox($MB_SYSTEMMODAL, "RegExp Test with Option 2 - " & $i, $aArray[$i])
Next

 

Edited by Jury
Link to comment
Share on other sites

Sure, here it is:

$ie = ObjCreate("InternetExplorer.Application")

#include <ButtonConstants.au3>
#include <EditConstants.au3>
#include <GUIConstantsEx.au3>
#include <WindowsConstants.au3>

Global $bot = GUICreate("Bot", 226, 139, -1, -1)
Global $startie = GUICtrlCreateButton("Start IE", 32, 24, 155, 25)
Global $startloop = GUICtrlCreateButton("Start Loop", 32, 56, 75, 25)
Global $pauseloop = GUICtrlCreateButton("Pause Loop", 112, 56, 75, 25)

GUICtrlSetOnEvent($startie, 'StartIE')
GUICtrlSetOnEvent($startloop, 'startloop')
GUICtrlSetOnEvent($pauseloop, 'Pauseloop')
GUISetOnEvent($GUI_EVENT_CLOSE, 'ExitApp')
Opt('GUIOnEventMode', 1)

GUISetState(@SW_SHOW)

Global $var = 0

;function to start internet explorer and get website ready
Func StartIE()
   With $ie
      .visible = true
      .navigate("http://cleverbot.com")
      While($ie.busy)
         Sleep(500)
      WEnd
   EndWith
EndFunc

While 1
   With $ie
      ;while loop to interact with the website
      While $var = 1
         While($ie.busy)
            Sleep(500)
         WEnd
         Sleep(5000)
         $sayitbutton = .document.getElementsByTagName("input")
         For $b in $sayitbutton
            if $b.value = "think for me" Then
               $b.click()
            EndIf
         Next
         ;finds the text in html
         $span = .document.getElementsByTagName("span")
         For $text In $span
           If $text.value = " " Then
              Sleep(50)
              MsgBox(0,0,0) ;messagebox to test if it can be found, but I don't know how to grab the text
           EndIf
         Next
      WEnd
   EndWith
WEnd

;starts the loop
Func startloop()
   $var = 1
EndFunc

;pauses the loop
Func Pauseloop()
   $var = 2
EndFunc

;exits the app
Func ExitApp()
   Opt('GUIOnEventMode', 0)
   GUIDelete($bot)
   Exit
EndFunc

 

Link to comment
Share on other sites

Something like this - the more information you supply and an example of what you are trying will result in better responses to you questions.

#include <IE.au3>


$oIE = _IECreate("Your url here")
$sHTML = _IEBodyReadHTML($oIE)


$aArray = StringRegExp($sHTML, '(?i)(?-s)<p id="line1" class>\r?\n.*?bot">(.*?)[<;]+.*?\r?\n</P>', 3)

For $i = 0 To UBound($aArray) - 1
    ConsoleWrite($aArray[$i] & @CRLF)
Next

 

Edited by Jury
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...