Jump to content
Spask

Finding a text inside HTML

Recommended Posts

Spask

Hi, I'm trying to find a text value inside of a html.

This is what the line looks like normally:

<p id="line1" class>
    <span class="bot">TEXT HERE</span>
</p>

The text then changes to a non breaking space:

<p id="line1" class>
    <span class="bot">&nbsp;</span>
</p>

And then it changes back to normal text but it's different every time.

Can I code this so that it grabs the text every time it changes and has a variable that represents it?

I currently have this inside of my loop:

$span = .document.getElementsByTagName("span")
    For $text In $span
        If $text.value = "&nbsp;" Then
            Sleep(50)
            MsgBox(0,0,0) ;messagebox to test if it can be found, but I don't know how to grab the text
        EndIf
    Next

The problem is that there are many other lines in the html that have the same span but are called "line3", "line5", etc and the one I need is from "line1".

I will appreciate if anyone can help with this!

Edited by Spask

Share this post


Link to post
Share on other sites
Jury

Here  is how I'd do it given the html file is temp.html in the  @ScriptDir directory and looping through the regular expression periodically to keep checking:

#include <MsgBoxConstants.au3>

; Open the file for reading and store the handle to a variable.
$hFileOpen = FileOpen(@ScriptDir & "\temp.html", 0)
If $hFileOpen = -1 Then
    MsgBox($MB_SYSTEMMODAL, "", "An error occurred when reading the file.")
EndIf

; Read the contents of the file using the handle returned by FileOpen.
$sFileRead = FileRead($hFileOpen)

; Check if a string fits a given regular expression pattern.
$aArray = StringRegExp($sFileRead, '(?i)(?-s)<p id="line1" class>\r?\n.*?bot">(.*?)[<;]+.*?\r?\n</P>', 3)


For $i = 0 To UBound($aArray) - 1
    MsgBox($MB_SYSTEMMODAL, "RegExp Test with Option 2 - " & $i, $aArray[$i])
Next

 

Edited by Jury

Share this post


Link to post
Share on other sites
Spask

Does this work if I'm trying to do it in an IE window? I've created a variable called $ie = ObjCreate("InternetExplorer.Application")

Share this post


Link to post
Share on other sites
kylomas

Spask,

Can you post the runnable code you are trying?

kylomas

Edited by kylomas

Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Share this post


Link to post
Share on other sites
Spask

Sure, here it is:

$ie = ObjCreate("InternetExplorer.Application")

#include <ButtonConstants.au3>
#include <EditConstants.au3>
#include <GUIConstantsEx.au3>
#include <WindowsConstants.au3>

Global $bot = GUICreate("Bot", 226, 139, -1, -1)
Global $startie = GUICtrlCreateButton("Start IE", 32, 24, 155, 25)
Global $startloop = GUICtrlCreateButton("Start Loop", 32, 56, 75, 25)
Global $pauseloop = GUICtrlCreateButton("Pause Loop", 112, 56, 75, 25)

GUICtrlSetOnEvent($startie, 'StartIE')
GUICtrlSetOnEvent($startloop, 'startloop')
GUICtrlSetOnEvent($pauseloop, 'Pauseloop')
GUISetOnEvent($GUI_EVENT_CLOSE, 'ExitApp')
Opt('GUIOnEventMode', 1)

GUISetState(@SW_SHOW)

Global $var = 0

;function to start internet explorer and get website ready
Func StartIE()
   With $ie
      .visible = true
      .navigate("http://cleverbot.com")
      While($ie.busy)
         Sleep(500)
      WEnd
   EndWith
EndFunc

While 1
   With $ie
      ;while loop to interact with the website
      While $var = 1
         While($ie.busy)
            Sleep(500)
         WEnd
         Sleep(5000)
         $sayitbutton = .document.getElementsByTagName("input")
         For $b in $sayitbutton
            if $b.value = "think for me" Then
               $b.click()
            EndIf
         Next
         ;finds the text in html
         $span = .document.getElementsByTagName("span")
         For $text In $span
           If $text.value = " " Then
              Sleep(50)
              MsgBox(0,0,0) ;messagebox to test if it can be found, but I don't know how to grab the text
           EndIf
         Next
      WEnd
   EndWith
WEnd

;starts the loop
Func startloop()
   $var = 1
EndFunc

;pauses the loop
Func Pauseloop()
   $var = 2
EndFunc

;exits the app
Func ExitApp()
   Opt('GUIOnEventMode', 0)
   GUIDelete($bot)
   Exit
EndFunc

 

Share this post


Link to post
Share on other sites
Spask

Woops my bad, didn't realize I made another post.

Edited by Spask

Share this post


Link to post
Share on other sites
Jury

Something like this - the more information you supply and an example of what you are trying will result in better responses to you questions.

#include <IE.au3>


$oIE = _IECreate("Your url here")
$sHTML = _IEBodyReadHTML($oIE)


$aArray = StringRegExp($sHTML, '(?i)(?-s)<p id="line1" class>\r?\n.*?bot">(.*?)[<;]+.*?\r?\n</P>', 3)

For $i = 0 To UBound($aArray) - 1
    ConsoleWrite($aArray[$i] & @CRLF)
Next

 

Edited by Jury

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Similar Content

    • SkysLastChance
      By SkysLastChance
      So I have two things I am trying to click.
      Policy which works.
      $oInputs3 = _IETagNameGetCollection($oIE, "div") For $oInput3 in $oInputs3 If StringStripWS($oInput3.innertext,1) = "Policy" Then $target = $oInput3 _IELoadWait($target,"",70000) ExitLoop EndIf Next _IEAction($target, "click")  

      And Add Insurance which I havent been able to get to work. 
      $oInputs2 = _IETagNameGetCollection($oIE, "div") For $oInput2 in $oInputs2 If StringStripWS($oInput2.innertext,1) = "Add Insurance" Then $target = $oInput2 _IELoadWait($target,"",70000) ExitLoop EndIf Next _IEAction($target, "click")

      Any Ideas on what I am doing wrong?  I feel like it might be the spaces between >  Add Insurance  < but I am not sure. 
    • FMS
      By FMS
      Hello,
      I'm trying to read a div element and wait until it hits 100%.
      The structure is like :
      <div class="progress-bar" style="width: 48.0219%;  overflow: hidden; "></div>
      And want to wait until :
      <div class="progress-bar" style="width: 100%;  overflow: hidden; "></div>
      because afther this there will be an redirection whish i don't know the URL from and want to catsh this URL.
      And want to push a button on this redidertion page.

      Is there a best pratice way how to do this or is there a better way to wait for the redirection?
      Maybe wait until button exist or something?

      Does anybody could give me some tips about this challange?
       
      thnx in advanced.
       
      #include <IE.au3> Global $IE_flvto = _IECreate("https://www.website.com/",0,1,1,1) Global $oForm = _IEFormGetObjByName ($IE_flvto, "convertForm") Global $oText = _IEFormElementGetObjByName ($oForm, "convertUrl") _IEFormElementSetValue ($oText, "some text") _IEFormSubmit($oForm) ;wait for redirection ;if redirection loaded push button  
    • Juvigy
      By Juvigy
       Hi Guys,
      Could you please try out my script on win10 machine (simple site scrape)? One of my users complains it doesnt work on his win10, but it works fine on my win7. The error he gets is on the .FireEvent call. I think it is admin rights or IE / Edge issue, but don't have win10 to test it out.
      Thanks in advance.
       
      #include <IE.au3> #include <Array.au3> #include <Excel.au3> Global $oIE,$string Dim $destination[8] = ["Viña del Mar", "Rancagua", "Pucon", "Copiapo", "Temuco", "La serena", "Puerto Montt", "Valdivia"] Dim $FinalResult[1][3] $FinalResult[0][0] = "Destination" $FinalResult[0][1] = "Ida" $FinalResult[0][2] = "Vuelta" Attach("https://www.turbus.cl/") If IsObj($oIE) = 0 Then ConsoleWrite("IE error??"&@CRLF) $oIE = _IECreate() EndIf For $i=0 to UBound($destination)-1 Step 1 _IENavigate($oIE,"https://www.turbus.cl/") $site1 = _IEGetObjById($oIE,"j_id_id122:cmbCiudadOrigenV2") While @error Sleep(1000) $site1 = _IEGetObjById($oIE,"j_id_id122:cmbCiudadOrigenV2") WEnd $site2 = _IEGetObjById($oIE,"j_id_id122:cmbCiudadDestinoV2") $date1 = _IEGetObjById($oIE,"j_id_id122:calIdaV2InputDate") $date2 = _IEGetObjById($oIE,"j_id_id122:calVueltaV2InputDate") $buttun1 = _IEGetObjById($oIE,"j_id_id122:botonContinuarV2") $site1.Value = "Santiago" $site2.Value = $destination[$i] $date1.Value = @MDAY+1&"/"&@MON&"/"&@YEAR $date2.Value = @MDAY+4&"/"&@MON&"/"&@YEAR _IEAction($buttun1,"click") _IELoadWait($oIE) $array = GetResult(GetData()) If IsArray($array) = 0 OR UBound($array,1) < 1 Or UBound($array,2) < 2 Then MsgBox(0,UBound($array,1), UBound($array,2)) _ArrayDisplay($array,"$array") ContinueLoop EndIf _ArrayAdd($FinalResult,"Santiago-"&$destination[$i]&"|"&$array[0][0]&"|"&$array[0][1]) Next ;~ _ArrayDisplay($FinalResult) Local $oExcel = _Excel_Open() If @error Then Exit MsgBox($MB_SYSTEMMODAL, "Excel UDF: _Excel_RangeWrite Example", "Error creating the Excel application object." & @CRLF & "@error = " & @error & ", @extended = " & @extended) Local $oWorkbook = _Excel_BookNew($oExcel) If @error Then MsgBox($MB_SYSTEMMODAL, "Excel UDF: _Excel_RangeWrite Example", "Error creating the new workbook." & @CRLF & "@error = " & @error & ", @extended = " & @extended) _Excel_Close($oExcel) Exit EndIf _Excel_RangeWrite($oWorkbook, $oWorkbook.Activesheet, $FinalResult, "A1") If @error Then Exit MsgBox($MB_SYSTEMMODAL, "Excel UDF: _Excel_RangeWrite Example 2", "Error writing to worksheet." & @CRLF & "@error = " & @error & ", @extended = " & @extended) Func GetData() Local $string1,$string2 $ida = _IEGetObjById($oIE,"tbIda_lbl") While @error Sleep(1000) $ida = _IEGetObjById($oIE,"tbIda_lbl") WEnd _IEAction($ida,"click") Sleep(2000) $result1 = _IEGetObjById($oIE,"pnlReglaIda:idPersonalizaPasaje:tb") While @error $ida.FireEvent("onmouseover") Sleep(1000) $ida.FireEvent("onclick") Sleep(1000) $result1 = _IEGetObjById($oIE,"pnlReglaIda:idPersonalizaPasaje:tb") WEnd $string1 = $string1 & $result1.innertext&@CRLF $vuelta = _IEGetObjById($oIE,"tbVuelta_lbl") _IEAction($vuelta,"click") Sleep(2000) $result2 = _IEGetObjById($oIE,"pnlReglaVuelta:idPersonalizaPasajeRegreso:tb") While @error $vuelta.FireEvent("onmouseover") Sleep(1000) $vuelta.FireEvent("onclick") Sleep(1000) $result2 = _IEGetObjById($oIE,"pnlReglaVuelta:idPersonalizaPasajeRegreso:tb") WEnd $string2 = $string2 & $result2.innertext&@CRLF Return $string1&"|"&$string2 EndFunc Func Attach($atachadres) Local $i = 1 While 1 $oIE = _IEAttach("", "instance", $i) If @error = $_IEStatus_NoMatch Then $oIE = 0 ExitLoop EndIf If StringLeft(_IEPropertyGet($oIE, "locationurl"),StringLen($atachadres)) = $atachadres Then ExitLoop $i += 1 WEnd EndFunc Func GetResult($String) Dim $Master[0][2] ,$Master2[1][2] $2strings = StringSplit($String,"|",2) $Strings = StringSplit($2strings[0],@CRLF,2) $Strings = _ArrayUnique($Strings) For $i=0 to UBound($Strings)-1 Step 1 $Data = StringSplit($Strings[$i],"$",2) $add = _ArrayToString($Data) _ArrayAdd($Master,$add) Next For $i=UBound($Master,1)-1 to 0 Step -1 If $Master[$i][1] = "" Then _ArrayDelete($Master,$i) Next _ArraySort($Master,0, 0, 0,1) $Master2[0][0] = $Master[0][0]&$Master[0][1] Dim $Master[0][2] $Strings = StringSplit($2strings[1],@CRLF,2) $Strings = _ArrayUnique($Strings) For $i=0 to UBound($Strings)-1 Step 1 $Data = StringSplit($Strings[$i],"$",2) $add = _ArrayToString($Data) _ArrayAdd($Master,$add) Next For $i=UBound($Master,1)-1 to 0 Step -1 If $Master[$i][1] = "" Then _ArrayDelete($Master,$i) Next _ArraySort($Master,0, 0, 0,1) $Master2[0][1] = $Master[0][0]&$Master[0][1] Return $Master2 EndFunc  
    • SkysLastChance
      By SkysLastChance
      I have a goofy problem. I am hoping someone could shed some light. The example is not going around the text box. It is way off. 
      I have seen some post blaming IE 11, however I have IE11 on my desktop and it works fine.
      Is there anything I can do that might fix this? 
       
      ; Open a browser with the form example and get a reference to the form ; textarea element. Get the coordinates and dimensions of the text area, ; outline its shape with the mouse and come to rest in the center #include <IE.au3> Local $oIE = _IE_Example("form") Local $oForm = _IEFormGetObjByName($oIE, "ExampleForm") Local $oTextArea = _IEFormElementGetObjByName($oForm, "textareaExample") ; Get coordinates and dimensions of the textarea Local $iScreenX = _IEPropertyGet($oTextArea, "screenx") Local $iScreenY = _IEPropertyGet($oTextArea, "screeny") Local $iWidth = _IEPropertyGet($oTextArea, "width") Local $iHeight = _IEPropertyGet($oTextArea, "height") ; Outline the textarea with the mouse, come to rest in the center Local $iMousespeed = 50 MouseMove($iScreenX, $iScreenY, $iMousespeed) MouseMove($iScreenX + $iWidth, $iScreenY, $iMousespeed) MouseMove($iScreenX + $iWidth, $iScreenY + $iHeight, $iMousespeed) MouseMove($iScreenX, $iScreenY + $iHeight, $iMousespeed) MouseMove($iScreenX, $iScreenY, $iMousespeed) MouseMove($iScreenX + $iWidth / 2, $iScreenY + $iHeight / 2, $iMousespeed)  
       
    • FMS
      By FMS
      Hello,
      I try to get all the text from a news site around a subject.
      The first run I get all the text inside a attribute in an array.
      When i try to go back and reload another page it chrashes and think it because "$oIE.GoBack"
      I couldn't find anything in the help/forum around this subject.
      Do I need to reload the $oIE or something afther an X.GoBack?
      The error i got is :
      if $oTag2.GetAttribute("class") == "NewsDetail" Then if $oTag2^ ERROR I'm not shure why I got this error, maybe someone could explain?
      Also I'm open for some pointers in this test script because I'm pretty new in working whit the IE.UDF
      Maybe there is an simpler way to get the same results?

      test script:
      #include <IE.au3> #include <MsgBoxConstants.au3> #include <Array.au3> HotKeySet("{ESC}", "Terminate") Global $oIE = _IECreate("https://www.iex.nl/Zoeken/Nieuws.aspx?q=air%20france") ;get first subject Global $oLink1 = _IEGetObjById($oIE, "ctl00_ctl00_Content_LeftContent_NewsSearch_repNews_ctl00_linkNews") Sleep(500) _IEAction($oLink1, "click") Sleep(500) Global $oTags = _IETagNameGetCollection($oIE, "div") Global $aResults[1] For $oTag In $oTags if $oTag.GetAttribute("class") == "NewsDetail" Then _ArrayAdd($aResults, $oTag.innerTEXT) EndIf Next $aResults[0] = UBound($aResults) - 1 _ArrayDisplay($aResults, "Episodelist") ConsoleWrite($aResults[1] & @CRLF) $oIE.GoBack ;get second subject Global $oLink2 = _IEGetObjById($oIE, "ctl00_ctl00_Content_LeftContent_NewsSearch_repNews_ctl01_linkNews") Sleep(500) _IEAction($oLink2, "click") Sleep(500) Local $oTags2 = _IETagNameGetCollection($oIE, "div") Local $aResults2[1] For $oTag2 In $oTags2 if $oTag2.GetAttribute("class") == "NewsDetail" Then _ArrayAdd($aResults2, $oTag2.innerTEXT) EndIf Next $aResults2[0] = UBound($aResults2) - 1 ConsoleWrite($aResults2[1] & @CRLF) Func Terminate() _IEQuit($oIE) Exit EndFunc ;==>Terminate  
×