Jump to content
Spask

Finding a text inside HTML

Recommended Posts

Spask

Hi, I'm trying to find a text value inside of a html.

This is what the line looks like normally:

<p id="line1" class>
    <span class="bot">TEXT HERE</span>
</p>

The text then changes to a non breaking space:

<p id="line1" class>
    <span class="bot">&nbsp;</span>
</p>

And then it changes back to normal text but it's different every time.

Can I code this so that it grabs the text every time it changes and has a variable that represents it?

I currently have this inside of my loop:

$span = .document.getElementsByTagName("span")
    For $text In $span
        If $text.value = "&nbsp;" Then
            Sleep(50)
            MsgBox(0,0,0) ;messagebox to test if it can be found, but I don't know how to grab the text
        EndIf
    Next

The problem is that there are many other lines in the html that have the same span but are called "line3", "line5", etc and the one I need is from "line1".

I will appreciate if anyone can help with this!

Edited by Spask

Share this post


Link to post
Share on other sites
Jury

Here  is how I'd do it given the html file is temp.html in the  @ScriptDir directory and looping through the regular expression periodically to keep checking:

#include <MsgBoxConstants.au3>

; Open the file for reading and store the handle to a variable.
$hFileOpen = FileOpen(@ScriptDir & "\temp.html", 0)
If $hFileOpen = -1 Then
    MsgBox($MB_SYSTEMMODAL, "", "An error occurred when reading the file.")
EndIf

; Read the contents of the file using the handle returned by FileOpen.
$sFileRead = FileRead($hFileOpen)

; Check if a string fits a given regular expression pattern.
$aArray = StringRegExp($sFileRead, '(?i)(?-s)<p id="line1" class>\r?\n.*?bot">(.*?)[<;]+.*?\r?\n</P>', 3)


For $i = 0 To UBound($aArray) - 1
    MsgBox($MB_SYSTEMMODAL, "RegExp Test with Option 2 - " & $i, $aArray[$i])
Next

 

Edited by Jury

Share this post


Link to post
Share on other sites
Spask

Does this work if I'm trying to do it in an IE window? I've created a variable called $ie = ObjCreate("InternetExplorer.Application")

Share this post


Link to post
Share on other sites
kylomas

Spask,

Can you post the runnable code you are trying?

kylomas

Edited by kylomas

Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Share this post


Link to post
Share on other sites
Spask

Sure, here it is:

$ie = ObjCreate("InternetExplorer.Application")

#include <ButtonConstants.au3>
#include <EditConstants.au3>
#include <GUIConstantsEx.au3>
#include <WindowsConstants.au3>

Global $bot = GUICreate("Bot", 226, 139, -1, -1)
Global $startie = GUICtrlCreateButton("Start IE", 32, 24, 155, 25)
Global $startloop = GUICtrlCreateButton("Start Loop", 32, 56, 75, 25)
Global $pauseloop = GUICtrlCreateButton("Pause Loop", 112, 56, 75, 25)

GUICtrlSetOnEvent($startie, 'StartIE')
GUICtrlSetOnEvent($startloop, 'startloop')
GUICtrlSetOnEvent($pauseloop, 'Pauseloop')
GUISetOnEvent($GUI_EVENT_CLOSE, 'ExitApp')
Opt('GUIOnEventMode', 1)

GUISetState(@SW_SHOW)

Global $var = 0

;function to start internet explorer and get website ready
Func StartIE()
   With $ie
      .visible = true
      .navigate("http://cleverbot.com")
      While($ie.busy)
         Sleep(500)
      WEnd
   EndWith
EndFunc

While 1
   With $ie
      ;while loop to interact with the website
      While $var = 1
         While($ie.busy)
            Sleep(500)
         WEnd
         Sleep(5000)
         $sayitbutton = .document.getElementsByTagName("input")
         For $b in $sayitbutton
            if $b.value = "think for me" Then
               $b.click()
            EndIf
         Next
         ;finds the text in html
         $span = .document.getElementsByTagName("span")
         For $text In $span
           If $text.value = " " Then
              Sleep(50)
              MsgBox(0,0,0) ;messagebox to test if it can be found, but I don't know how to grab the text
           EndIf
         Next
      WEnd
   EndWith
WEnd

;starts the loop
Func startloop()
   $var = 1
EndFunc

;pauses the loop
Func Pauseloop()
   $var = 2
EndFunc

;exits the app
Func ExitApp()
   Opt('GUIOnEventMode', 0)
   GUIDelete($bot)
   Exit
EndFunc

 

Share this post


Link to post
Share on other sites
Spask

Woops my bad, didn't realize I made another post.

Edited by Spask

Share this post


Link to post
Share on other sites
Jury

Something like this - the more information you supply and an example of what you are trying will result in better responses to you questions.

#include <IE.au3>


$oIE = _IECreate("Your url here")
$sHTML = _IEBodyReadHTML($oIE)


$aArray = StringRegExp($sHTML, '(?i)(?-s)<p id="line1" class>\r?\n.*?bot">(.*?)[<;]+.*?\r?\n</P>', 3)

For $i = 0 To UBound($aArray) - 1
    ConsoleWrite($aArray[$i] & @CRLF)
Next

 

Edited by Jury

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Similar Content

    • rm4453
      By rm4453
      I have a table I am parsing, to find specific vehicle information. I am unable to get _ArrayFindAll to return the only valid result with my test data.
      Below is a sample of the table's HTML:
      <td class="textCenter">2010</td> <td>TOYOTA</td> <td>TACOMA 4X4 DB</td> <td></td> <td>BLACK</td> <td class="textCenter">C</td> <td class="textCenter">6</td> <td>GAS</td> <td class="textCenter">A</td> <td class="textCenter">4X4</td> <td class="textCenter">Y</td> <td>CD</td> <td class="textCenter">HT</td> <td class="textCenter">C</td> <td class="textRight" nowrap="nowrap">80,975</td> <td class="textRight" nowrap="nowrap">$16,800</td> </tr> <tr><!-- 308 --> <td class="textCenter">2010</td> <td>TOYOTA</td> <td>TACOMA 4X4 RG</td> <td></td> <td>BLACK</td> <td class="textCenter">R</td> <td class="textCenter">4</td> <td>GAS</td> <td class="textCenter">5</td> <td class="textCenter">4X4</td> <td class="textCenter">A</td> <td>CD</td> <td class="textCenter">HT</td> <td class="textCenter">C</td> <td class="textRight" nowrap="nowrap">95,224</td> <td class="textRight" nowrap="nowrap">$9,500</td> </tr> <tr><!-- 309 --> <td class="textCenter">2011</td> <td>BUICK</td> <td>REGAL</td> <td>CXL RL4</td> <td>BLACK</td> <td class="textCenter">4</td> <td class="textCenter">4</td> <td>GAS</td> <td class="textCenter">A</td> <td class="textCenter"></td> <td class="textCenter">A</td> <td>CD</td> <td class="textCenter">SR</td> <td class="textCenter">L</td> <td class="textRight" nowrap="nowrap">102,694</td> <td class="textRight" nowrap="nowrap">$5,000</td> </tr> <tr><!-- 310 --> <td class="textCenter">2011</td> <td>CHEVROLET</td> <td>AVALANCH 4X4 CR</td> <td>LS</td> <td>GRAY</td> <td class="textCenter">C</td> <td class="textCenter">8</td> <td>E</td> <td class="textCenter">A</td> <td class="textCenter">4X4</td> <td class="textCenter">A</td> <td>CD</td> <td class="textCenter">HT</td> <td class="textCenter">C</td> <td class="textRight" nowrap="nowrap">64,759</td> <td class="textRight" nowrap="nowrap">$16,300</td> </tr> <tr><!-- 311 --> <td class="textCenter">2011</td> <td>CHEVROLET</td> <td>EQUINOX AWD 4C</td> <td>LT W/2LT</td> <td>BLACK</td> <td class="textCenter">S</td> <td class="textCenter">4</td> <td>GAS</td> <td class="textCenter">A</td> <td class="textCenter">AWD</td> <td class="textCenter">Y</td> <td>CD</td> <td class="textCenter">SR</td> <td class="textCenter">C</td> <td class="textRight" nowrap="nowrap">91,896</td> <td class="textRight" nowrap="nowrap">$4,400</td> </tr> <tr><!-- 312 --> <td class="textCenter">2011</td> <td>CHEVROLET</td> <td>TAHOE 4X4 V8</td> <td>LTZ</td> <td>WHITE</td> <td class="textCenter">S</td> <td class="textCenter">8</td> <td>C</td> <td class="textCenter">A</td> <td class="textCenter">4X4</td> <td class="textCenter">A</td> <td>N</td> <td class="textCenter">MR</td> <td class="textCenter">L</td> <td class="textRight" nowrap="nowrap">126,982</td> <td class="textRight" nowrap="nowrap">$17,800</td> </tr> <tr><!-- 313 --> <td class="textCenter">2011</td> <td>CHEVROLET</td> <td>1500 SLV 4X4 EX</td> <td>LT</td> <td>GRAY</td> <td class="textCenter">X</td> <td class="textCenter">8</td> <td>GAS</td> <td class="textCenter">O</td> <td class="textCenter">4X4</td> <td class="textCenter">A</td> <td>CD</td> <td class="textCenter">HT</td> <td class="textCenter">C</td> <td class="textRight" nowrap="nowrap">60,303</td> <td class="textRight" nowrap="nowrap">$18,100</td> </tr> <tr><!-- 314 --> <td class="textCenter">2011</td> <td>CHEVROLET</td> <td>1500 SLV 4X4 EX</td> <td>LT</td> <td>SILVER</td> <td class="textCenter">X</td> <td class="textCenter">8</td> <td>E</td> <td class="textCenter">O</td> <td class="textCenter">4X4</td> <td class="textCenter">A</td> <td>CD</td> <td class="textCenter">HT</td> <td class="textCenter">C</td> <td class="textRight" nowrap="nowrap">89,403</td> <td class="textRight" nowrap="nowrap">$15,900</td> </tr> <tr><!-- 315 --> <td class="textCenter">2011</td> <td>CHEVROLET</td> <td>1500 SLV 4X4 EX</td> <td>LTZ</td> <td>BLUE</td> <td class="textCenter">X</td> <td class="textCenter">8</td> <td>E</td> <td class="textCenter">A</td> <td class="textCenter">4X4</td> <td class="textCenter">A</td> <td>CD</td> <td class="textCenter">HT</td> <td class="textCenter">L</td> <td class="textRight" nowrap="nowrap">53,087</td> <td class="textRight" nowrap="nowrap">$17,700</td> </tr> <tr><!-- 316 --> <td class="textCenter">2011</td> <td>CHEVROLET</td> <td>3500 CUTAWAY</td> <td>WORK VAN</td> <td>WHITE</td> <td class="textCenter">S</td> <td class="textCenter"></td> <td></td> <td class="textCenter">A</td> <td class="textCenter">4X2</td> <td class="textCenter"></td> <td>N</td> <td class="textCenter">HT</td> <td class="textCenter">C</td> <td class="textRight" nowrap="nowrap">202,477</td> <td class="textRight" nowrap="nowrap">$2,700</td> </tr> <tr><!-- 317 --> <td class="textCenter">2011</td> <td>CHRYSLER</td> <td>TOWN &amp; COUNTRY</td> <td>TOURING</td> <td>BLACK</td> <td class="textCenter">4</td> <td class="textCenter">6</td> <td>E</td> <td class="textCenter">A</td> <td class="textCenter">4X2</td> <td class="textCenter">A</td> <td>CD</td> <td class="textCenter">HT</td> <td class="textCenter"></td> <td class="textRight" nowrap="nowrap">198,541</td> <td class="textRight" nowrap="nowrap">$1,900</td> </tr> <tr><!-- 318 --> <td class="textCenter">2011</td> <td>DODGE</td> <td>DURANGO AWD V6</td> <td>CREW</td> <td>BLUE</td> <td class="textCenter">S</td> <td class="textCenter">6</td> <td>GAS</td> <td class="textCenter">A</td> <td class="textCenter">AWD</td> <td class="textCenter">A</td> <td>CD</td> <td class="textCenter">SR</td> <td class="textCenter">C</td> <td class="textRight" nowrap="nowrap">176,036</td> <td class="textRight" nowrap="nowrap">$2,800</td> </tr> <tr><!-- 319 --> <td class="textCenter">2011</td> <td>FORD</td> <td>FOCUS</td> <td>SE</td> <td>SILVER</td> <td class="textCenter">4</td> <td class="textCenter">4</td> <td>GAS</td> <td class="textCenter">A</td> <td class="textCenter"></td> <td class="textCenter">Y</td> <td>CD</td> <td class="textCenter">HT</td> <td class="textCenter">C</td> <td class="textRight" nowrap="nowrap">101,929</td> <td class="textRight" nowrap="nowrap">$3,100</td> </tr> <tr><!-- 320 --> <td class="textCenter">2011</td> <td>FORD</td> <td>FUSION FWD 4C</td> <td>SEL</td> <td>WHITE</td> <td class="textCenter">4</td> <td class="textCenter">4</td> <td>GAS</td> <td class="textCenter">A</td> <td class="textCenter"></td> <td class="textCenter">Y</td> <td>CD</td> <td class="textCenter">SR</td> <td class="textCenter">L</td> <td class="textRight" nowrap="nowrap">78,290</td> <td class="textRight" nowrap="nowrap">$5,500</td> </tr> <tr><!-- 321 --> <td class="textCenter">2011</td> <td>FORD</td> <td>F150 4X4 CR</td> <td>XLT</td> <td>BLACK</td> <td class="textCenter">C</td> <td class="textCenter">8</td> <td>GAS</td> <td class="textCenter">A</td> <td class="textCenter">4X4</td> <td class="textCenter">Y</td> <td>CD</td> <td class="textCenter">HT</td> <td class="textCenter">C</td> <td class="textRight" nowrap="nowrap">70,909</td> <td class="textRight" nowrap="nowrap">$16,000</td> </tr> <tr><!-- 322 --> <td class="textCenter">2011</td> <td>FORD</td> <td>MUSTANG V6 CPE</td> <td>V6 PREMIUM</td> <td>BLACK</td> <td class="textCenter">2</td> <td class="textCenter">6</td> <td>GAS</td> <td class="textCenter">A</td> <td class="textCenter">4X2</td> <td class="textCenter">A</td> <td>CD</td> <td class="textCenter">HT</td> <td class="textCenter">L</td> <td class="textRight" nowrap="nowrap">92,531</td> <td class="textRight" nowrap="nowrap">$2,700</td> </tr> <tr><!-- 323 --> <td class="textCenter">2011</td> <td>GMC</td> <td>ACADIA FWD</td> <td>SLE</td> <td>RED</td> <td class="textCenter">4</td> <td class="textCenter">6</td> <td>GAS</td> <td class="textCenter">A</td> <td class="textCenter"></td> <td class="textCenter">A</td> <td>CD</td> <td class="textCenter">HT</td> <td class="textCenter">C</td> <td class="textRight" nowrap="nowrap">79,199</td> <td class="textRight" nowrap="nowrap">$10,700</td> A picture of the table test data is attached here:

      Here is the _query function, and all other relevant code that I can share.
       
      Func _query($aSel, $aUrls) $oIE = _login() If $oIE = "Return" Then Return EndIf $j = 0 While $j < UBound($aSel) - 1 $i = 1 $aucID ;Unable To Share What This Is Other Than Var Name. _IENavigate($oIE, "Something" & $aucID[0] & "Something") _IELoadWait($oIE, 100, 2000) $oObj = _IETableGetCollection($oIE, 3) $cars = _IETableWriteToArray_ProgressBar($oObj, True, "Processing Requested Information!") ;<---- Modified Version See Post For It: https://www.autoitscript.com/forum/topic/195335-solved-how-to-add-a-progress-bar-to-_ietablewritetoarray/?tab=comments#comment-1400699 $carsYear = _filter($cars, 0, 0, GUICtrlRead($year)) $carsMake = _filter($carsYear, 0, 1, GUICtrlRead($make)) Global $carsModel = _filter($carsMake, 0, 2, GUICtrlRead($model)) $i = 0 $engine = GUICtrlRead($engine) If $engine <> "" Then While $i < StringLen($engine) $carsEngine = _filter($carsModel, 0, 6 + $i, StringLeft($engine, 1)) ;_ArrayDisplay($carsModel, "Cars Model Before Array Delete") $x = 1 While $x <= UBound($carsModel) _ArrayDelete($carsModel, $x) $x += 1 WEnd ;_ArrayDisplay($carsModel, "Cars Model After Array Delete") ;_ArrayConcatenate($carsModel, $carsEngine) ;_ArrayDisplay($carsModel, "Cars Model After Concatenate") $engine = StringTrimLeft($engine, 1) $i += 1 WEnd Else Dim $carsEngine[1][16] EndIf _ArrayConcatenate($carsEngine, $carsModel) _ArrayDisplay($carsEngine, "Cars Engine") Dim $carsDriveTrain[1][16] Dim $carsDriveTrain2[1][16] $driveTrainVal = GUICtrlRead($driveTrain) If $driveTrainVal = "4x4" Or $driveTrainVal = "awd" Then $carsDriveTrain = _filter($carsEngine, 0, 9, "4") $carsDriveTrain2 = _filter($carsEngine, 0, 9, "a") ;~ _ArrayDisplay($carsDriveTrain, "Drive Train Before") ;~ If @error Then ;~ MsgBox("", "", "Cars Drive Train Error: " & @error) ;~ EndIf ;~ _ArrayDisplay($carsDriveTrain2, "Drive Train2 Before") ;~ If @error Then ;~ MsgBox("", "", "Cars Drive Train 2 Error: " & @error) ;~ EndIf _ArrayConcatenate($carsDriveTrain, $carsDriveTrain2) _ArrayDisplay($carsDriveTrain, "Drive Train After Concat") ElseIf $driveTrainVal = "" Then _ArrayConcatenate($carsDriveTrain, $carsEngine) Else $carsDriveTrain = _filter($carsEngine, 0, 9, $driveTrain) EndIf Dim $carsOdom[1][16] $min = GUICtrlRead($odomMin) $max = GUICtrlRead($odomMax) For $i = 0 To UBound($carsDriveTrain) - 1 If $carsDriveTrain[$i][14] > $min And $carsDriveTrain[$i][14] < $max Then _ArrayAdd($carsOdom, $carsDriveTrain[$i]) MsgBox("", "", "ADDED!") EndIf Next _ArrayDisplay($carsOdom, "Cars Odom") $j += 1 WEnd _IEQuit($oIE) EndFunc ;==>_query Func _filter($tofilter, $xpos1, $ypos1, $str) ;~ If UBound($tofilter, 1) <= 1 Then ;~ $endx = 0 ;~ Else ;~ $endx = UBound($tofilter, 1) - 1 ;~ EndIf ;~ $cars = _ArrayFindAll($tofilter, $str, $tofilter[$xpos1][$ypos1], $tofilter[$endx][$ypos1], 0, 1, $ypos1, False) $cars = _ArrayFindAll($tofilter, $str, Default, Default, 0, 1, $ypos1) Dim $carsFiltered[1][16] = [["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p"]] ;_ArrayDisplay($carsFiltered) $i = 0 $uBound = UBound($cars) - 1 While $i < $uBound $filtered = _ArrayExtract($tofilter, $cars[$i], $cars[$i], 0, 15) ;_ArrayDisplay($filtered, "_filter Array of Filtered") _ArrayConcatenate($carsFiltered, $filtered) $i += 1 WEnd _ArrayDelete($carsFiltered, 0) Return $carsFiltered EndFunc ;==>_filter  
      If you know of a more efficient way of doing this please let me know would be more than happy to chew down my inefficiency while learning! (It's like my grandpa used to say, "The only criticism I can't use is that which is not given to me.")
       
      The Item an I am using to test the filter is:

       
    • rm4453
      By rm4453
      Hello,
       
      I am currently writing a program that parses a massive table from a website, and need a way to add a progress bar while parsing.
      I am currently using the function _IETableWriteToArray($oObj, True) to parse the array. I need the progress bar to update as the table is parsed, not just at the end of the parsing.
      Any help at all would be very much appreciated!
       
      *EDIT --> The array I am left with after parsing is $array[0-50000][16]
    • SkysLastChance
      By SkysLastChance
      I have a goofy problem. I am hoping someone could shed some light. The example is not going around the text box. It is way off. 
      I have seen some post blaming IE 11, however I have IE11 on my desktop and it works fine.
      Is there anything I can do that might fix this? 
       
      ; Open a browser with the form example and get a reference to the form ; textarea element. Get the coordinates and dimensions of the text area, ; outline its shape with the mouse and come to rest in the center #include <IE.au3> Local $oIE = _IE_Example("form") Local $oForm = _IEFormGetObjByName($oIE, "ExampleForm") Local $oTextArea = _IEFormElementGetObjByName($oForm, "textareaExample") ; Get coordinates and dimensions of the textarea Local $iScreenX = _IEPropertyGet($oTextArea, "screenx") Local $iScreenY = _IEPropertyGet($oTextArea, "screeny") Local $iWidth = _IEPropertyGet($oTextArea, "width") Local $iHeight = _IEPropertyGet($oTextArea, "height") ; Outline the textarea with the mouse, come to rest in the center Local $iMousespeed = 50 MouseMove($iScreenX, $iScreenY, $iMousespeed) MouseMove($iScreenX + $iWidth, $iScreenY, $iMousespeed) MouseMove($iScreenX + $iWidth, $iScreenY + $iHeight, $iMousespeed) MouseMove($iScreenX, $iScreenY + $iHeight, $iMousespeed) MouseMove($iScreenX, $iScreenY, $iMousespeed) MouseMove($iScreenX + $iWidth / 2, $iScreenY + $iHeight / 2, $iMousespeed)  
       
    • ur
      By ur
      Is there any UDF to remove all anchor tags <a> with a particular class (and also its sub elements completely) in a html document.
      Here the classes are browse and breadcrumbs
      Like in the below image.


       
      I am not able to find that option in IE.au3
       
      Please suggest.
    • milkmoron
      By milkmoron
      I am trying to automate something in a web browser but i need some help with finding the html code to a web applet. How do I access the code.
×