Jump to content

Extract string in HTML


Recommended Posts

Hi all,

sorry for my language...

I have a problem, for quite some time with my code.

I have a text like this :

 

<div class="dataCard sc">

        <div class="localisationBlock">
 
            <p><strong>Agence Creusot (Le)</strong><br>45&nbsp;r Maréchal Foch<br>71200&nbsp;Creusot (Le)</p>
            <!-- info ouvert maintenant -->
            <ul>
                        <li>
                    <a href="#null" data-pjlb="{&quot;url&quot;:&quot;#aHR0cDovL3d3dy5wYWdlc2phdW5lcy5mci9mZC1tZWRpYS9jb250ZW51RkRNZWRpYT9pZEJsb2NBbm5vbmNldXI9MDAxNjgyNDAwMDAwMTFDMDAwMSZpbmRleEluc2NyaXB0aW9uU2VsZWN0aW9ubmVlPTE0JnR5cGVPbmdsZXRTZWxlY3Rpb25uZT1QTEFO&quot;}" data-pjstats="{&quot;idRequete&quot;:&quot;140670850297018226&quot;,&quot;typeBloc&quot;:&quot;CV_LVS&quot;,&quot;genreBloc&quot;:&quot;1&quot;,&quot;idTag&quot;:&quot;PLAN&quot;,&quot;pos&quot;:8}">Plan</a></li>
            <li>
                    <a href="#null" data-pjlb="{&quot;url&quot;:&quot;#aHR0cDovL3d3dy5wYWdlc2phdW5lcy5mci9mZC1tZWRpYS9jb250ZW51RkRNZWRpYT9pZEJsb2NBbm5vbmNldXI9MDAxNjgyNDAwMDAwMTFDMDAwMSZpbmRleEluc2NyaXB0aW9uU2VsZWN0aW9ubmVlPTE0JnR5cGVPbmdsZXRTZWxlY3Rpb25uZT1JVElORVJBSVJF&quot;}" data-pjstats="{&quot;idRequete&quot;:&quot;140670850297018226&quot;,&quot;typeBloc&quot;:&quot;CV_LVS&quot;,&quot;genreBloc&quot;:&quot;1&quot;,&quot;idTag&quot;:&quot;ITI&quot;,&quot;pos&quot;:8}">Itinéraire</a></li>
            </ul>
            </div>
 
        <div class="contactBlock " id="contactBlock-00168240000011C0001-8">
 
            <p id="userContactTel00168240000011C0001-8" class="userContactTel btnShowNumber">
 
 
                            <a href="#contactList00168240000011C0001-8" class="linkButtonS1 withPicto picTel JS_PJ" data-pjstats="{&quot;idRequete&quot;:&quot;140670850297018226&quot;,&quot;typeBloc&quot;:&quot;CV_LVS&quot;,&quot;genreBloc&quot;:&quot;1&quot;,&quot;idProduit&quot;:&quot;1&quot;,&quot;idTag&quot;:&quot;CONTACTER-PAR-TELEPHONE&quot;,&quot;pos&quot;:8}" data-pjmyhistosearch_proclic="{&quot;id&quot;:&quot;05778040&quot;}" data-pjtoggleclass="{klass:'displayedNumber',sel:'#contactBlock-00168240000011C0001-8'}" data-pjcookie="{name:'idBlocUnmasked',value:'00168240000011C0001',domain:'pagesjaunes.fr',action:'plus'}"><span class="picto"></span>
                               Afficher<br>le numéro</a></p>
                    <ul id="contactList00168240000011C0001-8" class="blocPhoneNumber">
                        <li class="hideTel">
<strong>.09 74 50 3227</strong>
 
    </li>
</ul>
 
            <br>
</div>
 
        <div class="userContrib sc">
            <ul>
        <li>
                <a href="#null" class="opinion" data-pjlb="{&quot;url&quot;:&quot;#aHR0cHM6Ly93d3cucGFnZXNqYXVuZXMuZnIvdHJvdXZlcmxlc3Byb2Zlc3Npb25uZWxzL2ZpY2hlRGV0YWlsbGVlL2F2aXMvZGVwb3QuZG8/aWRCbG9jQW5ub25jZXVyPTAwMTY4MjQwMDAwMDExQzAwMDEmaW5kZXhJbnNjcmlwdGlvblNlbGVjdGlvbm5lZT0xNA==&quot;}" data-pjstats="{&quot;idRequete&quot;:&quot;140670850297018226&quot;,&quot;typeBloc&quot;:&quot;CV_LVS&quot;,&quot;genreBloc&quot;:&quot;1&quot;,&quot;idTag&quot;:&quot;BA-ECRIRE-AVIS-1ER&quot;,&quot;pjscript&quot;:&quot;xt_click({},'C','{%xtn2}','Liens_edito_BI_LR_1::Ecrire_1er_avis','N');&quot;,&quot;pos&quot;:8}" title="Soyez le 1er à écrire un avis">Soyez le 1er à écrire un avis</a></li>
        </ul>
</div>
        </div>
        <div class="moreDataCard sc">
            <div class="mentionsOblogatoiresContainer sc">
       <p class="JS_PJ mentionsOblogatoires withTooltip" data-pjtooltip="{dtanchor:'popMentions-0'}" id="aMentions-0">
                   Avertissement crédit</p>
               <dl class="js_hide">
                   <dt class="tooltip mogmoiCard sc"><a name="popMentions-0"></a></dt>
                   <dd>
                       <div class="blockTooltip hg">
                           <div class="boxRoundTopL">
                               <div class="boxRoundTopR">
                                   <p><strong>Avertissement crédit</strong></p>
                                   <p>Aucun versement, de quelque nature que ce soit, ne peut être exigé d'un particulier, avant l'obtention d'un ou plusieurs prêts d'argent.</p>
                               </div>
                           </div>
                           <div class="boxRoundBottomL">
                               <div class="boxRoundBottomR">&nbsp;</div>
                           </div>
                           <span class="arrow">&nbsp;</span>
                       </div>
                   </dd>
               </dl>
           </div>
    <div class="moreCoordBlock">
                    <div class="sc">
                                <div class="localisationBlock">
                                    <p class="itemCoord"><strong>Appels Particuliers</strong><br>206&nbsp;chem des 4 Pilles<br>71000&nbsp;Mâcon</p>
                                    <a href="http://www.pagesjaunes.fr/pros/00322525#onglet-infos" data-pjstats="{&quot;idRequete&quot;:&quot;140670850297018226&quot;,&quot;typeBloc&quot;:&quot;CV_LVS&quot;,&quot;genreBloc&quot;:&quot;1&quot;,&quot;idTag&quot;:&quot;PLUS-D-INFOS&quot;,&quot;pjscript&quot;:&quot;xt_click({},'C','{%xtn2}','Liens_edito_BI_LR_1::Plus_informations_autre_adresse','N');&quot;,&quot;pos&quot;:8}">+ d’infos</a></div>
                                <div class="contactBlock " id="contactBlock-00168240000011C0001-8-2">
                                    <p id="userContactTel00168240000011C0001-8-2" class="userContactTel btnShowNumber">
 
                                            <a href="#contactList00168240000011C0001-8-2" class="linkButtonS1 withPicto picTel JS_PJ" data-pjstats="{&quot;idRequete&quot;:&quot;140670850297018226&quot;,&quot;typeBloc&quot;:&quot;CV_LVS&quot;,&quot;genreBloc&quot;:&quot;1&quot;,&quot;idProduit&quot;:2,&quot;idTag&quot;:&quot;CONTACTER-PAR-TELEPHONE&quot;,&quot;pos&quot;:8}" data-pjmyhistosearch_proclic="{&quot;id&quot;:&quot;00322525&quot;}" data-pjtoggleclass="{klass:'displayedNumber',sel:'#contactBlock-00168240000011C0001-8-2'}" data-pjcookie="{name:'idBlocUnmasked',value:'00168240000011C0001',domain:'pagesjaunes.fr',action:'plus'}"><span class="picto"></span>
                                               Afficher<br>le numéro</a></p>
                                   <br>
<ul id="contactList00168240000011C0001-8-2" class="blocPhoneNumber">
                                     <li class="hideTel">
<strong>.09 74 75 0272</strong>
 
    </li>
</ul>
                                </div>
                            </div>
                        <div class="sc">
                                <div class="localisationBlock">
                                    <p class="itemCoord"><strong>Appels Agriculteurs</strong><br>206&nbsp;chem des 4 Pilles<br>71000&nbsp;Mâcon</p>
                                    <a href="#null" data-pjlb="{&quot;url&quot;:&quot;#aHR0cDovL3d3dy5wYWdlc2phdW5lcy5mci9mZC1tZWRpYS9jb250ZW51RkRNZWRpYT9pZEJsb2NBbm5vbmNldXI9MDAxNjgyNDAwMDAwMTFDMDAwMSZpbmRleEluc2NyaXB0aW9uU2VsZWN0aW9ubmVlPTImdHlwZU9uZ2xldFNlbGVjdGlvbm5lPUlORk8=&quot;}" data-pjstats="{&quot;idRequete&quot;:&quot;140670850297018226&quot;,&quot;typeBloc&quot;:&quot;CV_LVS&quot;,&quot;genreBloc&quot;:&quot;1&quot;,&quot;idTag&quot;:&quot;PLUS-D-INFOS&quot;,&quot;pjscript&quot;:&quot;xt_click({},'C','{%xtn2}','Liens_edito_BI_LR_1::Plus_informations_autre_adresse','N');&quot;,&quot;pos&quot;:8}">+ d’infos</a></div>
                                <div class="contactBlock " id="contactBlock-00168240000011C0001-8-3">
                                    <p id="userContactTel00168240000011C0001-8-3" class="userContactTel btnShowNumber">
 
                                            <a href="#contactList00168240000011C0001-8-3" class="linkButtonS1 withPicto picTel JS_PJ" data-pjstats="{&quot;idRequete&quot;:&quot;140670850297018226&quot;,&quot;typeBloc&quot;:&quot;CV_LVS&quot;,&quot;genreBloc&quot;:&quot;1&quot;,&quot;idProduit&quot;:3,&quot;idTag&quot;:&quot;CONTACTER-PAR-TELEPHONE&quot;,&quot;pos&quot;:8}" data-pjmyhistosearch_proclic="{&quot;id&quot;:&quot;00322525&quot;}" data-pjtoggleclass="{klass:'displayedNumber',sel:'#contactBlock-00168240000011C0001-8-3'}" data-pjcookie="{name:'idBlocUnmasked',value:'00168240000011C0001',domain:'pagesjaunes.fr',action:'plus'}"><span class="picto"></span>
                                               Afficher<br>le numéro</a></p>
                                   <br>
<ul id="contactList00168240000011C0001-8-3" class="blocPhoneNumber">
                                     <li class="hideTel">
<strong>.09 74 75 0273</strong>
 
    </li>
</ul>
                                </div>
                            </div>
                        </div>

 

And I have this code (it d'oesnt work) :

$Text = FileRead(@DesktopDir & "\lrvisitcard.html")
$ScClass = StringInStr($Text, 'class="sc"')
$NextBlocTexte = StringMid($Text, $ScClass)
Local $Between = "", _
    $Div = 0, _
    $StartSDiv = 1, _
    $BetweenStart
While 1
    If $BetweenStart Then
        $CountSDiv = StringInStr($BetweenStart, "</div")
    Else
        $CountSDiv = StringInStr($NextBlocTexte, "</div")
    EndIf
    If $CountSDiv <> 0 Then
        If $BetweenStart Then
            $Between = StringLeft($BetweenStart, $CountSDiv)
        Else
            $Between = StringLeft($NextBlocTexte, $CountSDiv)
        EndIf
        $StartDiv = 1
        While 1
            $CountDiv = StringInStr($Between, "<div", 0, 1, $StartDiv)
            If $CountDiv <> 0 Then
                $StartDiv += $CountDiv
                $Div += 1
            Else
                ExitLoop
            EndIf
        WEnd
        $StartSDiv2 = 1
        While 1
            $CountSDiv2 = StringInStr($Between, "</div", 0, 1, $StartSDiv2)
            If $CountSDiv2 <> 0 Then
                $StartSDiv2 += $CountSDiv2
                $Div -=1
            Else
                ExitLoop
            EndIf
        WEnd
        If $Div = 0 Then
            $Between = StringLeft($NextBlocTexte, $StartSDiv)
            ExitLoop
        Else
            $BetweenStart = StringMid($NextBlocTexte, $CountSDiv + 5)
        EndIf
    Else
        ExitLoop
    EndIf
WEnd

I search to extract the text between class="sc" and the end of this div. But in my text file, I have some group --> class="sc" ... </div <--- to extract

Ask me if you don't understand my explications !  :blink:

Thanks

Qui ose gagneWho Dares Win[left]CyberExploit[/left]

Link to comment
Share on other sites

#include <array.au3>
#include <File.au3>
#include <String.au3>

Global $temp

Global $aArray[0]
     $temp = FileRead(@ScriptDir & '\Irvisitcard.htm');open the webppage
     $aArray = _StringBetween($temp,'class="sc">','</div>',1)
     _ArrayDisplay($aArray)

Get Scite to add a popup when you use a 3rd party UDF -> http://www.autoitscript.com/autoit3/scite/docs/SciTE4AutoIt3/user-calltip-manager.html

Link to comment
Share on other sites

#include <array.au3>
#include <File.au3>
#include <String.au3>

Global $temp

Global $aArray[0]
     $temp = FileRead(@ScriptDir & '\Irvisitcard.htm');open the webppage
     $aArray = _StringBetween($temp,'class="sc">','</div>')
     MsgBox(0,"123",$aArray[0])

This one actually shows all the data in a msgbox

Get Scite to add a popup when you use a 3rd party UDF -> http://www.autoitscript.com/autoit3/scite/docs/SciTE4AutoIt3/user-calltip-manager.html

Link to comment
Share on other sites

I don't use _ie function because I have many bugs

To recover the source I use :

$URL = "http://www.google.fr
$HTTP = ObjCreate("winhttp.winhttprequest.5.1")
$HTTP.Open("GET", $URL)
$HTTP.Send()
$HTTP.WaitForResponse()
$Source = $HTTP.Responsetext
$Statut = $HTTP.Status
If $Statut == 200 Then
EndIf

And I use regex and other function like stringmid, _stringbetween, stringleft, ...

It's more fast and I have'nt bug like with _ie function

Qui ose gagneWho Dares Win[left]CyberExploit[/left]

Link to comment
Share on other sites

I don't use _ie function because I have many bugs

What bugs have you run into? Have you reported them in the bug tracker?

Is it possible that it's not a bug but the way you wrote it? What version of AutoIt were/are you using?

If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Link to comment
Share on other sites

#include <Array.au3>
#include <file.au3>
#include <MsgBoxConstants.au3>

Global $aArray[0]
$File = FileRead(@ScriptDir & '\Irvisitcard.htm');open the webppage
     $LeftCount = StringInStr($File,'<div class="sc"');count of characters left of string
     $TrimmedFile = StringTrimLeft($File,$LeftCount);Trimms all data to the left of the first <div class=
     $aArray = StringSplit($TrimmedFile,'<div class="sc"',1)
     $Data = $aArray[1]
     $aArray[1] = "<" & $Data
     MsgBox(0,"Array1",$aArray[1])
     For $i = 2 To $aArray[0]
         $temp = '<div class="sc"' & $aArray[$i]
         $aArray[$i] = $temp
         MsgBox(0,"Array1",$aArray[$i])
     Next

This only works if the <div class"sc"> tags are next to each other in the code.

Edited by computergroove

Get Scite to add a popup when you use a 3rd party UDF -> http://www.autoitscript.com/autoit3/scite/docs/SciTE4AutoIt3/user-calltip-manager.html

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...