Sign in to follow this  
Followers 0
jerem488

Recover text in a web page

9 posts in this topic

Hi everybody,

Since some days, I try to recover the text in a web page...

And in the body of the page, there isn't element "ID" ! So I don't know how to do :(

For clarity, I put you the body of the web page, and below the text I want to have output.

THE BODY :

<DIV id=header>
<DIV id=pub_header>
<script type=text/javascript charset=ISO-8859-1 src="http://adnext.fr/richmedia.adv?id=96154&amp;tag=1&amp;s=all"></SCRIPT>

<script language=javascript type=text/javascript src="http://s0b.bluestreak.com/ix.e?jss&amp;wmode=default&amp;s=8044128&amp;u=&amp;n=1074842607&amp;cltk=http://adnext.fr/click/64325922?"></SCRIPT>

<script language=VBScript>  on error resume next 
  ShockMode = (IsObject(CreateObject("ShockwaveFlash.ShockwaveFlash.8")))</SCRIPT>

<script type=text/javascript src="http://ak.bluestreak.com//adv/conduit.js"></SCRIPT>

<script type=text/javascript src="http://ak.bluestreak.com//adv/rmInteractionTrackingConduit.js"></SCRIPT>

<script type=text/javascript>var tracking_object_registerer_80441287526201 = window.setInterval(function() {    if(typeof(RMInteractionTrackingConduit) != "undefined") {       clearInterval(tracking_object_registerer_80441287526201);       tracking_object_8044128 = new RMInteractionTrackingConduit(8044128,7526201,""); }}, 100);</SCRIPT>
<B>
<OBJECT id=BSFlashAd class=BSFlashAd classid=clsid:D27CDB6E-AE6D-11cf-96B8-444553540000 width=728 height=90 wmode="opaque"><PARAM NAME="_cx" VALUE="19261"><PARAM NAME="_cy" VALUE="2381"><PARAM NAME="FlashVars" VALUE=""><PARAM NAME="Movie" VALUE="http://ak.bluestreak.com//adv/sfr-fr/^17334/^7526201/simplicite728x90.swf"><PARAM NAME="Src" VALUE="http://ak.bluestreak.com//adv/sfr-fr/^17334/^7526201/simplicite728x90.swf"><PARAM NAME="WMode" VALUE="Window"><PARAM NAME="Play" VALUE="-1"><PARAM NAME="Loop" VALUE="-1"><PARAM NAME="Quality" VALUE="High"><PARAM NAME="SAlign" VALUE=""><PARAM NAME="Menu" VALUE="-1"><PARAM NAME="Base" VALUE="http://ak.bluestreak.com//adv/sfr-fr/^17334/^7526201/"><PARAM NAME="AllowScriptAccess" VALUE="always"><PARAM NAME="Scale" VALUE="ShowAll"><PARAM NAME="DeviceFont" VALUE="0"><PARAM NAME="EmbedMovie" VALUE="0"><PARAM NAME="BGColor" VALUE=""><PARAM NAME="SWRemote" VALUE=""><PARAM NAME="MovieData" VALUE=""><PARAM NAME="SeamlessTabbing" VALUE="1"><PARAM NAME="Profile" VALUE="0"><PARAM NAME="ProfileAddress" VALUE=""><PARAM NAME="ProfilePort" VALUE="0"><PARAM NAME="AllowNetworking" VALUE="all"><PARAM NAME="AllowFullScreen" VALUE="false">
     
<embed class='BSFlashAd' wmode='opaque' src='http://ak.bluestreak.com//adv/sfr-fr/^17334/^7526201/simplicite728x90.swf' FlashVars='siteid=8044128&adid=7526201&tracking_object=tracking_object_8044128&cltk=http%3A%2F%2Fadnext%2Efr%2Fclick%2F64325922%3F&xtrack=xxxtrackxx&mb=http://ak.bluestreak.com//adv/sfr-fr/^17334/^7526201/&cp=http://ak.bluestreak.com//flashtracking/tracking_&url=http%3A%2F%2Fadnext%2Efr%2Fclick%2F64325922%3Fhttp%3A%2F%2Fs0b%2Ebluestreak%2Ecom%2Fix%2Ee%3Ftr%26s%3D8044128%26a%3D7526201%26u%3Dhttp%3A%2F%2Fwww%2Esfrbusinessteam%2Efr%2Ftelephone%2Dinternet%2Foffres%2Dmobile%2Dfixe%2Foffre%2Dtout%2Den%2Dun%2Fpack%2Dbusiness%2Ejsp%3Fsfrcpid%3DBOLBT%5Fpub%5Fmarsavril%5Fpackbusiness%5Fbanniere%26sfrcpid%3Dt2%5Fmobr%5Fpackbusiness%5Fb4&clickTag=http%3A%2F%2Fadnext%2Efr%2Fclick%2F64325922%3Fhttp%3A%2F%2Fs0b%2Ebluestreak%2Ecom%2Fix%2Ee%3Ftr%26s%3D8044128%26a%3D7526201%26u%3Dhttp%3A%2F%2Fwww%2Esfrbusinessteam%2Efr%2Ftelephone%2Dinternet%2Foffres%2Dmobile%2Dfixe%2Foffre%2Dtout%2Den%2Dun%2Fpack%2Dbusiness%2Ejsp%3Fsfrcpid%3DBOLBT%5Fpub%5Fmarsavril%5Fpackbusiness%5Fbanniere%26sfrcpid%3Dt2%5Fmobr%5Fpackbusiness%5Fb4' quality='high' base='http://ak.bluestreak.com//adv/sfr-fr/^17334/^7526201/' AllowScriptAccess='always' width='728' height='90' 
 type='application/x-shockwave-flash'> </embed></OBJECT></B></DIV>
<DIV id=logo_cci_header onclick="javascript:window.location='http://www.aef.cci.fr'"></DIV></DIV>
<DIV id=header2 class=text_white><STRONG>L'Annuaire des Entreprises de France</STRONG>, le fichier national de 2 millions d'entreprises des Chambres de Commerce et d'Industrie </DIV>
<DIV id=header3><IMG alt=CCI src="http://www.aef.cci.fr/images/logo_cci_grand.gif"> </DIV>
<DIV id=login_header class=text_little_white>
<FORM id=connexion method=post action=http://www.aef.cci.fr/achatFichiers//login.php?action=process>
<DIV id=compte_perso class=text_bold_little_white>Mon compte personnel</DIV>
<DIV id=login>Identifiant (E-mail) <INPUT id=email_address class=text_little_black size=15 type=text name=email_address></DIV>
<DIV id=mot_passe>Mot de passe <INPUT id=password class=text_little_black value="" size=15 type=password name=password></DIV>
<DIV id=btn_validez><INPUT id=validez value=OK src="/images/ok.gif" type=image name=validez></DIV>
<DIV id=mot_passe_oublie><A href="http://www.aef.cci.fr/pageConnexion">Mot de passe oublié ?</A></DIV>
<DIV id=devenez_membre><A class=text_little_white href="http://www.aef.cci.fr/statiques/avantages-membre"><SPAN>Devenir membre</SPAN></A> </DIV></FORM></DIV>
<DIV id=retour_accueil><A href="http://www.aef.cci.fr/">accueil</A> </DIV>
<DIV id=container>
<DIV id=bloc_contenu>
<DIV id=bloc_contenu1>
<DIV id=AEF_fichier>
<DIV id=AEF_fichier_header>
<H2>AEF, le fichier B to B des CCI françaises</H2></DIV>
<DIV id=AEF_fichier_contenu><BR>
<UL id=menu_fichier>
<LI><A href="http://www.aef.cci.fr/statiques/recherche-entreprises">Rechercher des entreprises</A></LI>
<LI><A href="http://www.aef.cci.fr/statiques/achat-fichier">Acheter un fichier en ligne</A></LI>
<LI><A href="http://www.aef.cci.fr/statiques/contact">Etre conseillé par un expert</A></LI>
<LI><A href="http://www.aef.cci.fr/statiques/aide">Aide en ligne</A></LI>
<LI><A href="http://www.aef.cci.fr/statiques/conseils-utilisation">Conseils d'utilisation</A></LI>
<LI><A href="http://www.aef.cci.fr/statiques/avantages-membre">Devenir membre AEF<BR>Découvrez les avantages</A></LI></UL></DIV></DIV>
<DIV id=AEF_actualite>
<DIV id=AEF_actualite_header>
<H2><BR>Atouts de nos fichiers</H2></DIV>
<DIV id=AEF_actualite_contenu><BR>
<UL id=menu_actualite>
<LI>La fiabilité de la source CCI</LI>
<LI>Mises à jour régulières par les Conseillers fichiers CCI</LI>
<LI>La force du réseau des CCI avec plus de 2 millions d'entreprises enregistrées</LI>
<LI>Pas de montant minimum et paiement sécurisé <BR>
<CENTER><IMG alt="Moyens paiement" src="http://www.aef.cci.fr/images/moyen_paiement.gif"></CENTER></LI></UL></DIV></DIV></DIV>
<DIV id=bloc_contenu2>
<DIV id=principale>
<H1 id=titre>Bernard Roland</H1>
<DIV id=contenu class=statique><A name=fiche></A>
<P class=float_right><SPAN class=boutonSuivant><A id=infolegales href="http://www.aef.cci.fr/listeEntreprises/ficheEntreprise?siret=65714078600010#informations">Informations légales </A></SPAN></P>
<P class=ficheCCI>Sa chambre de commerce et d'industrie est: <A href="http://www.macon.cci.fr" target=_blank>Saône-et-Loire</A> </P>
<H3>Identité</H3>
<DL class=fiche>
<DT>SIRET</DT>
<DD>657 140 786 00010</DD>
<DT>Statut</DT>
<DD>Siège social ou établissement principal</DD>
<DT>Catégorie</DT>
<DD>Industrie</DD></DL>
<H3 class=alterne>Coordonnées</H3>
<DL class=fiche>
<DT>Voie </DT>
<DD>Le Montceau</DD>
<DT>Code postal </DT>
<DD>71580</DD>
<DT>Ville</DT>
<DD>Le Fay</DD>
<DT>Pays </DT>
<DD>FRANCE</DD>
<DT>Téléphone </DT>
<DD>+33 3 85 74 10 79 </DD></DL>
<H3 class=alterne>Informations économiques</H3>
<DL class=fiche>
<DT>Date de début d'activité </DT>
<DD>01/01/1971</DD></DL>
<H4>Activité de l'établissement</H4>
<DL class=fiche>
<DT>Code APET </DT>
<DD>011A</DD>
<DT>Libellé code APET </DT>
<DD>Culture de céréales ; cultures industrielles</DD>
<DT>Code NAF 2008 </DT>
<DD>0111Z</DD>
<DT>Libellé code NAF 2008 </DT>
<DD>Culture de céréales (à l'exception du riz), de légumineuses et de graines oléagineuses</DD>
<DT>Activité en clair </DT>
<DD>Cultures Generales Polyculture,Trav Agric-Loueur De Fonds </DD></DL>
<H3>Informations légales</H3>
<H4>Identité</H4>
<DL class=fiche>
<DT>SIREN</DT>
<DD>657 140 786</DD>
<DT>Raison sociale</DT>
<DD>Bernard Roland</DD>
<DT>Dénomination </DT>
<DD>Bernard Roland</DD></DL>
<H4>Renseignements juridiques</H4>
<DL class=fiche>
<DT>Forme juridique </DT>
<DD>EI (Entreprise Individuelle)</DD></DL>
<DIV>
<H4>Dirigeants</H4>
<DL class=fiche>
<DT>Responsable légal</DT>
<DD>M Bernard Roland</DD></DL></DIV><BR>
<P class=float_right><SPAN class=boutonSuivant><A id=retourHaut class=boutonRetour href="http://www.aef.cci.fr/listeEntreprises/ficheEntreprise?siret=65714078600010#fiche">Retour haut </A></SPAN></P>
<P class=retour><SPAN class=boutonRetour></SPAN></P><A name=informations></A>
<UL>
<LI id=modifier><SPAN class=boutonRetour><A class=boutonRetour href="http://www.aef.cci.fr/accueil/redirigerVersRecherche">Modifier les critères </A></SPAN></LI>
<LI id=afficher><SPAN class=boutonRetour><A href="..">Retour à la liste entreprises </A></SPAN></LI></UL>
<P class=notes>NB: Ces informations sont actuellement disponibles à titre gratuit. Prochainement, certaines d'entre elles deviendront payantes au fur et à mesure de l'enrichissement des champs de données.</P></DIV>
<DIV id=piedContenu>&nbsp;</DIV><!-- Cette balise n'est présente que pour corriger un problème d'IE --><A style="VISIBILITY: hidden" href="http://www.aef.cci.fr/listeEntreprises/ficheEntreprise?siret=65714078600010#informations">Voir les informations légales</A> </DIV></DIV>
<DIV id=thematique_complementaire>
<DIV id=thematique_complementaire_header>
<H2><BR>Services complémentaires</H2></DIV>
<DIV id=thematique_complementaire_contenu><BR>
<UL id=menu_thematique_complementaire1>
<LI><A href="http://www.cci.fr/Groups/cfenet/site_reference_fr/LABEL_HomePage_view_front" target=_blank>Formalités en ligne</A></LI>
<LI><A href="http://www.cci.fr/Groups/thematiques/Creation/thematique_front_view" target=_blank>Création d'entreprises</A></LI>
<LI><A href="http://www.cci.fr/Groups/thematiques/Transmission/thematique_front_view" target=_blank>Transmission, <BR>reprise d'entreprises</A></LI>
<LI><A href="http://www.cci.fr/Groups/semaphore/site_reference_fr/LABEL_HomePage_view_front" target=_blank>Aides et financements</A></LI></UL>
<UL id=menu_thematique_complementaire2>
<LI><A href="http://www.urbanicom.org" target=_blank>Grande distribution</A></LI>
<LI><A href="http://www.urbanicom.org" target=_blank>Urbanisme commercial,<BR>CDEC</A></LI>
<LI><A href="http://www.cci.fr/agenda-international" target=_blank>Evénements export</A></LI>
<LI><A href="http://www.cci.fr/Groups/union_des_chambres_de_commerce_et_dindustrie_francaises_a_letranger/site_reference_fr/LABEL_HomePage_view_front" target=_blank>CCI à l'étranger</A></LI></UL>
<UL id=menu_thematique_complementaire3>
<LI><A href="http://www.cci.fr/Groups/le_portail_de_lenvironnement/site_reference_fr/LABEL_HomePage_view_front" target=_blank>Développement durable</A></LI>
<LI><A href="http://www.cci.fr/Groups/formation/site_reference_fr/LABEL_HomePage_view_front" target=_blank>Formation dans les CCI</A></LI>
<LI><A href="http://www.marches.cci.fr/XMain/?AdminX=327a0e1117288c4199764baff2b87d" target=_blank>Marchés publics</A></LI>
<LI><A href="http://www.cci.fr/Groups/cotraitel/site_reference_fr/LABEL_HomePage_view_front" target=_blank>Annuaire sous-traitance</A></LI></UL>
<UL id=menu_thematique_complementaire4>
<LI><A href="http://www.cci.fr" target=_blank>Portail des CCI</A></LI>
<LI><A href="http://www.cci.fr/recherche_cci" target=_blank>Annuaire des CCI</A></LI>
<LI><A href="http://www.acfci.cci.fr" target=_blank>ACFCI (Assemblée des<BR>Chambres Françaises de<BR>Commerce et d'Industrie)</A></LI></UL></DIV></DIV></DIV>
<DIV id=bloc_pub>
<script type=text/javascript charset=ISO-8859-1 src="http://adnext.fr/richmedia.adv?popunderid=96154&amp;tag=3&amp;s=all"></SCRIPT>

<script language=javascript type=text/javascript src="http://ad-emea.doubleclick.net/adj/N5295.152106.9750800178521/B4387586.3;sz=300x250;click=http://adnext.fr/click/64473699?;ord=1074894057?"></SCRIPT>
<!-- Copyright 2008 DoubleClick, a division of Google Inc. All rights reserved. --><!-- Code auto-generated on Tue Nov 24 06:51:12 EST 2009 -->
<script src="http://s0.2mdn.net/879366/flashwrite_1_2.js"></SCRIPT>

<OBJECT id=DCF223477725 classid=clsid:d27cdb6e-ae6d-11cf-96b8-444553540000 width=300 height=250><PARAM NAME="_cx" VALUE="7937"><PARAM NAME="_cy" VALUE="6614"><PARAM NAME="FlashVars" VALUE=""><PARAM NAME="Movie" VALUE="http://s0.2mdn.net/2534702/lbc_r1_google_standalone_france_300.swf"><PARAM NAME="Src" VALUE="http://s0.2mdn.net/2534702/lbc_r1_google_standalone_france_300.swf"><PARAM NAME="WMode" VALUE="Opaque"><PARAM NAME="Play" VALUE="-1"><PARAM NAME="Loop" VALUE="-1"><PARAM NAME="Quality" VALUE="High"><PARAM NAME="SAlign" VALUE=""><PARAM NAME="Menu" VALUE="-1"><PARAM NAME="Base" VALUE="http://s0.2mdn.net/2534702"><PARAM NAME="AllowScriptAccess" VALUE="never"><PARAM NAME="Scale" VALUE="ShowAll"><PARAM NAME="DeviceFont" VALUE="0"><PARAM NAME="EmbedMovie" VALUE="0"><PARAM NAME="BGColor" VALUE=""><PARAM NAME="SWRemote" VALUE=""><PARAM NAME="MovieData" VALUE=""><PARAM NAME="SeamlessTabbing" VALUE="1"><PARAM NAME="Profile" VALUE="0"><PARAM NAME="ProfileAddress" VALUE=""><PARAM NAME="ProfilePort" VALUE="0"><PARAM NAME="AllowNetworking" VALUE="all"><PARAM NAME="AllowFullScreen" VALUE="false">
<embed src="http://s0.2mdn.net/2534702/lbc_r1_google_standalone_france_300.swf" flashvars="moviePath=http://s0.2mdn.net/2534702/&moviepath=http://s0.2mdn.net/2534702/&clickTAG=http%3A//ad-emea.doubleclick.net/click%253Bh%253Dv8/396f/f/20/%252a/a%253B223477725%253B0-0%253B0%253B47224017%253B4307-300/250%253B34333592/34351470/1%253B%253B%257Esscs%253D%253fhttp%3A//adnext.fr/click/64473699%3Fhttp%253a%252f%252fwww.google.com/local/add/splashPage%253Fgl%253Dfr%2526utm_source%253DFR-OffNetworkEssence-Q4%2526utm_medium%253Doa%2526utm_campaign%253Dfr%2526hl%253Dfr" width="300" height="250"  type="application/x-shockwave-flash" quality="high" swliveconnect="true" wmode="opaque" name="DCF223477725" base="http://s0.2mdn.net/2534702" AllowScriptAccess="never"></embed></OBJECT><NOSCRIPT></NOSCRIPT>
<DIV id=pub_autres></DIV></DIV>
<DIV id=footer>
<P>Firmnet © 2005-2009 - <A href="http://www.aef.cci.fr/statiques/mentions-legales">Mentions légales</A> - <A href="http://www.aef.cci.fr/statiques/qui-sommes-nous">En savoir plus sur AEF</A> - <A href="http://www.aef.cci.fr/statiques/contact">Contacts</A> - <A href="http://www.aef.cci.fr/statiques/conditions-generales-vente">CGV</A> - <A href="http://www.aef.cci.fr/statiques/plan">Plan du site</A> - <A href="http://www.aef.cci.fr/statiques/aide">Aide en ligne</A> - <A href="mailto:serviceclients@aef.cci.fr?subject=Devenir Annonceur sur l'Annuaire des Entreprises de France">Devenir annonceur</A> <BR><A href="http://www.aef.cci.fr/statiques/information-entreprise">Information entreprises</A> - <A href="http://www.aef.cci.fr/statiques/achat-fichier">Achat fichiers</A> - <A href="http://www.aef.cci.fr/statiques/recherche-entreprises">Recherche entreprises</A> - <A href="http://www.aef.cci.fr/statiques/vente-adresse">Vente adresses</A> - <A href="http://www.aef.cci.fr/statiques/vente-fichiers">Vente fichiers</A> - <A href="http://www.aef.cci.fr/statiques/liste-des-entreprises">Liste des entreprises</A> - <A href="http://www.aef.cci.fr/statiques/annuaire-entreprises">Annuaire des entreprises</A> <BR><A href="http://www.cci.fr/recherche_cci" target=_blank>Annuaire des CCI</A> - <A href="http://www.acfci.cci.fr" target=_blank>ACFCI</A> - <A href="http://www.cci.fr" target=_blank>Portail consulaire : www.cci.fr</A></P>
<P></P></DIV></DIV>
<script type=text/javascript>
        var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");
        document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));
    </SCRIPT>

<script type=text/javascript src="http://www.google-analytics.com/ga.js"></SCRIPT>

<script type=text/javascript>
        var pageTracker = _gat._getTracker("UA-4621836-1");
        pageTracker._initData();
        pageTracker._trackPageview();
    </SCRIPT>

And I want extract this text like this :

SIRET 657 140 786 00010

Voie Le Montceau

Code postal 71580

Ville Le Fay

Pays FRANCE

Téléphone +33 3 85 74 10 79

Forme juridique EI (Entreprise Individuelle)

Thanks in advance


Qui ose gagneWho Dares Win[left]CyberExploit[/left]

Share this post


Link to post
Share on other sites



You could make this much easier if you just posted a link to the page. I'm going cross-eyed just trying to read it in the code block.


George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

I can't put the direct link, because the site is protected...

But this is this site : http://www.aef.cci.fr/

And to have the real page, put this "65714078600010" in the "SIREN/SIRET" edit control


Qui ose gagneWho Dares Win[left]CyberExploit[/left]

Share this post


Link to post
Share on other sites

I can't put the direct link, because the site is protected...

But this is this site : http://www.aef.cci.fr/

And to have the real page, put this "65714078600010" in the "SIREN/SIRET" edit control

I can get the page source but it doesn't even resemble what you posted so I'll have to work on the source you provided. I'll get back to it soon.

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

If you want find the page source, you can execute this code

#include <IE.au3>
#include <Array.au3>
#include <File.au3>
#Include <String.au3>

$Page_internet = _IECreate("http://www.aef.cci.fr/", 0, 1, 1)

$Case_SIREN_SIRET = _IEGetObjById($Page_internet, "identifiant")
_IEPropertySet($Case_SIREN_SIRET, "innertext", "65714078600010")

$Bouton_Rechercher = _IEGetObjById($Page_internet, "valider")
_IEAction($Bouton_Rechercher, "click")

Sleep (6000)

$oLinks = _IELinkGetCollection($Page_internet)
For $oLink In $oLinks
    If StringInStr($oLink, "http://www.aef.cci.fr/accueil/listeEntreprises/ficheEntreprise?siret=65714078600010") Then
        ConsoleWrite("YES" & @CRLF)
        _IEAction($oLink, "click")
    EndIf
Next

Qui ose gagneWho Dares Win[left]CyberExploit[/left]

Share this post


Link to post
Share on other sites

Like I said, I CAN get the page code but it is not the same as you posted. No matter, I'm working out a solution based on what you posted in post #1.


George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

#include <IE.au3>
#include <String.au3>
#include <array.au3>
$oIE = _IEAttach ("Fich")
$test = _IEPropertyGet ($oIE, "outertext")
$1 = _StringBetween($test, "SIRET","Statut")
$2 = _StringBetween($test, "Voie ","Code postal")
$3 = _StringBetween($test, "Code postal ","Ville")
$4 = _StringBetween($test, "Ville","Pays")
$5 = _StringBetween($test, "Pays ","Téléphone")
$6 = _StringBetween($test, "Téléphone ","Informations")
$7 = _StringBetween($test, "Forme juridique ","Dirigeants")
MsgBox(0,"",$1[0]&@CRLF&$2[0]&@CRLF&$3[0]&@CRLF&$4[0]&@CRLF&$5[0]&@CRLF&$6[0]&@CRLF&$7[0])

Dono if text on page is changing but heares one try with Attach


TCP server and client - Learning about TCP servers and clients connection
Au3 oIrrlicht - Irrlicht project
Au3impact - Another 3D DLL game engine for autoit. (3impact 3Drad related)



460px-Thief-4-temp-banner.jpg
There are those that believe that the perfect heist lies in the preparation.
Some say that it’s all in the timing, seizing the right opportunity. Others even say it’s the ability to leave no trace behind, be a ghost.

 

Share this post


Link to post
Share on other sites

#8 ·  Posted (edited)

Take everything that you have in the code block in post #1 and paste it into a text file named source.txt in the same folder as the script I am about to give you.

Create an au3 file with this code and run it.

$sSrc = FileRead(@ScriptDir & "\source.txt")

$aHold = StringRegExp($sSrc, "(?i)(?s).+?Identité.+?(<dl.+/dl>)", 1)
If NOT @Error Then
    $aHold = StringRegExp($aHold[0], "(?i)<dt>(Siret|voie|Code postal|Ville|Pays|Téléphone|Forme juridique)\s*</dt>\s*<dd>(.+)</dd>", 3)
    If NOT @Error Then
        $sRtn = ""
        For $i = 0 To UBound($aHold) -2 Step 2
            $sRtn &= $aHold[$i] & "  " & $aHold[$i +1] & @CRLF
        Next
        MsgBox(0, "Result", $sRtn)
    EndIf
EndIf

It should be returning what you want.

If the source code for that page is changing then there will be issues because the source that I got did not include some of the information you wanted, although I think this coode will still work, it won't return what you wanted.

EDIT; I guess I could have been a nice guy and turned it into a function for you. Use this instead

$$sSrc = FileRead(@ScriptDir & "\source.txt")
$sInfo = _Info($sSrc)
If NOT @Error Then
    MsgBox(0, "Result", $sInfo)
Else
    MsgBox(0, "Error", "Error " @Error & " was returned)
EndIf

Func _Info($s_Str)
    If NOT $s_Str Then Return SetError(1) ;; No source code
    Local $s_Rtn = ""
    $a_Hold = StringRegExp($s_Str, "(?i)(?s).+?Identité.+?(<dl.+/dl>)", 1)
    If NOT @Error Then
        $a_Hold = StringRegExp($a_Hold[0], "(?i)<dt>(Siret|voie|Code postal|Ville|Pays|Téléphone|Forme juridique)\s*</dt>\s*<dd>(.+)</dd>", 3)
        If NOT @Error Then
            For $i = 0 To UBound($a_Hold) - 2 Step 2
                $s_Rtn &= $a_Hold[$i] & "  " & $a_Hold[$i + 1] & @CRLF
            Next
            Return $s_Rtn
        Else
            Return SetError(3) ;; Unable to create the information array
        EndIf
    EndIf
    Return SetError(2) ;; Unable to locate the proper section of the source
EndFunc   ;==>_Info
Edited by GEOSoft

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

I don't know how to say I'm very happy !! It works great :( I passed many days to search a solution for this, and don't find :)

If I can help you a day, I'd be really happy

Many thanks


Qui ose gagneWho Dares Win[left]CyberExploit[/left]

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0