jerem488 Posted April 1, 2010 Share Posted April 1, 2010 Hi everybody, Since some days, I try to recover the text in a web page... And in the body of the page, there isn't element "ID" ! So I don't know how to do For clarity, I put you the body of the web page, and below the text I want to have output. THE BODY : expandcollapse popup<DIV id=header> <DIV id=pub_header> <script type=text/javascript charset=ISO-8859-1 src=""></SCRIPT> <script language=javascript type=text/javascript src=""></SCRIPT> <script language=VBScript> on error resume next ShockMode = (IsObject(CreateObject("ShockwaveFlash.ShockwaveFlash.8")))</SCRIPT> <script type=text/javascript src=""></SCRIPT> <script type=text/javascript src=""></SCRIPT> <script type=text/javascript>var tracking_object_registerer_80441287526201 = window.setInterval(function() { if(typeof(RMInteractionTrackingConduit) != "undefined") { clearInterval(tracking_object_registerer_80441287526201); tracking_object_8044128 = new RMInteractionTrackingConduit(8044128,7526201,""); }}, 100);</SCRIPT> <B> <OBJECT id=BSFlashAd class=BSFlashAd classid=clsid:D27CDB6E-AE6D-11cf-96B8-444553540000 width=728 height=90 wmode="opaque"><PARAM NAME="_cx" VALUE="19261"><PARAM NAME="_cy" VALUE="2381"><PARAM NAME="FlashVars" VALUE=""><PARAM NAME="Movie" VALUE="^17334/^7526201/simplicite728x90.swf"><PARAM NAME="Src" VALUE="^17334/^7526201/simplicite728x90.swf"><PARAM NAME="WMode" VALUE="Window"><PARAM NAME="Play" VALUE="-1"><PARAM NAME="Loop" VALUE="-1"><PARAM NAME="Quality" VALUE="High"><PARAM NAME="SAlign" VALUE=""><PARAM NAME="Menu" VALUE="-1"><PARAM NAME="Base" VALUE="^17334/^7526201/"><PARAM NAME="AllowScriptAccess" VALUE="always"><PARAM NAME="Scale" VALUE="ShowAll"><PARAM NAME="DeviceFont" VALUE="0"><PARAM NAME="EmbedMovie" VALUE="0"><PARAM NAME="BGColor" VALUE=""><PARAM NAME="SWRemote" VALUE=""><PARAM NAME="MovieData" VALUE=""><PARAM NAME="SeamlessTabbing" VALUE="1"><PARAM NAME="Profile" VALUE="0"><PARAM NAME="ProfileAddress" VALUE=""><PARAM NAME="ProfilePort" VALUE="0"><PARAM NAME="AllowNetworking" VALUE="all"><PARAM NAME="AllowFullScreen" VALUE="false"> <embed class='BSFlashAd' wmode='opaque' src='^17334/^7526201/simplicite728x90.swf' FlashVars='siteid=8044128&adid=7526201&tracking_object=tracking_object_8044128&cltk=http%3A%2F%2Fadnext%2Efr%2Fclick%2F64325922%3F&xtrack=xxxtrackxx&mb=^17334/^7526201/&cp=' quality='high' base='^17334/^7526201/' AllowScriptAccess='always' width='728' height='90' type='application/x-shockwave-flash'> </embed></OBJECT></B></DIV> <DIV id=logo_cci_header onclick="javascript:window.location=''"></DIV></DIV> <DIV id=header2 class=text_white><STRONG>L'Annuaire des Entreprises de France</STRONG>, le fichier national de 2 millions d'entreprises des Chambres de Commerce et d'Industrie </DIV> <DIV id=header3><IMG alt=CCI src=""> </DIV> <DIV id=login_header class=text_little_white> <FORM id=connexion method=post action=> <DIV id=compte_perso class=text_bold_little_white>Mon compte personnel</DIV> <DIV id=login>Identifiant (E-mail) <INPUT id=email_address class=text_little_black size=15 type=text name=email_address></DIV> <DIV id=mot_passe>Mot de passe <INPUT id=password class=text_little_black value="" size=15 type=password name=password></DIV> <DIV id=btn_validez><INPUT id=validez value=OK src="/images/ok.gif" type=image name=validez></DIV> <DIV id=mot_passe_oublie><A href="">Mot de passe oublié ?</A></DIV> <DIV id=devenez_membre><A class=text_little_white href=""><SPAN>Devenir membre</SPAN></A> </DIV></FORM></DIV> <DIV id=retour_accueil><A href="">accueil</A> </DIV> <DIV id=container> <DIV id=bloc_contenu> <DIV id=bloc_contenu1> <DIV id=AEF_fichier> <DIV id=AEF_fichier_header> <H2>AEF, le fichier B to B des CCI françaises</H2></DIV> <DIV id=AEF_fichier_contenu><BR> <UL id=menu_fichier> <LI><A href="">Rechercher des entreprises</A></LI> <LI><A href="">Acheter un fichier en ligne</A></LI> <LI><A href="">Etre conseillé par un expert</A></LI> <LI><A href="">Aide en ligne</A></LI> <LI><A href="">Conseils d'utilisation</A></LI> <LI><A href="">Devenir membre AEF<BR>Découvrez les avantages</A></LI></UL></DIV></DIV> <DIV id=AEF_actualite> <DIV id=AEF_actualite_header> <H2><BR>Atouts de nos fichiers</H2></DIV> <DIV id=AEF_actualite_contenu><BR> <UL id=menu_actualite> <LI>La fiabilité de la source CCI</LI> <LI>Mises à jour régulières par les Conseillers fichiers CCI</LI> <LI>La force du réseau des CCI avec plus de 2 millions d'entreprises enregistrées</LI> <LI>Pas de montant minimum et paiement sécurisé <BR> <CENTER><IMG alt="Moyens paiement" src=""></CENTER></LI></UL></DIV></DIV></DIV> <DIV id=bloc_contenu2> <DIV id=principale> <H1 id=titre>Bernard Roland</H1> <DIV id=contenu class=statique><A name=fiche></A> <P class=float_right><SPAN class=boutonSuivant><A id=infolegales href="">Informations légales </A></SPAN></P> <P class=ficheCCI>Sa chambre de commerce et d'industrie est: <A href="" target=_blank>Saône-et-Loire</A> </P> <H3>Identité</H3> <DL class=fiche> <DT>SIRET</DT> <DD>657 140 786 00010</DD> <DT>Statut</DT> <DD>Siège social ou établissement principal</DD> <DT>Catégorie</DT> <DD>Industrie</DD></DL> <H3 class=alterne>Coordonnées</H3> <DL class=fiche> <DT>Voie </DT> <DD>Le Montceau</DD> <DT>Code postal </DT> <DD>71580</DD> <DT>Ville</DT> <DD>Le Fay</DD> <DT>Pays </DT> <DD>FRANCE</DD> <DT>Téléphone </DT> <DD>+33 3 85 74 10 79 </DD></DL> <H3 class=alterne>Informations économiques</H3> <DL class=fiche> <DT>Date de début d'activité </DT> <DD>01/01/1971</DD></DL> <H4>Activité de l'établissement</H4> <DL class=fiche> <DT>Code APET </DT> <DD>011A</DD> <DT>Libellé code APET </DT> <DD>Culture de céréales ; cultures industrielles</DD> <DT>Code NAF 2008 </DT> <DD>0111Z</DD> <DT>Libellé code NAF 2008 </DT> <DD>Culture de céréales (à l'exception du riz), de légumineuses et de graines oléagineuses</DD> <DT>Activité en clair </DT> <DD>Cultures Generales Polyculture,Trav Agric-Loueur De Fonds </DD></DL> <H3>Informations légales</H3> <H4>Identité</H4> <DL class=fiche> <DT>SIREN</DT> <DD>657 140 786</DD> <DT>Raison sociale</DT> <DD>Bernard Roland</DD> <DT>Dénomination </DT> <DD>Bernard Roland</DD></DL> <H4>Renseignements juridiques</H4> <DL class=fiche> <DT>Forme juridique </DT> <DD>EI (Entreprise Individuelle)</DD></DL> <DIV> <H4>Dirigeants</H4> <DL class=fiche> <DT>Responsable légal</DT> <DD>M Bernard Roland</DD></DL></DIV><BR> <P class=float_right><SPAN class=boutonSuivant><A id=retourHaut class=boutonRetour href="">Retour haut </A></SPAN></P> <P class=retour><SPAN class=boutonRetour></SPAN></P><A name=informations></A> <UL> <LI id=modifier><SPAN class=boutonRetour><A class=boutonRetour href="">Modifier les critères </A></SPAN></LI> <LI id=afficher><SPAN class=boutonRetour><A href="..">Retour à la liste entreprises </A></SPAN></LI></UL> <P class=notes>NB: Ces informations sont actuellement disponibles à titre gratuit. Prochainement, certaines d'entre elles deviendront payantes au fur et à mesure de l'enrichissement des champs de données.</P></DIV> <DIV id=piedContenu> </DIV><!-- Cette balise n'est présente que pour corriger un problème d'IE --><A style="VISIBILITY: hidden" href="">Voir les informations légales</A> </DIV></DIV> <DIV id=thematique_complementaire> <DIV id=thematique_complementaire_header> <H2><BR>Services complémentaires</H2></DIV> <DIV id=thematique_complementaire_contenu><BR> <UL id=menu_thematique_complementaire1> <LI><A href="" target=_blank>Formalités en ligne</A></LI> <LI><A href="" target=_blank>Création d'entreprises</A></LI> <LI><A href="" target=_blank>Transmission, <BR>reprise d'entreprises</A></LI> <LI><A href="" target=_blank>Aides et financements</A></LI></UL> <UL id=menu_thematique_complementaire2> <LI><A href="" target=_blank>Grande distribution</A></LI> <LI><A href="" target=_blank>Urbanisme commercial,<BR>CDEC</A></LI> <LI><A href="" target=_blank>Evénements export</A></LI> <LI><A href="" target=_blank>CCI à l'étranger</A></LI></UL> <UL id=menu_thematique_complementaire3> <LI><A href="" target=_blank>Développement durable</A></LI> <LI><A href="" target=_blank>Formation dans les CCI</A></LI> <LI><A href="" target=_blank>Marchés publics</A></LI> <LI><A href="" target=_blank>Annuaire sous-traitance</A></LI></UL> <UL id=menu_thematique_complementaire4> <LI><A href="" target=_blank>Portail des CCI</A></LI> <LI><A href="" target=_blank>Annuaire des CCI</A></LI> <LI><A href="" target=_blank>ACFCI (Assemblée des<BR>Chambres Françaises de<BR>Commerce et d'Industrie)</A></LI></UL></DIV></DIV></DIV> <DIV id=bloc_pub> <script type=text/javascript charset=ISO-8859-1 src=""></SCRIPT> <script language=javascript type=text/javascript src=";sz=300x250;click=;ord=1074894057?"></SCRIPT> <!-- Copyright 2008 DoubleClick, a division of Google Inc. All rights reserved. --><!-- Code auto-generated on Tue Nov 24 06:51:12 EST 2009 --> <script src=""></SCRIPT> <OBJECT id=DCF223477725 classid=clsid:d27cdb6e-ae6d-11cf-96b8-444553540000 width=300 height=250><PARAM NAME="_cx" VALUE="7937"><PARAM NAME="_cy" VALUE="6614"><PARAM NAME="FlashVars" VALUE=""><PARAM NAME="Movie" VALUE=""><PARAM NAME="Src" VALUE=""><PARAM NAME="WMode" VALUE="Opaque"><PARAM NAME="Play" VALUE="-1"><PARAM NAME="Loop" VALUE="-1"><PARAM NAME="Quality" VALUE="High"><PARAM NAME="SAlign" VALUE=""><PARAM NAME="Menu" VALUE="-1"><PARAM NAME="Base" VALUE=""><PARAM NAME="AllowScriptAccess" VALUE="never"><PARAM NAME="Scale" VALUE="ShowAll"><PARAM NAME="DeviceFont" VALUE="0"><PARAM NAME="EmbedMovie" VALUE="0"><PARAM NAME="BGColor" VALUE=""><PARAM NAME="SWRemote" VALUE=""><PARAM NAME="MovieData" VALUE=""><PARAM NAME="SeamlessTabbing" VALUE="1"><PARAM NAME="Profile" VALUE="0"><PARAM NAME="ProfileAddress" VALUE=""><PARAM NAME="ProfilePort" VALUE="0"><PARAM NAME="AllowNetworking" VALUE="all"><PARAM NAME="AllowFullScreen" VALUE="false"> <embed src="" flashvars="moviePath=" width="300" height="250" type="application/x-shockwave-flash" quality="high" swliveconnect="true" wmode="opaque" name="DCF223477725" base="" AllowScriptAccess="never"></embed></OBJECT><NOSCRIPT></NOSCRIPT> <DIV id=pub_autres></DIV></DIV> <DIV id=footer> <P>Firmnet © 2005-2009 - <A href="">Mentions légales</A> - <A href="">En savoir plus sur AEF</A> - <A href="">Contacts</A> - <A href="">CGV</A> - <A href="">Plan du site</A> - <A href="">Aide en ligne</A> - <A href=" Annonceur sur l'Annuaire des Entreprises de France">Devenir annonceur</A> <BR><A href="">Information entreprises</A> - <A href="">Achat fichiers</A> - <A href="">Recherche entreprises</A> - <A href="">Vente adresses</A> - <A href="">Vente fichiers</A> - <A href="">Liste des entreprises</A> - <A href="">Annuaire des entreprises</A> <BR><A href="" target=_blank>Annuaire des CCI</A> - <A href="" target=_blank>ACFCI</A> - <A href="" target=_blank>Portail consulaire :</A></P> <P></P></DIV></DIV> <script type=text/javascript> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "' type='text/javascript'%3E%3C/script%3E")); </SCRIPT> <script type=text/javascript src=""></SCRIPT> <script type=text/javascript> var pageTracker = _gat._getTracker("UA-4621836-1"); pageTracker._initData(); pageTracker._trackPageview(); </SCRIPT> And I want extract this text like this : SIRET 657 140 786 00010 Voie Le Montceau Code postal 71580 Ville Le Fay Pays FRANCE Téléphone +33 3 85 74 10 79 Forme juridique EI (Entreprise Individuelle) Thanks in advance Qui ose gagneWho Dares Win[left]CyberExploit[/left] Link to comment Share on other sites More sharing options...
GEOSoft Posted April 1, 2010 Share Posted April 1, 2010 You could make this much easier if you just posted a link to the page. I'm going cross-eyed just trying to read it in the code block. George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver: - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
jerem488 Posted April 1, 2010 Author Share Posted April 1, 2010 I can't put the direct link, because the site is protected... But this is this site : And to have the real page, put this "65714078600010" in the "SIREN/SIRET" edit control Qui ose gagneWho Dares Win[left]CyberExploit[/left] Link to comment Share on other sites More sharing options...
GEOSoft Posted April 1, 2010 Share Posted April 1, 2010 I can't put the direct link, because the site is protected...But this is this site : to have the real page, put this "65714078600010" in the "SIREN/SIRET" edit controlI can get the page source but it doesn't even resemble what you posted so I'll have to work on the source you provided. I'll get back to it soon. George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver: - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
jerem488 Posted April 1, 2010 Author Share Posted April 1, 2010 If you want find the page source, you can execute this code #include <IE.au3> #include <Array.au3> #include <File.au3> #Include <String.au3> $Page_internet = _IECreate("", 0, 1, 1) $Case_SIREN_SIRET = _IEGetObjById($Page_internet, "identifiant") _IEPropertySet($Case_SIREN_SIRET, "innertext", "65714078600010") $Bouton_Rechercher = _IEGetObjById($Page_internet, "valider") _IEAction($Bouton_Rechercher, "click") Sleep (6000) $oLinks = _IELinkGetCollection($Page_internet) For $oLink In $oLinks If StringInStr($oLink, "") Then ConsoleWrite("YES" & @CRLF) _IEAction($oLink, "click") EndIf Next Qui ose gagneWho Dares Win[left]CyberExploit[/left] Link to comment Share on other sites More sharing options...
GEOSoft Posted April 1, 2010 Share Posted April 1, 2010 Like I said, I CAN get the page code but it is not the same as you posted. No matter, I'm working out a solution based on what you posted in post #1. George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver: - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
bogQ Posted April 1, 2010 Share Posted April 1, 2010 #include <IE.au3> #include <String.au3> #include <array.au3> $oIE = _IEAttach ("Fich") $test = _IEPropertyGet ($oIE, "outertext") $1 = _StringBetween($test, "SIRET","Statut") $2 = _StringBetween($test, "Voie ","Code postal") $3 = _StringBetween($test, "Code postal ","Ville") $4 = _StringBetween($test, "Ville","Pays") $5 = _StringBetween($test, "Pays ","Téléphone") $6 = _StringBetween($test, "Téléphone ","Informations") $7 = _StringBetween($test, "Forme juridique ","Dirigeants") MsgBox(0,"",$1[0]&@CRLF&$2[0]&@CRLF&$3[0]&@CRLF&$4[0]&@CRLF&$5[0]&@CRLF&$6[0]&@CRLF&$7[0]) Dono if text on page is changing but heares one try with Attach TCP server and client - Learning about TCP servers and clients connectionAu3 oIrrlicht - Irrlicht projectAu3impact - Another 3D DLL game engine for autoit. (3impact 3Drad related) There are those that believe that the perfect heist lies in the preparation.Some say that it’s all in the timing, seizing the right opportunity. Others even say it’s the ability to leave no trace behind, be a ghost. Link to comment Share on other sites More sharing options...
GEOSoft Posted April 1, 2010 Share Posted April 1, 2010 (edited) Take everything that you have in the code block in post #1 and paste it into a text file named source.txt in the same folder as the script I am about to give you. Create an au3 file with this code and run it. $sSrc = FileRead(@ScriptDir & "\source.txt") $aHold = StringRegExp($sSrc, "(?i)(?s).+?Identité.+?(<dl.+/dl>)", 1) If NOT @Error Then $aHold = StringRegExp($aHold[0], "(?i)<dt>(Siret|voie|Code postal|Ville|Pays|Téléphone|Forme juridique)\s*</dt>\s*<dd>(.+)</dd>", 3) If NOT @Error Then $sRtn = "" For $i = 0 To UBound($aHold) -2 Step 2 $sRtn &= $aHold[$i] & " " & $aHold[$i +1] & @CRLF Next MsgBox(0, "Result", $sRtn) EndIf EndIf It should be returning what you want. If the source code for that page is changing then there will be issues because the source that I got did not include some of the information you wanted, although I think this coode will still work, it won't return what you wanted. EDIT; I guess I could have been a nice guy and turned it into a function for you. Use this instead $$sSrc = FileRead(@ScriptDir & "\source.txt") $sInfo = _Info($sSrc) If NOT @Error Then MsgBox(0, "Result", $sInfo) Else MsgBox(0, "Error", "Error " @Error & " was returned) EndIf Func _Info($s_Str) If NOT $s_Str Then Return SetError(1) ;; No source code Local $s_Rtn = "" $a_Hold = StringRegExp($s_Str, "(?i)(?s).+?Identité.+?(<dl.+/dl>)", 1) If NOT @Error Then $a_Hold = StringRegExp($a_Hold[0], "(?i)<dt>(Siret|voie|Code postal|Ville|Pays|Téléphone|Forme juridique)\s*</dt>\s*<dd>(.+)</dd>", 3) If NOT @Error Then For $i = 0 To UBound($a_Hold) - 2 Step 2 $s_Rtn &= $a_Hold[$i] & " " & $a_Hold[$i + 1] & @CRLF Next Return $s_Rtn Else Return SetError(3) ;; Unable to create the information array EndIf EndIf Return SetError(2) ;; Unable to locate the proper section of the source EndFunc ;==>_Info Edited April 2, 2010 by GEOSoft George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver: - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
jerem488 Posted April 2, 2010 Author Share Posted April 2, 2010 I don't know how to say I'm very happy !! It works great I passed many days to search a solution for this, and don't find If I can help you a day, I'd be really happy Many thanks Qui ose gagneWho Dares Win[left]CyberExploit[/left] Link to comment Share on other sites More sharing options...
