Sign in to follow this  
Followers 0
ATR

Regexp problem <strong> balise

7 posts in this topic

Hello,

I have a small regexp problem...

Many hours that I search solution, but I don't find :(

I have a string 

localisationBlock....many text......<p><strong>Agence CFI Mâcon</strong><br>6&nbsp;grande rue de la Coupée<br>71850&nbsp;Charnay lès Mâcon</p>

and <strong> and </strong> are not always present

I have my regexp : '(?s)localisationBlock.*?(?:<p><strong>|<p>)(.*?(?:</strong>)?)</p>.*?/div'

Thanks in advance

Share this post


Link to post
Share on other sites



Maybe it would be easier to adress the containing elemt in your webpage by name or id and then just read the innerhtml instead of the text with all the tags.


Scripts & functions Organize Includes Let Scite organize the include files

Yahtzee The game "Yahtzee" (Kniffel, DiceLion)

LoginWrapper Secure scripts by adding a query (authentication)

_RunOnlyOnThis UDF Make sure that a script can only be executed on ... (Windows / HD / ...)

Internet-Café Server/Client Application Open CD, Start Browser, Lock remote client, etc.

MultipleFuncsWithOneHotkey Start different funcs by hitting one hotkey different times

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

In fact I already use _IETagNameGetCollection() function. 

$Donnees = _IETagNameGetCollection($IE, "li")
For $Donnee In $Donnees
If StringInStr($Donnee.classname, "visitCard withVisual sc") Then
$Adresse = StringRegExp($Donnee.outerhtml, '(?s)localisationBlock.*?(?:<p><strong>|<p>)(.*?(?:</strong>)?)</p>.*?/div', 1)
If @error = 0 Then
consolewrite($Adresse[0] & @crlf)
EndIf
EndIf
Next
Edited by ATR

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

Hi,

You are not explaining what you want to get with your regular expression...

Edit: Please provide an example of the desired output in all cases. (with or without the strong tags as you said)

Br, FireFox.

Edited by FireFox

 

OS : Win XP SP2 (32 bits) / Win 7 SP1 (64 bits) / Win 8 (64 bits) | Autoit version: latest stable / beta.
Hardware : Intel(R) Core(TM) i5-2400 CPU @ 3.10Ghz / 8 GiB RAM DDR3.

My UDFs : Skype UDF | TrayIconEx UDF | GUI Panel UDF | Excel XML UDF | Is_Pressed_UDF

My Projects : YouTube Multi-downloader | FTP Easy-UP | Lock'n | WinKill | AVICapture | Skype TM | Tap Maker | ShellNew | Scriptner | Const Replacer | FT_Pocket | Chrome theme maker

My Examples : Capture toolIP Camera | Crosshair | Draw Captured Region | Picture Screensaver | Jscreenfix | Drivetemp | Picture viewer

My Snippets : Basic TCP | Systray_GetIconIndex | Intercept End task | Winpcap various | Advanced HotKeySet | Transparent Edit control

 

Share this post


Link to post
Share on other sites
        <div class="localisationBlock">
            <p><strong>Agence CFI Mâcon</strong><br>6&nbsp;grande rue de la Coupée<br>71850&nbsp;Charnay lès Mâcon</p>
            <ul>
            </div>
 
 
And I want :
 
Agence CFI Mâcon
6 grande rue de la Coupée
71850 Charnay lès Mâcon
 
Best regards

Share this post


Link to post
Share on other sites

#include <Array.au3>
 
$s = FileRead("t.txt")
 
$a = StringRegExp($s, "(?m)>(.*?)<", 3)
 
_ArrayDisplay($a)

 

OS : Win XP SP2 (32 bits) / Win 7 SP1 (64 bits) / Win 8 (64 bits) | Autoit version: latest stable / beta.
Hardware : Intel(R) Core(TM) i5-2400 CPU @ 3.10Ghz / 8 GiB RAM DDR3.

My UDFs : Skype UDF | TrayIconEx UDF | GUI Panel UDF | Excel XML UDF | Is_Pressed_UDF

My Projects : YouTube Multi-downloader | FTP Easy-UP | Lock'n | WinKill | AVICapture | Skype TM | Tap Maker | ShellNew | Scriptner | Const Replacer | FT_Pocket | Chrome theme maker

My Examples : Capture toolIP Camera | Crosshair | Draw Captured Region | Picture Screensaver | Jscreenfix | Drivetemp | Picture viewer

My Snippets : Basic TCP | Systray_GetIconIndex | Intercept End task | Winpcap various | Advanced HotKeySet | Transparent Edit control

 

Share this post


Link to post
Share on other sites

$txt = '  <div class="localisationBlock">' &@crlf& _
 '           <p><strong>Agence CFI Mâcon</strong><br>6&nbsp;grande rue de la Coupée<br>71850&nbsp;Charnay lès Mâcon</p>'  &@crlf& _
           ' <ul>' &@crlf& _
          '  </div>'

msgbox(0,"", $txt)
$txt = StringRegExpReplace($txt, '(?s)(<.*?>)+', @crlf)
$txt = StringStripWS(StringReplace($txt, '&nbsp;', " "), 3)
msgbox(0,"", $txt)

:P

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0