Sign in to follow this  
Followers 0
ATR

Help with regex between HTML tags

5 posts in this topic

#1 ·  Posted (edited)

Hi all,

I often use regex but here I don't find the good regex !

<tr  class="bg">
                            <td class="tdhead">Gérant</td>
                                                            <td>
                                                                            Mme <a href="/dirigeants/fiche/q/HUBERT5-Angèle6-Gabrielle6-Mad6-19507-3779055008">
                                            HUBERT Angèle
                                        </a>
                                                                    </td>
                                                    </tr>
                                            
                        <tr  class="bg">
                            <td class="tdhead">Directeur des achats</td>
                                                            <td>
                                                                            M HUBERT Alain
                                                                    </td>
                                                    </tr>

I want extract 'HUBERT Angèle' and 'M HUBERT Alain'.

My problem is that I haven't every time the <a > tags between the last <td> ... </td> tags

I test with this regex :

$HTML = StringRegExp($Data, '(?is)(?:<td>|<a.*>|)(.*?)(?=</(\w+)>)', 1)
If @error = 0 Then
    $HTML = StringStripWS($HTML[0], 7)
    ConsoleWrite($HTML & @LF)
EndIf

but it doesn't work :(

Edited by ATR

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

For that example they are the only lines without tags so:

$sFile = FileRead("test.html")
$aFile = stringsplit($sFile , @LF , 2)

$sOut = ""
For $i = 0 to ubound($aFile) - 1
If StringinStr($aFile[$i] , "<") = 0 AND StringinStr($aFile[$i] , ">") = 0 Then $sOut &= stringstripWS($aFile[$i] , 1)
Next

msgbox(0 , '' , $sOut)
Edited by boththose

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

One way :

#include <Array.au3>

$sData = '<tr  class="bg">' & @CRLF & _
'<td class="tdhead">Gérant</td>' & @CRLF & _
'<td>' & @CRLF & _
'Mme <a href="/dirigeants/fiche/q/HUBERT5-Angèle6-Gabrielle6-Mad6-19507-3779055008">' & @CRLF & _
'HUBERT Angèle' & @CRLF & _
'</a>' & @CRLF & _
'</td>' & @CRLF & _
'</tr>' & @CRLF & _
'' & @CRLF & _
'<tr  class="bg">' & @CRLF & _
'<td class="tdhead">Directeur des achats</td>' & @CRLF & _
'<td>' & @CRLF & _
'M HUBERT Alain' & @CRLF & _
'</td>' & @CRLF & _
'</tr>'

$aRes = StringRegExp($sData, "(?is)<a [^>]+>\s*(.+?)\s*</a>", 1)
If IsArray($aRes) Then MsgBox(0, "", $aRes[0])

Edit : uncomplete code... Please use Mikell's

Edited by jguinch

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

Using the wise remark from boththose :

#include <Array.au3>

$txt = Fileread("1.txt")
$r = StringRegExp($txt, '(?m)^\s*(\w[^<]+\w)\s*$', 3)
_ArrayDisplay($r)

BTW jguinch, you forgot in $sData the bunch of WS present in the original text  whistle.gif

Edit

And where is Alain ?

Edited by mikell

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0