Jump to content

Help with regex between HTML tags


ATR
 Share

Go to solution Solved by mikell,

Recommended Posts

Hi all,

I often use regex but here I don't find the good regex !

<tr  class="bg">
                            <td class="tdhead">Gérant</td>
                                                            <td>
                                                                            Mme <a href="/dirigeants/fiche/q/HUBERT5-Angèle6-Gabrielle6-Mad6-19507-3779055008">
                                            HUBERT Angèle
                                        </a>
                                                                    </td>
                                                    </tr>
                                            
                        <tr  class="bg">
                            <td class="tdhead">Directeur des achats</td>
                                                            <td>
                                                                            M HUBERT Alain
                                                                    </td>
                                                    </tr>

I want extract 'HUBERT Angèle' and 'M HUBERT Alain'.

My problem is that I haven't every time the <a > tags between the last <td> ... </td> tags

I test with this regex :

$HTML = StringRegExp($Data, '(?is)(?:<td>|<a.*>|)(.*?)(?=</(\w+)>)', 1)
If @error = 0 Then
    $HTML = StringStripWS($HTML[0], 7)
    ConsoleWrite($HTML & @LF)
EndIf

but it doesn't work :(

Edited by ATR
Link to comment
Share on other sites

For that example they are the only lines without tags so:

$sFile = FileRead("test.html")
$aFile = stringsplit($sFile , @LF , 2)

$sOut = ""
For $i = 0 to ubound($aFile) - 1
If StringinStr($aFile[$i] , "<") = 0 AND StringinStr($aFile[$i] , ">") = 0 Then $sOut &= stringstripWS($aFile[$i] , 1)
Next

msgbox(0 , '' , $sOut)
Edited by boththose

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Link to comment
Share on other sites

One way :

#include <Array.au3>

$sData = '<tr  class="bg">' & @CRLF & _
'<td class="tdhead">Gérant</td>' & @CRLF & _
'<td>' & @CRLF & _
'Mme <a href="/dirigeants/fiche/q/HUBERT5-Angèle6-Gabrielle6-Mad6-19507-3779055008">' & @CRLF & _
'HUBERT Angèle' & @CRLF & _
'</a>' & @CRLF & _
'</td>' & @CRLF & _
'</tr>' & @CRLF & _
'' & @CRLF & _
'<tr  class="bg">' & @CRLF & _
'<td class="tdhead">Directeur des achats</td>' & @CRLF & _
'<td>' & @CRLF & _
'M HUBERT Alain' & @CRLF & _
'</td>' & @CRLF & _
'</tr>'

$aRes = StringRegExp($sData, "(?is)<a [^>]+>\s*(.+?)\s*</a>", 1)
If IsArray($aRes) Then MsgBox(0, "", $aRes[0])

Edit : uncomplete code... Please use Mikell's

Edited by jguinch
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...