ATR Posted April 24, 2015 Share Posted April 24, 2015 (edited) Hi all, I often use regex but here I don't find the good regex ! <tr class="bg"> <td class="tdhead">Gérant</td> <td> Mme <a href="/dirigeants/fiche/q/HUBERT5-Angèle6-Gabrielle6-Mad6-19507-3779055008"> HUBERT Angèle </a> </td> </tr> <tr class="bg"> <td class="tdhead">Directeur des achats</td> <td> M HUBERT Alain </td> </tr> I want extract 'HUBERT Angèle' and 'M HUBERT Alain'. My problem is that I haven't every time the <a > tags between the last <td> ... </td> tags I test with this regex : $HTML = StringRegExp($Data, '(?is)(?:<td>|<a.*>|)(.*?)(?=</(\w+)>)', 1) If @error = 0 Then $HTML = StringStripWS($HTML[0], 7) ConsoleWrite($HTML & @LF) EndIf but it doesn't work Edited April 24, 2015 by ATR Link to comment Share on other sites More sharing options...
iamtheky Posted April 24, 2015 Share Posted April 24, 2015 (edited) For that example they are the only lines without tags so:$sFile = FileRead("test.html") $aFile = stringsplit($sFile , @LF , 2) $sOut = "" For $i = 0 to ubound($aFile) - 1 If StringinStr($aFile[$i] , "<") = 0 AND StringinStr($aFile[$i] , ">") = 0 Then $sOut &= stringstripWS($aFile[$i] , 1) Next msgbox(0 , '' , $sOut) Edited April 24, 2015 by boththose ,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-. |(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/ (_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_) | | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) ( | | | | |)| | \ / | | | | | |)| | `--. | |) \ | | `-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_| '-' '-' (__) (__) (_) (__) Link to comment Share on other sites More sharing options...
jguinch Posted April 24, 2015 Share Posted April 24, 2015 (edited) One way : #include <Array.au3> $sData = '<tr class="bg">' & @CRLF & _ '<td class="tdhead">Gérant</td>' & @CRLF & _ '<td>' & @CRLF & _ 'Mme <a href="/dirigeants/fiche/q/HUBERT5-Angèle6-Gabrielle6-Mad6-19507-3779055008">' & @CRLF & _ 'HUBERT Angèle' & @CRLF & _ '</a>' & @CRLF & _ '</td>' & @CRLF & _ '</tr>' & @CRLF & _ '' & @CRLF & _ '<tr class="bg">' & @CRLF & _ '<td class="tdhead">Directeur des achats</td>' & @CRLF & _ '<td>' & @CRLF & _ 'M HUBERT Alain' & @CRLF & _ '</td>' & @CRLF & _ '</tr>' $aRes = StringRegExp($sData, "(?is)<a [^>]+>\s*(.+?)\s*</a>", 1) If IsArray($aRes) Then MsgBox(0, "", $aRes[0]) Edit : uncomplete code... Please use Mikell's Edited April 24, 2015 by jguinch Spoiler Network configuration UDF, _DirGetSizeByExtension, _UninstallList Firefox ConfigurationArray multi-dimensions, Printer Management UDF Link to comment Share on other sites More sharing options...
Solution mikell Posted April 24, 2015 Solution Share Posted April 24, 2015 (edited) Using the wise remark from boththose : #include <Array.au3> $txt = Fileread("1.txt") $r = StringRegExp($txt, '(?m)^\s*(\w[^<]+\w)\s*$', 3) _ArrayDisplay($r) BTW jguinch, you forgot in $sData the bunch of WS present in the original text Edit And where is Alain ? Edited April 24, 2015 by mikell Link to comment Share on other sites More sharing options...
jguinch Posted April 24, 2015 Share Posted April 24, 2015 @Mikell : Alain disappeared. He is probably reading the first post, and will return soon ... Well, sorry I read too fast, as usual... Spoiler Network configuration UDF, _DirGetSizeByExtension, _UninstallList Firefox ConfigurationArray multi-dimensions, Printer Management UDF Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now