ATR Posted April 24, 2015 Posted April 24, 2015 (edited) Hi all, I often use regex but here I don't find the good regex ! <tr class="bg"> <td class="tdhead">Gérant</td> <td> Mme <a href="/dirigeants/fiche/q/HUBERT5-Angèle6-Gabrielle6-Mad6-19507-3779055008"> HUBERT Angèle </a> </td> </tr> <tr class="bg"> <td class="tdhead">Directeur des achats</td> <td> M HUBERT Alain </td> </tr> I want extract 'HUBERT Angèle' and 'M HUBERT Alain'. My problem is that I haven't every time the <a > tags between the last <td> ... </td> tags I test with this regex : $HTML = StringRegExp($Data, '(?is)(?:<td>|<a.*>|)(.*?)(?=</(\w+)>)', 1) If @error = 0 Then $HTML = StringStripWS($HTML[0], 7) ConsoleWrite($HTML & @LF) EndIf but it doesn't work Edited April 24, 2015 by ATR
iamtheky Posted April 24, 2015 Posted April 24, 2015 (edited) For that example they are the only lines without tags so:$sFile = FileRead("test.html") $aFile = stringsplit($sFile , @LF , 2) $sOut = "" For $i = 0 to ubound($aFile) - 1 If StringinStr($aFile[$i] , "<") = 0 AND StringinStr($aFile[$i] , ">") = 0 Then $sOut &= stringstripWS($aFile[$i] , 1) Next msgbox(0 , '' , $sOut) Edited April 24, 2015 by boththose ,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-. |(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/ (_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_) | | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) ( | | | | |)| | \ / | | | | | |)| | `--. | |) \ | | `-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_| '-' '-' (__) (__) (_) (__)
jguinch Posted April 24, 2015 Posted April 24, 2015 (edited) One way : #include <Array.au3> $sData = '<tr class="bg">' & @CRLF & _ '<td class="tdhead">Gérant</td>' & @CRLF & _ '<td>' & @CRLF & _ 'Mme <a href="/dirigeants/fiche/q/HUBERT5-Angèle6-Gabrielle6-Mad6-19507-3779055008">' & @CRLF & _ 'HUBERT Angèle' & @CRLF & _ '</a>' & @CRLF & _ '</td>' & @CRLF & _ '</tr>' & @CRLF & _ '' & @CRLF & _ '<tr class="bg">' & @CRLF & _ '<td class="tdhead">Directeur des achats</td>' & @CRLF & _ '<td>' & @CRLF & _ 'M HUBERT Alain' & @CRLF & _ '</td>' & @CRLF & _ '</tr>' $aRes = StringRegExp($sData, "(?is)<a [^>]+>\s*(.+?)\s*</a>", 1) If IsArray($aRes) Then MsgBox(0, "", $aRes[0]) Edit : uncomplete code... Please use Mikell's Edited April 24, 2015 by jguinch Spoiler Network configuration UDF, _DirGetSizeByExtension, _UninstallList Firefox ConfigurationArray multi-dimensions, Printer Management UDF
Solution mikell Posted April 24, 2015 Solution Posted April 24, 2015 (edited) Using the wise remark from boththose : #include <Array.au3> $txt = Fileread("1.txt") $r = StringRegExp($txt, '(?m)^\s*(\w[^<]+\w)\s*$', 3) _ArrayDisplay($r) BTW jguinch, you forgot in $sData the bunch of WS present in the original text Edit And where is Alain ? Edited April 24, 2015 by mikell
jguinch Posted April 24, 2015 Posted April 24, 2015 @Mikell : Alain disappeared. He is probably reading the first post, and will return soon ... Well, sorry I read too fast, as usual... Spoiler Network configuration UDF, _DirGetSizeByExtension, _UninstallList Firefox ConfigurationArray multi-dimensions, Printer Management UDF
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now