Sign in to follow this  
Followers 0
ATR

HTML regex

6 posts in this topic

Hi all,

I have a problem with a regex. 

It's a part of web page :

            <tr>
                <td class="leftElement large1">chiffre</td>
                <td>46 348 &euro;</td>
                <td>12 030 &euro;</td>
                <td class="colorGreen">+ 285.27%</td>
            </tr>
            <tr>
                <td class="leftElement large1">Resultat</td>
                <td>1 284 &euro;</td>
                <td>-20 262 &euro;</td>
                <td class="colorGreen">+ 106.34%</td>
            </tr>
 
I can recover "46348" and "1284" but I want recover "12030" and "-20262".
 
I have test many regex but nobody works
 
$Balises_tr = _StringBetween($texte, "<tr", "</tr>", -1)
If @error = 0 Then
    For $Balise = 0 To UBound($Balises_tr) - 1
        If StringInStr($Balises_tr[$Balise], "Chiffre d'affaires") Then
            $CA = StringRegExpReplace($Balises_tr[$Balise], "[^\w</>]", "")
            $CA = StringRegExp($CA, "(?is)affaire.*?(\d+K|\d+)", 1 )
            If @error = 0 Then
                ConsoleWrite("TEST : " & $CA[0] & @LF)
            EndIf
        EndIf
        If StringInStr($Balises_tr[$Balise], "Resultat net") Then
            $ResultatNet = StringRegExpReplace($Balises_tr[$Balise], "[^\w</>]", "")
            $ResultatNet = StringRegExp($ResultatNet, "(?is)net.*?(\d+K|\d+)", 1)
            If @error = 0 Then
                ConsoleWrite("TEST_2 : " & $ResultatNet[0] & @LF)
            EndIf
        EndIf
    Next
EndIf

 

Share this post


Link to post
Share on other sites

#include <Array.au3>
$txt = FileRead("1.txt")
$Balises_tr = StringRegExp($txt, '(?s)<tr(.+?)</tr', 3)  ;_StringBetween($texte, "<tr", "</tr>", -1)
$title = "chiffre"
If @error = 0 Then
    For $Balise = 0 To UBound($Balises_tr) - 1
        $res = StringRegExp($Balises_tr[$Balise], '>([\d\h-]+)', 3)
      _ArrayDisplay($res, $title)
        $title = "résultat"
 Next
EndIf

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

mikell example gives the "first pair of amounts", and the "second pair of amounts".

What I think ATR is after is the "second amounts from each pair".

#include <Array.au3>

Local $texte = StringRegExpReplace(FileRead(@ScriptFullPath), "(?s)^.*#cs\s*(.+)\s*#ce.*$", "\1") ; Get test string for this script.
;ConsoleWrite($texte & @LF)

Local $aArr = StringRegExp(StringStripWS($texte, 8), "(-?\d+)&euro;", 3)
_ArrayDisplay($aArr, "All") ; Returns 46348, 12030, 1284, -20262

Local $aArr1 = StringRegExp(StringStripWS($texte, 8), "(-?\d+)&euro;.+?-?\d+&euro;.+?", 3)
_ArrayDisplay($aArr1, "First Amounts from each pair"); Returns 46348, 1284

Local $aArr2 = StringRegExp(StringStripWS($texte, 8), "-?\d+&euro;.+?(-?\d+)&euro;.+?", 3)
_ArrayDisplay($aArr2, "Second Amounts from each pair"); Returns 12030, -20262

Local $aArr3 = StringRegExp(StringStripWS($texte, 8), "(?s)leftElement.+?(-?\d+)&euro;.+?(-?\d+)&euro;.+", 3)
_ArrayDisplay($aArr3, "First pair of amounts"); Returns 46348, 12030

Local $aArr4 = StringRegExp(StringStripWS($texte, 8), "(?s).+leftElement.+?(-?\d+)&euro;.+?(-?\d+)&euro;.+", 3)
_ArrayDisplay($aArr4, "Second pair of amounts"); Returns 1284, -20262

#cs
<tr>
                <td class="leftElement large1">chiffre</td>
                <td>46 348 &euro;</td>
                <td>12 030 &euro;</td>
                <td class="colorGreen">+ 285.27%</td>
            </tr>
            <tr>
                <td class="leftElement large1">Resultat</td>
                <td>1 284 &euro;</td>
                <td>-20 262 &euro;</td>
                <td class="colorGreen">+ 106.34%</td>
            </tr>
#ce
Edited by Malkey

Share this post


Link to post
Share on other sites

Malkey, seriously my code was more than easy to adapt  :)

Local $txt = FileRead("1.txt"), $result = ""
$Balises_tr = StringRegExp($txt, '(?s)<tr(.+?)</tr', 3) 
If @error = 0 Then
   For $Balise = 0 To UBound($Balises_tr) - 1
        $res = StringRegExp($Balises_tr[$Balise], '>([\d\h-]+)', 3)
        $result &= $res[1] & @crlf
   Next
EndIf
Msgbox(0,"", $result)

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

Search in the helpfile for "Tutorial - Regular Expression" and at the bottom you will see a button called "StringRegExpGUI.au3"
Click it and it will load an example into SciTE.
Now RUN IT so a GUI launches.
It is a Regex tester that makes it easy to figure out what you want to do.

Edited by MBALZESHARI

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0