Jump to content

Help for multiple html tables. With random value.


Recommended Posts

Hello peoples. 

Long time I did not fall into a complex problem. 

 

I come here to ask for help in the logic to solve a problem I don't have brain or skills to solve or maybe both :D 

 

I got 1500 reports coming from GPO-Backup. They are in HTML format, but can be in XML format as well. 

Actually, I'm working with the HTML format. 

 

My problem is the following

I need to format all these data, to be able to compare all reports in any easy way. 

The problem is each rapport have multiple table that contains multiple value sorted sometime in 5 columns and sometime in 10 columns, and they can have random amount of lines. 

 

As an example, this is some different tables that I can need to work with 

image.thumb.png.e8c52f958a3c02ad56e6f49ebd41c612.png

<div class="gposummary">
<div class="he0_expanded"><span class="sectionTitle" tabindex="0">Général</span><a class="expando" href="#"></a></div>
<div class="container"><div class="he1"><span class="sectionTitle" tabindex="0">Détails</span><a class="expando" href="#"></a></div>
<div class="container"><div class="he4i"><table class="info" cellpadding="0" cellspacing="0">
<tr><td scope="row">Domaine</td><td>aXXXXXXXXXXr</td></tr>
<tr><td scope="row">Propriétaire</td><td>AC\Admins du domaine</td></tr>
<tr><td scope="row">Créé le</td><td>02/04/2014 19:19:24</td></tr>
<tr><td scope="row">Modifié le</td><td>07/03/2022 17:44:04</td></tr>
<tr><td scope="row">Révisions utilisateur</td><td>2 (AD), 2 (SYSVOL)</td></tr>
<tr><td scope="row">Révisions ordinateur</td><td>0 (AD), 0 (SYSVOL)</td></tr>
<tr><td scope="row">ID unique</td><td>{546eXXXXXXXXXXXXXXXXXXXf48}</td></tr>
<tr><td scope="row">État GPO</td><td>Tous les paramètres désactivés</td></tr>
</table></div></div>

image.thumb.png.ba74fe525960e8ca6d9937e4013dc06a.png

<div class="he0_expanded"><span class="sectionTitle" tabindex="0">Configuration utilisateur (désactivée)</span><a class="expando" href="#"></a></div>
<div class="container"><div class="he1h_expanded"><span class="sectionTitle" tabindex="0">Stratégies</span><a class="expando" href="#"></a></div>
<div class="container"><div class="he1_expanded"><span class="sectionTitle" tabindex="0">Paramètres Windows</span><a class="expando" href="#"></a></div>
<div class="container"><div class="he2"><span class="sectionTitle" tabindex="0">Scripts</span><a class="expando" href="#"></a></div>
        <div class="container"><div class="he4"><span class="sectionTitle" tabindex="0">Ouvrir la session</span><a class="expando" href="#"></a></div>
    <div class="container">
<div class="he4i"><b>For this GPO, Script order:</b> Non configuré</div><div class="he4i"><table class="info" cellpadding="0" cellspacing="0">
<tr><th scope="col">Nom</th><th scope="col">Paramètres</th></tr>
<tr><td>aXXXXXXXXX.bat</td><td></td></tr>
</table>

image.thumb.png.80ad228c570d2590d5ba2e22c60665f7.png

The goal would be to have a final view of all 1500 reports to compare them. 

Each time a new column name is meet, we make a new column name, and leave it empty if it does not exist in the next report. 

One line per report.

 

I'm struggling with this since too much time now if anyone of you have a logic help for something like this I take ! 

😭

 

Edited by caramen

My video tutorials : ( In construction )  || My Discord : https://discord.gg/S9AnwHw

How to Ask Help ||  UIAutomation From Junkew || WebDriver From Danp2 || And Water's UDFs in the Quote

Spoiler

 Water's UDFs:
Active Directory (NEW 2018-10-19 - Version 1.4.10.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (2018-10-31 - Version 1.3.4.1) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
PowerPoint (2017-06-06 - Version 0.0.5.0) - Download - General Help & Support
Excel - Example Scripts - Wiki
Word - Wiki
 
Tutorials:

ADO - Wiki

 

Link to comment
Share on other sites

All parts xD 

The purpose is to compare 1500 reports to analyze and delete useless GPO from 13 different domains. 

And this is why it's complicated. All reports will have different structures. But they are using almost same HTML or XML structures.

 

I can provide more picture and / or samples of different report

 

Edited by caramen

My video tutorials : ( In construction )  || My Discord : https://discord.gg/S9AnwHw

How to Ask Help ||  UIAutomation From Junkew || WebDriver From Danp2 || And Water's UDFs in the Quote

Spoiler

 Water's UDFs:
Active Directory (NEW 2018-10-19 - Version 1.4.10.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (2018-10-31 - Version 1.3.4.1) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
PowerPoint (2017-06-06 - Version 0.0.5.0) - Download - General Help & Support
Excel - Example Scripts - Wiki
Word - Wiki
 
Tutorials:

ADO - Wiki

 

Link to comment
Share on other sites

It's not hard to get table fields and also it's not hard to compare them if you have some consistency in naming these table fields.

Local $sFirstReport = '<div class="gposummary">' & @CRLF
$sFirstReport &= '<div class="he0_expanded"><span class="sectionTitle" tabindex="0">Général</span><a class="expando" href="#"></a></div>' & @CRLF
$sFirstReport &= '<div class="container"><div class="he1"><span class="sectionTitle" tabindex="0">Détails</span><a class="expando" href="#"></a></div>' & @CRLF
$sFirstReport &= '<div class="container"><div class="he4i"><table class="info" cellpadding="0" cellspacing="0">' & @CRLF
$sFirstReport &= '<tr><td scope="row">Domaine</td><td>aXXXXXXXXXXr</td></tr>' & @CRLF
$sFirstReport &= '<tr><td scope="row">Propriétaire</td><td>AC\Admins du domaine</td></tr>' & @CRLF
$sFirstReport &= 'tr><td scope="row">Créé le</td><td>02/04/2014 19:19:24</td></tr>' & @CRLF
$sFirstReport &= '<tr><td scope="row">Modifié le</td><td>07/03/2022 17:44:04</td></tr>' & @CRLF
$sFirstReport &= '<tr><td scope="row">Révisions utilisateur</td><td>2 (AD), 2 (SYSVOL)</td></tr>' & @CRLF
$sFirstReport &= '<tr><td scope="row">Révisions ordinateur</td><td>0 (AD), 0 (SYSVOL)</td></tr>' & @CRLF
$sFirstReport &= '<tr><td scope="row">ID unique</td><td>{546eXXXXXXXXXXXXXXXXXXXf48}</td></tr>' & @CRLF
$sFirstReport &= '<tr><td scope="row">État GPO</td><td>Tous les paramètres désactivés</td></tr>' & @CRLF
$sFirstReport &= '</table></div></div>'

Local $sSecondReport = '<div class="he0_expanded"><span class="sectionTitle" tabindex="0">Configuration utilisateur (désactivée)</span><a class="expando" href="#"></a></div>' & @CRLF
$sSecondReport &= '<div class="container"><div class="he1h_expanded"><span class="sectionTitle" tabindex="0">Stratégies</span><a class="expando" href="#"></a></div>' & @CRLF
$sSecondReport &= '<div class="container"><div class="he1_expanded"><span class="sectionTitle" tabindex="0">Paramètres Windows</span><a class="expando" href="#"></a></div>' & @CRLF
$sSecondReport &= '<div class="container"><div class="he2"><span class="sectionTitle" tabindex="0">Scripts</span><a class="expando" href="#"></a></div>' & @CRLF
$sSecondReport &= '        <div class="container"><div class="he4"><span class="sectionTitle" tabindex="0">Ouvrir la session</span><a class="expando" href="#"></a></div>' & @CRLF
$sSecondReport &= '    <div class="container">' & @CRLF
$sSecondReport &= '<div class="he4i"><b>For this GPO, Script order:</b> Non configuré</div><div class="he4i"><table class="info" cellpadding="0" cellspacing="0">' & @CRLF
$sSecondReport &= '<tr><th scope="col">Nom</th><th scope="col">Paramètres</th></tr>' & @CRLF
$sSecondReport &= '<tr><td>aXXXXXXXXX.bat</td><td></td></tr>' & @CRLF
$sSecondReport &= '</table>'

ConsoleWrite('First report table fields' & @CRLF)
$aMatch = StringRegExp($sFirstReport, '<tr><td.*?>(.*?)<\/td><td>(.*?)<\/td><\/tr>', 4)
For $Index = 0 To UBound($aMatch) - 1
    ConsoleWrite(($aMatch[$Index])[1] & ': ' & ($aMatch[$Index])[2] & @CRLF)
Next

ConsoleWrite('Second report table fields' & @CRLF)
$aMatch = StringRegExp($sSecondReport, '<tr><td.*?>(.*?)<\/td><td>(.*?)<\/td><\/tr>', 4)
For $Index = 0 To UBound($aMatch) - 1
    ConsoleWrite(($aMatch[$Index])[1] & ': ' & ($aMatch[$Index])[2] & @CRLF)
Next

Basically after you get all table fields for each report you have to identify all unique fields across all reports (or maybe you already have them) and compare all these data.

When the words fail... music speaks.

Link to comment
Share on other sites

I did some test with regexp and python, but it becomes too complex for me after I have to implement random title inside my logic. 

My video tutorials : ( In construction )  || My Discord : https://discord.gg/S9AnwHw

How to Ask Help ||  UIAutomation From Junkew || WebDriver From Danp2 || And Water's UDFs in the Quote

Spoiler

 Water's UDFs:
Active Directory (NEW 2018-10-19 - Version 1.4.10.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (2018-10-31 - Version 1.3.4.1) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
PowerPoint (2017-06-06 - Version 0.0.5.0) - Download - General Help & Support
Excel - Example Scripts - Wiki
Word - Wiki
 
Tutorials:

ADO - Wiki

 

Link to comment
Share on other sites

What would you do for the final file overview and how ?

My video tutorials : ( In construction )  || My Discord : https://discord.gg/S9AnwHw

How to Ask Help ||  UIAutomation From Junkew || WebDriver From Danp2 || And Water's UDFs in the Quote

Spoiler

 Water's UDFs:
Active Directory (NEW 2018-10-19 - Version 1.4.10.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (2018-10-31 - Version 1.3.4.1) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
PowerPoint (2017-06-06 - Version 0.0.5.0) - Download - General Help & Support
Excel - Example Scripts - Wiki
Word - Wiki
 
Tutorials:

ADO - Wiki

 

Link to comment
Share on other sites

If you have some consistency in these reports it's not very complicated. Here is a basic example. Let's say that for have two reports, one with many table row (fields) and another report with just two fields and one it's different (in this case ID unique). Basically you parse the HTML to get your data, create a map and then loop through keys to see if there are differences. The keys array could be created manually if you know all possible field names or by looping through all maps and getting all unique keys. In the final loop where different maps are compared I used a simple if statement since I compare just two maps but you can have a second loop to compare all your maps that have a particular key.

Local $sFirstReport = '<div class="gposummary">' & @CRLF
$sFirstReport &= '<div class="he0_expanded"><span class="sectionTitle" tabindex="0">Général</span><a class="expando" href="#"></a></div>' & @CRLF
$sFirstReport &= '<div class="container"><div class="he1"><span class="sectionTitle" tabindex="0">Détails</span><a class="expando" href="#"></a></div>' & @CRLF
$sFirstReport &= '<div class="container"><div class="he4i"><table class="info" cellpadding="0" cellspacing="0">' & @CRLF
$sFirstReport &= '<tr><td scope="row">Domaine</td><td>aXXXXXXXXXXr</td></tr>' & @CRLF
$sFirstReport &= '<tr><td scope="row">Propriétaire</td><td>AC\Admins du domaine</td></tr>' & @CRLF
$sFirstReport &= 'tr><td scope="row">Créé le</td><td>02/04/2014 19:19:24</td></tr>' & @CRLF
$sFirstReport &= '<tr><td scope="row">Modifié le</td><td>07/03/2022 17:44:04</td></tr>' & @CRLF
$sFirstReport &= '<tr><td scope="row">Révisions utilisateur</td><td>2 (AD), 2 (SYSVOL)</td></tr>' & @CRLF
$sFirstReport &= '<tr><td scope="row">Révisions ordinateur</td><td>0 (AD), 0 (SYSVOL)</td></tr>' & @CRLF
$sFirstReport &= '<tr><td scope="row">ID unique</td><td>{546eXXXXXXXXXXXXXXXXXXXf48}</td></tr>' & @CRLF
$sFirstReport &= '<tr><td scope="row">État GPO</td><td>Tous les paramètres désactivés</td></tr>' & @CRLF
$sFirstReport &= '</table></div></div>'

Local $sSecondReport = '<div class="gposummary">' & @CRLF
$sSecondReport &= '<div class="he0_expanded"><span class="sectionTitle" tabindex="0">Général</span><a class="expando" href="#"></a></div>' & @CRLF
$sSecondReport &= '<div class="container"><div class="he1"><span class="sectionTitle" tabindex="0">Détails</span><a class="expando" href="#"></a></div>' & @CRLF
$sSecondReport &= '<div class="container"><div class="he4i"><table class="info" cellpadding="0" cellspacing="0">' & @CRLF
$sSecondReport &= '<tr><td scope="row">Révisions utilisateur</td><td>2 (AD), 2 (SYSVOL)</td></tr>' & @CRLF
$sSecondReport &= '<tr><td scope="row">ID unique</td><td>{546eXXXXXXXXXXXXXXXXXXXf50}</td></tr>' & @CRLF
$sSecondReport &= '</table></div></div>'

; This list of unique fields can be created manually or from all reports
Global $aKeyFields[7] = ['Domaine', 'Propriétaire', 'Modifié le', 'Révisions utilisateur', 'Révisions ordinateur', 'ID unique', 'État GPO']

$aMatch = StringRegExp($sFirstReport, '<tr><td.*?>(.*?)<\/td><td>(.*?)<\/td><\/tr>', 4)
Local $mReport1[]
For $Index = 0 To UBound($aMatch) - 1
    $mReport1[($aMatch[$Index])[1]] = ($aMatch[$Index])[2]
Next

$aMatch = StringRegExp($sSecondReport, '<tr><td.*?>(.*?)<\/td><td>(.*?)<\/td><\/tr>', 4)
Local $mReport2[]
For $Index = 0 To UBound($aMatch) - 1
    $mReport2[($aMatch[$Index])[1]] = ($aMatch[$Index])[2]
Next

For $vKey In $aKeyFields
    If MapExists($mReport1, $vKey) And MapExists($mReport2, $vKey) Then
        If $mReport1[$vKey] <> $mReport2[$vKey] Then
            ConsoleWrite($vKey & @CRLF & $mReport1[$vKey] & @CRLF & $mReport2[$vKey] & @CRLF & @CRLF)
        EndIf
    EndIf
Next

 

Edited by Andreik

When the words fail... music speaks.

Link to comment
Share on other sites

I will figure out, and reply. 

My video tutorials : ( In construction )  || My Discord : https://discord.gg/S9AnwHw

How to Ask Help ||  UIAutomation From Junkew || WebDriver From Danp2 || And Water's UDFs in the Quote

Spoiler

 Water's UDFs:
Active Directory (NEW 2018-10-19 - Version 1.4.10.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (2018-10-31 - Version 1.3.4.1) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
PowerPoint (2017-06-06 - Version 0.0.5.0) - Download - General Help & Support
Excel - Example Scripts - Wiki
Word - Wiki
 
Tutorials:

ADO - Wiki

 

Link to comment
Share on other sites

Here is a better example where field names are obtained automatically and code it's better organized in specific functions.

Global $sUniqueFields
Global $aReports[2]

; Two sample reports
$aReports[0] = '<div class="gposummary">' & @CRLF
$aReports[0] &= '<div class="he0_expanded"><span class="sectionTitle" tabindex="0">Général</span><a class="expando" href="#"></a></div>' & @CRLF
$aReports[0] &= '<div class="container"><div class="he1"><span class="sectionTitle" tabindex="0">Détails</span><a class="expando" href="#"></a></div>' & @CRLF
$aReports[0] &= '<div class="container"><div class="he4i"><table class="info" cellpadding="0" cellspacing="0">' & @CRLF
$aReports[0] &= '<tr><td scope="row">Domaine</td><td>aXXXXXXXXXXr</td></tr>' & @CRLF
$aReports[0] &= '<tr><td scope="row">Propriétaire</td><td>AC\Admins du domaine</td></tr>' & @CRLF
$aReports[0] &= 'tr><td scope="row">Créé le</td><td>02/04/2014 19:19:24</td></tr>' & @CRLF
$aReports[0] &= '<tr><td scope="row">Modifié le</td><td>07/03/2022 17:44:04</td></tr>' & @CRLF
$aReports[0] &= '<tr><td scope="row">Révisions utilisateur</td><td>2 (AD), 2 (SYSVOL)</td></tr>' & @CRLF
$aReports[0] &= '<tr><td scope="row">Révisions ordinateur</td><td>0 (AD), 0 (SYSVOL)</td></tr>' & @CRLF
$aReports[0] &= '<tr><td scope="row">ID unique</td><td>{546eXXXXXXXXXXXXXXXXXXXf48}</td></tr>' & @CRLF
$aReports[0] &= '<tr><td scope="row">État GPO</td><td>Tous les paramètres désactivés</td></tr>' & @CRLF
$aReports[0] &= '</table></div></div>'

$aReports[1] = '<div class="gposummary">' & @CRLF
$aReports[1] &= '<div class="he0_expanded"><span class="sectionTitle" tabindex="0">Général</span><a class="expando" href="#"></a></div>' & @CRLF
$aReports[1] &= '<div class="container"><div class="he1"><span class="sectionTitle" tabindex="0">Détails</span><a class="expando" href="#"></a></div>' & @CRLF
$aReports[1] &= '<div class="container"><div class="he4i"><table class="info" cellpadding="0" cellspacing="0">' & @CRLF
$aReports[1] &= '<tr><td scope="row">Révisions utilisateur</td><td>2 (AD), 2 (SYSVOL)</td></tr>' & @CRLF
$aReports[1] &= '<tr><td scope="row">ID unique</td><td>{546eXXXXXXXXXXXXXXXXXXXf50}</td></tr>' & @CRLF
$aReports[1] &= '</table></div></div>'

CompareReports($aReports)

Func CompareReports($aReports)
    Local $iReports = UBound($aReports)
    Local $mReports[$iReports]
    For $Index = 0 To $iReports - 1
        $mReports[$Index] = ReportToMap($aReports[$Index])
    Next
    For $vKey In StringSplit(StringTrimRight($sUniqueFields, 1), '|', 2)
        For $Index = 0 To $iReports - 1
            If MapExists($mReports[$Index], $vKey) Then
                ; Here you can compare whatever you want
                ConsoleWrite('Report: ' & ($Index + 1) & @CRLF & 'Field: ' & $vKey & @CRLF & 'Value: ' & ($mReports[$Index])[$vKey] & @CRLF & @CRLF)
            EndIf
        Next
    Next
EndFunc

Func ReportToMap($sReport)
    Local $mReport[]
    Local $aMatch = StringRegExp($sReport, '<tr><td.*?>(.*?)<\/td><td>(.*?)<\/td><\/tr>', 4)
    For $Index = 0 To UBound($aMatch) - 1
        $mReport[($aMatch[$Index])[1]] = ($aMatch[$Index])[2]
    Next
    CheckForUniqueFields($mReport)  ; This will dinamically construct the list of unique fields
    Return $mReport
EndFunc

Func CheckForUniqueFields($mMap)
    Local $aMapKeys = MapKeys($mMap)
    For $vKey In $aMapKeys
        ; This can be improved with a better regex
        If Not StringInStr($sUniqueFields, $vKey) Then $sUniqueFields &= $vKey & '|'
    Next
EndFunc

 

When the words fail... music speaks.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...