caramen Posted August 8, 2023 Share Posted August 8, 2023 (edited) Hello peoples. Long time I did not fall into a complex problem. I come here to ask for help in the logic to solve a problem I don't have brain or skills to solve or maybe both I got 1500 reports coming from GPO-Backup. They are in HTML format, but can be in XML format as well. Actually, I'm working with the HTML format. My problem is the following : I need to format all these data, to be able to compare all reports in any easy way. The problem is each rapport have multiple table that contains multiple value sorted sometime in 5 columns and sometime in 10 columns, and they can have random amount of lines. As an example, this is some different tables that I can need to work with : <div class="gposummary"> <div class="he0_expanded"><span class="sectionTitle" tabindex="0">Général</span><a class="expando" href="#"></a></div> <div class="container"><div class="he1"><span class="sectionTitle" tabindex="0">Détails</span><a class="expando" href="#"></a></div> <div class="container"><div class="he4i"><table class="info" cellpadding="0" cellspacing="0"> <tr><td scope="row">Domaine</td><td>aXXXXXXXXXXr</td></tr> <tr><td scope="row">Propriétaire</td><td>AC\Admins du domaine</td></tr> <tr><td scope="row">Créé le</td><td>02/04/2014 19:19:24</td></tr> <tr><td scope="row">Modifié le</td><td>07/03/2022 17:44:04</td></tr> <tr><td scope="row">Révisions utilisateur</td><td>2 (AD), 2 (SYSVOL)</td></tr> <tr><td scope="row">Révisions ordinateur</td><td>0 (AD), 0 (SYSVOL)</td></tr> <tr><td scope="row">ID unique</td><td>{546eXXXXXXXXXXXXXXXXXXXf48}</td></tr> <tr><td scope="row">État GPO</td><td>Tous les paramètres désactivés</td></tr> </table></div></div> <div class="he0_expanded"><span class="sectionTitle" tabindex="0">Configuration utilisateur (désactivée)</span><a class="expando" href="#"></a></div> <div class="container"><div class="he1h_expanded"><span class="sectionTitle" tabindex="0">Stratégies</span><a class="expando" href="#"></a></div> <div class="container"><div class="he1_expanded"><span class="sectionTitle" tabindex="0">Paramètres Windows</span><a class="expando" href="#"></a></div> <div class="container"><div class="he2"><span class="sectionTitle" tabindex="0">Scripts</span><a class="expando" href="#"></a></div> <div class="container"><div class="he4"><span class="sectionTitle" tabindex="0">Ouvrir la session</span><a class="expando" href="#"></a></div> <div class="container"> <div class="he4i"><b>For this GPO, Script order:</b> Non configuré</div><div class="he4i"><table class="info" cellpadding="0" cellspacing="0"> <tr><th scope="col">Nom</th><th scope="col">Paramètres</th></tr> <tr><td>aXXXXXXXXX.bat</td><td></td></tr> </table> The goal would be to have a final view of all 1500 reports to compare them. Each time a new column name is meet, we make a new column name, and leave it empty if it does not exist in the next report. One line per report. I'm struggling with this since too much time now if anyone of you have a logic help for something like this I take ! 😭 Edited August 8, 2023 by caramen My video tutorials : ( In construction ) || My Discord : https://discord.gg/S9AnwHw How to Ask Help || UIAutomation From Junkew || WebDriver From Danp2 || And Water's UDFs in the Quote Spoiler Water's UDFs:Active Directory (NEW 2018-10-19 - Version 1.4.10.0) - Download - General Help & Support - Example Scripts - WikiOutlookEX (2018-10-31 - Version 1.3.4.1) - Download - General Help & Support - Example Scripts - WikiExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example ScriptsPowerPoint (2017-06-06 - Version 0.0.5.0) - Download - General Help & SupportExcel - Example Scripts - WikiWord - Wiki Tutorials:ADO - Wiki Link to comment Share on other sites More sharing options...
Andreik Posted August 8, 2023 Share Posted August 8, 2023 It's not very clear what part do you want to extract and which data do you want to compare. When the words fail... music speaks. Link to comment Share on other sites More sharing options...
caramen Posted August 8, 2023 Author Share Posted August 8, 2023 (edited) All parts xD The purpose is to compare 1500 reports to analyze and delete useless GPO from 13 different domains. And this is why it's complicated. All reports will have different structures. But they are using almost same HTML or XML structures. I can provide more picture and / or samples of different report Edited August 8, 2023 by caramen My video tutorials : ( In construction ) || My Discord : https://discord.gg/S9AnwHw How to Ask Help || UIAutomation From Junkew || WebDriver From Danp2 || And Water's UDFs in the Quote Spoiler Water's UDFs:Active Directory (NEW 2018-10-19 - Version 1.4.10.0) - Download - General Help & Support - Example Scripts - WikiOutlookEX (2018-10-31 - Version 1.3.4.1) - Download - General Help & Support - Example Scripts - WikiExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example ScriptsPowerPoint (2017-06-06 - Version 0.0.5.0) - Download - General Help & SupportExcel - Example Scripts - WikiWord - Wiki Tutorials:ADO - Wiki Link to comment Share on other sites More sharing options...
Andreik Posted August 8, 2023 Share Posted August 8, 2023 It's not hard to get table fields and also it's not hard to compare them if you have some consistency in naming these table fields. Local $sFirstReport = '<div class="gposummary">' & @CRLF $sFirstReport &= '<div class="he0_expanded"><span class="sectionTitle" tabindex="0">Général</span><a class="expando" href="#"></a></div>' & @CRLF $sFirstReport &= '<div class="container"><div class="he1"><span class="sectionTitle" tabindex="0">Détails</span><a class="expando" href="#"></a></div>' & @CRLF $sFirstReport &= '<div class="container"><div class="he4i"><table class="info" cellpadding="0" cellspacing="0">' & @CRLF $sFirstReport &= '<tr><td scope="row">Domaine</td><td>aXXXXXXXXXXr</td></tr>' & @CRLF $sFirstReport &= '<tr><td scope="row">Propriétaire</td><td>AC\Admins du domaine</td></tr>' & @CRLF $sFirstReport &= 'tr><td scope="row">Créé le</td><td>02/04/2014 19:19:24</td></tr>' & @CRLF $sFirstReport &= '<tr><td scope="row">Modifié le</td><td>07/03/2022 17:44:04</td></tr>' & @CRLF $sFirstReport &= '<tr><td scope="row">Révisions utilisateur</td><td>2 (AD), 2 (SYSVOL)</td></tr>' & @CRLF $sFirstReport &= '<tr><td scope="row">Révisions ordinateur</td><td>0 (AD), 0 (SYSVOL)</td></tr>' & @CRLF $sFirstReport &= '<tr><td scope="row">ID unique</td><td>{546eXXXXXXXXXXXXXXXXXXXf48}</td></tr>' & @CRLF $sFirstReport &= '<tr><td scope="row">État GPO</td><td>Tous les paramètres désactivés</td></tr>' & @CRLF $sFirstReport &= '</table></div></div>' Local $sSecondReport = '<div class="he0_expanded"><span class="sectionTitle" tabindex="0">Configuration utilisateur (désactivée)</span><a class="expando" href="#"></a></div>' & @CRLF $sSecondReport &= '<div class="container"><div class="he1h_expanded"><span class="sectionTitle" tabindex="0">Stratégies</span><a class="expando" href="#"></a></div>' & @CRLF $sSecondReport &= '<div class="container"><div class="he1_expanded"><span class="sectionTitle" tabindex="0">Paramètres Windows</span><a class="expando" href="#"></a></div>' & @CRLF $sSecondReport &= '<div class="container"><div class="he2"><span class="sectionTitle" tabindex="0">Scripts</span><a class="expando" href="#"></a></div>' & @CRLF $sSecondReport &= ' <div class="container"><div class="he4"><span class="sectionTitle" tabindex="0">Ouvrir la session</span><a class="expando" href="#"></a></div>' & @CRLF $sSecondReport &= ' <div class="container">' & @CRLF $sSecondReport &= '<div class="he4i"><b>For this GPO, Script order:</b> Non configuré</div><div class="he4i"><table class="info" cellpadding="0" cellspacing="0">' & @CRLF $sSecondReport &= '<tr><th scope="col">Nom</th><th scope="col">Paramètres</th></tr>' & @CRLF $sSecondReport &= '<tr><td>aXXXXXXXXX.bat</td><td></td></tr>' & @CRLF $sSecondReport &= '</table>' ConsoleWrite('First report table fields' & @CRLF) $aMatch = StringRegExp($sFirstReport, '<tr><td.*?>(.*?)<\/td><td>(.*?)<\/td><\/tr>', 4) For $Index = 0 To UBound($aMatch) - 1 ConsoleWrite(($aMatch[$Index])[1] & ': ' & ($aMatch[$Index])[2] & @CRLF) Next ConsoleWrite('Second report table fields' & @CRLF) $aMatch = StringRegExp($sSecondReport, '<tr><td.*?>(.*?)<\/td><td>(.*?)<\/td><\/tr>', 4) For $Index = 0 To UBound($aMatch) - 1 ConsoleWrite(($aMatch[$Index])[1] & ': ' & ($aMatch[$Index])[2] & @CRLF) Next Basically after you get all table fields for each report you have to identify all unique fields across all reports (or maybe you already have them) and compare all these data. When the words fail... music speaks. Link to comment Share on other sites More sharing options...
caramen Posted August 8, 2023 Author Share Posted August 8, 2023 I did some test with regexp and python, but it becomes too complex for me after I have to implement random title inside my logic. My video tutorials : ( In construction ) || My Discord : https://discord.gg/S9AnwHw How to Ask Help || UIAutomation From Junkew || WebDriver From Danp2 || And Water's UDFs in the Quote Spoiler Water's UDFs:Active Directory (NEW 2018-10-19 - Version 1.4.10.0) - Download - General Help & Support - Example Scripts - WikiOutlookEX (2018-10-31 - Version 1.3.4.1) - Download - General Help & Support - Example Scripts - WikiExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example ScriptsPowerPoint (2017-06-06 - Version 0.0.5.0) - Download - General Help & SupportExcel - Example Scripts - WikiWord - Wiki Tutorials:ADO - Wiki Link to comment Share on other sites More sharing options...
caramen Posted August 8, 2023 Author Share Posted August 8, 2023 What would you do for the final file overview and how ? My video tutorials : ( In construction ) || My Discord : https://discord.gg/S9AnwHw How to Ask Help || UIAutomation From Junkew || WebDriver From Danp2 || And Water's UDFs in the Quote Spoiler Water's UDFs:Active Directory (NEW 2018-10-19 - Version 1.4.10.0) - Download - General Help & Support - Example Scripts - WikiOutlookEX (2018-10-31 - Version 1.3.4.1) - Download - General Help & Support - Example Scripts - WikiExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example ScriptsPowerPoint (2017-06-06 - Version 0.0.5.0) - Download - General Help & SupportExcel - Example Scripts - WikiWord - Wiki Tutorials:ADO - Wiki Link to comment Share on other sites More sharing options...
argumentum Posted August 8, 2023 Share Posted August 8, 2023 ( not to solve/help but to say good to see you around ) Follow the link to my code contribution ( and other things too ). FAQ - Please Read Before Posting. Link to comment Share on other sites More sharing options...
Andreik Posted August 8, 2023 Share Posted August 8, 2023 (edited) If you have some consistency in these reports it's not very complicated. Here is a basic example. Let's say that for have two reports, one with many table row (fields) and another report with just two fields and one it's different (in this case ID unique). Basically you parse the HTML to get your data, create a map and then loop through keys to see if there are differences. The keys array could be created manually if you know all possible field names or by looping through all maps and getting all unique keys. In the final loop where different maps are compared I used a simple if statement since I compare just two maps but you can have a second loop to compare all your maps that have a particular key. expandcollapse popupLocal $sFirstReport = '<div class="gposummary">' & @CRLF $sFirstReport &= '<div class="he0_expanded"><span class="sectionTitle" tabindex="0">Général</span><a class="expando" href="#"></a></div>' & @CRLF $sFirstReport &= '<div class="container"><div class="he1"><span class="sectionTitle" tabindex="0">Détails</span><a class="expando" href="#"></a></div>' & @CRLF $sFirstReport &= '<div class="container"><div class="he4i"><table class="info" cellpadding="0" cellspacing="0">' & @CRLF $sFirstReport &= '<tr><td scope="row">Domaine</td><td>aXXXXXXXXXXr</td></tr>' & @CRLF $sFirstReport &= '<tr><td scope="row">Propriétaire</td><td>AC\Admins du domaine</td></tr>' & @CRLF $sFirstReport &= 'tr><td scope="row">Créé le</td><td>02/04/2014 19:19:24</td></tr>' & @CRLF $sFirstReport &= '<tr><td scope="row">Modifié le</td><td>07/03/2022 17:44:04</td></tr>' & @CRLF $sFirstReport &= '<tr><td scope="row">Révisions utilisateur</td><td>2 (AD), 2 (SYSVOL)</td></tr>' & @CRLF $sFirstReport &= '<tr><td scope="row">Révisions ordinateur</td><td>0 (AD), 0 (SYSVOL)</td></tr>' & @CRLF $sFirstReport &= '<tr><td scope="row">ID unique</td><td>{546eXXXXXXXXXXXXXXXXXXXf48}</td></tr>' & @CRLF $sFirstReport &= '<tr><td scope="row">État GPO</td><td>Tous les paramètres désactivés</td></tr>' & @CRLF $sFirstReport &= '</table></div></div>' Local $sSecondReport = '<div class="gposummary">' & @CRLF $sSecondReport &= '<div class="he0_expanded"><span class="sectionTitle" tabindex="0">Général</span><a class="expando" href="#"></a></div>' & @CRLF $sSecondReport &= '<div class="container"><div class="he1"><span class="sectionTitle" tabindex="0">Détails</span><a class="expando" href="#"></a></div>' & @CRLF $sSecondReport &= '<div class="container"><div class="he4i"><table class="info" cellpadding="0" cellspacing="0">' & @CRLF $sSecondReport &= '<tr><td scope="row">Révisions utilisateur</td><td>2 (AD), 2 (SYSVOL)</td></tr>' & @CRLF $sSecondReport &= '<tr><td scope="row">ID unique</td><td>{546eXXXXXXXXXXXXXXXXXXXf50}</td></tr>' & @CRLF $sSecondReport &= '</table></div></div>' ; This list of unique fields can be created manually or from all reports Global $aKeyFields[7] = ['Domaine', 'Propriétaire', 'Modifié le', 'Révisions utilisateur', 'Révisions ordinateur', 'ID unique', 'État GPO'] $aMatch = StringRegExp($sFirstReport, '<tr><td.*?>(.*?)<\/td><td>(.*?)<\/td><\/tr>', 4) Local $mReport1[] For $Index = 0 To UBound($aMatch) - 1 $mReport1[($aMatch[$Index])[1]] = ($aMatch[$Index])[2] Next $aMatch = StringRegExp($sSecondReport, '<tr><td.*?>(.*?)<\/td><td>(.*?)<\/td><\/tr>', 4) Local $mReport2[] For $Index = 0 To UBound($aMatch) - 1 $mReport2[($aMatch[$Index])[1]] = ($aMatch[$Index])[2] Next For $vKey In $aKeyFields If MapExists($mReport1, $vKey) And MapExists($mReport2, $vKey) Then If $mReport1[$vKey] <> $mReport2[$vKey] Then ConsoleWrite($vKey & @CRLF & $mReport1[$vKey] & @CRLF & $mReport2[$vKey] & @CRLF & @CRLF) EndIf EndIf Next Edited August 8, 2023 by Andreik When the words fail... music speaks. Link to comment Share on other sites More sharing options...
caramen Posted August 8, 2023 Author Share Posted August 8, 2023 I will figure out, and reply. My video tutorials : ( In construction ) || My Discord : https://discord.gg/S9AnwHw How to Ask Help || UIAutomation From Junkew || WebDriver From Danp2 || And Water's UDFs in the Quote Spoiler Water's UDFs:Active Directory (NEW 2018-10-19 - Version 1.4.10.0) - Download - General Help & Support - Example Scripts - WikiOutlookEX (2018-10-31 - Version 1.3.4.1) - Download - General Help & Support - Example Scripts - WikiExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example ScriptsPowerPoint (2017-06-06 - Version 0.0.5.0) - Download - General Help & SupportExcel - Example Scripts - WikiWord - Wiki Tutorials:ADO - Wiki Link to comment Share on other sites More sharing options...
Andreik Posted August 8, 2023 Share Posted August 8, 2023 Here is a better example where field names are obtained automatically and code it's better organized in specific functions. expandcollapse popupGlobal $sUniqueFields Global $aReports[2] ; Two sample reports $aReports[0] = '<div class="gposummary">' & @CRLF $aReports[0] &= '<div class="he0_expanded"><span class="sectionTitle" tabindex="0">Général</span><a class="expando" href="#"></a></div>' & @CRLF $aReports[0] &= '<div class="container"><div class="he1"><span class="sectionTitle" tabindex="0">Détails</span><a class="expando" href="#"></a></div>' & @CRLF $aReports[0] &= '<div class="container"><div class="he4i"><table class="info" cellpadding="0" cellspacing="0">' & @CRLF $aReports[0] &= '<tr><td scope="row">Domaine</td><td>aXXXXXXXXXXr</td></tr>' & @CRLF $aReports[0] &= '<tr><td scope="row">Propriétaire</td><td>AC\Admins du domaine</td></tr>' & @CRLF $aReports[0] &= 'tr><td scope="row">Créé le</td><td>02/04/2014 19:19:24</td></tr>' & @CRLF $aReports[0] &= '<tr><td scope="row">Modifié le</td><td>07/03/2022 17:44:04</td></tr>' & @CRLF $aReports[0] &= '<tr><td scope="row">Révisions utilisateur</td><td>2 (AD), 2 (SYSVOL)</td></tr>' & @CRLF $aReports[0] &= '<tr><td scope="row">Révisions ordinateur</td><td>0 (AD), 0 (SYSVOL)</td></tr>' & @CRLF $aReports[0] &= '<tr><td scope="row">ID unique</td><td>{546eXXXXXXXXXXXXXXXXXXXf48}</td></tr>' & @CRLF $aReports[0] &= '<tr><td scope="row">État GPO</td><td>Tous les paramètres désactivés</td></tr>' & @CRLF $aReports[0] &= '</table></div></div>' $aReports[1] = '<div class="gposummary">' & @CRLF $aReports[1] &= '<div class="he0_expanded"><span class="sectionTitle" tabindex="0">Général</span><a class="expando" href="#"></a></div>' & @CRLF $aReports[1] &= '<div class="container"><div class="he1"><span class="sectionTitle" tabindex="0">Détails</span><a class="expando" href="#"></a></div>' & @CRLF $aReports[1] &= '<div class="container"><div class="he4i"><table class="info" cellpadding="0" cellspacing="0">' & @CRLF $aReports[1] &= '<tr><td scope="row">Révisions utilisateur</td><td>2 (AD), 2 (SYSVOL)</td></tr>' & @CRLF $aReports[1] &= '<tr><td scope="row">ID unique</td><td>{546eXXXXXXXXXXXXXXXXXXXf50}</td></tr>' & @CRLF $aReports[1] &= '</table></div></div>' CompareReports($aReports) Func CompareReports($aReports) Local $iReports = UBound($aReports) Local $mReports[$iReports] For $Index = 0 To $iReports - 1 $mReports[$Index] = ReportToMap($aReports[$Index]) Next For $vKey In StringSplit(StringTrimRight($sUniqueFields, 1), '|', 2) For $Index = 0 To $iReports - 1 If MapExists($mReports[$Index], $vKey) Then ; Here you can compare whatever you want ConsoleWrite('Report: ' & ($Index + 1) & @CRLF & 'Field: ' & $vKey & @CRLF & 'Value: ' & ($mReports[$Index])[$vKey] & @CRLF & @CRLF) EndIf Next Next EndFunc Func ReportToMap($sReport) Local $mReport[] Local $aMatch = StringRegExp($sReport, '<tr><td.*?>(.*?)<\/td><td>(.*?)<\/td><\/tr>', 4) For $Index = 0 To UBound($aMatch) - 1 $mReport[($aMatch[$Index])[1]] = ($aMatch[$Index])[2] Next CheckForUniqueFields($mReport) ; This will dinamically construct the list of unique fields Return $mReport EndFunc Func CheckForUniqueFields($mMap) Local $aMapKeys = MapKeys($mMap) For $vKey In $aMapKeys ; This can be improved with a better regex If Not StringInStr($sUniqueFields, $vKey) Then $sUniqueFields &= $vKey & '|' Next EndFunc When the words fail... music speaks. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now