Jump to content

Regexp help in XML data file


Recommended Posts

Have an XML file structured as so.

<Group name="Office1">

<host>

<name>O1Pc1</name>

<ip>192.168.0.1</ip>

</host>

<host>

<name>O1Pc2</name>

<ip>192.168.0.2</ip>

</host>

</group>

<Group name="Office2">

<host>

<name>O2Pc1</name>

<ip>192.168.0.11</ip>

</host>

<host>

<name>O2Pc2</name>

<ip>192.168.0.12</ip>

</host>

</group>

I've got it to find all of the group names, and now I'm trying to figure out a way to group the <name> and <ip> fields per group. This is what I've got so far.

#include <Array.au3>
Local $sFile = "", $sXml = "", $aGroups, $aHosts, $aRegExp, $aHosts, $x = 0
$sFile = @DesktopDir & "\test.xml"
$sXml = FileRead($sFile)
ConsoleWrite($sXml & @CRLF & @CRLF)
$aGroups = StringRegExp($sXml, '(?i)<group name="(.+)">', 3)
ConsoleWrite(@error & @CRLF)
_ArrayDisplay($aGroups)

I then was testing just stripping the <name> fields so I got this far on that part:

#include <Array.au3>
Local $sFile = "", $sXml = "", $aGroups, $aHosts, $aRegExp, $aHosts, $x = 0
$sFile = "X:\acis\Projects\2012\ACIS_ACCESS\hostlist.xml"
$sXml = FileRead($sFile)
ConsoleWrite($sXml & @CRLF & @CRLF)
$aGroups = StringRegExp($sXml, '(?i)<group name="(.+)">', 3)
ConsoleWrite(@error & @CRLF)
_ArrayDisplay($aGroups)
Dim $aHosts[UBound($aGroups)]
For $group in $aGroups
ConsoleWrite($group & @CRLF)
$aHosts[$x] = StringRegExp($sXml, '(?i).+(?:<name>(.+)</name>)+.', 3)
ConsoleWrite(@error & @CRLF & @extended & @CRLF)
_ArrayDisplay($aHosts[$x])
$x+=1
Next

I can't figure out how to separate them by the group however? I've tried structuring the regexp as

'(?i)<group name="' & $group & '">(?s).*(?:<name>(.+)</name>)+(?s).*</group>'

but it returns not found =(

Spoiler

“Hello, ladies, look at your man, now back to me, now back at your man, now back to me. Sadly, he isn’t me, but if he stopped using ladies scented body wash and switched to Old Spice, he could smell like he’s me. Look down, back up, where are you? You’re on a boat with the man your man could smell like. What’s in your hand, back at me. I have it, it’s an oyster with two tickets to that thing you love. Look again, the tickets are now diamonds. Anything is possible when your man smells like Old Spice and not a lady. I’m on a horse.”

 

Link to comment
Share on other sites

Well it hasn't been chosen which file format to use, so at this moment I'm playing with it, but either way, learning the method on how to retrieve information from a file that has a similar structure (that may not be XML or may even be something custom) is what I'm aiming for. I unfortunately don't have control over what format will end up being used, but knowing the method to grab information between specific markers is an important method to solving that problem.

Edited by Mechaflash
Spoiler

“Hello, ladies, look at your man, now back to me, now back at your man, now back to me. Sadly, he isn’t me, but if he stopped using ladies scented body wash and switched to Old Spice, he could smell like he’s me. Look down, back up, where are you? You’re on a boat with the man your man could smell like. What’s in your hand, back at me. I have it, it’s an oyster with two tickets to that thing you love. Look again, the tickets are now diamonds. Anything is possible when your man smells like Old Spice and not a lady. I’m on a horse.”

 

Link to comment
Share on other sites

I'm probably going to have to separate it into two different parts. RegExp everything between the start of <group name="' & $group & '"> and </group>, then run another regexp to string stuff between <name></name> and <ip></ip>

I'll try it out tomorrow probably. Coming to the end of my day.

Spoiler

“Hello, ladies, look at your man, now back to me, now back at your man, now back to me. Sadly, he isn’t me, but if he stopped using ladies scented body wash and switched to Old Spice, he could smell like he’s me. Look down, back up, where are you? You’re on a boat with the man your man could smell like. What’s in your hand, back at me. I have it, it’s an oyster with two tickets to that thing you love. Look again, the tickets are now diamonds. Anything is possible when your man smells like Old Spice and not a lady. I’m on a horse.”

 

Link to comment
Share on other sites

I understand. Well, _GetXML is a very nice generic approach for XML. It works great on simple queries but it can be a pita when you really want to deep probe. It's a good starter though.

Meh, took a different approach as the RegExp was becoming a real beasty. Works but not nice...

#include <Array.au3>
Local $sFile = '', $sXml = ''
Local $aGroups, $sGroup, $aHosts
$sFile = 'hlist.xml'
$sXml = FileRead($sFile)
$aGroups = StringSplit($sXml, '</group>', 3)
_ArrayDisplay($aGroups)
Dim $aHosts[UBound($aGroups)]
For $i = 0 To UBound($aGroups) - 1
    If $aGroups[$i] = '' Then ContinueLoop ; Skip empty entry from StringSplit.
    $sGroup = StringRegExp($aGroups[$i], '(?i)<group name="(.+)">', 1)
    $aHosts[$i] = StringRegExp($aGroups[$i], '<name>(.+)</name>', 3)
    $aGroups[$i] = $sGroup[0]
    _ArrayDisplay($aHosts[$i])
Next
Edited by dany

[center]Spiderskank Spiderskank[/center]GetOpt Parse command line options UDF | AU3Text Program internationalization UDF | Identicon visual hash UDF

Link to comment
Share on other sites

This works for me

$string = '<XML><Group name="Office1"><host><name>O1Pc1</name><ip>192.168.0.1</ip></host><host><name>O1Pc2</name><ip>192.168.0.2</ip></host></Group><Group name="Office2"><host><name>O2Pc1</name><ip>192.168.0.11</ip></host><host><name>O2Pc2</name><ip>192.168.0.12</ip></host></Group></XML>'
$oXML=ObjCreate("Microsoft.XMLDOM")
$oXML.loadxml($string) ; load document

$result = $oXML.selectNodes( '//Group' )
For $Node In $result
$oXML2=ObjCreate("Microsoft.XMLDOM")
$oXML2.loadxml($Node.xml) ; load document
$GroupName= $oXML2.selectSingleNode('//@name')
$result2 = $oXML.selectNodes( '//host' )
For $Node2 In $result2
$oXML3=ObjCreate("Microsoft.XMLDOM")
$oXML3.loadxml($Node2.xml) ; load document
$name = $oXML3.selectSingleNode( '//name' )
$ip = $oXML3.selectSingleNode( '//ip' )
ConsoleWrite ($GroupName.nodevalue & " " & $name.text & " " & $ip.text & @crlf )
$oXML3 = ""
Next
Next

output:

Office1 O1Pc1 192.168.0.1

Office1 O1Pc2 192.168.0.2

Office1 O2Pc1 192.168.0.11

Office1 O2Pc2 192.168.0.12

Office2 O1Pc1 192.168.0.1

Office2 O1Pc2 192.168.0.2

Office2 O2Pc1 192.168.0.11

Office2 O2Pc2 192.168.0.12

Edited by jdelaney
IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...