Jump to content
Sign in to follow this  
JonF

Extracting a URL from an XML file

Recommended Posts

JonF

I'm trying to enumerate the SharePoint sites on a server. I've gotten so far as to run "stsadm -o enumsites", capturing the result, and storing it in an XML file that looks like this:

<?xml version="1.0"?>
<Sites Count="5">
<Site Url="https://bptc-server3" Owner="BPTC\sharepoint" SecondaryOwner="BPTC\bioadmin" ContentDatabase="WSS_Content" StorageUsedMB="0.2" StorageWarningMB="0" StorageMaxMB="0" />
<Site Url="https://bptc-server3/sites/Biocrescentia" Owner="BPTC\jfleming" SecondaryOwner="BPTC\hlevine" ContentDatabase="WSS_Content" StorageUsedMB="2178.2" StorageWarningMB="0" StorageMaxMB="0" />
<Site Url="https://bptc-server3/sites/DynPort" Owner="BPTC\jfleming" SecondaryOwner="BPTC\bioadmin" ContentDatabase="WSS_Content" StorageUsedMB="3544.9" StorageWarningMB="0" StorageMaxMB="0" />
<Site Url="https://bptc-server3/sites/Thermalin-QS" Owner="BPTC\bioadmin" SecondaryOwner="BPTC\jfleming" ContentDatabase="WSS_Content" StorageUsedMB="24.9" StorageWarningMB="0" StorageMaxMB="0" />
<Site Url="https://bptc-server3/sites/Typhoid" Owner="BPTC\jfleming" SecondaryOwner="aspnetmembershipprovider:bioadmin" ContentDatabase="WSS_Content" StorageUsedMB="101.7" StorageWarningMB="0" StorageMaxMB="0" />
</Sites>

but nothing I try from the _XMLDomWrapper.au3 library returns any of the URL strings, or anything containing them. How can I extract those URLs? The closest I've come is this:

#include <_XMLDomWrapper.au3>
#include <Array.au3>
#include <Constants.au3>

dim $cmdOUT
dim $AttVal[1]

; Get an XML file with all the sites listed
$PID = Run("C:\Program Files\Common Files\Microsoft Shared\web server extensions\12\BIN\stsadm.exe -o enumsites -url https://bptc-server3","C:\",@SW_HIDE,$STDOUT_CHILD)
ProcessWait($PID)
While 1
    $line = StdoutRead($PID)
    If @error <> 0 Then ExitLoop
    $cmdOUT &= $line
Wend
$Handle = FileOpen("C:\Sites.xml",2)
FileWrite($Handle,'<?xml version="1.0"?>')
FileWrite($Handle,$cmdOUT)
FileClose($Handle)

; Get an array of the site URLs
$oOXml = _XMLFileOpen("C:\Sites.xml")
$AttVal = _XMLGetChildNodes("/Sites")
MsgBox(0,"Info",$AttVal)
_ArrayDisplay($AttVal,"Result")

Share this post


Link to post
Share on other sites
somdcomputerguy

Try _StringBetween, with Url=" and " Owner as the parameters.


- Bruce /*somdcomputerguy */  If you change the way you look at things, the things you look at change.

Share this post


Link to post
Share on other sites
mrflibblehat

This should work

_XMLGetAttrib("Sites/Site", "Owner")
Edited by mrflibblehat

[font="'courier new', courier, monospace;"]Pastebin UDF | Prowl UDF[/font]

Share this post


Link to post
Share on other sites
jdelaney

Using pure xmldom:

$xml ='<?xml version="1.0"?>' & @CRLF & _
'<Sites Count="5">'& @CRLF & _
'<Site Url="https://bptc-server3" Owner="BPTC\sharepoint" SecondaryOwner="BPTC\bioadmin" ContentDatabase="WSS_Content" StorageUsedMB="0.2" StorageWarningMB="0" StorageMaxMB="0" />'& @CRLF & _
'<Site Url="https://bptc-server3/sites/Biocrescentia" Owner="BPTC\jfleming" SecondaryOwner="BPTC\hlevine" ContentDatabase="WSS_Content" StorageUsedMB="2178.2" StorageWarningMB="0" StorageMaxMB="0" />'& @CRLF & _
'<Site Url="https://bptc-server3/sites/DynPort" Owner="BPTC\jfleming" SecondaryOwner="BPTC\bioadmin" ContentDatabase="WSS_Content" StorageUsedMB="3544.9" StorageWarningMB="0" StorageMaxMB="0" />'& @CRLF & _
'<Site Url="https://bptc-server3/sites/Thermalin-QS" Owner="BPTC\bioadmin" SecondaryOwner="BPTC\jfleming" ContentDatabase="WSS_Content" StorageUsedMB="24.9" StorageWarningMB="0" StorageMaxMB="0" />'& @CRLF & _
'<Site Url="https://bptc-server3/sites/Typhoid" Owner="BPTC\jfleming" SecondaryOwner="aspnetmembershipprovider:bioadmin" ContentDatabase="WSS_Content" StorageUsedMB="101.7" StorageWarningMB="0" StorageMaxMB="0" />'& @CRLF & _
'</Sites>'

$oXML = ObjCreate("Microsoft.XMLDOM")
$oXML.loadxml($xml)
;~ ConsoleWrite($oXML.xml)
$oSites = $oXML.SelectNodes("//Site")
For $oSite In $oSites
ConsoleWrite( $oSite.getattribute("Url") & @CRLF)
Next

output:

https://bptc-server3

https://bptc-server3/sites/Biocrescentia

https://bptc-server3/sites/DynPort

https://bptc-server3/sites/Thermalin-QS

https://bptc-server3/sites/Typhoid

Edited by jdelaney

IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.

Share this post


Link to post
Share on other sites
JonF

Thanks, all. I think jdelany's solution is the best, not needing to write the XML to a file at all.

Share this post


Link to post
Share on other sites
jdelaney

Working with nodes/attributes will always return exactly what you are expecting, and not just output anything that matches. It's all about the what-if scenarios that can occur.

example, you would return 6 URL's with the regexp of this:

$xml ='<?xml version="1.0"?>' & @CRLF & _
'<Sites Count="5">'& @CRLF & _
'<Site Url="https://bptc-server3" Owner="BPTC\sharepoint" SecondaryOwner="BPTC\bioadmin" ContentDatabase="WSS_Content" StorageUsedMB="0.2" StorageWarningMB="0" StorageMaxMB="0" />'& @CRLF & _
'<Site Url="https://bptc-server3/sites/Biocrescentia" Owner="BPTC\jfleming" SecondaryOwner="BPTC\hlevine" ContentDatabase="WSS_Content" StorageUsedMB="2178.2" StorageWarningMB="0" StorageMaxMB="0"> Url="something"</Site>'& @CRLF & _
'<Site Url="https://bptc-server3/sites/DynPort" Owner="BPTC\jfleming" SecondaryOwner="BPTC\bioadmin" ContentDatabase="WSS_Content" StorageUsedMB="3544.9" StorageWarningMB="0" StorageMaxMB="0" />'& @CRLF & _
'<Site Url="https://bptc-server3/sites/Thermalin-QS" Owner="BPTC\bioadmin" SecondaryOwner="BPTC\jfleming" ContentDatabase="WSS_Content" StorageUsedMB="24.9" StorageWarningMB="0" StorageMaxMB="0" />'& @CRLF & _
'<Site Url="https://bptc-server3/sites/Typhoid" Owner="BPTC\jfleming" SecondaryOwner="aspnetmembershipprovider:bioadmin" ContentDatabase="WSS_Content" StorageUsedMB="101.7" StorageWarningMB="0" StorageMaxMB="0" />'& @CRLF & _
'</Sites>'

another example with commented out nodes

#include <Array.au3>
$xml ='<?xml version="1.0"?>' & @CRLF & _
'<Sites Count="5">'& @CRLF & _
'<Site Url="https://bptc-server3" Owner="BPTC\sharepoint" SecondaryOwner="BPTC\bioadmin" ContentDatabase="WSS_Content" StorageUsedMB="0.2" StorageWarningMB="0" StorageMaxMB="0" />'& @CRLF & _
'<Site Url="https://bptc-server3/sites/Biocrescentia" Owner="BPTC\jfleming" SecondaryOwner="BPTC\hlevine" ContentDatabase="WSS_Content" StorageUsedMB="2178.2" StorageWarningMB="0" StorageMaxMB="0"> Url="something"</Site>'& @CRLF & _
'<!-- <Site Url="https://bptc-server3/sites/Biocrescentia" Owner="BPTC\jfleming" SecondaryOwner="BPTC\hlevine" ContentDatabase="WSS_Content" StorageUsedMB="2178.2" StorageWarningMB="0" StorageMaxMB="0"> Url="something"</Site> -->'& @CRLF & _
'<Site Url="https://bptc-server3/sites/DynPort" Owner="BPTC\jfleming" SecondaryOwner="BPTC\bioadmin" ContentDatabase="WSS_Content" StorageUsedMB="3544.9" StorageWarningMB="0" StorageMaxMB="0" />'& @CRLF & _
'<Site Url="https://bptc-server3/sites/Thermalin-QS" Owner="BPTC\bioadmin" SecondaryOwner="BPTC\jfleming" ContentDatabase="WSS_Content" StorageUsedMB="24.9" StorageWarningMB="0" StorageMaxMB="0" />'& @CRLF & _
'<Site Url="https://bptc-server3/sites/Typhoid" Owner="BPTC\jfleming" SecondaryOwner="aspnetmembershipprovider:bioadmin" ContentDatabase="WSS_Content" StorageUsedMB="101.7" StorageWarningMB="0" StorageMaxMB="0" />'& @CRLF & _
'</Sites>'
$res = StringRegExp($xml, 'Url="(.+?)"', 3)
_ArrayDisplay($res)
$oXML = ObjCreate("Microsoft.XMLDOM")
$oXML.loadxml($xml)
ConsoleWrite($oXML.xml)
$oSites = $oXML.SelectNodes("//Site")
For $oSite In $oSites
 ConsoleWrite( $oSite.getattribute("Url") & @CRLF)
Next
Edited by jdelaney

IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.

Share this post


Link to post
Share on other sites
mikell

I agree, though this time it was not a sre problem, it was a mikell's error with a regex too much quickly written :D

This one should work better

$res = StringRegExp($xml, 'Url="(.+?)".+\s', 3)
assuming the xml is correctly formatted

Share this post


Link to post
Share on other sites
AnrDaemon

Assuming URL is correctly formatted,

StringRegExp($txt, 'Url="(.+?)"')

will catch exactly what it's supposed to.

Share this post


Link to post
Share on other sites
jdelaney

Nope, I provided 2 examples where stringregexp matches mutliple things which are not expected to be returned...it's over simplified

And to come up with the exact regexp to ONLY pull what is needed, would be development that is overkill, relative to xmldom

Edited by jdelaney

IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.

Share this post


Link to post
Share on other sites
AnrDaemon

jdelaney, I was replying to a difference between

StringRegExp($txt, 'Url="(.+?)"', 3)

and

StringRegExp($xml, 'Url="(.+?)".+s', 3)

Both will catch the same, but the latter is just more work for parser.

Share this post


Link to post
Share on other sites
mikell

jdelaney is right, you can never be 100% sure with a regex , and he told the reason why

You can use it safely if your file is always formatted the same way with no doubt, though obviously a xml-dedicated tool is still the best for xml files

The choice is yours ;)

BTW the first regex fails but second one works with the last jdelaney tests above...

Edited by mikell

Share this post


Link to post
Share on other sites
SmOke_N

I'm going to agree with jdelaney wholeheartedly on their intent to express something like: "Use the right tool for the task at hand".

Obviously XML functions are what should be used in this situation, but...

Anyone can work anything out for a specific task if they are willing/able.

I wrote this fairly quickly, only tested it against the xml string jdelaney provided, but it seems to get the correct results.

Mind you, I'm only showing this because it is possible to do it outside xml functions if you need to ( I'd only do it if the syntax never changed in the string structure personally ).

#include <Array.au3>
$xml ='<?xml version="1.0"?>' & @CRLF & _
'<Sites Count="5">'& @CRLF & _
'<Site Url="https://bptc-server3" Owner="BPTC\sharepoint" SecondaryOwner="BPTC\bioadmin" ContentDatabase="WSS_Content" StorageUsedMB="0.2" StorageWarningMB="0" StorageMaxMB="0" />'& @CRLF & _
'<Site Url="https://bptc-server3/sites/Biocrescentia" Owner="BPTC\jfleming" SecondaryOwner="BPTC\hlevine" ContentDatabase="WSS_Content" StorageUsedMB="2178.2" StorageWarningMB="0" StorageMaxMB="0"> Url="something"</Site>'& @CRLF & _
'<Site Url="https://bptc-server3/sites/DynPort" Owner="BPTC\jfleming" SecondaryOwner="BPTC\bioadmin" ContentDatabase="WSS_Content" StorageUsedMB="3544.9" StorageWarningMB="0" StorageMaxMB="0" />'& @CRLF & _
'<Site Url="https://bptc-server3/sites/Thermalin-QS" Owner="BPTC\bioadmin" SecondaryOwner="BPTC\jfleming" ContentDatabase="WSS_Content" StorageUsedMB="24.9" StorageWarningMB="0" StorageMaxMB="0" />'& @CRLF & _
'<Site Url="https://bptc-server3/sites/Typhoid" Owner="BPTC\jfleming" SecondaryOwner="aspnetmembershipprovider:bioadmin" ContentDatabase="WSS_Content" StorageUsedMB="101.7" StorageWarningMB="0" StorageMaxMB="0" />'& @CRLF & _
'</Sites>'

Global $ga_TestCode = _sre_getXMLAttribute($xml, "site", "url")
_ArrayDisplay($ga_TestCode)

Func _sre_getXMLAttribute($s_xml, $s_node, $s_attr)

    If FileExists($s_xml) Then
        $s_xml = FileRead($s_xml)
    EndIf

    Local $s_npatt = "(?i)\W<\s*\Q" & $s_node & '\E\s+(.+?(?:\"|\s))/?\s*>'
    Local $a_node = StringRegExp($s_xml, $s_npatt, 3)
    If @error Then Return SetError(1, 0, 0)
    
    Local $a_attr, $a_ret[101] = [0], $i_add
    For $i = 0 To UBound($a_node) - 1
        $a_attr = StringRegExp($a_node[$i], "(?i)\Q" & $s_attr & '\E\="(.+?)"', 1)
        If @error Then ContinueLoop
        $i_add += 1
        If Mod($i_add, 100) = 0 Then
            ReDim $a_ret[$i_add + 100 + 1]
        EndIf
        $a_ret[$i_add] = $a_attr[0]
    Next

    If Not $i_add Then Return SetError(2, 0, 0)

    ReDim $a_ret[$i_add + 1]
    $a_ret[0] = $i_add

    Return $a_ret
EndFunc

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Share this post


Link to post
Share on other sites
mikell

With this jdelaney's example including a commented out node

$xml ='<?xml version="1.0"?>' & @CRLF & _
'<Sites Count="5">'& @CRLF & _
'<Site Url="https://bptc-server3" Owner="BPTC\sharepoint" SecondaryOwner="BPTC\bioadmin" ContentDatabase="WSS_Content" StorageUsedMB="0.2" StorageWarningMB="0" StorageMaxMB="0" />'& @CRLF & @CRLF& _
'<Site Url="https://bptc-server3/sites/Biocrescentia" Owner="BPTC\jfleming" SecondaryOwner="BPTC\hlevine" ContentDatabase="WSS_Content" StorageUsedMB="2178.2" StorageWarningMB="0" StorageMaxMB="0"> Url="something"</Site>'& @CRLF & _
'<!-- <Site Url="https://bptc-server3/sites/Biocrescentia" Owner="BPTC\jfleming" SecondaryOwner="BPTC\hlevine" ContentDatabase="WSS_Content" StorageUsedMB="2178.2" StorageWarningMB="0" StorageMaxMB="0"> Url="something"</Site> -->'& @CRLF & _
'<Site Url="https://bptc-server3/sites/DynPort" Owner="BPTC\jfleming" SecondaryOwner="BPTC\bioadmin" ContentDatabase="WSS_Content" StorageUsedMB="3544.9" StorageWarningMB="0" StorageMaxMB="0" />'& @CRLF & _
'<Site Url="https://bptc-server3/sites/Thermalin-QS" Owner="BPTC\bioadmin" SecondaryOwner="BPTC\jfleming" ContentDatabase="WSS_Content" StorageUsedMB="24.9" StorageWarningMB="0" StorageMaxMB="0" />'& @CRLF & _
'<Site Url="https://bptc-server3/sites/Typhoid" Owner="BPTC\jfleming" SecondaryOwner="aspnetmembershipprovider:bioadmin" ContentDatabase="WSS_Content" StorageUsedMB="101.7" StorageWarningMB="0" StorageMaxMB="0" />'& @CRLF & _
'</Sites>'

Your func works if the first sre is corrected (returning 5 matches instead of 6 )

Local $s_npatt = "(?i)\n\s*<\s*\Q" & $s_node & '\E\s+(.+?(?:\"|\s))/?\s*>'

BTW so does mine with the same correction

$res = StringRegExp($xml, '\n\s*<\s*<Url="(.+?)".+\s', 3)
_ArrayDisplay($res)
Edited by mikell

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×