Jump to content

Extracting a URL from an XML file


Recommended Posts

I'm trying to enumerate the SharePoint sites on a server. I've gotten so far as to run "stsadm -o enumsites", capturing the result, and storing it in an XML file that looks like this:

<?xml version="1.0"?>
<Sites Count="5">
<Site Url="https://bptc-server3" Owner="BPTC\sharepoint" SecondaryOwner="BPTC\bioadmin" ContentDatabase="WSS_Content" StorageUsedMB="0.2" StorageWarningMB="0" StorageMaxMB="0" />
<Site Url="https://bptc-server3/sites/Biocrescentia" Owner="BPTC\jfleming" SecondaryOwner="BPTC\hlevine" ContentDatabase="WSS_Content" StorageUsedMB="2178.2" StorageWarningMB="0" StorageMaxMB="0" />
<Site Url="https://bptc-server3/sites/DynPort" Owner="BPTC\jfleming" SecondaryOwner="BPTC\bioadmin" ContentDatabase="WSS_Content" StorageUsedMB="3544.9" StorageWarningMB="0" StorageMaxMB="0" />
<Site Url="https://bptc-server3/sites/Thermalin-QS" Owner="BPTC\bioadmin" SecondaryOwner="BPTC\jfleming" ContentDatabase="WSS_Content" StorageUsedMB="24.9" StorageWarningMB="0" StorageMaxMB="0" />
<Site Url="https://bptc-server3/sites/Typhoid" Owner="BPTC\jfleming" SecondaryOwner="aspnetmembershipprovider:bioadmin" ContentDatabase="WSS_Content" StorageUsedMB="101.7" StorageWarningMB="0" StorageMaxMB="0" />
</Sites>

but nothing I try from the _XMLDomWrapper.au3 library returns any of the URL strings, or anything containing them. How can I extract those URLs? The closest I've come is this:

#include <_XMLDomWrapper.au3>
#include <Array.au3>
#include <Constants.au3>

dim $cmdOUT
dim $AttVal[1]

; Get an XML file with all the sites listed
$PID = Run("C:\Program Files\Common Files\Microsoft Shared\web server extensions\12\BIN\stsadm.exe -o enumsites -url https://bptc-server3","C:\",@SW_HIDE,$STDOUT_CHILD)
ProcessWait($PID)
While 1
    $line = StdoutRead($PID)
    If @error <> 0 Then ExitLoop
    $cmdOUT &= $line
Wend
$Handle = FileOpen("C:\Sites.xml",2)
FileWrite($Handle,'<?xml version="1.0"?>')
FileWrite($Handle,$cmdOUT)
FileClose($Handle)

; Get an array of the site URLs
$oOXml = _XMLFileOpen("C:\Sites.xml")
$AttVal = _XMLGetChildNodes("/Sites")
MsgBox(0,"Info",$AttVal)
_ArrayDisplay($AttVal,"Result")
Link to comment
Share on other sites

Using pure xmldom:

$xml ='<?xml version="1.0"?>' & @CRLF & _
'<Sites Count="5">'& @CRLF & _
'<Site Url="https://bptc-server3" Owner="BPTC\sharepoint" SecondaryOwner="BPTC\bioadmin" ContentDatabase="WSS_Content" StorageUsedMB="0.2" StorageWarningMB="0" StorageMaxMB="0" />'& @CRLF & _
'<Site Url="https://bptc-server3/sites/Biocrescentia" Owner="BPTC\jfleming" SecondaryOwner="BPTC\hlevine" ContentDatabase="WSS_Content" StorageUsedMB="2178.2" StorageWarningMB="0" StorageMaxMB="0" />'& @CRLF & _
'<Site Url="https://bptc-server3/sites/DynPort" Owner="BPTC\jfleming" SecondaryOwner="BPTC\bioadmin" ContentDatabase="WSS_Content" StorageUsedMB="3544.9" StorageWarningMB="0" StorageMaxMB="0" />'& @CRLF & _
'<Site Url="https://bptc-server3/sites/Thermalin-QS" Owner="BPTC\bioadmin" SecondaryOwner="BPTC\jfleming" ContentDatabase="WSS_Content" StorageUsedMB="24.9" StorageWarningMB="0" StorageMaxMB="0" />'& @CRLF & _
'<Site Url="https://bptc-server3/sites/Typhoid" Owner="BPTC\jfleming" SecondaryOwner="aspnetmembershipprovider:bioadmin" ContentDatabase="WSS_Content" StorageUsedMB="101.7" StorageWarningMB="0" StorageMaxMB="0" />'& @CRLF & _
'</Sites>'

$oXML = ObjCreate("Microsoft.XMLDOM")
$oXML.loadxml($xml)
;~ ConsoleWrite($oXML.xml)
$oSites = $oXML.SelectNodes("//Site")
For $oSite In $oSites
ConsoleWrite( $oSite.getattribute("Url") & @CRLF)
Next

output:

https://bptc-server3

https://bptc-server3/sites/Biocrescentia

https://bptc-server3/sites/DynPort

https://bptc-server3/sites/Thermalin-QS

https://bptc-server3/sites/Typhoid

Edited by jdelaney
IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.
Link to comment
Share on other sites

Working with nodes/attributes will always return exactly what you are expecting, and not just output anything that matches. It's all about the what-if scenarios that can occur.

example, you would return 6 URL's with the regexp of this:

$xml ='<?xml version="1.0"?>' & @CRLF & _
'<Sites Count="5">'& @CRLF & _
'<Site Url="https://bptc-server3" Owner="BPTC\sharepoint" SecondaryOwner="BPTC\bioadmin" ContentDatabase="WSS_Content" StorageUsedMB="0.2" StorageWarningMB="0" StorageMaxMB="0" />'& @CRLF & _
'<Site Url="https://bptc-server3/sites/Biocrescentia" Owner="BPTC\jfleming" SecondaryOwner="BPTC\hlevine" ContentDatabase="WSS_Content" StorageUsedMB="2178.2" StorageWarningMB="0" StorageMaxMB="0"> Url="something"</Site>'& @CRLF & _
'<Site Url="https://bptc-server3/sites/DynPort" Owner="BPTC\jfleming" SecondaryOwner="BPTC\bioadmin" ContentDatabase="WSS_Content" StorageUsedMB="3544.9" StorageWarningMB="0" StorageMaxMB="0" />'& @CRLF & _
'<Site Url="https://bptc-server3/sites/Thermalin-QS" Owner="BPTC\bioadmin" SecondaryOwner="BPTC\jfleming" ContentDatabase="WSS_Content" StorageUsedMB="24.9" StorageWarningMB="0" StorageMaxMB="0" />'& @CRLF & _
'<Site Url="https://bptc-server3/sites/Typhoid" Owner="BPTC\jfleming" SecondaryOwner="aspnetmembershipprovider:bioadmin" ContentDatabase="WSS_Content" StorageUsedMB="101.7" StorageWarningMB="0" StorageMaxMB="0" />'& @CRLF & _
'</Sites>'

another example with commented out nodes

#include <Array.au3>
$xml ='<?xml version="1.0"?>' & @CRLF & _
'<Sites Count="5">'& @CRLF & _
'<Site Url="https://bptc-server3" Owner="BPTC\sharepoint" SecondaryOwner="BPTC\bioadmin" ContentDatabase="WSS_Content" StorageUsedMB="0.2" StorageWarningMB="0" StorageMaxMB="0" />'& @CRLF & _
'<Site Url="https://bptc-server3/sites/Biocrescentia" Owner="BPTC\jfleming" SecondaryOwner="BPTC\hlevine" ContentDatabase="WSS_Content" StorageUsedMB="2178.2" StorageWarningMB="0" StorageMaxMB="0"> Url="something"</Site>'& @CRLF & _
'<!-- <Site Url="https://bptc-server3/sites/Biocrescentia" Owner="BPTC\jfleming" SecondaryOwner="BPTC\hlevine" ContentDatabase="WSS_Content" StorageUsedMB="2178.2" StorageWarningMB="0" StorageMaxMB="0"> Url="something"</Site> -->'& @CRLF & _
'<Site Url="https://bptc-server3/sites/DynPort" Owner="BPTC\jfleming" SecondaryOwner="BPTC\bioadmin" ContentDatabase="WSS_Content" StorageUsedMB="3544.9" StorageWarningMB="0" StorageMaxMB="0" />'& @CRLF & _
'<Site Url="https://bptc-server3/sites/Thermalin-QS" Owner="BPTC\bioadmin" SecondaryOwner="BPTC\jfleming" ContentDatabase="WSS_Content" StorageUsedMB="24.9" StorageWarningMB="0" StorageMaxMB="0" />'& @CRLF & _
'<Site Url="https://bptc-server3/sites/Typhoid" Owner="BPTC\jfleming" SecondaryOwner="aspnetmembershipprovider:bioadmin" ContentDatabase="WSS_Content" StorageUsedMB="101.7" StorageWarningMB="0" StorageMaxMB="0" />'& @CRLF & _
'</Sites>'
$res = StringRegExp($xml, 'Url="(.+?)"', 3)
_ArrayDisplay($res)
$oXML = ObjCreate("Microsoft.XMLDOM")
$oXML.loadxml($xml)
ConsoleWrite($oXML.xml)
$oSites = $oXML.SelectNodes("//Site")
For $oSite In $oSites
 ConsoleWrite( $oSite.getattribute("Url") & @CRLF)
Next
Edited by jdelaney
IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.
Link to comment
Share on other sites

Nope, I provided 2 examples where stringregexp matches mutliple things which are not expected to be returned...it's over simplified

And to come up with the exact regexp to ONLY pull what is needed, would be development that is overkill, relative to xmldom

Edited by jdelaney
IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.
Link to comment
Share on other sites

jdelaney is right, you can never be 100% sure with a regex , and he told the reason why

You can use it safely if your file is always formatted the same way with no doubt, though obviously a xml-dedicated tool is still the best for xml files

The choice is yours ;)

BTW the first regex fails but second one works with the last jdelaney tests above...

Edited by mikell
Link to comment
Share on other sites

  • Moderators

I'm going to agree with jdelaney wholeheartedly on their intent to express something like: "Use the right tool for the task at hand".

Obviously XML functions are what should be used in this situation, but...

Anyone can work anything out for a specific task if they are willing/able.

I wrote this fairly quickly, only tested it against the xml string jdelaney provided, but it seems to get the correct results.

Mind you, I'm only showing this because it is possible to do it outside xml functions if you need to ( I'd only do it if the syntax never changed in the string structure personally ).

#include <Array.au3>
$xml ='<?xml version="1.0"?>' & @CRLF & _
'<Sites Count="5">'& @CRLF & _
'<Site Url="https://bptc-server3" Owner="BPTC\sharepoint" SecondaryOwner="BPTC\bioadmin" ContentDatabase="WSS_Content" StorageUsedMB="0.2" StorageWarningMB="0" StorageMaxMB="0" />'& @CRLF & _
'<Site Url="https://bptc-server3/sites/Biocrescentia" Owner="BPTC\jfleming" SecondaryOwner="BPTC\hlevine" ContentDatabase="WSS_Content" StorageUsedMB="2178.2" StorageWarningMB="0" StorageMaxMB="0"> Url="something"</Site>'& @CRLF & _
'<Site Url="https://bptc-server3/sites/DynPort" Owner="BPTC\jfleming" SecondaryOwner="BPTC\bioadmin" ContentDatabase="WSS_Content" StorageUsedMB="3544.9" StorageWarningMB="0" StorageMaxMB="0" />'& @CRLF & _
'<Site Url="https://bptc-server3/sites/Thermalin-QS" Owner="BPTC\bioadmin" SecondaryOwner="BPTC\jfleming" ContentDatabase="WSS_Content" StorageUsedMB="24.9" StorageWarningMB="0" StorageMaxMB="0" />'& @CRLF & _
'<Site Url="https://bptc-server3/sites/Typhoid" Owner="BPTC\jfleming" SecondaryOwner="aspnetmembershipprovider:bioadmin" ContentDatabase="WSS_Content" StorageUsedMB="101.7" StorageWarningMB="0" StorageMaxMB="0" />'& @CRLF & _
'</Sites>'

Global $ga_TestCode = _sre_getXMLAttribute($xml, "site", "url")
_ArrayDisplay($ga_TestCode)

Func _sre_getXMLAttribute($s_xml, $s_node, $s_attr)

    If FileExists($s_xml) Then
        $s_xml = FileRead($s_xml)
    EndIf

    Local $s_npatt = "(?i)\W<\s*\Q" & $s_node & '\E\s+(.+?(?:\"|\s))/?\s*>'
    Local $a_node = StringRegExp($s_xml, $s_npatt, 3)
    If @error Then Return SetError(1, 0, 0)
    
    Local $a_attr, $a_ret[101] = [0], $i_add
    For $i = 0 To UBound($a_node) - 1
        $a_attr = StringRegExp($a_node[$i], "(?i)\Q" & $s_attr & '\E\="(.+?)"', 1)
        If @error Then ContinueLoop
        $i_add += 1
        If Mod($i_add, 100) = 0 Then
            ReDim $a_ret[$i_add + 100 + 1]
        EndIf
        $a_ret[$i_add] = $a_attr[0]
    Next

    If Not $i_add Then Return SetError(2, 0, 0)

    ReDim $a_ret[$i_add + 1]
    $a_ret[0] = $i_add

    Return $a_ret
EndFunc

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

With this jdelaney's example including a commented out node

$xml ='<?xml version="1.0"?>' & @CRLF & _
'<Sites Count="5">'& @CRLF & _
'<Site Url="https://bptc-server3" Owner="BPTC\sharepoint" SecondaryOwner="BPTC\bioadmin" ContentDatabase="WSS_Content" StorageUsedMB="0.2" StorageWarningMB="0" StorageMaxMB="0" />'& @CRLF & @CRLF& _
'<Site Url="https://bptc-server3/sites/Biocrescentia" Owner="BPTC\jfleming" SecondaryOwner="BPTC\hlevine" ContentDatabase="WSS_Content" StorageUsedMB="2178.2" StorageWarningMB="0" StorageMaxMB="0"> Url="something"</Site>'& @CRLF & _
'<!-- <Site Url="https://bptc-server3/sites/Biocrescentia" Owner="BPTC\jfleming" SecondaryOwner="BPTC\hlevine" ContentDatabase="WSS_Content" StorageUsedMB="2178.2" StorageWarningMB="0" StorageMaxMB="0"> Url="something"</Site> -->'& @CRLF & _
'<Site Url="https://bptc-server3/sites/DynPort" Owner="BPTC\jfleming" SecondaryOwner="BPTC\bioadmin" ContentDatabase="WSS_Content" StorageUsedMB="3544.9" StorageWarningMB="0" StorageMaxMB="0" />'& @CRLF & _
'<Site Url="https://bptc-server3/sites/Thermalin-QS" Owner="BPTC\bioadmin" SecondaryOwner="BPTC\jfleming" ContentDatabase="WSS_Content" StorageUsedMB="24.9" StorageWarningMB="0" StorageMaxMB="0" />'& @CRLF & _
'<Site Url="https://bptc-server3/sites/Typhoid" Owner="BPTC\jfleming" SecondaryOwner="aspnetmembershipprovider:bioadmin" ContentDatabase="WSS_Content" StorageUsedMB="101.7" StorageWarningMB="0" StorageMaxMB="0" />'& @CRLF & _
'</Sites>'

Your func works if the first sre is corrected (returning 5 matches instead of 6 )

Local $s_npatt = "(?i)\n\s*<\s*\Q" & $s_node & '\E\s+(.+?(?:\"|\s))/?\s*>'

BTW so does mine with the same correction

$res = StringRegExp($xml, '\n\s*<\s*<Url="(.+?)".+\s', 3)
_ArrayDisplay($res)
Edited by mikell
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...