Jump to content

Recommended Posts

Hi all,

Ive made a script that scrapes an xml off the web code below

-<availability>
-<members date="2015-07-18" daytag="Today" count="11" day="8" night="9" ooa="0" s44="" na="0">
<qualification abbrev="2YR" name="2 Years Experience" category="Ability" count="4" day="3" night="3" ooa="0" s44="0"na="0"/>
<qualification abbrev="BA" name="Breathing Apparatus Operator" category="Operator" count="4" day="3" night="4" ooa="0"s44="0" na="0"/>
</members>
-<members date="2015-07-19" daytag="Tomorrow" count="11" day="8" night="11" ooa="0" s44="0" na="0">
<qualification abbrev="2YR" name="2 Years Experience" category="Ability" count="4" day="4" night="4" ooa="0" s44="0"na="0"/>
<qualification abbrev="BA" name="Breathing Apparatus Operator" category="Operator" count="6" day="6" night="4" ooa="0"s44="0" na="0"/>
</members>                                                                                                                                      <availability>

 

My script is meant to scrape the "today" section. The first part of my script works and picks up the correct "day" count but when its comes to the "breathing Apparatus Operator" it collects the number from "tomorrow" how can I fix this? My code below

 

 

$sXML = BinaryToString(InetRead($Site))

  $day = StringRegExpReplace($sXML, '(?is).*<availability.*?day="([^"]+).*</availability.*', '$1')

  $BA = StringRegExpReplace($sXML, '(?is).*<members.*? name="Breathing Apparatus Operator".*?day="([^"]+).*</members.*', '$1');this gets the info we need

 

Edited by shaggy89
coding from mobile SUCKS

Share this post


Link to post
Share on other sites

Share this post


Link to post
Share on other sites

Errr ok? Queation and code seemed clear to me.Basically the output I want is the "day" number from "Breathing Appetatus Operators" from today.

 

So from my example I want 3 not 6

Share this post


Link to post
Share on other sites

Change the regular expression for $BA from (?is).* to (?is).*? (/edit: because currently it picks up the last members instead of the first one.)

That's the quickfix. But I agree with using a real XML parser if you want this to be more robust. Parsing xml with regex is just shaky.

Edited by SadBunny

Roses are FF0000, violets are 0000FF... All my base are belong to you.

Share this post


Link to post
Share on other sites

I would do it like this. If anything, it's not "shaky":

; $sXML = BinaryToString(InetRead(...))

$sXML = '-<availability>' & _
        '-<members date="2015-07-18" daytag="Today" count="11" day="8" night="9" ooa="0" s44="" na="0">' & _
        '<qualification abbrev="2YR" name="2 Years Experience" category="Ability" count="4" day="3" night="3" ooa="0" s44="0"na="0"/>' & _
        '<qualification abbrev="BA" name="Breathing Apparatus Operator" category="Operator" count="4" day="3" night="4" ooa="0"s44="0" na="0"/>' & _
        '</members>' & _
        '-<members date="2015-07-19" daytag="Tomorrow" count="11" day="8" night="11" ooa="0" s44="0" na="0">' & _
        '<qualification abbrev="2YR" name="2 Years Experience" category="Ability" count="4" day="4" night="4" ooa="0" s44="0"na="0"/>' & _
        '<qualification abbrev="BA" name="Breathing Apparatus Operator" category="Operator" count="6" day="6" night="4" ooa="0"s44="0" na="0"/>' & _
        '</members>' & _
        '<availability>'


MsgBox(4096, "bzz...", "daytag = Today, abbrev = BA, day = " & ThatThingFromXML($sXML, "Today"))
MsgBox(4096, "bzz...", "daytag = Tomorrow, abbrev = BA, day = " & ThatThingFromXML($sXML, "Tomorrow"))


Func ThatThingFromXML($sXML, $sDayTag, $sAbbrev = "BA", $sAttrib = "day")
    ; Clean the XML
    $sXML = StringRegExpReplace($sXML, "(?s)<!--.*?-->", "") ; removing comments
    $sXML = StringRegExpReplace($sXML, "(?s)<!\[CDATA\[.*?\]\]>", "") ; removing CDATA

    ; Find all members
    Local $aMembers = StringRegExp($sXML, "(?si)<\s*members(?:[^\w])\s*(.*?)(?:(?:<\s*/members\s*>)|\Z)", 3)
    If @error Then Return SetError(1, 0, "") ; There are no members available

    Local $sMember, $sAttributes, $aDesc

    ; Loop through members
    For $iMemberOrdinal = 0 To UBound($aMembers) - 1
        $sMember = $aMembers[$iMemberOrdinal] ; currently examined member

        $sAttributes = StringRegExp($sMember, "(?s)(.*?)>", 3)
        If Not @error Then $sAttributes = $sAttributes[0]

        If AttribVal($sAttributes, "daytag") = $sDayTag Then
            $aDesc = StringRegExp($sMember, "(?si)<\h*(?:qualification|whatever)\h*(.*?)/*\h*>", 3)
            For $i = 0 To UBound($aDesc) - 1
                If AttribVal($aDesc[$i], "abbrev") = $sAbbrev Then
                    Return AttribVal($aDesc[$i], $sAttrib)
                    ExitLoop 2
                EndIf

            Next
            ExitLoop

        EndIf

    Next

    Return SetError(2, 0, "") ; Conditions not met
EndFunc



Func AttribVal($sIn, $sAttrib)
    Local $aArray = StringRegExp($sIn, '(?i).*?' & $sAttrib & '\h*=(\h*"(.*?)"|' & "\h*'(.*?)'|" & '\h*(.*?)(?: |\Z))', 3) ; e.g. id="abc" or id='abc' or id=abc

    If @error Then Return ""
    Return $aArray[UBound($aArray) - 1]
EndFunc

 

Edited by trancexx

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Similar Content

    • By VinMe
      Dear all, i am unable to open a xml file to excel in the "xml table format" Please help me out in where i am missing
      Local $strFileToOpen = _WinAPI_OpenFileDlg('Select xml file', @WorkingDir, 'All Files(*.*)', 1, '', '', BitOR($OFN_PATHMUSTEXIST, $OFN_FILEMUSTEXIST, $OFN_HIDEREADONLY)) Global $xlXmlLoadImportToList = 2 ; Places the contents of the XML data file in an XML table $oExcel = _Excel_Open() $oWorkbook1=$oExcel.Workbooks.OpenXML($strFileToOpen, "", $xlXmlLoadImportToList) If $strFileToOpen <> False Then     Local $oWorkbook1 = _Excel_BookOpen($oExcel, $strFileToOpen) EndIf Error i am getting is:
      ......\81e_Compare_v1.au3" (46) : ==> The requested action with this object has failed.:
      $oWorkbook1=$oExcel.Workbooks.OpenXML($strFileToOpen, "", $xlXmlLoadImportToList)
      $oWorkbook1=$oExcel.Workbooks^ ERROR
      >Exit code: 1    Time: 7.338
    • By Slipk
      Hello everyone,
      #include <GUIConstantsEx.au3> #include <GUIListBox.au3> #include <WindowsConstants.au3> #cs FILE DATA : <name>John</name> <random>Hello</random> <name>Silly</name> <other>Test</other> <other>World</other> <name>Billy</name> #ce Local $file = "file.txt" $Form1 = GUICreate("Get items from text", 360, 250, -1, -1) $List1 = GUICtrlCreateList("", 8, 8, 160, 235, -1, 0) _getitems() GUISetState(@SW_SHOW, $Form1) While 1 $nMsg = GUIGetMsg() Switch $nMsg Case $GUI_EVENT_CLOSE Exit EndSwitch WEnd Func _getitems() EndFunc ;==>_getitems  
      I have the code above and I try to figure out the next one :
      It must read a text file with fields like in example below.
      <name>John</name>
      <random>Hello</random>
      <name>Silly</name>
      <other>Test</other>
      <other>World</other>
      <name>Billy</name>
      How to add into list only what is between <name>*</name>, something like a wildcard tried to applied but not working.
      Any suggestions?
      Thank you!
    • By ijourneaux
      I have been using the XML UDF successfully for sometime. I am now tring to add in error handling for some bad XML files I run into.
      $oXmlDoc = _XML_CreateDOMDocument(Default) _XML_Load($oXmlDoc,$sFileXML) ;<== ENTER XML FILE PATH HERE If @error Then ; ConsoleWrite(StringFormat("_XML_load error - @error = %s", @error) & @CRLF) ; ConsoleWrite("-" & $sFileXML & @CRLF) Exit -1 EndIf ConsoleWrite("-" & $sFileXML & @CRLF) ;If no specified nodes exist, log error and exit If Not _XML_NodeExists($oXmlDoc, "//Property") Then ; ConsoleWrite("No specified nodes exist" & @CRLF) Exit -1 EndIf ;Get number of Property nodes $oProperties = _XML_SelectNodes($oXmlDoc,"//Property/Data") $iNodeCount = @extended $oParents = $oXmlDoc.SelectSingleNode("//Property[@Name='Parents']/Data") There are no error upto this point. The XML file I am trying to handle has a Data node but when I try to
      $sParam = StringSplit($oParents.text, ";") I get a fatal error (requested action on object failed). How can I test for the problem before I try to do the string split? I tried using @error, and testing $oParents but no luck.
      Appreciate any tips.
    • By ijourneaux
      I am trying to read an XML file that looks like the following. I am particularly interested in the ParameterNames and ParameterValues
       
      I was able to read a simplier XML file using
      $oXML.load("DataForwardSettings.xml") Local $oInfos = $oXML.selectnodes("//Database") ; or //Info or //Data//Info or //Values/Info  but have not been able to read
      <?xml version="1.0"?> <Entities> <Entity RecordType="TrendData"> <Property Name="AlarmLimitsSetNumber" IsReadOnly="False" ValueType="System.Int32">8</Property> <Property Name="AnalysisParamaterSetNumber" IsReadOnly="False" ValueType="System.Int32">8</Property> <Property Name="ParameterNames" IsReadOnly="True" IsList="True" ListType="List<string>" ValueType="Array" ArrayType="System.String" Count="12">System.Collections.Generic.List`1[System.String]<Data>OVERALL|PK-PK WAVEFORM|HFD|CREST FACTOR|SYNC 1-6|1X|2X|3X-4X|FTF|BSF|BPFO|BPFI</Data></Property> <Property Name="ParameterValues" IsReadOnly="True" IsList="True" ListType="List<float>" ValueType="Array" ArrayType="System.Single" Count="12">System.Collections.Generic.List`1[System.Single]<Data>0.04706,0.27951,0.02640,4.85608,0.03494,0.01727,0.02256,0.01993,0.00207,0.00060,0.00178,0.00221</Data></Property> <Property Name="NumberOfParameters" IsReadOnly="False" ValueType="System.Int32">12</Property> <Property Name="ModifiedSinceLastDataDump" IsReadOnly="False" ValueType="System.Boolean">False</Property> <Property Name="Load" IsReadOnly="False" ValueType="System.Single">0</Property> <Property Name="RPM" IsReadOnly="False" ValueType="System.Single">140.962</Property> <Property Name="Value" IsReadOnly="False" ValueType="System.Single">-1.1E-20</Property> <Property Name="SampleID" IsReadOnly="False" ValueType="System.Int32">-626794</Property> <Property Name="Timestamp_as_String" IsReadOnly="True" ValueType="System.String">8/18/2018 2:05:33 PM</Property> <Property Name="Timestamp_as_UInt" IsReadOnly="False" ValueType="System.UInt32">1534619133</Property> <Property Name="Timestamp" IsReadOnly="False" ValueType="System.DateTime">8/18/2018 2:05:33 PM</Property> <Property Name="StorageFlag" IsReadOnly="False" ValueType="Enum" EnumType="Emerson.CSI.DataImport.MHM.TrendDataStorageType" EnumValue="2">RPM_And_Overall</Property> <Property Name="Parents" IsReadOnly="False" IsList="True" ListType="List<string>" ValueType="Array" ArrayType="System.String" Count="5">System.Collections.Generic.List`1[System.String]<Data>Database=phmhmdb4ts;C:\RBMdbsrv\CustData\4ts_online_1807.rbm;-99|Area=4TS;-494|Equipment=4THTS;-712|MeasurementPoint=D39;-780|DataCollectionSet=Normal Collection Dryer Rolls;-783</Data></Property> </Entity> </Entities> I tried switching to the XML UDF but was alittle lost in how to use it. I am particularly interested in the ParameterNames and ParameterValues.
    • By macran
      I want to generating a XML file (test.xml) like as follow:
      <?xml version="1.0" encoding="GBK"?>
      <!DOCTYPE SCHEMA SYSTEM "HGWSPZJK.dtd">  ;I can not generate this line
      <SCHEMA CRC="HGWSPZ201808_9131011571786229XM_CRC.XML" SSSQ="201808" CHSNAME="HGWSDKQD" NAME="HGWSPZ">

      <TAXPAYER CJRMC="sigmagroup" CJRDM="9131011571786229XM" CJLX="DKZK" RECORDCOUNT="411" SBRQ="2018-08-31" NSRMC="sigmagroup" SWSBH="9131011571786229XM">

      <Records>
      <Record BZ="" JKKADM="2244" JKKAMC="shanghai" SE="5907.82" TFRQ="2018-08-23" FPHM="224420181000752586-L02"/>
      <Record BZ="" JKKADM="2244" JKKAMC="shanghai" SE="4742.4" TFRQ="2018-08-21" FPHM="224420181000743016-L01"/>
      <Record BZ="" JKKADM="2244" JKKAMC="shanghai" SE="18720" TFRQ="2018-08-14" FPHM="224420181000719215-L01"/>
      </Records>
      </TAXPAYER>
      </SCHEMA>
      I use XML.UDF  
      Local $oXMLile=_XML_CreateFile(@ScriptDir&"\test.xml","",True) 
      but there is no function CreateDocumentType 
      It is no effort even I test use 
      Local $doct=$oXMLfile.CreateDocumentType("SCHEMA", null, "HGWSPZJK.dtd", null)
            $oXmlfile.appendChild($doct)
      pls help me thanks.
       
       
       
       
       
       
×
×
  • Create New...