Sign in to follow this  
Followers 0
argumentum

msWord XML beautifier

1 post in this topic

#1 ·  Posted (edited)

..so I got Notepad++ but was not working with the xml printing plugin, the online beautifiers would no work all the time, and I wanted like Tidy does with tabs so it looks nice in Scite to look at the code and make my own template for a script I'm working on....very frustrating and time consuming, so, I put this code together. I get in a 10th of a sec. the xml prettified vs. minutes and frustration.

Anyway, how I use it is I drag and drop the word doc. ( saved in XML format ) to scite , copy to the clipboard ( ctrl-A, ctrl-C ) , switch to this code, press F5 and I'm happy

If Not StringInStr($CmdLineRaw, "/ErrorStdOut") And Not @Compiled Then Exit MsgBox( 262144 , @ScriptName, "please run from Editor", 10)

Local $s = ClipGet()
If Not StringInStr($s, '<?mso-application progid="Word.Document"?>') Then Exit MsgBox(262144, StringTrimRight(@ScriptName, 4), "tested only in Word.Document XML" & @CR & @CR & "no changes made to clipboard", 20)
Local $sOut = msWordXML_Beautify($s, 2) ; 2=return as beautified string, 1=ConsoleWrite beautified string, 0=return beautified array
ClipPut($sOut)
MsgBox(262144, StringTrimRight(@ScriptName, 4), "clipboard content replaced by beautified XML", 2)


Func msWordXML_Beautify($s, $iEcho = 0)
    Local $iTimer = TimerInit()
    $s = StringReplace($s, @CR, '')
    $s = StringReplace($s, @LF, '')
    Local $a = StringSplit($s, "<")
    Local $b[$a[0] * 2]
    Local $i = 0, $c = ""
    For $x = 1 To $a[0]
        If StringReplace($a[$x], @TAB, "") = "" Then ContinueLoop
        If StringInStr($a[$x], ">") Then $a[$x] = StringReplace($a[$x], @TAB, '')
        $c = StringSplit($a[$x], ">")
        If UBound($c) < 2 Then ContinueLoop
        For $y = 1 To $c[0]
            If $y = 1 Then
                $i += 1
                $b[$i] = "<" & $c[$y] & ">"
            Else
                If $c[$y] = "" Then ContinueLoop
                $i += 1
                $b[$i] = $c[$y]
            EndIf
        Next
    Next
    ReDim $b[$i + 1]
    $b[0] = $i
    For $x = 3 To $b[0]
        If Not StringInStr($b[$x - 1], ">") Then
            $b[$x] = $b[$x - 2] & $b[$x - 1] & $b[$x]
            $b[$x - 2] = "<>"
            $b[$x - 1] = "<>"
        EndIf
    Next
    Dim $c[$b[0] + 1]
    $i = 0
    For $x = 1 To $b[0]
        If $b[$x] = "<>" Then ContinueLoop
        $i += 1
        $c[$i] = $b[$x]
    Next
    $b = $c
    $c = ""
    ReDim $b[$i + 1]
    $b[0] = UBound($b) - 1
    Local $tabs = ""
    For $x = 1 To $b[0]
        $b[$x] = StringStripWS($b[$x], 3)
        If StringLeft($b[$x], 2) = "<!" Then ContinueLoop
        If StringLeft($b[$x], 2) = "<?" Then ContinueLoop
        If StringLeft($b[$x], 1) = "<" And StringRight($b[$x], 2) = "/>" Then
            $b[$x] = $tabs & $b[$x]
            ContinueLoop
        EndIf
        If StringLeft($b[$x], 2) = "</" And StringRight($b[$x], 1) = ">" Then
            $tabs = StringTrimRight($tabs, 1)
            $b[$x] = $tabs & $b[$x]
            ContinueLoop
        EndIf
        If StringLeft($b[$x], 1) = "<" And StringRight($b[$x], 1) = ">" And Not StringInStr($b[$x], '</') Then
            $b[$x] = $tabs & $b[$x]
            $tabs &= @TAB
            ContinueLoop
        EndIf
        $b[$x] = $tabs & $b[$x]
    Next
    ConsoleWrite('+ msWordXML_Beautify done in about ' & Round(TimerDiff($iTimer), 5) & ' mSec.' & @CRLF)
    Local $sOut = ""
    If $iEcho Then
        For $x = 1 To $b[0]
            If $iEcho = 1 Then
                ConsoleWrite( $b[$x] & @CRLF )
            Else
                $sOut &= $b[$x] & @CRLF
            EndIf
        Next
    EndIf
    If $iEcho = 2 Then Return $sOut
    Return $b
EndFunc   ;==>msWordXML_Beautify

..hope it saves time to someone.

Edit 1: it works nice with <?mso-application progid="Excel.Sheet"?>, it may just work with any XML, no clue.

Edit 2: fixed an error in the code

Edited by argumentum

Share this post


Link to post
Share on other sites



Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0

  • Similar Content

    • Jury
      By Jury
      I've failed to find an example of _Word_DocFindReplace which searches for formatted text (I'm looking for stand alone paragraph marks that are formatted other than normal i.e. Bold Italic, Underlined). 
      The reason being that when converting a Word document to html one of the main problems in the results is that a stand alone paragraph mark is converted to an html space that retains the formatting ...>&nbsp;<... thus showing up as a underline _  in a browser when it should be blank.  I've played around with the script and got it to at least un-bold  the first paragraph mark regardless if it was bold or not but I'd like to clear all formatting from any stand alone paragraph marks in the whole document.  Below is what I've done so far (not much more than in the help file I'm afraid) .  Way down at the bottom of the _Word_DocFindReplace  help  text is this parameter but without any examples to be found :
      $bFormat   [optional] True to have the find operation locate formatting in addition to or instead of the find text (default = False) #include <MsgBoxConstants.au3> #include <Word.au3> $processing = @MyDocumentsDir & '\AutoIt_code\getter\processing\' Global $oWord = _Word_Create() Global $sTestfile = $processing & "Testing.docx" ConsoleWrite($sTestfile & @CRLF) Global $oDoc = _Word_DocOpen($oWord, $sTestfile) If @error Then Exit MsgBox($MB_SYSTEMMODAL, "ERROR", "Error opening file = '" & $sTestfile & "'" & @CRLF & "@error = " & @error & ", @extended = " & @extended) $oRangeFound = _Word_DocFind($oDoc, "^p", Default, Default) If @error Then Exit MsgBox($MB_SYSTEMMODAL, "Word UDF: _Word_DocFind Example", _ "Error locating paragraph control character in the document." & @CRLF & "@error = " & @error & ", @extended = " & @extended) $oRangeFound.Bold = False If @error Then Exit MsgBox($MB_SYSTEMMODAL, "Word UDF: _Word_DocFind Example", _ "Error inserting text after the paragraph control character in the document." & @CRLF & "@error = " & @error & _ ", @extended = " & @extended) MsgBox($MB_SYSTEMMODAL, "Word UDF: _Word_DocFind Example", "Paragraph control character successfully replaced." & @CRLF & _ "Text inserted in paragraph 2.")  
    • FrancescoDiMuro
      By FrancescoDiMuro
      Good morning everyone
      I am working on a little script, which takes some data from a SQLite DB and should create a sort of report, inserting rows in a Word Document... I arrived at the point of:
      _Word_DocTableWrite() and, I don't know how to set the range parameter? What does that specify? 
      Thanks a lot for the help
      EDIT:
      Managed to write a table in the Word document, but now I get an error when I save the document with _Word_DocSaveAs(), with error 2.
      Which are possible causes? Thanks a lot, again
      EDIT 2:
      ... And, how can I set a border to the table? Maybe, with a sort of auto-formatting for text ( larger is the text, larger is the height/width of the table's cell ).
      Thanks  
      EDIT 3 ( bug ):
      Including the parameter $WdSaveChanges in the function _Word_DocSaveAs(), a save dialog box appears, and it should not do it, as it's written in the MSDN documentation:
      wdSaveChanges -1 Save pending changes automatically without prompting the user. Thanks again for everyone will answer to me  
    • anthonyjr2
      By anthonyjr2
      I'm using the Word UDF for the first time, and I'm having some trouble with _Word_DocFind(). There isn't really much talk around the forums about this so it's hard to find any support on the issue I'm having. Here's my code:
      #include <Word.au3> $listPath = @ScriptDir & "\AMCH OFFSET 042617.docx" $pWord = _Word_Create() $oWord = _Word_DocOpen($pWord, $listPath) Local $ctr = 0 Local $searchRange = _Word_DocFind($oWord, "Claim Number") If Not @error Then $ctr += 1 EndIf While ($searchRange <> 0) $searchRange = _Word_DocFind($oWord, "Claim Number", 0, $searchRange) If Not @error Then $ctr += 1 EndIf $searchRange.Select WEnd My problem is that it doesn't seem to find a match of the string on any page after the second. When I run the script, it just loops indefinitely on the second page. I can't post an example of the word document because it is medical data, but every page is basically the same and every page has the string I am looking for on it. Also I tried checking @error after doing a find and it is never set, so I don't think that's the problem.
    • Simpel
      By Simpel
      Hi. I'm trying to write a xml. Here is my code:
      #include <_XMLDomWrapper.au3> #include <Date.au3> Global $g_sXMLFileName Global $g_sDestPath = @DesktopDir & "\" Global $g_sReturnedBID = "A10829" _makeXML() _AddXML(1, "A10829_Thomas/wav/T001.wav") _AddXML(2, "A10829_Thomas/wav/T002.wav") Exit Func _makeXML() Local $sXMLtime = StringReplace(StringReplace(StringReplace(_NowCalc()," ","_"),":","-"),"/","-") ; in yyyy-mm-dd_hh-mm-ss $g_sXMLFileName = $g_sDestPath & $g_sReturnedBID & "_" & "EB-Ton-Upload" & "_" & $sXMLtime & ".xml" _XMLCreateFile($g_sXMLFileName, "gemagvl", 1,1) _XMLFileOpen($g_sXMLFileName) EndFunc Func _AddXML($iCount, $sDateiname) _XMLCreateRootNodeWAttr("row", "count", $iCount, "") _XMLCreateChildNode("//row", "picklistenname", $g_sReturnedBID & "_EB-Ton-Upload") _XMLCreateChildNode("//row", "picklisteninfo") _XMLCreateChildNode("//row", "bid", $g_sReturnedBID) _XMLCreateChildNode("//row", "audiodateiname", $sDateiname) _XMLCreateChildNode("//row", "titel", StringTrimRight(StringTrimLeft($sDateiname, 7), 4)) _XMLCreateChildNode("//row", "interpret", "EB") _XMLCreateChildNode("//row", "quelle", "Ton") EndFunc It returns:
      <?xml version="1.0" encoding="UTF-8"?><gemagvl> <row count="1"> <picklistenname>A10829_EB-Ton-Upload</picklistenname> <picklisteninfo/> <bid>A10829</bid> <audiodateiname>A10829_Thomas/wav/T001.wav</audiodateiname> <titel>Thomas/wav/T002</titel> <interpret>EB</interpret> <quelle>Ton</quelle> <picklistenname>A10829_EB-Ton-Upload</picklistenname> <picklisteninfo/> <bid>A10829</bid> <audiodateiname>A10829_Thomas/wav/T002.wav</audiodateiname> <titel>Thomas/wav/T003</titel> <interpret>EB</interpret> <quelle>Ton</quelle> </row> <row count="2"> <picklistenname>A10829_EB-Ton-Upload</picklistenname> <picklisteninfo/> <bid>A10829</bid> <audiodateiname>A10829_Thomas/wav/T002.wav</audiodateiname> <titel>Thomas/wav/T003</titel> <interpret>EB</interpret> <quelle>Ton</quelle> </row> </gemagvl> But it should return:
      <?xml version="1.0" encoding="UTF-8"?><gemagvl> <row count="1"> <picklistenname>A10829_EB-Ton-Upload</picklistenname> <picklisteninfo/> <bid>A10829</bid> <audiodateiname>A10829_Thomas/wav/T001.wav</audiodateiname> <titel>Thomas/wav/T002</titel> <interpret>EB</interpret> <quelle>Ton</quelle> </row> <row count="2"> <picklistenname>A10829_EB-Ton-Upload</picklistenname> <picklisteninfo/> <bid>A10829</bid> <audiodateiname>A10829_Thomas/wav/T002.wav</audiodateiname> <titel>Thomas/wav/T003</titel> <interpret>EB</interpret> <quelle>Ton</quelle> </row> </gemagvl> The second inserted nodes are double. How will it be going right?
      Regards, Conrad
    • rootx
      By rootx
      I need help to read in a loop the DVD id child and subchild. Thx
      Example...
      DVD001 - PAL - EN,FR,DE,ES,IT and filter the right title & descri language.  I tried with $oXML.SelectSingleNode but without success
      <?xml version="1.0" encoding="UTF-8"?> <datafile xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="mydvd.xsd"> <dvd name="My dvd title"> <id>DVD001</id> <region>PAL</region> <languages>EN,FR,DE,ES,IT</languages> <locale lang="EN"> <title>title en</title> <descri>descri en</descri> </locale> <locale lang="FR"> <title>title fr</title> <descri>descri fr </descri> </locale> <locale lang="DE"> <title>title de</title> <descri>descri de </descri> </locale> <locale lang="ES"> <title>title es</title> <descri>descri es</descri> </locale> <locale lang="IT"> <title>title it</title> <descri>descri it</descri> </locale> </dvd> <dvd name="My dvd title 2"> <id>DVD002</id> <region>USA</region> <languages>EN</languages> <locale lang="EN"> <title>title en</title> <descri>descri en</descri> </locale> </dvd> </datafile> #include <File.au3> $xml = @ScriptDir&"\test.xml" Local $oXML = ObjCreate("Microsoft.XMLDOM") $oXML.load($xml) $id = $oXML.SelectNodes("//dvd") For $ids In $id ConsoleWrite($ids.text &@CRLF) Next