Sign in to follow this  
Followers 0
BlazerV60

Comparing 2 word files

13 posts in this topic

Hello everyone,

I was able to make a script that compares 2 txt files and notify me of the differences between the 2 txt files.

 

I did this by using _FileReadToArray() on each txt file and then comparing both arrays for differences.

But in the word.au3 UDF, I don't see a siilar function to _FileReadToArray().

How would I go about creating a script to compare 2 word documents and having the script telling me the differences between both word docs?

Thanks

Share this post


Link to post
Share on other sites



Comparing Word documents is much more complex than comparing simple text files.

Can you describe what you try to do? How about different formatting, tables etc.?


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

several resources at your disposal:

1. Word 2010+ built-in function to visually compare documents side-by-side (Review > Compare)

2. Word COM property or method to retrieve the content of documents (absent from the Word UDF it seems - perhaps water can comment on that?)

3. Word COM method to compare documents, with or without formatting: 

http://msdn.microsoft.com/en-us/library/ff192559(v=office.14).aspx

4. use 7zip to extract the content file from the Word file (DOCX format only!) and parse it for content

5. external utility to extract only text from Word documents (and other formats):

http://freemind.s57.xrea.com/xdocdiffPlugin/en/

6. this is also used as a plugin for WinMerge - highly recommended full featured diff tool.

perhaps other ways exist too. my favorite is WinMerge.

 

and of course as water commented, you should better describe your purpose. what changes interest you: text? formatting? tables? charts? authors? etc.

Edited by orbs

Share this post


Link to post
Share on other sites

Thanks for replying, both of you.

Simply comparing the text difference in each doc is good enough for me. The word doc's formatting, tables, chars or authors can be ignored.

Is there a way to do it without downloading WinMerge?

Here's an example of my script that simply compares text in .txt documents.

#include <File.au3>
#include <Array.au3>
#include <GUIConstantsEx.au3>
#include <GuiListView.au3> ;For making the clear listview function work
#include <GuiListBox.au3> ;For making a guictrlcreatelist
#include <EditConstants.au3> ;Allows positioning of gui input box text
#include <StaticConstants.au3> ;Allows formatting of gui labels
#include <WindowsConstants.au3>

Global $GUI = GuiCreate("Cute Fluffy Hamster Text Compare Tool", 890, 550, 0, 0) ;Creates the GUI

Dim $File1Array[10], $File2Array[10]

Global $FileInfo1 = GUICTRLCREATEListView("Line # | File Document #1", 10, 50, 430, 385, -1) ;The keyboard information box in the middle of the GUI
_GUICtrlListView_SetColumnWidth($FileInfo1, 0, 50) ;Increases the width of the "Line #" column
_GUICtrlListView_SetColumnWidth($FileInfo1, 1, 380) ;Increases the width of the "Document #1" column
_GUICtrlListView_JustifyColumn($FileInfo1, 0, 0) ;"Places the "Document #1" word in the center

Global $FileInfo2 = GUICTRLCREATEListView("Line # | File Document #2", 450, 50, 430, 385, -1) ;The keyboard information box in the middle of the GUI
_GUICtrlListView_SetColumnWidth($FileInfo2, 0, 50) ;Increases the width of the "Line #" column
_GUICtrlListView_SetColumnWidth($FileInfo2, 1, 380) ;Increases the width of the "Document #2" column
_GUICtrlListView_JustifyColumn($FileInfo2, 0, 0) ;"Places the "Document #1" word in the center

Global $Compare = GUICtrlCreateButton("Compare!", 780, 15, 100, 30)

Global $Input1 = GUICTRLCREATEInput("", 120, 20, 265, 22, $ES_READONLY) ;The search term input field
Global $ChooseFile1 = GUICtrlCreateButton("Select File #1", 10, 15, 100, 28)

Global $Input2 = GUICTRLCREATEInput("", 505, 20, 265, 22, $ES_READONLY) ;The search term input field
Global $ChooseFile2 = GUICtrlCreateButton("Select File #2", 395, 15, 100, 28)

Global $Exit = GUICtrlCreateButton("Exit", 10, 495, 870, 50)

Global $InfoConsole = GUICTRLCREATELIST("", 10, 440, 870, 60, -1, $SS_ETCHEDFRAME) ;The console info that describes all the changes



GUISetState(@SW_SHOW) ;Makes the GUI Appear


While 1
   Sleep(10)

   Switch GUIGetMsg()


            Case $GUI_EVENT_CLOSE ;If the "x" button on the GUI is clicked then exit while loop (which will lead to the last line of code which tells GUI to close)
                ExitLoop

                  Case $Exit ;If the exit button is pushed then close the GUI
                ExitLoop

            Case $ChooseFile1
                Local $FileOpen1 = FileOpenDialog("Choose 1st File", @WindowsDir & "\", "Text (*.txt)|Documents (*.doc;*.docx)", $FD_FILEMUSTEXIST + $FD_MULTISELECT)
                If @error Then
                   MsgBox($MB_SYSTEMMODAL, "", "No file(s) were selected.")
                   FileChangeDir(@ScriptDir)
                Else
                   FileChangeDir(@ScriptDir)
                   GUICtrlSetData($Input1, $FileOpen1) ;Change input box to display the opened file location
                   _FileReadToArray ($FileOpen1, $File1Array) ;Stores the contents of the file in an array

                EndIf

            Case $ChooseFile2
                Local $FileOpen2 = FileOpenDialog("Choose 2nd File", @WindowsDir & "\", "Text (*.txt)|Documents (*.doc;*.docx)", $FD_FILEMUSTEXIST + $FD_MULTISELECT)
                If @error Then
                   MsgBox($MB_SYSTEMMODAL, "", "No file(s) were selected.")
                   FileChangeDir(@ScriptDir)
                Else
                   FileChangeDir(@ScriptDir)
                   GUICtrlSetData($Input2, $FileOpen2) ;Change input box to display the opened file location
                   _FileReadToArray ($FileOpen2, $File2Array) ;Stores the contents of the file in an array

                EndIf



            Case $Compare

            If GUICtrlRead($Input1) <> "" and  GUICtrlRead($Input2) <> "" Then ;If there's files in both input boxes then do the comparing

            _GUICtrlListView_DeleteAllItems($FileInfo1) ;Delete all current things on the first listview window
            _GUICtrlListView_DeleteAllItems($FileInfo2) ;Delete all current things on the second listview window
            _GUICtrlListBox_ResetContent($InfoConsole) ;Delete all current things on the console info window

            For $i = 1 To UBound ($File1Array) - 1

         Local $aResult = _ArrayFindAll($File2Array, $File1Array[$i]) ;Search the entire file 2 array looking for each item in file 1
         GUICtrlCreateListViewItem($i &"|"&$File1Array[$i], $FileInfo1) ;Populates the listview window with info

         if ubound($aResult) < 1 and $File1Array[$i] <> "" Then  ;If you find a string in file 2 that doesn't exist in file 1 (in other words, 0 instances of it) Then

            GUICtrlSetBkColor(-1, 0xFFFF00) ;Sets a diff background color indicating the difference

            GUICtrlSetData($InfoConsole, "File #1 contains the word '"&$File1Array[$i]&"' (as seen on line "&$i& ") while File #2 does not contain that word.")
            GUICtrlSetFont($InfoConsole,10)
            GUICtrlSetColor($InfoConsole, 0xFF0000)
            EndIf
         Next

         For $j = 1 To UBound ($File2Array) - 1

         Local $bResult = _ArrayFindAll($File1Array, $File2Array[$j]) ;Search the entire file 1 array looking for each item in file 2
          GUICtrlCreateListViewItem($j &"|"&$File2Array[$j], $FileInfo2) ;Populates the listview window with info

         if ubound($bResult) < 1 and $File2Array[$j] <> "" Then  ;If you find a string in file 1 that doesn't exist in file 2 (in other words, 0 instances of it) Then

            GUICtrlSetBkColor(-1, 0xFFFF00) ;Sets a diff background color indicating the difference_GUICtrlListBox_AddString($InfoConsole, "File #2 contains the word '"&$File2Array[$j]&"' on line "&$j& " while File #1 does not contain that word.")

            GUICtrlSetData($InfoConsole, "File #2 contains the word '"&$File2Array[$j]&"' (as seen on line "&$j& ") while File #1 does not contain that word.")
            GUICtrlSetFont($InfoConsole,10)
            GUICtrlSetColor($InfoConsole, 0xFF0000)
            EndIf
         Next




Else ;If there isn't files in both input boxes then display message
    MsgBox(0,"File(s) Needed", "Please select 2 files")

               EndIf


                EndSwitch

Wend

Even though I allowed .doc's to be an option when you select a file, don't select a doc file :P

And Yes I purposely made my exit button that big :D

Share this post


Link to post
Share on other sites

Just save them in Word as *.txt files.

Then compare them with your tool.


My UDFs, Abbrevs and Snippets
If you like my post, just click the like button :) 

Share this post


Link to post
Share on other sites

Hi Exit,

That was going to be my last resort.

It's just that I prefer not to generate 2 extra files for the user, but if it has to be that way, then so be it

-Brian

Share this post


Link to post
Share on other sites

#7 ·  Posted (edited)

i think your best choice would be this:

5. external utility to extract only text from Word documents (and other formats):

http://freemind.s57.xrea.com/xdocdiffPlugin/en/

 

(this can be used as a plugin for WinMerge, but it is a standalone app as well, with COM support)

It's just that I prefer not to generate 2 extra files for the user, but if it has to be that way, then so be it

 

why? temp files are used all around the place, by practically all apps you can think of.

- if it's security concerns, wipe the temp files when you're done with them.

- if it's space concerns, text files are never that large to be concerned about - especially compared to their Word origin.

and these files are not for the user - you can have your user select a doc file, and before it is processed, your script can convert it to text. this is transparent to the end user.

 

EDIT: your script does not detect change of order of lines. b.t.w. it seems it checks lines, not words. you better rephrase the messages in the info console.

Edited by orbs
1 person likes this

Share this post


Link to post
Share on other sites

#8 ·  Posted (edited)

Hey orbs,

Thanks for the feedback, I'll go with the text file route since that sounds simpler and you're right about how several programs create temp files.

And yeah my script only detects changes in lines atm, not each word individually. I'm working on fine tuning it to specifically look for each individual word right now.

Also, it seems that I can't see the scrollbar in my Info Console D:

Thanks again

Edited by BlazerV60

Share this post


Link to post
Share on other sites

#9 ·  Posted (edited)

perhaps this is what you need?

#NoTrayIcon

$prog = "WordCMP"

if $CmdLine[0] < 2 then
    msgbox(0, $prog, "Use:  " & $prog & ' "<Full Path Name 1>" "<Full Path Name 2>"', 10 )
    exit
endif

$doc1 = $CmdLine[1]
$doc2 = $CmdLine[2]

if FileExists ( $doc1 ) == 0 then
    msgbox(0, $prog, "File " & $doc1 & " not found!")
    exit
endif

if FileExists ( $doc2 ) == 0 then
    msgbox(0, $prog, "File " & $doc2 & " not found!")
    exit
endif

RegWrite("HKEY_CURRENT_USER\Software\Classes\CLSID\{000209FE-0000-0000-C000-000000000046}\LocalServer32", "LocalServer32", "REG_MULTI_SZ", "']gAVn-}f(ZXfeAR6.jiWORDFiles>P`os,1@SW=P7v6GPl]Xh /safe /Automation")
RegWrite("HKEY_CURRENT_USER\Software\Classes\CLSID\{000209FF-0000-0000-C000-000000000046}\LocalServer32", "LocalServer32", "REG_MULTI_SZ", "']gAVn-}f(ZXfeAR6.jiWORDFiles>P`os,1@SW=P7v6GPl]Xh /safe /Automation")

_Msg("WordCMP running ...", 1)
$oWord = ObjCreate("Word.Application")
$oWord.Visible = 0

_Msg("Loading doc1 ...", 1)
$docA = $oWord.Documents.Open( $doc1)

_Msg("Loading doc2 ...", 1)
$docB = $oWord.Documents.Open( $doc2)

_Msg("Comparing doc1 and doc2 ...", 1)
$docC = $oWord.CompareDocuments($docA, $docB, 2, 1, 1, 1)

$docA.close
$docB.close

$oWord.Visible = 1
$oWord.DisplayAlerts = 0

_Msg($prog, 0)

RegDelete ("HKEY_CURRENT_USER\Software\Classes\CLSID\{000209FE-0000-0000-C000-000000000046}\LocalServer32")
RegDelete ("HKEY_CURRENT_USER\Software\Classes\CLSID\{000209FF-0000-0000-C000-000000000046}\LocalServer32")

Func _Msg($msg, $state)
    $Width = StringLen ($msg) * 8
    $Height = 40
    $left = @DesktopWidth - $Width - 10
    $top = @DesktopHeight - $Height - 40

    if $state = 1 then
        SplashTextOn ( "", $msg, $Width, $Height, $left, $top, 5, "Tahoma", 11)
    else
        SplashOff ( )
    EndIf
EndFunc
Edited by Melba23
Added code tags

Share this post


Link to post
Share on other sites

keldepulo,

When you post code please use Code tags - see here how to do it. Then you get a scrolling box and syntax colouring as you can see above now I have added the tags. ;)

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites

Hi keldepulo, I will definitely your method when I get home (I'm currently at work). I assume all Microsoft word versions have the .CompareDocuments method? :D

You also gave me the idea that if I do indeed decide to generate 2 extra files for the user, then I should place them somewhere in the HKEY_CURRENT_USER directory and then delete them once my program is done comparing. If I hadn't read read your post, I would have placed the 2 extra files somewhere in the same directory as the person's word docs or something D:

Share this post


Link to post
Share on other sites

Hi Blazer,

I did it for the version 2010, I haven't previous or following versions to test it.

1 person likes this

Share this post


Link to post
Share on other sites
It's just that I prefer not to generate 2 extra files for the user, but if it has to be that way, then so be it

 

Why ? The 2 files could be in a temp directory, just for reading them, you just have to delete after comparing them.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0