orbs

SPDiff: Single-Pane Text Diff

22 posts in this topic

#1 ·  Posted (edited)

 
Features
 
Single Pane Layout offers a more efficient use of screen area, often put to waste by the common side-by-side layout. The single pane displays a composition of the two compared texts, with the removed and added contents highlighted in distinctive colors.
 
Character-Level Comparison delivers better precision than the commonly used line-level comparison.
 
Minimalistic, Highly Functional Toolbar: drop-down list of diffs for quick navigation, step through diffs back and forth, and search the removed/added contents as well as the entire text. Keyboard shortcuts are supported - hover over a toolbar icon for info.
 
External Application Integration via command-line switches and exit codes - as well as custom title and colors set, to match the calling application appearance.
 
Fully Portable, self-contained single executable.
 
and many more - launch it and click the "Info" icon (right-most toolbar icon) for full details.
 
 
Under The Hood
 
SPDiff employs the GNU diff utility (ported to Windows by the GnuWin32 project), but extends its core functionality from line-based to character-based comparison. In practice, SPDiff is a pre-processor, post-processor and GUI wrapper for the GNU diff utility.
 
 
Examples
 
Very Simple Example
 
files contents:

the older file "text_B_simple.txt" :
wordAwordBwordCwordDwordEwordF
wordA
wordB
wordC
wordD
wordE
wordF

the newer file "text_A_simple.txt" :

wordA1234567890wordBABCDEFGHIJwordC0987654321wordDQWERTYUIOPwordEASDFGHJKL:wordF
wordA
1234567890
wordB
ABCDEFGHIJ
wordC
0987654321
wordD
QWERTYUIOP
wordE
ASDFGHJKL:
wordF

the diff display:

post-47848-0-22445200-1424338746_thumb.p

 
Slightly Less Simple Example
 
files contents:

the older file "text_B.txt" :
{
identical very long section
identical very long section
identical very long section
identical very long section
identical very long section
}
{remove entire line}
{similar string entire line}
{similar string entire line}
{remove}{similar string at end of line}
{similar string at beginning of line}{remove}
{remove}{similar string at middle of line}{remove}
{remove entire line}
moved to beginning
moved to beginning
moved to beginning
{line remains at end}

the newer file "text_A.txt" :

moved to beginning
moved to beginning
moved to beginning
{
identical very long section
identical very long section
identical very long section
identical very long section
identical very long section
}
{
add new section
add new section
add new section
add new section
add new section
}
{add new line}
{similar string entire line}
{similar string entire line}
{similar string at end of line}{add at end of line}
                    {20 spaces added at beginning}{similar string at beginning of line}
{add}{similar string at middle of line}{add}
{add new line}
{line remains at end}

the diff display:

post-47848-0-27123800-1424338863_thumb.p

 
 
Further Development
 
Tolerance:
 
first thing's first: although character-based comparison has clear advantage over line-based comparison, it does have its cons: as-is, it is quite strict. in the 2nd example, you see how a complete paragraph was added, so you'd expect to see something like this:
 
{remove entire line}
{
add new section
add new section
add new section
add new section
add new section
}
{add new line}
 
but because there are common individual characters. you get the mess you see at the screenshot.
 
so, next big feature would be "tolerance": not quite defined yet, but basically will try to detect when individual characters are messing-up the display.
 
 
Community Help & Feedback
 
using GNU diff is due to the simple fact that i was unable to implement the "longest common sequence" algorithm in pure AutoIt. although it looks rather simple, i failed miserably in the attempt. perhaps it's because i'm no programmer; so i'm hoping an experienced programmer can pick that gauntlet up.
thanks to Melba23 for trying to implement the diff algorithm in pure AutoIt. conclusion is that it is slower compared to using the external diff utility, so no updates on this front for now.
 
 
other than that - use it, let me know what you think!
 
 
Download:
 
v1.0.0.1: bugfix (>post #22)
the archive includes: compiled exe, au3 script, icon, the GNU diff utility, a PDF brief documentation sheet, and the example files.
 
previous versions:

only the last version is available for download due to forum limitations. this is an extremely brief change log for reference.
 
v1.0: comprehensive color set (>post #19)
v0.9: small performance improvement (not released)
v0.8: small fixes (>post #18)
v0.7: "Word" icon to view diff result in your default RTF viewer (>post #17)
v0.6: added "Recently Compared" list (>post #16)
v0.5: minor improvements (>post #15)
v0.4: minor improvements (>post #14)
v0.3: shift highlight back to beginning of line (>post #11)
v0.2: switch from pane view to selection bar & info link (>post #10)
v0.1: initial release

Edited by orbs

Share this post


Link to post
Share on other sites



orbs,

I have never used python, but the syntax did not seem that difficult to follow. Here is the diff code translated into AutoIt with a short example:

;""" Usage: python diff.py FILE1 FILE2
;A primitive `diff` in 50 lines of Python.
;Explained here: http://pynash.org/2013/02/26/diff-in-50-lines.html

; ###########################################

; Example

Local $aArray_1[] = [1, 2, 3, 3, 3, 4, 5, 6, 7, 8, 9]
Local $aArray_2[] = [1, 2, 2, 3, 3, 4, 1, 6, 7, 8, 9]

$sRet = _Matching_Slices($aArray_1, 0, 10, $aArray_2, 0, 10)

; Remove final EOL
$sRet = StringTrimRight($sRet, 2)

; And display - Start point in Array_1 | Start point in Array_2 | Length of match
ConsoleWrite($sRet & @CRLF)

; ###########################################

;def longest_matching_slice(a, a0, a1, b, b0, b1):
Func _Longest_Matching_Slice($aArray_1, $iArray_1_Start, $iArray_1_End, $aArray_2, $iArray_2_Start, $iArray_2_End)

;    sa, sb, n = a0, b0, 0
    Local $iBreak_1 = $iArray_1_Start
    Local $iBreak_2 = $iArray_2_Start
    Local $n = 0

;    runs = {}
    Local $aRuns[$iArray_2_End + 1]

;    for i in range(a0, a1):
    For $i = $iArray_1_Start To $iArray_1_End

;        new_runs = {}
        Local $aNew_Runs[$iArray_2_End + 1]

;        for j in range(b0, b1):
        For $j = $iArray_2_Start To $iArray_2_End

;            if a[i] == b[j]:
            If $aArray_1[$i] = $aArray_2[$j] Then

;                k = new_runs[j] = runs.get(j-1, 0) + 1
                If $j Then
                    $aNew_Runs[$j] = $aRuns[$j - 1] + 1
                Else
                    $aNew_Runs[$j] = 1
                EndIf
                $k = $aNew_Runs[$j]

;                if k > n:
                If $k > $n Then

;                   sa, sb, n = i-k+1, j-k+1, k
                    $iBreak_1 = $i - $k + 1
                    $iBreak_2 = $j - $k + 1
                    $n = $k

                EndIf
            EndIf
        Next

;        runs = new_runs
        $aRuns = $aNew_Runs

    Next

;    assert a[sa:sa+n] == b[sb:sb+n]

;    return sa, sb, n
    Local $aRet[3] = [$iBreak_1, $iBreak_2, $n]
    Return $aRet

EndFunc

;def matching_slices(a, a0, a1, b, b0, b1):
Func _Matching_Slices($aArray_1, $iArray_1_Start, $iArray_1_End, $aArray_2, $iArray_2_Start, $iArray_2_End)

;   sa, sb, n = longest_matching_slice(a, a0, a1, b, b0, b1)
    Local $aRet = _Longest_Matching_Slice($aArray_1, $iArray_1_Start, $iArray_1_End, $aArray_2, $iArray_2_Start, $iArray_2_End)

    Local $iBreak_1 = $aRet[0]
    Local $iBreak_2 = $aRet[1]
    Local $n = $aRet[2]

    $sRet = $iBreak_1 & "|" & $iBreak_2 & "|" & $n & @CRLF

;    if n == 0:
    If $n < 2 Then

;       return []
        Return ""

    Else

;   return (matching_slices(a, a0, sa, b, b0, sb) +
;            [(sa, sb, n)] +
;            matching_slices(a, sa+n, a1, b, sb+n, b1))

        Return  _Matching_Slices($aArray_1, $iArray_1_Start, $iBreak_1, $aArray_2, $iArray_2_Start, $iBreak_2) & _
                $sRet & _
                _Matching_Slices($aArray_1, $iBreak_1 + $n, $iArray_1_End, $aArray_2, $iBreak_2 + $n, $iArray_2_End)

    EndIf

EndFunc
M23

Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites

thank you Melba23!

however i think i see an error in this (which may well origin from the source python script) - the result from the example are as follows:

0|0|2
1|2|3
7|7|4
 
that is, 3 chunks of identical characters:
 
the first begins in array1 at index0 (first char), in array2 at index0, and is 2 characters long - that's fine, as both arrays begin with 1,2, ...
the third chunk is also fine, as both arrays are identical there (... ,6,7,8,9).
but the second chunk - begins in array1 at index1, in array2 at index2, and is 3 characters long - that's also fine, as both arrays indeed are identical there (... ,2,3,3, ...). however there is an overlap between this chunk and the first chunk: array1 index1 is included in both chunks.
 
overlaps are not valid results for this algorithm - it should decide whether to report the 1st chunk or the 2nd chunk (the decision is obvious here - it should be 2nd as it's longer). and if this algorithm considers overlaps to be valid results (handing the post-processing over to the calling script), why did it not respond with 3,3,3 , which is also identical chunk (... ,3,3,4, ...) ?

Share this post


Link to post
Share on other sites

orbs,

The results are what the algorithm to which you linked returned - I made no changes to the code other than a direct translation into AutoIt. I was as surprised as you when the results appeared and I had to examine the arrays carefully to confirm that they were indeed correct - and as you have pointed out, there is another section which should have been included (although looking at the algorithm I can see why it was not). ;)

I think you need a different algorithm if you want the results you appear to require. :)

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites

i appreciate your efforts Melba23, but for now i must relinquish this. in the past two weeks or so i saw many versions of this algorithm in many languages, yet i was unable to track its logic, and other issues now claim their share of my time.

Share this post


Link to post
Share on other sites

orbs,

I hate being beaten by a simple algorithm. So as it was a very foggy morning (and therefore no golf or aviation) I have amended the code a bit and it now seems to work much better. :)

Here is an example script to show it working:

#include <GUIConstantsEx.au3>
#include <WindowsConstants.au3>
#include <GUIRichEdit.au3>
#include <Array.au3>
#include <Color.au3>

; Only needed for _StringToArray function
Global Const $STR_COUNT = 0, $STR_EXPAND = 4

$sRTF_Initiate = "RTF_Initiate"

$sText_1 = "Lorem ipsum new_text_in_1 dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua." & @CRLF & _
            "Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat." & @CRLF & _
            "New_line_ in_1" & @CRLF & _
            "Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur." & @CRLF & _
            "Excepteur sint occaecat new_text_in_1 cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum new_text_in_1."

$sText_2 = "Add_text_in_2 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua." & @CRLF & _
            "Ut enim ad minim veniam, quis nostrud add_text_in_2 exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat." & @CRLF & _
            "Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla add_text_in_2 pariatur." & @CRLF & _
            "Add_line_in_2" & @CRLF & _
            "Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."

SplashTextOn("Diff", "Producing diff" & @CRLF & @CRLF & "Please be patient")

; Create diff text
$sDiffText = _Create_Diff($sText_1, $sText_2)

SplashOff()

$hGUI = GUICreate("Diff Test", 500, 500)

$hRichEdit = _GUICtrlRichEdit_Create($hGUI, $sRTF_Initiate, 10, 10, 400, 400)

; Make sure there is no selection
_GUICtrlRichEdit_Deselect($hRichEdit)
; Extract the RTF stream
$sContent = _GUICtrlRichEdit_StreamToVar($hRichEdit)
; Split header and footer
$aHeader = StringSplit($sContent, $sRTF_Initiate, $STR_ENTIRESPLIT)

; Define the highlight colour table
Local $aColours[2] = [0xFFCCCC, 0xCCFFCC]
$sColDef = "{\colortbl ;"
For $i = 0 To UBound($aColours) - 1
    $sColDef &= "\red" & _ColorGetRed($aColours[$i]) & "\green" & _ColorGetGreen($aColours[$i]) & "\blue" & _ColorGetBlue($aColours[$i]) & ";"
Next
$sColDef &= "}"
; And add to header as line 2
$aHeader[1] = StringReplace($aHeader[1], ";}}", ";}}" & @CRLF & $sColDef, 1)

; Recreate and load stream
_GUICtrlRichEdit_StreamFromVar($hRichEdit, $aHeader[1] & StringReplace($sDiffText, @CRLF, "\par" & @CRLF) & $aHeader[2])

GUISetState()

While 1
    Switch GUIGetMsg()
        Case $GUI_EVENT_CLOSE
            Exit
    EndSwitch
WEnd

Func _Create_Diff($sText_1, $sText_2)

    ; Convert strings to arrays
    $aText_1 = StringSplit($sText_1, "")
    $aText_2 = StringSplit($sText_2, "")

    ; Get array of matches
    $sRet = _Get_Matches($aText_1, 1, $aText_1[0], $aText_2, 1, $aText_2[0])
    ; Convert to array
    $aRet = _StringToArray(StringTrimRight($sRet, 2), $STR_ENTIRESPLIT, @CRLF, "|")

    ; Set start conditions
    $sDiffString = "\highlight0 "
    $iIndex_1 = 1
    $iIndex_2 = 1
    ; And loop through matches to create diff string
    For $iIndex_Ret = 0 To UBound($aRet) - 1
        ; Extract indices and count for next common stretch
        $iCommon_1 = $aRet[$iIndex_Ret][0]
        $iCommon_2 = $aRet[$iIndex_Ret][1]
        ; Add any unique characters from String 1
        If $iIndex_1 <> $iCommon_1 Then
            ; Additional chars from String 1
            $iCount = $iCommon_1 - $iIndex_1
            $sDiffString &= "\highlight1 " & StringMid($sText_1, $iIndex_1, $iCount) & "\highlight0 "
            ; Reset index
            $iIndex_1 = $iCommon_1
        EndIf
        ; And now from String 2
        If $iIndex_2 <> $iCommon_2 Then
            $iCount = $iCommon_2 - $iIndex_2
            $sDiffString &= "\highlight2 " & StringMid($sText_2, $iIndex_2, $iCount) & "\highlight0 "
            $iIndex_2 = $iCommon_2
        EndIf

        ; Now add next common character string
        $iCount = $aRet[$iIndex_Ret][2]
        $sDiffString &= StringMid($sText_1, $iCommon_1, $iCount)

        ; Increase string index counts
        $iIndex_1 += $iCount
        $iIndex_2 += $iCount

    Next

    ; Check at end of strings and add any remaining characters
    If $iIndex_1 <= StringLen($sText_1) Then
        $sDiffString &= "\highlight1 " & StringMid($sText_1, $iIndex_1) & "\highlight0 "
    EndIf
    If $iIndex_2 <= StringLen($sText_2) Then
        $sDiffString &= "\highlight2 " & StringMid($sText_2, $iIndex_2) & "\highlight0 "
    EndIf

    ; Return diff string
    Return $sDiffString

EndFunc

Func _Get_Matches($aArray_1, $iArray_1_Start, $iArray_1_End, $aArray_2, $iArray_2_Start, $iArray_2_End)

    Local $aRet = _Longest_Match($aArray_1, $iArray_1_Start, $iArray_1_End, $aArray_2, $iArray_2_Start, $iArray_2_End)

    Local $iBreak_1 = $aRet[0]
    Local $iBreak_2 = $aRet[1]
    Local $n = $aRet[2]

    $sRet = $iBreak_1 & "|" & $iBreak_2 & "|" & $n & @CRLF

    If $n = 0 Then ; < 2 Then
        Return ""
    Else
        Return  _Get_Matches($aArray_1, $iArray_1_Start, $iBreak_1 - 1, $aArray_2, $iArray_2_Start, $iBreak_2 - 1) & _
                $sRet & _
                _Get_Matches($aArray_1, $iBreak_1 + $n, $iArray_1_End, $aArray_2, $iBreak_2 + $n, $iArray_2_End)
    EndIf

EndFunc

Func _Longest_Match($aArray_1, $iArray_1_Start, $iArray_1_End, $aArray_2, $iArray_2_Start, $iArray_2_End)

    Local $iBreak_1 = $iArray_1_Start
    Local $iBreak_2 = $iArray_2_Start
    Local $n = 0
    Local $aRuns[$iArray_2_End + 1]

    For $i = $iArray_1_Start To $iArray_1_End

        Local $aNew_Runs[$iArray_2_End + 1]

        For $j = $iArray_2_Start To $iArray_2_End
            If $aArray_1[$i] = $aArray_2[$j] Then
                If $j Then
                    $aNew_Runs[$j] = $aRuns[$j - 1] + 1
                Else
                    $aNew_Runs[$j] = 1
                EndIf
                $k = $aNew_Runs[$j]
                If $k > $n Then
                    $iBreak_1 = $i - $k + 1
                    $iBreak_2 = $j - $k + 1
                    $n = $k
                EndIf
            EndIf
        Next

        $aRuns = $aNew_Runs

    Next

    Local $aRet[3] = [$iBreak_1, $iBreak_2, $n]
    Return $aRet

EndFunc

Func _StringToArray($sString, $iFlags = $STR_COUNT, $sRowDelimiter = "", $sColDelimiter = "")

    Local $aRetArray

    If $iFlags = Default Then $iFlags = $STR_COUNT
    If $sRowDelimiter = Default Then $sRowDelimiter = ""
    If $sColDelimiter = Default Then $sColDelimiter = ""

    ; Set delimiter flag
    Local $iEntire = $STR_CHRSPLIT
    If BitAND($iFlags, $STR_ENTIRESPLIT) Then
        $iEntire = $STR_ENTIRESPLIT
        $iFlags -= $STR_ENTIRESPLIT
    EndIf
    ; Set "expand" flag
    Local $bExpand = False
    If BitAND($iFlags, $STR_EXPAND) Then
        $bExpand = True
        $iFlags -= $STR_EXPAND
    EndIf
    ; Set row count flags
    Local $iNoCount = 0
    If $iFlags <> $STR_COUNT Then
        $iFlags = $STR_NOCOUNT
        $iNoCount = $STR_NOCOUNT
    EndIf

    ; Check for column delimiter
    If StringLen($sColDelimiter) Then
        ; Split string into an array
        Local $aLines = StringSplit($sString, $sRowDelimiter, $iEntire + $STR_NOCOUNT)
        If @error Then Return SetError(1, 0, 0)
        ; Get first dimension and add count if required
        Local $iDim_1 = UBound($aLines) + $iFlags
        ; All lines must have same number of fields
        ; Count fields in first line
        Local $iDim_2 = UBound(StringSplit($aLines[0], $sColDelimiter, $iEntire + $STR_NOCOUNT))
        ; Size array
        Local $aRetArray[$iDim_1][$iDim_2]
        ; Declare the variables
        Local $iFields, $aSplit
        ; Loop through the lines
        For $i = 0 To $iDim_1 - $iFlags - 1
            ; Split each line as required
            $aSplit = StringSplit($aLines[$i], $sColDelimiter, $iEntire + $STR_NOCOUNT)
            ; Count the items
            $iFields = UBound($aSplit)
            ; Resize array if required
            If $iFields > $iDim_2 Then
                If $bExpand Then
                    $iDim_2 = $iFields
                    ReDim $aRetArray[$iDim_1][$iDim_2]
                Else
                    Return SetError(3, 0, 0)
                EndIf
            EndIf
            ; Fill this line of the array
            For $j = 0 To $iFields - 1
                $aRetArray[$i + $iFlags][$j] = $aSplit[$j]
            Next
        Next
        ; Check at least 2 columns
        If $iDim_2 < 2 Then Return SetError(2, 0, 0)
        ; Set dimension count
        If $iFlags Then
            $aRetArray[0][0] = $iDim_1 - $iFlags
            $aRetArray[0][1] = $iDim_2
        EndIf
    Else ; 1D
        $aRetArray = StringSplit($sString, $sRowDelimiter, $iEntire + $iNoCount)
        If @error Then Return SetError(@error, 0, 0)
    EndIf
    Return $aRetArray
EndFunc   ;==>_StringToArray
M23

P.S. You also get a sneak preview of the Beta _StringToArray function I have been developing. ;)


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites

i was curious to see the amendments you applied to the code. so naturally, i fed the older algorithm (post #2) and the newer one (post #6) as input to the newer algorithm (note: before comparison, both texts were edited to remove comments, blank lines and spaces). from the result, i see you amended the scanning index of the first part of text of next recursion level, and modified the stop condition:

post-47848-0-07623500-1424685255_thumb.p

however, it is rather slow. it took 3.5 sec to process, while feeding the same input files to the compiled SPDiff.exe (v0.1 at post #1) took less than 1 sec to complete. i assume you must have noticed it yourself, or you wouldn't have added the splash... processing yet larger files demonstrates a radical increase in processing time.

i'm still struggling to fully understand how it works, but i think i did manage to understand that there is recursion into text parts that are separated by the longest match found at previous recursion level. if it is possible to reduce processing time, perhaps it would be possible to remember "candidates" for longest match of each level (which i think are found anyway during the scan, right?), and when analyzing next level, use those candidates to pick from, instead of scanning the entire range of characters again?

(i'm assuming the time-consuming part is the loop that scans and compares the characters, i may be wrong though).

Share this post


Link to post
Share on other sites

orbs,

 

however, it is rather slow. it took 3.5 sec to process, while feeding the same input files to the compiled SPDiff.exe (v0.1 at post #1) took less than 1 sec to complete

Only to be expected as AutoIt is interpreted while the other is using a compiled GNU diff utility - I would have been amazed had there not been a significant difference in execution time. ;)

 

i'm still struggling to fully understand how it works

You have it right, it looks for the longest matching string and then recursively looks for the longest in what remains on either side of that string until there is nothing left to search or no matches are found - quite a clever little algorithm. But as you suggest, probably not the fastest and, given the title of the thread in which you found it, I get the impression that the author was going for compact code rather than speed. :)

There may well be faster ways to get all the matching strings and short-circuit the recursion, but I am afraid that I am not interested in developing any of them. I only continued as far as I did because I hate being beaten by an algorithm as seemingly simple as that one. :D

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites

There may well be faster ways to get all the matching strings and short-circuit the recursion

 

i just checked and saw that it's not the recursion that takes time - so my thinking went astray. the very first call the find the longest match takes a good 2.7 sec.

as i mentioned, i saw several versions of algorithms that solve the "longest common sequence" problem, including the source code of the very same diff utility i'm using - they all look quite similar in their approach (scan until you find). so i guess AutoIt is not well suited for crunching data... :(  never mind. i'll keep working with what i have so far. thanks for your contribution Melba23, i learnt quite a bit!

Share this post


Link to post
Share on other sites

#10 ·  Posted (edited)

v0.2: switch from pane view to selection bar & info link (updated at first post)

the file selection dialog was exported to a dedicated GUI called "Selection Bar". it's set on top and designed to take as little space as possible, so it's easier to drag & drop files into the input fields:

post-47848-0-83750100-1424770112_thumb.p

also, clicking the "Info" icon now opens a PDF brief documentation sheet.

Edited by orbs

Share this post


Link to post
Share on other sites

#11 ·  Posted (edited)

v0.3: shift highlight back to beginning of line:
 
a scenario like this:
 
post-47848-0-75902900-1424775539_thumb.p
 
although technically correct, it would be more appropriate if it had looked like this:
 
post-47848-0-20303100-1424775549_thumb.p
 
this is now implemented.
 
and bugfix: "Files Analysis In Progress..." splash does not disappear.
Edited by orbs

Share this post


Link to post
Share on other sites

For file comparison, I have had success with the levenshtein algorithm. Unfortunately my code was not in autoit. However if you have a file matching/comparison problem, you could do worse than take a look at this.

http://en.wikipedia.org/wiki/Levenshtein_distance

For more googling, Levenshtein is one solution to the "longest common subsequence" problem.

Share this post


Link to post
Share on other sites

For file comparison, I have had success with the levenshtein algorithm. Unfortunately my code was not in autoit. However if you have a file matching/comparison problem, you could do worse than take a look at this.

http://en.wikipedia.org/wiki/Levenshtein_distance

For more googling, Levenshtein is one solution to the "longest common subsequence" problem.

 

thanks, i'll certainly have a look.

Share this post


Link to post
Share on other sites
v0.4: minor improvements
 
toolbar color default: changed from white to bright gray
bugfix: if no files selected, Aerial View crash. now main view is not available if no files compared
info sheet accessible from the selection bar
full file names in title, longest common path part is consolidated
"Files Analysis In Progress..." splash is now integrated in the selection bar
files time precheck: if time of "older" file is newer than time of "newer" file, then user is asked if to switch the files

Share this post


Link to post
Share on other sites
v0.5: minor improvements
 
selection bar clear input before accept drag, only one file
shift right single @CR, also at end of text
flash selection bar
cancel Enter on selection bar

Share this post


Link to post
Share on other sites

#16 ·  Posted (edited)

v0.6: "Recently Compared" list (a.k.a. MRU)

click the selection bar "Recently Compared" icon (the left-most icon) for a drop-down list of... well... recently compared files (up to 10 comparisons displayed):

post-47848-0-80204800-1425027965_thumb.p

Note:

by default, your selection will be automatically accepted. uncheck the option "Accept Selection" to have your selection files only placed into the input fields, so you can change one of the selected files manually.

upon selection, files are re-compared. there is no reliance on earlier comparison results.

Edited by orbs

Share this post


Link to post
Share on other sites

#17 ·  Posted (edited)

v0.7: "Word" icon to view diff result in your default RTF viewer

(commonly Microsoft Word)

from there you can save it as independent RTF file, print it, share it with your colleagues, etc.

and bugfix: duplicate MRU entries added if files are specified at cmdline

Edited by orbs

Share this post


Link to post
Share on other sites

v0.8: small fixes:

- ctrl+W was accidentally not defined.

- cursor is always pointer, not a caret.

- when jumping to diff, or when returning from aerial view, the selected diff may be positioned one line up of the view.

Share this post


Link to post
Share on other sites

SPDiff has reached the v1.0 milestone!

featuring some performance enhancements, and a comprehensive coloring parameter set - you can now specify the color of practically every element. excerpt from the info sheet - the elements whose colors can be set:

1: toolbar background
2: toolbar icons
3: toolbar separators
4: toolbar selected icons (currently applicable only for Aerial View)
5: toolbar input field background
6: toolbar input field text
7: main pane background
8: main pane text
9: removed text highlight
10: removed text
11: added text highlight
12: added text
It is not mandatory to specify all colors; specify -1 for colors you wish to leave at default.
 
Custom Colors Example: “Savannah” theme - featuring brown, beige and green:
/color:FEDCBA,005522,993300,009900,-1,005522,-1,772200
1 person likes this

Share this post


Link to post
Share on other sites

Can you add a feature:

display old text in right pane ?


Signature beginning:   Wondering who uses AutoIT and what it can be used for ?
* GHAPI UDF - modest begining - comunication with GitHub REST API *
ADO.au3 UDF     POP3.au3 UDF     XML.au3 UDF    How to use IE.au3  UDF with  AutoIt v3.3.14.x  for other useful stuff click the following button

Spoiler

Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind. 

My contribution (my own projects): * Debenu Quick PDF Library - UDF * Debenu PDF Viewer SDK - UDF * Acrobat Reader - ActiveX Viewer * UDF for PDFCreator v1.x.x * XZip - UDF * AppCompatFlags UDF * CrowdinAPI UDF * _WinMergeCompare2Files() * _JavaExceptionAdd() * _IsBeta() * Writing DPI Awareness App - workaround * _AutoIt_RequiredVersion() * Chilkatsoft.au3 UDF * TeamViewer.au3 UDF * JavaManagement UDF * VIES over SOAP * WinSCP UDF * GHAPI UDF - modest begining - comunication with GitHub REST API *

My contribution to others projects or UDF based on  others projects: * _sql.au3 UDF  * POP3.au3 UDF *  RTF Printer - UDF * XML.au3 - BETA * ADO.au3 UDF SMTP Mailer UDF *

Useful links: * Forum Rules * Forum etiquette *  Forum Information and FAQs * How to post code on the forum * AutoIt Online Documentation * AutoIt Online Beta Documentation * SciTE4AutoIt3 getting started * Convert text blocks to AutoIt code * Games made in Autoit * Programming related sites * Polish AutoIt Tutorial * DllCall Code Generator * 

Wiki: Expand your knowledge - AutoIt Wiki * Collection of User Defined Functions * How to use HelpFile * Best coding practices * 

IE Related:  * How to use IE.au3  UDF with  AutoIt v3.3.14.x * Why isn't Autoit able to click a Javascript Dialog? * Clicking javascript button with no ID * IE document >> save as MHT file * IETab Switcher (by LarsJ ) * HTML Entities * _IEquerySelectorAll() (by uncommon) * 

I encourage you to read: * Global Vars * Best Coding Practices * Please explain code used in Help file for several File functions * OOP-like approach in AutoIt * UDF-Spec Questions *  EXAMPLE: How To Catch ConsoleWrite() output to a file or to CMD *

"Homo sum; humani nil a me alienum puto" - Publius Terentius Afer
"Program are meant to be read by humans and only incidentally for computers and execute" - Donald Knuth, "The Art of Computer Programming"
:naughty:  :ranting:, be  :) and       \\//_.

Anticipating Errors :  "Any program that accepts data from a user must include code to validate that data before sending it to the data store. You cannot rely on the data store, ...., or even your programming language to notify you of problems. You must check every byte entered by your users, making sure that data is the correct type for its field and that required fields are not empty."

Signature last update: 2017-06-04

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now