Jump to content

Natural Order String Comparison


wraithdu
 Share

Recommended Posts

Update 8/5/2011:

I've updated the _NaturalCompare function. Foremost, it will cope with very long strings of numbers without overflowing. I've also added an implementation of _ArraySort that takes a custom sorting function, and implemented _ArrayNaturalSort in this way.

This is a different algorithm than the other I found on the board. To me, it seems more efficient.

_NaturalCompare

; #FUNCTION# ====================================================================================================================
; Name...........: _NaturalCompare
; Description ...: Compare two strings using Natural (Alphabetical) sorting.
; Syntax.........: _NaturalCompare($s1, $s2[, $iCase = 0])
; Parameters ....: $s1, $s2 - Strings to compare
;                  $iCase   - [Optional] Case sensitive or insensitive comparison
;                  |0 - Case insensitive (default)
;                  |1 - Case sensitive
; Return values .: Success - One of the following:
;                  |0  - Strings are equal
;                  |-1 - $s1 comes before $s2
;                  |1  - $s1 goes after $s2
;                  Failure - Returns -2 and Sets @Error:
;                  |1 - $s1 or $s2 is not a string
;                  |2 - $iCase is invalid
; Author ........: Erik Pilsits
; Modified.......:
; Remarks .......: Original algorithm by Dave Koelle
; Related .......: StringCompare
; Link ..........: http://www.davekoelle.com/alphanum.html
; Example .......: Yes
; ===============================================================================================================================
Func _NaturalCompare($s1, $s2, $iCase = 0)
    ; check params
    If (Not IsString($s1)) Then $s1 = String($s1)
    If (Not IsString($s2)) Then $s2 = String($s2)
    ; check case, set default
    If $iCase <> 0 And $iCase <> 1 Then $iCase = 0

    Local $n = 0
    Local $s1chunk, $s2chunk
    Local $idx, $i1chunk, $i2chunk
    Local $s1temp, $s2temp

    While $n = 0
        ; get next chunk
        ; STRING 1
        $s1chunk = StringRegExp($s1, "^(\d+|\D+)", 1)
        If @error Then
            $s1chunk = ""
        Else
            $s1chunk = $s1chunk[0]
        EndIf
        ; STRING 2
        $s2chunk = StringRegExp($s2, "^(\d+|\D+)", 1)
        If @error Then
            $s2chunk = ""
        Else
            $s2chunk = $s2chunk[0]
        EndIf

        ; ran out of chunks, strings are the same, return 0
        If $s1chunk = "" And $s2chunk = "" Then Return 0

        ; remove chunks from strings
        $s1 = StringMid($s1, StringLen($s1chunk) + 1)
        $s2 = StringMid($s2, StringLen($s2chunk) + 1)

        Select
            ; Case 1: both chunks contain letters
            Case (Not StringIsDigit($s1chunk)) And (Not StringIsDigit($s2chunk))
                $n = StringCompare($s1chunk, $s2chunk, $iCase)
            ; Case 2: both chunks contain numbers
            Case StringIsDigit($s1chunk) And StringIsDigit($s2chunk)
                ; strip leading 0's
                $s1temp = $s1chunk
                $s2temp = $s2chunk
                $s1chunk = StringRegExpReplace($s1chunk, "^0*", "")
                $s2chunk = StringRegExpReplace($s2chunk, "^0*", "")
                ; record number of stripped 0's
                $s1temp = StringLen($s1temp) - StringLen($s1chunk)
                $s2temp = StringLen($s2temp) - StringLen($s2chunk)
                ; first check if one string is longer than the other, meaning a bigger number
                If StringLen($s1chunk) > StringLen($s2chunk) Then
                    Return 1
                ElseIf StringLen($s1chunk) < StringLen($s2chunk) Then
                    Return -1
                EndIf
                ; strings are equal length
                ; compare 8 digits at a time, starting from the left, to avoid overflow
                $idx = 1
                While 1
                    $i1chunk = Int(StringMid($s1chunk, $idx, 8))
                    $i2chunk = Int(StringMid($s2chunk, $idx, 8))
                    ; check for end of string
                    If $i1chunk = "" And $i2chunk = "" Then
                        ; check number of leading 0's removed, if any - windows sorts more leading 0's above fewer leading 0's, ie 00001 < 0001 < 001
                        If $s1temp > $s2temp Then
                            Return -1
                        ElseIf $s1temp < $s2temp Then
                            Return 1
                        Else
                            ; numbers are equal
                            ExitLoop
                        EndIf
                    EndIf
                    ; valid numbers, so compare
                    If $i1chunk > $i2chunk Then
                        Return 1
                    ElseIf $i1chunk < $i2chunk Then
                        Return -1
                    EndIf
                    ; chunks are equal, get next chunk of digits
                    $idx += 8
                WEnd
            ; Case 3: one chunk has letters, the other has numbers; or one is empty
            Case Else
                ; if we get here, this should be the last and deciding test, so return the result
                Return StringCompare($s1chunk, $s2chunk, $iCase)
        EndSelect
    WEnd

    Return $n
EndFunc

Example:

#include "_NaturalCompare.au3"

ConsoleWrite("StringCompare:" & @CRLF)
ConsoleWrite(StringCompare("abC10", "abc2") & @CRLF)
ConsoleWrite(StringCompare("abC10", "abc2", 1) & @CRLF)
ConsoleWrite("_NaturalCompare:" & @CRLF)
ConsoleWrite(_NaturalCompare("abC10", "abc2") & @CRLF)
ConsoleWrite(_NaturalCompare("abC10", "abc2", 1) & @CRLF)

Here's an implementation of array sorting that uses a custom sorting function:

_ArrayNaturalSort

#include-once
#include <_NaturalCompare.au3>
#include <_ArrayCustomSort.au3>

; #FUNCTION# ====================================================================================================================
; Name...........: _ArrayNaturalSort
; Description ...: Sort a 1D or 2D array on a specific index using the quicksort/insertionsort algorithms.
; Syntax.........: _ArrayNaturalSort(ByRef $avArray[, $iDescending = 0[, $iStart = 0[, $iEnd = 0[, $iSubItem = 0]]]])
; Parameters ....: $avArray     - Array to sort
;                  $iDescending - [optional] If set to 1, sort descendingly
;                  $iStart      - [optional] Index of array to start sorting at
;                  $iEnd        - [optional] Index of array to stop sorting at
;                  $iSubItem    - [optional] Sub-index to sort on in 2D arrays
; Return values .: Success - 1
;                  Failure - 0, sets @error:
;                  |1 - $avArray is not an array
;                  |2 - $iStart is greater than $iEnd
;                  |3 - $iSubItem is greater than subitem count
;                  |4 - $avArray has too many dimensions
;                  |5 - Invalid sort function
; Author ........: Erik Pilsits
; Modified.......:
; Remarks .......:
; Related .......:
; Link ..........;
; Example .......; No
; ===============================================================================================================================
Func _ArrayNaturalSort(ByRef $avArray, $iDescending = 0, $iStart = 0, $iEnd = 0, $iSubItem = 0)
    Return _ArrayCustomSort($avArray, "_NaturalCompare", $iDescending, $iStart, $iEnd, $iSubItem)
EndFunc   ;==>_ArrayNaturalSort

And the custom sorting function:

_ArrayCustomSort

#include-once
#include <Array.au3>

; #FUNCTION# ====================================================================================================================
; Name ..........: _ArrayCustomSort
; Description ...: Sort a 1D or 2D array on a specific index using the quicksort/insertionsort algorithms, based on a custom sorting function.
; Syntax ........: _ArrayCustomSort(Byref $avArray, $sSortFunc[, $iDescending = 0[, $iStart = 0[, $iEnd = 0[, $iSubItem = 0]]]])
; Parameters ....: $avArray             - [in/out] Array to sort
;                  $sSortFunc           - Name of custom sorting function. See Remarks for usage.
;                  $iDescending         - [optional] If set to 1, sort descendingly
;                  $iStart              - [optional] Index of array to start sorting at
;                  $iEnd                - [optional] Index of array to stop sorting at
;                  $iSubItem            - [optional] Sub-index to sort on in 2D arrays
; Return values .: Success - 1
;                  Failure - 0, sets @error:
;                  |1 - $avArray is not an array
;                  |2 - $iStart is greater than $iEnd
;                  |3 - $iSubItem is greater than subitem count
;                  |4 - $avArray has too many dimensions
;                  |5 - Invalid sort function
; Author ........: Erik Pilsits
; Modified ......: Erik Pilsits - removed IsNumber testing, LazyCoder - added $iSubItem option, Tylo - implemented stable QuickSort algo, Jos van der Zande - changed logic to correctly Sort arrays with mixed Values and Strings, Ultima - major optimization, code cleanup, removed $i_Dim parameter
; Remarks .......: Sorting function is called with two array elements as arguments. The function should return
;                  0 if they are equal,
;                  -1 if element one comes before element two,
;                  1 if element one comes after element two.
; Related .......:
; Link ..........:
; Example .......: No
; ===============================================================================================================================
Func _ArrayCustomSort(ByRef $avArray, $sSortFunc, $iDescending = 0, $iStart = 0, $iEnd = 0, $iSubItem = 0)
    If Not IsArray($avArray) Then Return SetError(1, 0, 0)
    If Not IsString($sSortFunc) Then Return SetError(5, 0, 0)

    Local $iUBound = UBound($avArray) - 1

    ; Bounds checking
    If $iEnd < 1 Or $iEnd > $iUBound Then $iEnd = $iUBound
    If $iStart < 0 Then $iStart = 0
    If $iStart > $iEnd Then Return SetError(2, 0, 0)

    ; Sort
    Switch UBound($avArray, 0)
        Case 1
            __ArrayCustomQuickSort1D($avArray, $sSortFunc, $iStart, $iEnd)
            If $iDescending Then _ArrayReverse($avArray, $iStart, $iEnd)
        Case 2
            Local $iSubMax = UBound($avArray, 2) - 1
            If $iSubItem > $iSubMax Then Return SetError(3, 0, 0)

            If $iDescending Then
                $iDescending = -1
            Else
                $iDescending = 1
            EndIf

            __ArrayCustomQuickSort2D($avArray, $sSortFunc, $iDescending, $iStart, $iEnd, $iSubItem, $iSubMax)
        Case Else
            Return SetError(4, 0, 0)
    EndSwitch

    Return 1
EndFunc   ;==>_ArrayCustomSort

; #INTERNAL_USE_ONLY#============================================================================================================
; Name...........: __ArrayCustomQuickSort1D
; Description ...: Helper function for sorting 1D arrays
; Syntax.........: __ArrayCustomQuickSort1D(ByRef $avArray, ByRef $sSortFunc, ByRef $iStart, ByRef $iEnd)
; Parameters ....: $avArray   - Array to sort
;                  $sSortFunc - Name of sorting function.
;                  $iStart    - Index of array to start sorting at
;                  $iEnd      - Index of array to stop sorting at
; Return values .: None
; Author ........: Jos van der Zande, LazyCoder, Tylo, Ultima
; Modified.......: Erik Pilsits - removed IsNumber testing
; Remarks .......: For Internal Use Only
; Related .......:
; Link ..........;
; Example .......;
; ===============================================================================================================================
Func __ArrayCustomQuickSort1D(ByRef $avArray, ByRef $sSortFunc, ByRef $iStart, ByRef $iEnd)
    If $iEnd <= $iStart Then Return

    Local $vTmp

    ; InsertionSort (faster for smaller segments)
    If ($iEnd - $iStart) < 15 Then
        Local $i, $j
        For $i = $iStart + 1 To $iEnd
            $vTmp = $avArray[$i]
            For $j = $i - 1 To $iStart Step -1
                If (Call($sSortFunc, $vTmp, $avArray[$j]) >= 0) Then ExitLoop
                $avArray[$j + 1] = $avArray[$j]
            Next
            $avArray[$j + 1] = $vTmp
        Next
        Return
    EndIf

    ; QuickSort
    Local $L = $iStart, $R = $iEnd, $vPivot = $avArray[Int(($iStart + $iEnd) / 2)]
    Do
        While (Call($sSortFunc, $avArray[$L], $vPivot) < 0)
            $L += 1
        WEnd
        While (Call($sSortFunc, $avArray[$R], $vPivot) > 0)
            $R -= 1
        WEnd

        ; Swap
        If $L <= $R Then
            $vTmp = $avArray[$L]
            $avArray[$L] = $avArray[$R]
            $avArray[$R] = $vTmp
            $L += 1
            $R -= 1
        EndIf
    Until $L > $R

    __ArrayCustomQuickSort1D($avArray, $sSortFunc, $iStart, $R)
    __ArrayCustomQuickSort1D($avArray, $sSortFunc, $L, $iEnd)
EndFunc   ;==>__ArrayCustomQuickSort1D

; #INTERNAL_USE_ONLY#============================================================================================================
; Name...........: __ArrayCustomQuickSort2D
; Description ...: Helper function for sorting 2D arrays
; Syntax.........: __ArrayCustomQuickSort2D(ByRef $avArray, ByRef $sSortFunc, ByRef $iStep, ByRef $iStart, ByRef $iEnd, ByRef $iSubItem, ByRef $iSubMax)
; Parameters ....: $avArray  - Array to sort
;                  $iStep    - Step size (should be 1 to sort ascending, -1 to sort descending!)
;                  $iStart   - Index of array to start sorting at
;                  $iEnd     - Index of array to stop sorting at
;                  $iSubItem - Sub-index to sort on in 2D arrays
;                  $iSubMax  - Maximum sub-index that array has
; Return values .: None
; Author ........: Jos van der Zande, LazyCoder, Tylo, Ultima
; Modified.......: Erik Pilsits - removed IsNumber testing
; Remarks .......: For Internal Use Only
; Related .......:
; Link ..........;
; Example .......;
; ===============================================================================================================================
Func __ArrayCustomQuickSort2D(ByRef $avArray, ByRef $sSortFunc, ByRef $iStep, ByRef $iStart, ByRef $iEnd, ByRef $iSubItem, ByRef $iSubMax)
    If $iEnd <= $iStart Then Return

    ; QuickSort
    Local $i, $vTmp, $L = $iStart, $R = $iEnd, $vPivot = $avArray[Int(($iStart + $iEnd) / 2)][$iSubItem]
    Do
        While ($iStep * Call($sSortFunc, $avArray[$L][$iSubItem], $vPivot) < 0)
            $L += 1
        WEnd
        While ($iStep * Call($sSortFunc, $avArray[$R][$iSubItem], $vPivot) > 0)
            $R -= 1
        WEnd

        ; Swap
        If $L <= $R Then
            For $i = 0 To $iSubMax
                $vTmp = $avArray[$L][$i]
                $avArray[$L][$i] = $avArray[$R][$i]
                $avArray[$R][$i] = $vTmp
            Next
            $L += 1
            $R -= 1
        EndIf
    Until $L > $R

    __ArrayCustomQuickSort2D($avArray, $sSortFunc, $iStep, $iStart, $R, $iSubItem, $iSubMax)
    __ArrayCustomQuickSort2D($avArray, $sSortFunc, $iStep, $L, $iEnd, $iSubItem, $iSubMax)
EndFunc   ;==>__ArrayCustomQuickSort2D

Example:

#include "_ArrayNaturalSort.au3"

Global $a[1], $array[10] = ["image1.jpg", "image2.jpg", "image3.jpg", "image10.jpg", "image11.jpg", _
                            "image12.jpg", "image20.jpg", "image21.jpg", "image22.jpg", "image23.jpg"]
                    
$a = $array
_ArraySort($a)
_ArrayDisplay($a, "_ArraySort")

$a = $array
_ArrayNaturalSort($a)
_ArrayDisplay($a, "_ArrayNaturalSort")
Edited by wraithdu
Link to comment
Share on other sites

It was posted over a year ago. The author mentioned a followup, but never did -

http://www.autoitscript.com/forum/index.php?showtopic=45755

He got his algorithm from one of the ones listed on this page -

http://www.codinghorror.com/blog/archives/001018.html

He also mentioned a rewrite to the _ArraySort() function that would take a user-defined comparison algorithm, much like C's qsort. I think this would be a cool idea. I'm not sure if the AutoIt devs would go for it, so I just modified the necessary _ArraySort***() funcs for my example.

Edited by wraithdu
Link to comment
Share on other sites

The plugin works MUCH faster. I implemented the original alphanum algorithm in a plugin. It sorts the same 5000 element array in 1.5 seconds. That's very acceptable :mellow: To test it, use the modified _ArrayNaturalSort function from above, but comment out the '#include "_NaturalCompare.au3"' line.

In your test script include the line '#AutoIt3Wrapper_PlugIn_Funcs=_NaturalCompare', and don't forget

$hDLL = PluginOpen("natcomp.dll")

...

PluginClose($hDLL)

This plugin uses MFC, so if you don't have it installed, you'll need the VC++ 2008 SP1 Redistributable.

natcomp_plugin.zip

Link to comment
Share on other sites

  • 1 year later...

Very nice!

It is possible to speed up the algorithm?

#include "_ArrayNaturalSort.au3"

$aMax=7000
Dim $aRecords[$aMax]
For $i=0 To $aMax-1
    $aRecords[$i]= Chr(Random(65,90,1)) & '_' & Random(1,$aMax,1)
Next

$begin = TimerInit()
_ArraySort($aRecords,0,0)
$dif = TimerDiff($begin)
$dif = Round($dif/1000,2) & ' sec.'
_ArrayDisplay($aRecords, "_ArraySort " & $dif)

$begin = TimerInit()
_ArrayNaturalSort($aRecords,0,0)
$dif = TimerDiff($begin)
$dif = Round($dif/1000,2) & ' sec.'
_ArrayDisplay($aRecords, "_ArrayNaturalSort " & $dif)
Link to comment
Share on other sites

  • 1 year later...

Hi, wraithdu. This is my machine code version. C Source from Martin Pool.

#Include-once
#Include <Memory.au3>

Global $_NaturalCompare_CodeBufferPtr, $_NaturalCompare_CodeBuffer

Func NaturalCompare_Shutdown()
    _MemVirtualFree($_NaturalCompare_CodeBufferPtr, 0, $MEM_RELEASE)
    $_NaturalCompare_CodeBufferPtr = 0
EndFunc

Func NaturalCompare_Startup()
    If Not $_NaturalCompare_CodeBuffer Then
        Local $Code
        If @AutoItX64 Then
            $Code = Binary("0x41574531D24531C9415641554154555756534883EC184963C14989D3668B2C414963C2668B1C42418D41014898488D0441400FB6FD4188EC83FF207F198D77F783FE04760583FF20750C668B2841FFC14883C002EBDB418D42014C89DA66892C244898498D04430FB6F34188DB83FE207F1B448D6EF74183FD04760583FE20750C668B1841FFC24883C002EBDA450FB6E44183EC304183FC090F8709010000450FB6DB4183EB304183FB090F87F70000006683FB3074066683FD3075634963C24C8D24424963C14C8D1C4166458B2B410FB6C583E83083F80966418B042476140FB6C083E83083F8090F860A010000E9B4000000440FB6F04183EE304183FE090F87F8000000664139C50F82E90000000F87E80000004983C3024983C402EBAB4963C24C8D2C424963C14C8D244131C066458B3424450FB6DE4183EB304183FB0966458B5D007614450FB6DB4183EB304183FB090F869F000000EB48450FB6FB4183EF304183FF090F8790000000664539DE730A85C041BBFFFFFFFFEB0A760E85C041BB01000000410F44C3EB0C664585F67506664585DB740A4983C4024983C502EB8C85C075406685DB75056685ED74344585C0751C8D5F9F8D47E083FB198D5EE00F46F88D469F66893C2483F8190F47DE66391C247220772341FFC141FFC2E930FEFFFF31C04883C4185B5E5F5D415C415D415E415FC383C8FFEBEAB801000000EBE3")
        Else
            $Code = Binary("0x5531C989E531D257565383EC1C8B5D0C8B4508668B1C4B668B045066895DEA8B5D08668945E28D4453028A5DE20FB6F383FE20885DF07F1A8D5EF783FB04760583FE20750D668B184283C00266895DE2EBD8668B45E28B5D0C8955E4668945DE8D444B020FB67DEA89FA0FB6DA83FB207F1A8D53F783FA04760583FB20750D668B184183C00266895DEAEBD80FB645F08B55E483E83083F8090F872301000081E7FF00000083EF3083FF090F871101000066837DEA30740766837DE230756D8B450C8D3C488B45088D04508945F08B45F0668B00668945E40FB645E483E83083F809668B07668945EC76150FB645EC83E83083F8090F8618010000E9C20000000FB645EC83E83083F8090F87080100008B45EC663945E40F82F60000000F87F50000008345F00283C702EBA28B450C8955D88D04488945EC8B45088D04508945E431C08B55E4668B120FB6FA668955E08B55EC83EF3083FF09668B12668955F076150FB67DF08B55D883EF3083FF090F869E000000EB470FB67DF083EF3083FF090F87910000008B55F0663955E0730985C0751D83C8FFEB18760885C07512B001EB0E66837DE000750766837DF000740A8345E4028345EC02EB888B55D885C0754766837DEA00750766837DE2007437837D1000751E8D469F83F819770383EE208D439F83F819668975DE770383EB2066895DEA668B5DEA66395DDE721577184241E906FEFFFF31C083C41C5B5E5F5DC2100083C8FFEBF1B801000000EBEA")
        EndIf

        Local $CodeLen = BinaryLen($Code)
        $_NaturalCompare_CodeBufferPtr = _MemVirtualAlloc(0, $CodeLen, $MEM_COMMIT, $PAGE_EXECUTE_READWRITE)
        $_NaturalCompare_CodeBuffer = DllStructCreate("byte[" & $CodeLen & "]", $_NaturalCompare_CodeBufferPtr)
        DllStructSetData($_NaturalCompare_CodeBuffer, 1, $Code)

        OnAutoItExitRegister("NaturalCompare_Shutdown")
    EndIf
EndFunc

Func NaturalCompareFast($String1, $String2, $Case = 0)
    If Not $_NaturalCompare_CodeBufferPtr Then NaturalCompare_Startup()
    Local $Ret = DllCall("user32.dll", "int", "CallWindowProc", "ptr", $_NaturalCompare_CodeBufferPtr, _
                                                    "wstr", $String1, _
                                                    "wstr", $String2, _
                                                    "int", $Case, _
                                                    "int", 0)
    Return $Ret[0]
EndFunc

Speed test:

ConsoleWrite("_NaturalCompare:" & @CRLF)
$Timer = TimerInit()
For $i = 1 To 10000
    _NaturalCompare("abC10", "abc2")
    _NaturalCompare("abC10", "abc2", 1)
Next
ConsoleWrite(TimerDiff($Timer) & @CRLF)

ConsoleWrite("NaturalCompareFast:" & @CRLF)
$Timer = TimerInit()
For $i = 1 To 10000
    NaturalCompareFast("abC10", "abc2")
    NaturalCompareFast("abC10", "abc2", 1)
Next
ConsoleWrite(TimerDiff($Timer) & @CRLF)

Result on my computre:

_NaturalCompare:
1288.4765591166
NaturalCompareFast:
472.212972604665

I hope this may help you.

新版 _ArrayAdd 的白痴作者,不管是誰,去死一死好了

 

Link to comment
Share on other sites

Heh, yeah, it won't win any speed comparisons for sure, I know it's slow. I wrote a plugin version as well above (haven't looked at or updated that in a while though). More to the point was the custom array sorting function, which is what I really wanted for my next project. I just updated this algorithm cause "it was there".

Link to comment
Share on other sites

Heh, yeah, it won't win any speed comparisons for sure, I know it's slow. I wrote a plugin version as well above (haven't looked at or updated that in a while though). More to the point was the custom array sorting function, which is what I really wanted for my next project. I just updated this algorithm cause "it was there".

Of course, for education or demonstrating algorithm, your code is best.

I just provide a simple binary code from C source if you really want use this function in program.

新版 _ArrayAdd 的白痴作者,不管是誰,去死一死好了

 

Link to comment
Share on other sites

  • 8 months later...
  • 3 years later...
  • 8 months later...

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...