Jump to content
c.haslam

_ArraySort lacks ability to compare case-sensitively

Recommended Posts

c.haslam

Looking at the code for __ArrayQuickSort1D() and __ArrayQuickSort2D(), I see, for example, StringCompare($vTmp, $vCur) : there is no casesense parameter, so comparison of string elements is always case-insensitive.

I am thinking that _ArraySort() might have one more optional parameter: casesense. ArrayQuickSort1D() and __ArrayQuickSort2D() would also need this parameter.

This change would, I think, take minimal effort, and would not break scripts.

Does this make sense? Should there be a feature request in Trac ?

I haven't figured out how __ArrayDualPivotSort() does comparisons.

  • Like 1

Spoiler

CDebug Dumps values of variables including arrays and DLL structs, to a GUI, to the Console, and to the Clipboard

 

Share this post


Link to post
Share on other sites
Jos
3 minutes ago, c.haslam said:

I am thinking that _ArraySort() might have one more optional parameter: casesense. ArrayQuickSort1D() and __ArrayQuickSort2D() would also need this parameter.

This change would, I think, take minimal effort, and would not break scripts.

Does this make sense? Should there be a feature request in Trac ?

Sure, just submit your updated UDF's in a Trac  tiicket as a feature/update proposal.

Jos

  • Like 1

SciTE4AutoIt3 Full installer Download page   - Beta files       Read before posting     How to post scriptsource        Forum Rules
 
Live for the present,
Dream of the future,
Learn from the past.
  :)

Share this post


Link to post
Share on other sites
iamtheky

There's also lots of meaning of "sort by casesense".  Here's two of em:

All lower, then all upper

#include<array.au3>

$str = "BabACc"

$aStr = StringToASCIIArray($str)

;~ _ArrayDisplay($aStr)
_ArraySort($aStr)
;~ _ArrayDisplay($aStr)

For $i = 0 to UBound($aStr) - 1
    $aStr[$i] = ChrW($aStr[$i])
Next

_ArrayDisplay($aStr)

 

lower,upper,lower,upper,lower,upper....

#include<array.au3>

local $myArray[8] = ["CC" , "BB", "aa", "dd" , "AA", "bb", "DD" , "cc"]

_ArraySort($myArray)
;~ _ArrayDisplay($myArray)

For $i = 1 to ubound($myArray) - 1
    If Asc(stringleft($myArray[$i] , 1)) - Asc(stringleft($myArray[$i - 1] , 1)) = 32  Then _ArraySwap($myArray , $i , $i - 1)
Next

_ArrayDisplay($myArray)

 


,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Share this post


Link to post
Share on other sites
c.haslam

Looking further at the code, it appears to allow comparing strings with numbers. This can make results unpredictable.

I suggest that @error be set if _ArraySort() is called to sort:

  • a 1D array whose elements are a mixture of strings and numbers.
  • a column of a 2D array containing a mixture of strings and numbers.

Does this make sense?


Spoiler

CDebug Dumps values of variables including arrays and DLL structs, to a GUI, to the Console, and to the Clipboard

 

Share this post


Link to post
Share on other sites
czardas

You can take a look at my _ArraySortXD() in the topic ArrayWorkshop. This gives you more control over numeric sorting. The standard _ArraySort fails with the numeric values 1/0 and -1^.5, but nobody has complained about this.

Share this post


Link to post
Share on other sites
c.haslam

iamtheky,

It would be possible to make the extra parameter either an integer (per StringCompare's casesense parameter) or the name of a function. _ArraySort() (and the functions it calls) would call VarGetType() to determine whether string comparison is to be done by StringCompare() or a user-supplied function:

If VarGetType($param) Then
    $func = StringCompare
    $casesense = $param
Else
    $func = $param
Endif

I see this as probably being beyond my current scope. Perhaps we need Jos' thoughts on this.

Edited by c.haslam

Spoiler

CDebug Dumps values of variables including arrays and DLL structs, to a GUI, to the Console, and to the Clipboard

 

Share this post


Link to post
Share on other sites
czardas

This is already implemented, but probably doesn't do what you expect with numeric strings.

Edit: I was refering to StringCompare(), but after reading your post again, I see you were talking about adding a function parameter.

Edited by czardas

Share this post


Link to post
Share on other sites
czardas

A quick test for case sensitive sorting on Ascii alpha characters only. It might give you some ideas. This method is not suitable for alphanumeric strings, symbols or unicode. A more complex approach will be needed for unicode and (going by first impressions) this appears to be quite a challenge.
 

#include <Array.au3>

Local $aArray = ['Hello world','hello wo','heLlO wOrLd','HELLO WORLD','HelLo worlD','Z']

_ArrayDisplay($aArray)

For $i = 0 To UBound($aArray) -1
    For $j = 65 To 90
        $aArray[$i] = StringReplace($aArray[$i], Chr($j), $j & Chr($j), 0, 1)
        $aArray[$i] = StringReplace($aArray[$i], Chr($j +32), $j -55 & Chr($j +32), 0, 1)
    Next
Next
_ArrayDisplay($aArray)

_ArraySort($aArray)

For $i = 0 To UBound($aArray) -1
    For $j = 65 To 90
        $aArray[$i] = StringReplace($aArray[$i], $j & Chr($j), Chr($j), 0, 1)
        $aArray[$i] = StringReplace($aArray[$i], $j -55 & Chr($j +32), Chr($j +32), 0, 1)
    Next
Next

_ArrayDisplay($aArray)

Thought there was a bug. I put lower case before upper case to keep the maths simple. Individual replacements must contain the same number of chars for this to work. You could also add 100 to both upper and lower case ascii values to get upper case characters appear before lower case.

Edited by czardas

Share this post


Link to post
Share on other sites
czardas

Actually, this is a better version:

#include <Array.au3>

Local $aArray = ['Hello world','hello wo','heLlO wOrLd','HELLO WORLD','HelLo worlD','A']

_ArrayDisplay($aArray)

For $i = 0 To UBound($aArray) -1
    For $j = 65 To 90
        $aArray[$i] = StringReplace($aArray[$i], Chr($j), Chr($j) & $j, 0, 1)
        $aArray[$i] = StringReplace($aArray[$i], Chr($j +32), Chr($j +32) & $j -55, 0, 1)
    Next
Next
; _ArrayDisplay($aArray)

_ArraySort($aArray)

For $i = 0 To UBound($aArray) -1
    For $j = 65 To 90
        $aArray[$i] = StringReplace($aArray[$i], Chr($j) & $j, Chr($j), 0, 1)
        $aArray[$i] = StringReplace($aArray[$i], Chr($j +32) & $j -55, Chr($j +32), 0, 1)
    Next
Next

_ArrayDisplay($aArray)

Alternatively you could dig into the code for _ArraySort() and add case sensitivity to StringCompare(). That would be more efficient than making replacements. I'll let someone else try that. I see you suggested this already. :)

Edited by czardas

Share this post


Link to post
Share on other sites
jdelaney

Good stuff...slight modification:

#include <Array.au3>

Local $aArray = ['Hello world','hello wo','heLlO wOrLd','HELLO WORLD','HelLo worlD','A']

_ArrayColInsert($aArray,1)

For $i = 0 To UBound($aArray) -1
    $aArray[$i][1] = $aArray[$i][0]
    For $j = 65 To 90
        $aArray[$i][1] = StringReplace($aArray[$i][1], Chr($j), Chr($j) & $j, 0, 1)
        $aArray[$i][1] = StringReplace($aArray[$i][1], Chr($j +32), Chr($j +32) & $j -55, 0, 1)
    Next
Next

_ArraySort($aArray,0,0,0,1)
_ArrayColDelete($aArray,1)

_ArrayDisplay($aArray)

 

  • Like 1

IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.

Share this post


Link to post
Share on other sites
c.haslam

Jos,

I can easily add a casesens flag to every call to StringCompare(), but I think that there is a problem.

A snippet from __ArrayQuickSort1D:

If IsNumber($vTmp) Then
                For $j = $i - 1 To $iStart Step -1
                    $vCur = $aArray[$j]
                    ; If $vTmp >= $vCur Then ExitLoop
                    If ($vTmp >= $vCur And IsNumber($vCur)) Or (Not IsNumber($vCur) And StringCompare($vTmp, $vCur) >= 0) Then ExitLoop

I think that the following is equivalent:

If IsNumber($vTmp) Then
    For $j = $i - 1 To $iStart Step -1
        $vCur = $aArray[$j]
        If IsNumber($vCur) Then
          If $vTmp >= $vCur Then
                ExitLoop
          EndIf
        Else
            If StringCompare($vTmp,$vCur)>= 0 Then
                ExitLoop
            EndIf
        EndIf
  EndIf

To StringCompare(), $vTmp is numeric but $vCur can be numeric or string. So the comparison can be of a numeric with a string. This may give the user the wrong sort sequence!

I quote the Help: "Most strings will be evaluated as 0 and so the result may well not be the one expected. It is recommended to force the items being compared into the same datatype using Number/String before the comparison."

The equivalent of the last line of the snippet occurs several times.

Further, in __ArrayPivotSort(), all comparisons between elements use comparison operators. StringCompare() is not called in this function.

I am willing to replace these operators with calls to this function

Func __ArrayCompareElements($v1, $v2, $fCaseSense)
    If IsNumber($v1) Then
        If $v1=$v2 Then
            Return 0
        ElseIf $v1>$v2 Then
            Return 1
        Else
            Return -1
        EndIf
    Else
        Return StringCompare($v1, $v2, $fCaseSense)
    EndIf
EndFunc

However, I am concerned about script-breaking. In theory anyway, it is possible that fixing these problems may break some (luck!) scripts.

 

 

Edited by c.haslam

Spoiler

CDebug Dumps values of variables including arrays and DLL structs, to a GUI, to the Console, and to the Clipboard

 

Share this post


Link to post
Share on other sites
czardas

I think people use pivot sort for speed. Also, there may be a tag sort option added in the next beta (speed optimization for 2D). If so, then that is the code you would need to modify. I personally think that case sense is rarely required: this being the only thread I've seen on the subject - not that I don't like it.

Here are the proposed changes:

 

Edited by czardas

Share this post


Link to post
Share on other sites
Jos
15 hours ago, Jos said:

This change would, I think, take minimal effort, and would not break scripts.

@c.haslam, These are your words, not mine. ;)

Jos 


SciTE4AutoIt3 Full installer Download page   - Beta files       Read before posting     How to post scriptsource        Forum Rules
 
Live for the present,
Dream of the future,
Learn from the past.
  :)

Share this post


Link to post
Share on other sites
c.haslam

Because a new version of _ArraySort() is being proposed, I will not be submitting a feature request to Trac.

Over the years, I have had the need for case-sensitivity several times. This is why I started this thread. I have made a note to myself to remind me that _ArraySort() is not case-sensitive.

BTW AutoIt itself now has a feature where something is case-sensitive: map keys.

Thank you for giving me the wider picture.


Spoiler

CDebug Dumps values of variables including arrays and DLL structs, to a GUI, to the Console, and to the Clipboard

 

Share this post


Link to post
Share on other sites
czardas

@c.haslam - I think it's a good suggestion. When time permits, I'll have a play around with the idea.

Share this post


Link to post
Share on other sites
jchd

@c.haslam

You seem to be looking for natural sort order. Searching the forum will surely reveal code.

  • Like 1

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
c.haslam

jchd,

Thank you for telling me the phrase to look for.

Actually, all I need is $STR_CASESENSE, but other users might want natural sort order.

I have copied __ArrayQuickSort1D() into my cDebug project, adding the parameter $STR_CASESENSE to StringCompare() calls, and giving my version of __ArrayQuickSort1D() a different name. There it is used to sort the keys of a map, once for numeric keys and again for string keys.


Spoiler

CDebug Dumps values of variables including arrays and DLL structs, to a GUI, to the Console, and to the Clipboard

 

Share this post


Link to post
Share on other sites
czardas

For the record, the method I suggested for ascii alpha only requires zero or one to be appended. This was my first thought but for some reason I abandoned it.

Share this post


Link to post
Share on other sites
c.haslam

Hmm!


Spoiler

CDebug Dumps values of variables including arrays and DLL structs, to a GUI, to the Console, and to the Clipboard

 

Share this post


Link to post
Share on other sites
czardas

Taking wrong turns is part and parcel of the experimental approach, without which learning and progress are less likely.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×