Jump to content

Definition of 'Sorted'?


Koder
 Share

Recommended Posts

I've noticed, what could be a problem, with the overloaded operators when doing string comparisons. = and == are different in that one is case sensitive and the other is not, however there is no similar case sensitive operators for < and >....

This problem shows up when sorting and searching arrays. An array with similar multi-case strings will be sorted differently depending on the original array. I discovered this with a multi-dimensional array that stores font info for individual characters, that needs to be binary searched often. In this array, case is important, but the UDF _Array functions will not work. Even if the array is sorted manually, _ArrayBinarySearch() will not always return the correct value.

After researching the issue I've noticed that many of the _Array search and sort functions are algorithms taken directly from example functions in C (strongly typed). These example functions are usually for integers only, since C doesn't (normally) do alphabetic string comparisons.

So that leads me to the whole issue of sorting. I have always sorted by ASCII values which is vastly different from how AutoIt sorts. But AutoIt does not sort consistently ("A" and "a" sort different depending on when the sort algorithm gets to it, not based on value).

So which is correct? ASCII sorting or AutoIt sorting? If AutoIt sorting is correct then AutoIt has either flawed operators or flawed use of those operators.

I have found a way to use the existing operators to detect and properly sort (but not ASCII sort) an array. When sorting case-insensitive values, it must still be decided which value is greater when case is the only difference, otherwise the same data can be returned in an undefined sort.

if $rc = $value Then ; A match (but maybe not a case sens match)

if $rc == $value Then return $Mid ; a True case-sensitive match

;else not a case-sensitive match

; There is still a problem!

endif

In the above example the if the case does not match, where should the search resume high or low?

In AutoIt this is undefined, so the sort and search functions must be coordinated in order to work properly, this could break other code that happens to do it differently.

I had to create a bunch of ASCII sort and search UDF's that use quick and shell sorts. These work great but have an extra string compare UDF. I could post them once I figure out how and if anyone cares ;)

Edited by Koder
Link to comment
Share on other sites

I had this same issue a while back. I ended up writing my own comparison UDF:

;This function compares given variants $vA and $vB
;Returns 1 if $vA > $vB
;Returns 0 if $vA = $vB
;Returns -1 if $vA < $vB
Func _ValueCompare($vA, $vB, $CaseSense = 0)
 ;compare numbers
  If IsNumber($vA) And IsNumber($vB) Then
    If $vA > $vB Then Return 1
    If $vA = $vB Then Return 0
    Return -1
  EndIf
 ;convert to strings
  If Not IsString($vA) Then $vA = String($vA)
  If Not IsString($vB) Then $vB = String($vB)
 ;compare strings
  If $CaseSense Then;Case sensative comparison
    Local $i=0
    While $i <= Stringlen($vA) and $i <= StringLen($vB)
      If Asc(StringMid($vA, $i, 1)) > Asc(StringMid($vB, $i, 1)) Then Return 1
      If Asc(StringMid($vA, $i, 1)) < Asc(StringMid($vB, $i, 1)) Then Return -1
      $i = $i + 1
    WEnd
    If StringLen($vA) > StringLen($vB) Then Return 1
    If StringLen($vA) = StringLen($vB) Then Return 0
    Return -1
  Else;case insensative comparison
    If $vA > $vB Then Return 1
    If $vA = $vB Then Return 0
    Return -1
  EndIf
EndFunc

Probably very slow compared to a native function...

Link to comment
Share on other sites

I have a very similar UDF that I used in Winrunner. Case in-sensitive is irrelevant if used to sort, so my UDF is either ASCII sensitive or not. In effect it sorts (helps sort) just like AutoIt only that upper case comes 1st, consistantly.

Func alphanum_compare ( $item1, $item2, $NotASCII = 0)
    local $rc = 0, $count = StringLen($item1), $i = StringLen($item2)
    if $count < $i then $count = $i
    $i = 0
    if isnumber($item1) AND isnumber($item2) then
        $rc = $item1 - $item2
    else
        while($rc == 0 AND $i < $count)
            $i = $i + 1
            $rc = asc(StringMid($item1,$i,1)) - asc(StringMid($item2,$i,1))
        wend
        if $NotASCII AND $rc <> 0 then;don't bother to compare exact duplicates
            $NotASCII = $rc; save the last $rc value
            $item1 = StringLower($item1)
            $item2 = StringLower($item2)
            $rc= 0
            $i = 0
            while($rc == 0 AND $i < $count)
                $i = $i + 1
                $rc = asc(StringMid($item1,$i,1)) - asc(StringMid($item2,$i,1))
            wend
            if $rc == 0 then
                $rc = $NotASCII;The difference must be due to case
            endif
        endif
    endif
    return $rc
EndFunc

No pretty return codes, just positive, negative or zero. I tacked on the NotASCII part just today to make it compatible with the built in stuff.

So it's not just me that's noticed this problem. This is a defect, but is it the AutoIt operators or the Array.au3 UDF's?

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...