Jump to content

Array statistics like mode


eignxing
 Share

Recommended Posts

Hello, does anyone know of a way to find the mode (most frequent number or string) stored in an array. I've looked all over the forum and in the help file but couldn't find anything. I'd like to find the most frequent number followed by the second most frequent and so on. Can anyone help?

For instance if I have 1, 1, 1, 1, 2, 2, 3 the most frequent is 1 then 2 and finally 3.

Link to comment
Share on other sites

Here are some array functions I came up with in case anyone else searches for them.

_ArrayMode finds the mode of the array (if there is more than one mode, it will only report one :) )

_ArrayElements finds the number of unique elements in the array. You must include Array.au3 to use them.

I'm not sure how efficient or failsafe these are, but they seem to work for me.

;===============================================================================
;
; FunctionName:     _ArrayElements()
; Description:      Returns the number of unique elements from a 1D or 2D array
; Syntax:           _ArrayElements( $aArray, $dimension1, $dimension2, $progbar )
; Parameter(s):     $aArray              - The array to return unique elements from
;                       $dimension1          - The size of the first dimension
;                       $dimension2           - The size of the second dimension, 
                                 ;Set to -1 for no 2nd dimension
;           $progbar         - Determines if there is a progress bar or not
                                ;Set to 0 for no or 1 for yes

; Requirement(s):   Requires Array.au3 UDF
; Return Value(s):  -1 if invalid array
;                    Otherwise it returns the number of unique elements in the array
; Author(s):        jon8763
;   
;===============================================================================

Func _ArrayElements($aArray, $dimension1, $dimension2, $progbar)
    Select
        Case $progbar = 0
            If $dimension1 < 0 Then
                Return -1
            EndIf
            If $dimension2 = -1 Then
                _ArraySort($aArray)
                Local $unq = 1, $i = 0
                Do
                    If StringInStr($aArray[$i], $aArray[$i+1]) = 0 Then
                        $unq = $unq + 1
                    EndIf
                    $i = $i + 1
                Until $i = $dimension1 - 1
                Return $unq
            Else
                Dim $temp[$dimension1*$dimension2]
                Local $z = 0, $i = 0, $j = 0
                Do
                    Do
                        $temp[$z] = $aArray[$i][$j]
                        $j = $j + 1
                        $z = $z + 1
                    Until $j = $dimension2
                    $i = $i + 1
                    $j = 0
                Until $i = $dimension1
                _ArraySort($temp)
                $unq = 1
                $i = 0
                Do
                    If StringInStr($temp[$i], $temp[$i+1]) = 0 Then
                        $unq = $unq + 1
                    EndIf
                    $i = $i + 1
                Until $i = $dimension1*$dimension2 - 1
                Return $unq
            EndIf
        Case $progbar = 1
                    If $dimension1 < 0 Then
                Return -1
            EndIf
            If $dimension2 = -1 Then
                ProgressOn("_ArrayElements", "Finding unique array elements...")
                _ArraySort($aArray)
                Local $unq = 1, $i = 0
                Do
                    If StringInStr($aArray[$i], $aArray[$i+1]) = 0 Then
                        $unq = $unq + 1
                    EndIf
                    $i = $i + 1
                    ProgressSet(($i/$dimension1)*100, "Elements found: "&$unq)
                Until $i = $dimension1 - 1
                ProgressOff()
                Return $unq
            Else
                ProgressOn("_ArrayElements", "Finding unique array elements...")
                Dim $temp[$dimension1*$dimension2]
                Local $z = 0, $i = 0, $j = 0
                Do
                    Do
                        $temp[$z] = $aArray[$i][$j]
                        $j = $j + 1
                        $z = $z + 1
                    Until $j = $dimension2
                    $i = $i + 1
                    $j = 0
                    ProgressSet(($i/$dimension1)*100, "Converting 2D array to 1D array")
                Until $i = $dimension1
                ProgressSet(100, "Sorting array, please wait...")
                _ArraySort($temp)
                $unq = 1
                $i = 0
                ProgressSet(($i/($dimension1*$dimension2))*100, "Elements found: "&$unq)
                Do
                    If StringInStr($temp[$i], $temp[$i+1]) = 0 Then
                        $unq = $unq + 1
                        ProgressSet(($i/($dimension1*$dimension2))*100, "Elements found: "&$unq)
                    EndIf
                    $i = $i + 1
                Until $i = $dimension1*$dimension2 - 1
                ProgressSet(100, "Elements found: "&$unq)
                ProgressOff()
                Return $unq
            EndIf
    EndSelect
EndFunc

;===============================================================================
;
; FunctionName:     _ArrayMode()
; Description:      Returns the most frequently occuring element in the array
; Syntax:           _ArrayMode( $aArray, $dimension1, $dimension2)
; Parameter(s):     $aArray              - The array to find the mode of
;                        $dimension1          - The size of the first dimension
;                       $dimension2            - The size of the second dimension, 
                                ;Set to -1 for no 2nd dimension
; Requirement(s):   Requires Array.au3 UDF
; Return Value(s):  -1 if invalid array
;                    Otherwise it returns the mode of the array
; Author(s):        jon8763
;   
;===============================================================================

Func _ArrayMode($aArray, $dimension1, $dimension2)
    If $dimension1 < 0 Then
        Return -1
    EndIf
    If $dimension2 = -1 Then
        Local $bar = 0, $cnt = 1, $i = 0, $last = $aArray[$i]
        _ArraySort($aArray)
        Do
            If StringInStr($aArray[$i], $aArray[$i+1]) > 0 Then
                $cnt = $cnt + 1
            ElseIf StringInStr($aArray[$i], $aArray[$i+1]) = 0 Then
                If $cnt > $bar Then
                    $bar = $cnt
                    $last = $aArray[$i]
                EndIf
                $cnt = 0
            EndIf
            $i = $i + 1
        Until $i = $dimension1 - 1
        Return $last
    Else
        Dim $temp[$dimension1*$dimension2]
        Local $z = 0, $i = 0, $j = 0
        Do
            Do
                $temp[$z] = $aArray[$i][$j]
                $j = $j + 1
                $z = $z + 1
            Until $j = $dimension2
            $i = $i + 1
            $j = 0
        Until $i = $dimension1
        _ArraySort($temp)
        Local $bar = 0, $cnt = 1, $i = 0, $last = $temp[$i]
        Do
            If StringInStr($temp[$i], $temp[$i+1]) > 0 Then
                $cnt = $cnt + 1
            ElseIf StringInStr($temp[$i], $temp[$i+1]) = 0 Then
                If $cnt > $bar Then
                    $bar = $cnt
                    $last = $temp[$i]
                EndIf
                $cnt = 0
            EndIf
            $i = $i + 1
        Until $i = ($dimension1*$dimension2) - 1
        Return $last
    EndIf
EndFunc
Edited by Jon8763
Link to comment
Share on other sites

Here are some array functions I came up with in case anyone else searches for them.

_ArrayMode finds the mode of the array (if there is more than one mode, it will only report one :P )

_ArrayElements finds the number of unique elements in the array. You must include Array.au3 to use them.

I'm not sure how efficient or failsafe these are, but they seem to work for me.

Good stuff! :)

Never willing to leave well-enough alone, I tweaked the _ArrayElements() function to taste (please ignore as you see fit).

I changed the input parameters because:

1. ByRef avoids the memory usage and time of duplicating a large input array.

2. The function can find out for itself how many dimensions there are an what size with Ubound().

3. The progress bar seemed an unnecessary complication (for me, anyway).

4. Often the first index [0] is skipped as data because it only contains the count.

Functionally, all I did was use a string to assemble the results vice an array, because it's faster to search for duplicates, and doesn't require any UDFs to operate.

Finally, the results are returned in a 1D array with the count in $array[0].

If nothing else, it illustrates another way to skin the cat...

#include <array.au3> ; only used for _ArrayDisplay()

Dim $avTest[4][2] = [[3, ""], [1, "abc"], [2, "def"], [3, "abc"]]
$avReply = _ArrayElements($avTest, 1)
_ArrayDisplay($avReply, "Test")

;===============================================================================
;
; FunctionName:     _ArrayElements()
; Description:      Returns the number of unique elements from a 1D or 2D array
; Syntax:           _ArrayElements( $aArray, $iStart )
; Parameter(s):     $aArray - ByRef array to return unique elements from (array is not changed)
;                   $iStart - (Optional) Index to start at, default is 0
; Return Value(s):  On success returns an array of unique elements,  $aReturn[0] = count
;                   On failure returns 0 and sets @error (see code below)
; Author(s):        jon8763; Modified by PsaltyDS
;===============================================================================

Func _ArrayElements(ByRef $aArray, $iStart = 0)
    If Not IsArray($aArray) Then Return SetError(1, 0, 0)
    
    ; Setup to use SOH as delimiter
    Local $SOH = Chr(01), $sData = $SOH
    
    ; Setup for number of dimensions
    Local $iBound1 = UBound($aArray) - 1, $Dim2 = False, $iBound2 = 0
    Select
        Case UBound($aArray, 0) = 2
            $Dim2 = True
            $iBound2 = UBound($aArray, 2) - 1
        Case UBound($aArray, 0) > 2
            Return SetError(2, 0, 0)
    EndSelect
    
    ; Get list of unique elements
    For $m = $iStart To $iBound1
        If $Dim2 Then
            ; 2D
            For $n = 0 To $iBound2
                If Not StringInStr($sData, $SOH & $aArray[$m][$n] & $SOH) Then $sData &= $aArray[$m][$n] & $SOH
            Next
        Else
            ; 1D
            If Not StringInStr($sData, $SOH & $aArray[$m] & $SOH) Then $sData &= $aArray[$m] & $SOH
        EndIf
    Next
    
    ; Strip start and end delimiters
    $sData = StringTrimRight(StringTrimLeft($sData, 1), 1)
    
    ; Return results after testing for null set
    Local $avRET = StringSplit($sData, $SOH)
    If $avRET[0] = 1 And $avRET[1] = "" Then Local $avRET[1] = [0]
    Return $avRET
EndFunc   ;==>_ArrayElements

:)

Edited by PsaltyDS
Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law
Link to comment
Share on other sites

Another shot at both of those functions this time. The _ArrayMode() function returns an array where [0] = mode count, and [1] thru [n] list all the elements with the same mode count. Running demo script attached:

#include <array.au3> ; Only for _ArrayDisplay()

; For this array, the mode element is "def" which occurs 3 times
Dim $sTest = "abc,def,ghi,jkl,mno,pqr,stu,vwx,yz1,234,567,890,def,mno,def" 
Dim $avTest = StringSplit($sTest, ",")
$avResult = _ArrayElements($avTest, 1)
_ArrayDisplay($avResult, "_ArrayElements()")
$avResult = _ArrayMode($avTest, 1)
_ArrayDisplay($avResult, "_ArrayMode()")

; For this array, the mode elements are "def", and "pqr" which occur 3 times each
$sTest = "abc,def,ghi,jkl,mno,pqr,stu,vwx,yz1,234,567,890,def,mno,pqr,def,567,pqr,234" 
$avTest = StringSplit($sTest, ",")
$avResult = _ArrayElements($avTest, 1)
_ArrayDisplay($avResult, "_ArrayElements()")
$avResult = _ArrayMode($avTest, 1)
_ArrayDisplay($avResult, "_ArrayMode()")

;===============================================================================
; FunctionName:     _ArrayElements()
; Description:      Returns the number of unique elements from a 1D or 2D array
; Syntax:           _ArrayElements( $aArray, $iStart )
; Parameter(s):     $aArray - ByRef array to return unique elements from (array is not changed)
;                   $iStart - (Optional) Index to start at, default is 0
; Return Value(s):  On success returns an array of unique elements,  $aReturn[0] = count
;                   On failure returns 0 and sets @error (see code below)
; Author(s):        jon8763; Modified by PsaltyDS
;===============================================================================
Func _ArrayElements(ByRef $aArray, $iStart = 0)
    If Not IsArray($aArray) Then Return SetError(1, 0, 0)

    ; Setup to use SOH as delimiter
    Local $SOH = Chr(01), $sData = $SOH

    ; Setup for number of dimensions
    Local $iBound1 = UBound($aArray) - 1, $Dim2 = False, $iBound2 = 0
    Select
        Case UBound($aArray, 0) = 2
            $Dim2 = True
            $iBound2 = UBound($aArray, 2) - 1
        Case UBound($aArray, 0) > 2
            Return SetError(2, 0, 0)
    EndSelect

    ; Get list of unique elements
    For $m = $iStart To $iBound1
        If $Dim2 Then
            ; 2D
            For $n = 0 To $iBound2
                If Not StringInStr($sData, $SOH & $aArray[$m][$n] & $SOH) Then $sData &= $aArray[$m][$n] & $SOH
            Next
        Else
            ; 1D
            If Not StringInStr($sData, $SOH & $aArray[$m] & $SOH) Then $sData &= $aArray[$m] & $SOH
        EndIf
    Next

    ; Strip start and end delimiters
    $sData = StringTrimRight(StringTrimLeft($sData, 1), 1)

    ; Return results after testing for null set
    Local $avRET = StringSplit($sData, $SOH)
    If $avRET[0] = 1 And $avRET[1] = "" Then Local $avRET[1] = [0]
    Return $avRET
EndFunc   ;==>_ArrayElements

;===============================================================================
;
; FunctionName:     _ArrayMode()
; Description:      Returns the most frequently occuring elements in the array
; Syntax:           _ArrayMode( $aArray [, $iStart] )
; Parameter(s):     $aArray - The ByRef array to find the mode of
;                   $iStart - (optional) The first index to check for data, default is 0
; Return Value(s):  On success returns a 1D array:
;                       [0] = Mode (number of instances of most common data element)
;                       [1] = First mode element
;                       [2] = Second mode element (with same mode count as first)
;                       [n] = Last mode element (with same mode count as first)
;                   On failure returns 0 and sets @error
; Author(s):        jon8763; modified by PsaltyDS
;===============================================================================
Func _ArrayMode(ByRef $aArray, $iStart = 0)
    ; Get list of unique elements
    Local $aData = _ArrayElements($aArray, $iStart)
    If @error Then Return SetError(@error, 0, 0)
    If $aData[0] = 0 Then Return $aData
    
    ; Setup to use SOH as delimiter
    Local $SOH = Chr(01), $sData = $SOH

    ; Setup for number of dimensions
    Local $iBound1 = UBound($aArray) - 1, $Dim2 = False, $iBound2 = 0
    If UBound($aArray, 0) = 2 Then
        $Dim2 = True
        $iBound2 = UBound($aArray, 2) - 1
    EndIf
    
    ; Assemble data string for searching
    For $m = $iStart To $iBound1
        If $Dim2 Then
            ; 2D
            For $n = 0 To $iBound2
                $sData &= $aArray[$m][$n] & $SOH
            Next
        Else
            ; 1D
            $sData &= $aArray[$m] & $SOH
        EndIf
    Next

    ; Check count of each unique element listed in $aData, highest count kept in $aCounts[0]
    Local $aCounts[$aData[0] + 1] = [0], $aRegExp[1]
    For $n = 1 To $aData[0]
        $aRegExp = StringRegExp($sData, $SOH & $aData[$n] & $SOH, 3)
        $aCounts[$n] = UBound($aRegExp)
        If $aCounts[$n] > $aCounts[0] Then $aCounts[0] = $aCounts[$n]
    Next
    
    ; Count elements that match the mode number
    Local $iMatches = 0
    For $n = 1 To $aData[0]
        If $aCounts[$n] = $aCounts[0] Then $iMatches += 1
    Next
    
    ; Return all elements matching highest count
    Local $aRET[$iMatches + 1] = [$aCounts[0]], $m = 1
    For $n = 1 To $aData[0]
        ; Add elements where count matches mode
        If $aCounts[$n] = $aCounts[0] Then
            $aRET[$m] = $aData[$n]
            $m += 1
        EndIf
    Next
    Return $aRET
EndFunc   ;==>_ArrayMode

:)

Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law
Link to comment
Share on other sites

  • Moderators

If your base with _ArrayElements() is 0 then regardless it's going to return 1 base because you are using StringSplit(), is this by design?

Also, handling 2D arrays, you only have 1 $iStart, but one of the dimensions may start higher than lower than the other, I see some errors coming with that.

This may give you some ideas on keeping it in line with what you are passing:

http://www.autoitscript.com/forum/index.php?showtopic=7821

Func _ArrayUnique(ByRef $aArray, $vDelim = '', $iBase = 1, $iCase = 1)
    If $vDelim = '' Then $vDelim = Chr(01)
    Local $sHold
    For $iCC = $iBase To UBound($aArray) - 1
        If Not StringInStr($vDelim & $sHold, $vDelim & $aArray[$iCC] & $vDelim, $iCase) Then _
            $sHold &= $aArray[$iCC] & $vDelim
    Next
    If $sHold And $iBase = 1 Then
        $aArray = StringSplit(StringTrimRight($sHold, StringLen($vDelim)), $vDelim)
        Return SetError(0, 0, 0)
    ElseIf $sHold And $iBase = 0 Then
        Local $avArray = StringSplit(StringTrimRight($sHold, StringLen($vDelim)), $vDelim)
        ReDim $aArray[UBound($avArray) - 1]
        For $iCC = 1 To UBound($avArray) - 1
            $aArray[$iCC - 1] = $avArray[$iCC]
        Next
        Return SetError(0, 0, 0)
    EndIf
    Return SetError(2, 0, 0)
EndFunc

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

If your base with _ArrayElements() is 0 then regardless it's going to return 1 base because you are using StringSplit(), is this by design?

It is by design from me, and works as intended... but I'm not the OP. Haven't heard if it meets his requirements or not.

Also, handling 2D arrays, you only have 1 $iStart, but one of the dimensions may start higher than lower than the other, I see some errors coming with that.

Again, it is as I intended. The $iStart parameter causes the skipping to happen in the first subscript, so if [0][0] is a count vice data (as with the array returned by WinList()) it can be excluded from the returned results.

This may give you some ideas on keeping it in line with what you are passing:

http://www.autoitscript.com/forum/index.php?showtopic=7821

That code modifies the original array, which I did not intend at all. The intent of my code was to get information ABOUT the ByRef array, not to make any changes to it.

I did get the use of Chr(01) as a delimiter and doing the repetitive parts on a string instead of an array from something of yours, though. Can't remember where I saw it, but it was a long time ago.

:)

Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law
Link to comment
Share on other sites

  • Moderators

I did get the use of Chr(01) as a delimiter and doing the repetitive parts on a string instead of an array from something of yours, though. Can't remember where I saw it, but it was a long time ago.

:)

If you look further up in that thread, I provided both solutions, the way you do it with yours, and by modifying the array.

Edit:

Otherwise, I'm not quite sure why you have ByRef to be honest.

Edited by SmOke_N

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

Otherwise, I'm not quite sure why you have ByRef to be honest.

It's certainly not required by the function, I just imagined it would be faster with a large array because then it doesn't have to be duplicated duplicated in memory for passing. The ByRef keywords could be removed without changing anything else about the fucntions.

:)

Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...