Jump to content

_ArrayGroupBy - 2D Array Group and Aggregation


0Ethan0
 Share

Recommended Posts

Hello fellow Autoits! (or is it Autoit-ers?)

After using Autoit for a while, using the forums to tackle issues, utilizing scripts others have shared, I finally found a way I can contribute back with a script I recently created to use in a small project I am working on, enter _ArrayGroupBy.

In a nutshell, it takes a 2D array and according to parameters you feed it, it returns a new array with grouped and aggregated columns.

With my luck, this probably exists somewhere in the dusty area of the forums where no search has gone (nor will go). Regardless, it was a nice way to remove some brain cobwebs 😆

Here comes the disclaimer: it is not fully tested, not much of error checking and I would say about 99% of the "features" I wanted to have there are... there 😅

Also, the script uses Erik Pilsits script _ArrayNaturalSort (thank you Erik!) so you will need to download the relevant files. However, if you wish, you can rely on _ArraySort only (just search replace and remove the #include from the script).

So, without further ado:

#include-once

; #INCLUDES# ====================================================================================================================
#include <Array.au3>
#include "_ArrayNaturalSort.au3" ; Credit to this file goes to Erik Pilsits.
                                 ; If unwanted, search for _ArrayNaturalSort and replace with _ArraySort
; ===============================================================================================================================

; #FUNCTION# ====================================================================================================================
; Name...........: _ArrayGroupBy
; Description ...: Group and aggregate 2D Array columns
; Syntax.........: _ArrayGroupBy(Const ByRef $aArrayInput, $aAggregations)
; Parameters ....: $aArrayInput   - The 2D array to use for grouping and aggregation
;                  $aAggregations - 2D array holding parameters of the aggregation format
;                                   Format: [Column to be grouped/aggregated, Group/aggregation function, Delimiter or ""]
;                                   NOTE 1: The row order in the aggregation parameters will reflect the column order in the final array
;                                   NOTE 2: A column can be used more than once.
;                                   Possible group/aggregation function:
;                                   GROUP ................: No implementation yet, please use GROUP_SORT instead.
;                                   GROUP_SORT ...........: Indicates a column to be used for grouping, sorted.
;                                   CONCAT ...............: Concatenate each item using the provided delimiter.
;                                   CONCAT_SORT ..........: Sort first then concatenate each item using the provided delimiter.
;                                   CONCAT_UNIQUE ........: Concatenate only unique items using the provided delimiter.
;                                   CONCAT_SORT_UNIQUE ...: Sort first then concatenate only unique items using the provided delimiter.
;                                   COUNT ................: Count all items in the group.
;                                   COUNT_UNIQUE .........: Count only unique items in the group.
;                                   FIRST ................: Returns the first entry of the group.
;                                   FIRST_SORT ...........: First sort then returns the first entry of the group.
;                                   LAST .................: Returns the last entry of the group.
;                                   LAST_SORT ............: First sort then returns the last entry of the group.
;                                   SUM ..................: Returns the sum of all numeric items
; Return values .: Success - New grouped/aggregated 2D array
;                  Failure: @error = 0 with @extended set as follows:
;                  | 1 = Incorrectly formatted parameter
;                  | 2 = Must have at least one GROUP parameter
;
; Author ........: Ethan Phoenix
; Modified.......: Nov. 24, 21 - Version 1.0, Initial release
;                  Nov. 24, 21 - Version 1.0.1, Added a few more aggregation function and optimized the code in some section
;                  Sep. 14, 22 - Version 1.0.2, Fixed issue with single row array (would return an unexpected array)
;                  Oct. 02, 22 - Version 1.1.0, Fixed a bug (overlooked a column count)
;                                               Added quotes cleanup in case of SUM
;                  Nov. 24, 22 - Version 1.2, Fixed variable not resetting issue and overlooked UBound counting rows instead of columns...
; Example .......: 
;   Local $aTestArray[][] = [ _
;                               ["123", "!!!", "GHI", 10], _
;                               ["456", "!!!", "DEF", 10], _
;                               ["456", "@@@", "ABC", 15], _
;                               ["123", "!!!", "ABC", 10], _
;                               ["111", "!!!", "DEF", 20], _
;                               ["123", "$$$", "ABC", 50] _
;                           ]
;   _ArrayDisplay($aTestArray, "Before _ArrayGroupBy")
;   
;   Local $aAggregations[][] = [ _
;                               [2, "GROUP", ""], _
;                               [2, "COUNT", ""], _
;                               [0, "GROUP", ""], _
;                               [1, "CONCAT_UNIQUE", ", "], _
;                               [3, "SUM", ""] _
;                              ]
;   
;   _ArrayDisplay(_ArrayGroupBy($aTestArray, $aAggregations), "After _ArrayGroupBy")
; ===============================================================================================================================

Func _ArrayGroupBy(Const ByRef $aArrayInput, $aAggregations)
    If UBound($aArrayInput) > 1 Then
        If UBound($aAggregations) > 0 AND UBound($aAggregations, 2) == 3 Then
            Local $aGroups = _ArrayFindAll($aAggregations, "GROUP", 0, 0, 0, 1, 1)
            If UBound($aGroups) > 0 Then
                Local $aArrayTemp[0][UBound($aArrayInput)]
                For $i=0 To UBound($aAggregations)-1
                    $aTempColumn = _ArrayExtract($aArrayInput, -1, -1, $aAggregations[$i][0], $aAggregations[$i][0])
                    _ArrayTranspose($aTempColumn)
                    _ArrayAdd($aArrayTemp, $aTempColumn)
                    $aAggregations[$i][0] = $i
                Next
                _ArrayAdd($aArrayTemp, "")
                _ArrayTranspose($aArrayTemp)
                For $i=0 To UBound($aArrayTemp)-1
                    For $i2=0 To UBound($aGroups)-1
                        $aArrayTemp[$i][UBound($aArrayTemp, 2)-1] &= $aArrayTemp[$i][$aGroups[$i2]]
                    Next
                Next
                $aTempColumn = _ArrayUnique($aArrayTemp, UBound($aArrayTemp, 2)-1, 0, 0, $ARRAYUNIQUE_NOCOUNT)
                _ArrayNaturalSort($aTempColumn)

                Local $aArrayOutput[UBound($aTempColumn)][UBound($aAggregations)]

                For $iGroup=0 To UBound($aTempColumn)-1
                    $aFilteredGroup = __ArrayExtractRange($aArrayTemp, _ArrayFindAll($aArrayTemp, $aTempColumn[$iGroup], 0, 0, 0, 0, UBound($aArrayTemp, 2)-1))
                    For $iGroupCol=0 To UBound($aGroups)-1
                        $aArrayOutput[$iGroup][$aGroups[$iGroupCol]] = $aFilteredGroup[0][$aGroups[$iGroupCol]]
                    Next
                    For $iAgCount=0 To UBound($aAggregations)-1
                        If StringInStr($aAggregations[$iAgCount][1], "GROUP") == 0 Then
                            $aCollector = _ArrayExtract($aFilteredGroup, -1, -1, $aAggregations[$iAgCount][0], $aAggregations[$iAgCount][0])
                            If StringInStr($aAggregations[$iAgCount][1], "SORT") == 0 Then
                                _ArrayNaturalSort($aCollector)
                            EndIf
                            Local $sAgResults = 0
                            Switch $aAggregations[$iAgCount][1]
                                Case "CONCAT"
                                    $sAgResults = _ArrayToString($aCollector, $aAggregations[$iAgCount][2])
                                Case "CONCAT_SORT"
                                    $sAgResults = _ArrayToString($aCollector, $aAggregations[$iAgCount][2])
                                Case "CONCAT_UNIQUE"
                                    $aCollector = _ArrayUnique($aCollector, 0, 0, 0, $ARRAYUNIQUE_NOCOUNT)
                                    $sAgResults = _ArrayToString($aCollector, $aAggregations[$iAgCount][2])
                                Case "CONCAT_SORT_UNIQUE"
                                    $aCollector = _ArrayUnique($aCollector, 0, 0, 0, $ARRAYUNIQUE_NOCOUNT)
                                    $sAgResults = _ArrayToString($aCollector, $aAggregations[$iAgCount][2])
                                Case "COUNT"
                                    $sAgResults = UBound($aCollector)
                                Case "COUNT_UNIQUE"
                                    $aCollector = _ArrayUnique($aCollector, 0, 0, 0, $ARRAYUNIQUE_NOCOUNT)
                                    $sAgResults = UBound($aCollector)
                                Case "FIRST"
                                    $sAgResults = $aCollector[0]
                                Case "FIRST_SORT"
                                    $sAgResults = $aCollector[0]
                                Case "LAST"
                                    $sAgResults = $aCollector[UBound($aCollector)-1]
                                Case "LAST_SORT"
                                    $sAgResults = $aCollector[UBound($aCollector)-1]
                                Case "SUM"
                                    For $sValue In $aCollector
                                        $sAgResults += Number(StringReplace($sValue, '"', ''))
                                    Next
                            EndSwitch
                            $aArrayOutput[$iGroup][$aAggregations[$iAgCount][0]] = $sAgResults
                        EndIf
                    Next
                Next
                Return $aArrayOutput
            Else
                Return SetError(0, 2, "Must have at least one GROUP parameter")
            EndIf
        Else
            Return SetError(0, 1, "Incorrectly formatted parameter")
        EndIf
    Else
        Return $aArrayInput
    EndIf
EndFunc

; #INTERNAL_USE_ONLY#============================================================================================================
; Name...........: __ArrayExtractRange
; Description ...: Extract range of rows from a given array
; Syntax.........: __ArrayExtractRange(ByRef $aArrayInput, $vRange)
; Parameters ....: $aArrayInput - Array to extract rows from
;                  $vRange      - An array or a string of indices to extract. When a string, using ; as delimiter.
;                                 Both can use hyphen - for spanning range.
; Return values .: A new array 
; Author ........: Ethan Phoenix
; Modified.......: Nov. 24, 21 - Version 1.0, Initial release
; Example .......: __ArrayExtractRange($aArrayInput, "0:2-4")
; ===============================================================================================================================

Func __ArrayExtractRange(Const ByRef $aArrayInput, $vRange)
    Local $aArrayOut[0][UBound($aArrayInput, 2)]
    If IsArray($vRange) <> 1 Then
        $vRange = StringSplit($vRange, ";", $STR_NOCOUNT )
    EndIf
    For $i=0 To UBound($vRange)-1
        $iStart = $vRange[$i]
        $iEnd = $vRange[$i]
        $iIsRange = StringInStr($vRange[$i], "-")
        If $iIsRange <> 0 Then
            $iStart = StringLeft($vRange[$i], $iIsRange-1)
            $iEnd = StringRight($vRange[$i], StringLen($vRange[$i])-$iIsRange)
        EndIf
        _ArrayAdd($aArrayOut, _ArrayExtract($aArrayInput, $iStart, $iEnd))
    Next
    Return $aArrayOut
EndFunc

Cheers!

Nov. 24, 2022 - UPDATE 1.2 :

Fixed variable not resetting issue and overlooked UBound counting rows instead of columns........

Oct. 02, 2022 - UPDATE 1.1.0 :

Fixed an overlooked column count which causes the results not to group properly. Also added quotes removal in case using SUM with numbers surrounded by quotes (which can happen in some CSV imported tables).

Sept. 14, 2022 - UPDATE 1.0.2 :

A small update for this script - it returned unexpected result in case the given array had only one row. I added a simple check to return the array if it has only one row since there is no reason for this script to do any 'calculations' on such an array.

 

 

 

_ArrayGroupBy.au3

Edited by 0Ethan0
Script Update 1.2
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...