Speed difference when loop through array by row or by column

Inpho · March 3, 2021

Hi All,

I wondered whether I could optimise my array manipulation functions any better and came across a discovery that's new to me.

If your array has more rows than columns, then it seems that it's quicker to loop through it column by column rather than row by row.

Here is a demonstration with some timing examples:

$iArrayRows = 400000
$iArrayCols = 2
$iLoopRows = $iArrayRows - 1
$iLoopColumns = $iArrayCols - 1

_ArrayCreateH()
_ArrayCreateH()
_ArrayCreateH()
_ArrayCreateH()
_ArrayCreateH()

ConsoleWrite("---------------" & @CRLF)

_ArrayCreateV()
_ArrayCreateV()
_ArrayCreateV()
_ArrayCreateV()
_ArrayCreateV()

Func _ArrayCreateH()

    Local $aArray[$iArrayRows][$iArrayCols]

    $hTimer = _Timer_Init()

    For $i = 1 To $iLoopRows
        For $ii = 0 To $iLoopColumns
            $aArray[$i][$ii] = "0000000000"
        Next
    Next

    $hTimer = _Timer_Diff($hTimer)

    ConsoleWrite(($hTimer / 1000) & @CRLF)
EndFunc

Func _ArrayCreateV()

    Local $aArray[$iArrayRows][$iArrayCols]

    $hTimer = _Timer_Init()

    For $ii = 0 To $iLoopColumns
        For $i = 1 To $iLoopRows

            $aArray[$i][$ii] = "0000000000"

        Next
    Next

    $hTimer = _Timer_Diff($hTimer)

    ConsoleWrite(($hTimer / 1000) & @CRLF)

EndFunc

Timings show a good saving with the above array size (has drastically more rows than columns):

1.7250778
1.755863
1.7677521
1.7441728
1.7371205
---------------
1.453319
1.4538587
1.4681471
1.4888286
1.4498426

I had no idea that there would be a difference but it seems so obvious now. Is this something that everybody has innately known forever and I'm only just finding out now?

RTFC · March 3, 2021

Well, yes and no. You have to be aware that memory storage is linear, so when storing data with dimensionality higher than 1 (2 dims = array/matrix, 3+ = tensor), storage order becomes relevant, for an array or matrix this is either ColMajor (one cell in all rows per column before the next column, so rows iteration is the minor loop) or RowMajor (one cell in all columns per row before the next row, so cols iteration is the minor loop). In both cases, if you perform cell I/O in the other order it involves repeated jumping back and forth in linear memory space, which likely implies more memory paging swaps (esp. for large data sets). AutoIt arrays are more complicated because they contain pointers to data of any type instead of contiguous raw data of a single type, but the basic idea is the same. Different programming environments may opt for either storage order; efficiency depends on how the data are most commonly accessed. In a computational environment such as Eigen, the default is ColMajor, but RowMajor order can be imposed if necessary. More info is here.

Quote

Algorithms that traverse a matrix row by row will go faster when the matrix is stored in row-major order because of better data locality. Similarly, column-by-column traversal is faster for column-major matrices. It may be worthwhile to experiment a bit to find out what is faster for your particular application.

BTW, if you're manipulating large numerical data sets and are worried about speed, you might want to consider using matrices instead of arrays (AutoIt UDF here).

Edited March 3, 2021 by RTFC

pixelsearch · March 3, 2021

@Inpho : could you please inverse the test, starting with the 5 _ArrayCreateV() ?

Gianni · March 3, 2021

??... it seems that the difference in speed is not due to the different access mode of the elements of the array, but to the greater number of crossings of the different nesting levels of the two "for next" loops In fact, if you comment the two commands to access the array, you will see that the time difference persists.

$iArrayRows = 400000
$iArrayCols = 2
$iLoopRows = $iArrayRows - 1
$iLoopColumns = $iArrayCols - 1

_ArrayCreateH()
_ArrayCreateH()
_ArrayCreateH()
_ArrayCreateH()
_ArrayCreateH()

ConsoleWrite("---------------" & @CRLF)

_ArrayCreateV()
_ArrayCreateV()
_ArrayCreateV()
_ArrayCreateV()
_ArrayCreateV()

Func _ArrayCreateH()

    Local $aArray[$iArrayRows][$iArrayCols]

    $hTimer = TimerInit()

    For $i = 1 To $iLoopRows
        For $ii = 0 To $iLoopColumns
            ; $aArray[$i][$ii] = "0000000000"
        Next
    Next

    $hTimer = TimerDiff($hTimer)

    ConsoleWrite(($hTimer / 1000) & @CRLF)
EndFunc   ;==>_ArrayCreateH

Func _ArrayCreateV()

    Local $aArray[$iArrayRows][$iArrayCols]

    $hTimer = TimerInit()

    For $ii = 0 To $iLoopColumns
        For $i = 1 To $iLoopRows

            ; $aArray[$i][$ii] = "0000000000"

        Next
    Next

    $hTimer = TimerDiff($hTimer)

    ConsoleWrite(($hTimer / 1000) & @CRLF)

EndFunc   ;==>_ArrayCreateV

AspirinJunkie · March 3, 2021

In the first example, you initialize 400,001 times a For loop.
In the second example you initialize a For loop 3 times.

I therefore suspect that the time difference is rather caused by the overhead of the for-loop initialization.

What RTFC says is true in principle, but since the pointer array in AutoIt requires jumping back and forth in memory each time, I think that no consecutive elements can be cast there.

Edited March 3, 2021 by AspirinJunkie

Nine · March 3, 2021

I think that speaks for itself :

$iArray1 = 400000
$iArray2 = 2

_ArrayCreate1()

ConsoleWrite("---------------" & @CRLF)

_ArrayCreate2()

ConsoleWrite("---------------" & @CRLF)

Func _ArrayCreate1()

  Local $aArray[$iArray1][$iArray2]

  $hTimer = TimerInit()
  For $i = 0 To $iArray1 - 1
    For $j = 0 To $iArray2 - 1
      ; $aArray[$i][$j] = "0000000000"
    Next
  Next
  $hTimer = TimerDiff($hTimer)
  ConsoleWrite(($hTimer / 1000) & @CRLF)

  $hTimer = TimerInit()
  For $j = 0 To $iArray2 - 1
    For $i = 0 To $iArray1 - 1
      ; $aArray[$i][$j] = "0000000000"
    Next
  Next
  $hTimer = TimerDiff($hTimer)
  ConsoleWrite(($hTimer / 1000) & @CRLF)

EndFunc   ;==>_ArrayCreateH

Func _ArrayCreate2()

  Local $aArray[$iArray2][$iArray1]

  $hTimer = TimerInit()
  For $i = 0 To $iArray2 - 1
    For $j = 0 To $iArray1 - 1
      ; $aArray[$i][j] = "0000000000"
    Next
  Next
  $hTimer = TimerDiff($hTimer)
  ConsoleWrite(($hTimer / 1000) & @CRLF)

  $hTimer = TimerInit()
  For $j = 0 To $iArray1 - 1
    For $i = 0 To $iArray2 - 1
      ; $aArray[$i][j] = "0000000000"
    Next
  Next
  $hTimer = TimerDiff($hTimer)
  ConsoleWrite(($hTimer / 1000) & @CRLF)
EndFunc   ;==>_ArrayCreateV

Results :

0.65700933651172
0.0324245133094954
---------------
0.0336796450801218
0.654968878294266
---------------

It is not the way the array is created, but the way the loop uses the array. As you can imagine, there is more calculations involved when the short loop is second. When the long loop is second there is less calculation and faster result.

Block all input without UAC	Save/Retrieve Images to/from Text	Monitor Management (VCP commands)
Tool to search in text (au3) files	Date Range Picker	Virtual Desktop Manager
Sudoku Game 2020	Overlapped Named Pipe IPC	HotString 2.0 - Hot keys with string
x64 Bitwise Operations	Multi-keyboards HotKeySet	Recursive Array Display
Fast and simple WCD IPC	Multiple Folders Selector	Printer Manager
GIF Animation (cached)	Screen Scraping	Multi-Threading Made Easy

Sign In

Speed difference when loop through array by row or by column

Recommended Posts

Inpho

Link to comment

Share on other sites

RTFC

Link to comment

Share on other sites

pixelsearch

Link to comment

Share on other sites

Gianni

Link to comment

Share on other sites

AspirinJunkie

Link to comment

Share on other sites

Nine

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Recently Browsing 0 members

Browse

AutoIt Resources

Release

Beta