Jump to content

Fastest Way to Delete Empty Records in 1d Array


Go to solution Solved by Andreik,

Recommended Posts

Here is what I am using but maybe it can be done faster within this line and not have to go through the loop to test for empty values?  Thanks in advance!

$tmp_array = StringRegExpReplace($tmp_array, "\|\|+", "|")

 

$array = Get_Array()

_ArrayDisplay($array)
$array = StripEmptyRecords($array, 0)
_ArrayDisplay($array)

Func Get_Array()

    Local $array[7]

    $array[0] = "12345"
    $array[1] = "01621"
    $array[2] = "xyz"
    $array[3] = "abc@defg.com"
    $array[4] = " john smith"
    $array[5] = " sally turner "
    $array[5] = "  "
    $array[6] = "zxy"

    Return $array

EndFunc   ;==>Get_Array


Func StripEmptyRecords($array, $start_row)

    $tmp_array = _ArrayToString($array, "|", $start_row)
    $tmp_array = StringStripCR($tmp_array)

    $tmp_array = StringRegExpReplace($tmp_array, "\|\|+", "|")

    If StringRight($tmp_array, 1) = "|" Then
        $tmp_array = StringTrimRight($tmp_array, 1)
    EndIf

    If StringLeft($tmp_array, 1) = "|" Then
        $tmp_array = StringTrimLeft($tmp_array, 1)
    EndIf

    $final_array = StringSplit($tmp_array, "|", 2)

    For $x = UBound($final_array) - 1 To 0 Step -1
        If StringStripWS($final_array[$x], 8) = "" Then
            _ArrayDelete($final_array, $x)
        EndIf
    Next

    Return $final_array

EndFunc   ;==>StripEmptyRecords

 

Link to comment
Share on other sites

  • Solution
Posted (edited)

Since _ArrayToString() already traverse the entire array you can do it by yourself and meanwhile keep just non empty data:

#include <Array.au3>

$array = Get_Array()

_ArrayDisplay($array)
$array = StripEmptyRecords($array, 0)
_ArrayDisplay($array)

Func Get_Array()

    Local $array[7]

    $array[0] = "12345"
    $array[1] = "01621"
    $array[2] = "xyz"
    $array[3] = "abc@defg.com"
    $array[4] = " john smith"
    $array[5] = " sally turner "
    $array[5] = "  "
    $array[6] = "zxy"

    Return $array

EndFunc   ;==>Get_Array


Func StripEmptyRecords(ByRef $aData, $iStart)
    If Not IsArray($aData) Then Return SetError(1, 0, False)
    Local $iElements = UBound($aData)
    If $iStart >= $iElements Then Return SetError(2, 0, False)
    Local $sResult
    For $Index = $iStart To $iElements - 1
        $sResult &= (StringStripWS($aData[$Index], 8) ? $aData[$Index] & '|' : '')
    Next
    Return StringSplit(StringTrimRight($sResult, 1), '|', 2)
EndFunc

 

Edited by Andreik

When the words fail... music speaks.

Link to comment
Share on other sites

  • Moderators

Doing it without concatenation may be faster for larger arrays.

Based on what you're showing, you could do a simple for/next loop with a blank ret array or you could put in the bells and whistles to catch some errors like so:

#include <Array.au3>

Global $array = Get_Array()

_ArrayDisplay($array)
$array = _StripEmptyElements($array)
_ArrayDisplay($array)

Func Get_Array()

    Local $array[7]

    $array[0] = "12345"
    $array[1] = "01621"
    $array[2] = "xyz"
    $array[3] = "abc@defg.com"
    $array[4] = " john smith"
    $array[5] = " sally turner "
    $array[5] = "  " & @CRLF & "  "
    $array[6] = "zxy"

    Return $array

EndFunc   ;==>Get_Array

; $bNoWS = White spaces only count as empty element, true by default
Func _StripEmptyElements(ByRef $aArgs, $iStart = 0, $bNoWS = True)

    If UBound($aArgs, 2) Then Return SetError(1, 0, 0) ; out of bounds
    
    If $iStart = Default Or $iStart == -1 Then $iStart = 0
    If $bNoWS = Default Or $bNoWS == -1 Then $bNoWS = True

    Local $iUB = UBound($aArgs)
    ; catch start out of bounds
    If $iStart < 0 Or $iStart > $iUB - 1 Then Return SetError(2, 0, 0)

    Local $aRet[$iUB]
    Local $iEnum = 0
    
    ; build array without concatenation
    For $i = $iStart To $iUB - 1
        If StringLen($aArgs[$i]) == 0 Then ContinueLoop
        If $bNoWS Then
            If StringRegExp($aArgs[$i], "(?m)^\s+$") Then ContinueLoop
        EndIf
        $aRet[$iEnum] = $aArgs[$i]
        $iEnum += 1
    Next

    If $iEnum = 0 Then
        ; nothing found, but rather than return a false
        ;  set error and return array where user can do what they want with it
        Return SetError(2, 0, $aArgs)
    EndIf
    
    ; resize return array
    ReDim $aRet[$iEnum]
    ; return extended as the ubound of new array
    Return SetExtended($iEnum, $aRet)
EndFunc

 

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

Doesn't seem to be faster not even with 1 million of records.

#include <Array.au3>

$iLen = 1

For $Index = 1 To 6
    $iLen *= 10
    $aData = BuildArray($iLen)
    ConsoleWrite('Records in array: ' & $iLen & @CRLF)
    ConsoleWrite('Empty records: ' & @extended & @CRLF)

    $iTimer = TimerInit()
    $aNew1 = StripEmptyRecords($aData, 0)
    ConsoleWrite(Round(TimerDiff($iTimer), 2) & ' ms' & @TAB)
    ConsoleWrite('Records: ' & UBound($aNew1) & @CRLF)

    $iTimer = TimerInit()
    $aNew2 = _StripEmptyElements($aData)
    ConsoleWrite(Round(TimerDiff($iTimer), 2) & ' ms' & @TAB)
    ConsoleWrite('Records: ' & UBound($aNew2) & @CRLF)

    ConsoleWrite(@CRLF)
Next

Func BuildArray($iLen)
    Local $aData[$iLen]
    Local $sData, $iEmpty = 0
    For $Index = 0 To $iLen - 1
        $sData = (Random(1, 10, 1) = 5 ? '' : RandomString(Random(10, 50, 1)))
        $aData[$Index] = $sData
        If Not $sData Then $iEmpty += 1
    Next
    Return SetError(0, $iEmpty, $aData)
EndFunc

Func RandomString($iLen)
    Local Static $aChars = StringSplit('abcdefghijklmnopqrstuvwxyz', '')
    Local $sString
    For $Index = 1 To $iLen
        $sString &= $aChars[Random(1, $aChars[0], 1)]
    Next
    Return $sString
EndFunc

Func StripEmptyRecords(ByRef $aData, $iStart)
    If Not IsArray($aData) Then Return SetError(1, 0, False)
    Local $iElements = UBound($aData)
    If $iStart >= $iElements Then Return SetError(1, 0, False)
    Local $sResult
    For $Index = $iStart To $iElements - 1
        $sResult &= (StringStripWS($aData[$Index], 8) ? $aData[$Index] & '|' : '')
    Next
    Return StringSplit(StringTrimRight($sResult, 1), '|', 2)
EndFunc

Func _StripEmptyElements(ByRef $aArgs, $iStart = 0, $bNoWS = True)

    If UBound($aArgs, 2) Then Return SetError(1, 0, 0) ; out of bounds

    If $iStart = Default Or $iStart == -1 Then $iStart = 0
    If $bNoWS = Default Or $bNoWS == -1 Then $bNoWS = True

    Local $iUB = UBound($aArgs)
    ; catch start out of bounds
    If $iStart < 0 Or $iStart > $iUB - 1 Then Return SetError(2, 0, 0)

    Local $aRet[$iUB]
    Local $iEnum = 0

    ; build array without concatenation
    For $i = $iStart To $iUB - 1
        If StringLen($aArgs[$i]) == 0 Then ContinueLoop
        If $bNoWS Then
            If StringRegExp($aArgs[$i], "(?m)^\s+$") Then ContinueLoop
        EndIf
        $aRet[$iEnum] = $aArgs[$i]
        $iEnum += 1
    Next

    If $iEnum = 0 Then
        ; nothing found, but rather than return a false
        ;  set error and return array where user can do what they want with it
        Return SetError(2, 0, $aArgs)
    EndIf

    ; resize return array
    ReDim $aRet[$iEnum]
    ; return extended as the ubound of new array
    Return SetExtended($iEnum, $aRet)
EndFunc

Results:

Quote

Records in array: 10
Empty records: 0
0.1 ms    Records: 10
0.13 ms    Records: 10

Records in array: 100
Empty records: 6
0.5 ms    Records: 94
0.62 ms    Records: 94

Records in array: 1000
Empty records: 109
3.78 ms    Records: 891
5.15 ms    Records: 891

Records in array: 10000
Empty records: 996
31.35 ms    Records: 9004
41.58 ms    Records: 9004

Records in array: 100000
Empty records: 9824
312.28 ms    Records: 90176
399.45 ms    Records: 90176

Records in array: 1000000
Empty records: 100389
3134.44 ms    Records: 899611
4157.58 ms    Records: 899611

 

When the words fail... music speaks.

Link to comment
Share on other sites

  • Moderators

That's pretty interesting.  The regex overhead is the only thing that I can think of for that with a single redim.  If I'm being honest, I didn't even see your post before I posted.  And was to tired to test yours out once I posted lol.  But proofs in the pudding so to speak.

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

The first is quite fast.  I kept the second for reference.

Func StripEmpty2(ByRef $array, $iStart = 0)
  Local $sArray = StringRegExpReplace(_ArrayToString($array, Default, $iStart), "^\s*\||(?<=\|)\s*\||\|(?=[\s\|]*$)|\s*$", "")
  Return StringSplit($sArray, "|", $STR_NOCOUNT)
EndFunc

Both have been modified to resolve misinterpretation of $iStart meaning

Func StripEmpty3(ByRef $array, $iStart = 0)
  Local $aTemp = _ArrayFindAll($array, "^\s*$", $iStart, Default, 0, 3)
  If @error Then
    If $iStart Then _ArrayDelete($array, "0-" & $iStart - 1)
    Return
  EndIf
  For $i = $iStart - 1 To 0 Step -1
    _ArrayInsert($aTemp, 0, $i)
  Next
  _ArrayInsert($aTemp, 0, UBound($aTemp))
  _ArrayDelete($array, $aTemp)
EndFunc
Edited by Nine
better SRER pattern
Link to comment
Share on other sites

9 minutes ago, SmOke_N said:

That's pretty interesting.  The regex overhead is the only thing that I can think of for that with a single redim.  If I'm being honest, I didn't even see your post before I posted.  And was to tired to test yours out once I posted lol.  But proofs in the pudding so to speak.

Redim it's quite expensive with large arrays. With few indices every version is more than enough.

When the words fail... music speaks.

Link to comment
Share on other sites

I have added a function and packed the whole thing into a speed comparison:

#include <Array.au3>

Global Const $aArrayRaw = Get_Array()
Global $f_DecimalPlaces = 1
Global $iT, $a_Results[0][3]

Func Get_Array()
    Local $aArray[1e6]

    For $i = 0 To UBound($aArray) - 1
        $aArray[$i] = Random(0,2,1) = 2 ? " " : "x"
    Next

    Return $aArray
EndFunc   ;==>Get_Array

; the first measurement
$aArray = $aArrayRaw
ReDim $a_Results[UBound($a_Results) + 1][3]
$a_Results[UBound($a_Results) - 1][0] = "Andreik"
$iT = TimerInit()
$aArray = _strip_Andreik($aArray)
$iT = TimerDiff($iT)
$a_Results[UBound($a_Results) - 1][1] = ($iT)

; the second measurement
$aArray = $aArrayRaw
ReDim $a_Results[UBound($a_Results) + 1][3]
$a_Results[UBound($a_Results) - 1][0] = "SmOke_N"
$iT = TimerInit()
$aArray = _strip_SmOke_N($aArray)
$iT = TimerDiff($iT)
$a_Results[UBound($a_Results) - 1][1] = ($iT)

; the third measurement
$aArray = $aArrayRaw
ReDim $a_Results[UBound($a_Results) + 1][3]
$a_Results[UBound($a_Results) - 1][0] = "Nine 1"
$iT = TimerInit()
$aArray = _stripNine1($aArray)
$iT = TimerDiff($iT)
$a_Results[UBound($a_Results) - 1][1] = ($iT)

; the fourth measurement
$aArray = $aArrayRaw
ReDim $a_Results[UBound($a_Results) + 1][3]
$a_Results[UBound($a_Results) - 1][0] = "Nine 2"
$iT = TimerInit()
$aArray = _stripNine2($aArray)
$iT = TimerDiff($iT)
$a_Results[UBound($a_Results) - 1][1] = ($iT)

; the fifth measurement
$aArray = $aArrayRaw
ReDim $a_Results[UBound($a_Results) + 1][3]
$a_Results[UBound($a_Results) - 1][0] = "AspirinJunkie"
$iT = TimerInit()
_strip_AspirinJunkie($aArray)
$iT = TimerDiff($iT)
$a_Results[UBound($a_Results) - 1][1] = ($iT)

; calculate results and print them out
_ArraySort($a_Results, 0, 0, 0, 1)
For $i = 0 To UBound($a_Results) - 1
    $a_Results[$i][2] = Round($a_Results[$i][1] / $a_Results[0][1], 2)
    $a_Results[$i][1] = Round($a_Results[$i][1], $f_DecimalPlaces)
Next
_ArrayDisplay($a_Results, "Measurement Results", "", 16 + 64, Default, "name|time [ms]|factor")


Func _strip_Andreik(ByRef $aData, $iStart = 0)
    If Not IsArray($aData) Then Return SetError(1, 0, False)
    Local $iElements = UBound($aData)
    If $iStart >= $iElements Then Return SetError(2, 0, False)
    Local $sResult
    For $Index = $iStart To $iElements - 1
        $sResult &= (StringStripWS($aData[$Index], 8) ? $aData[$Index] & '|' : '')
    Next
    Return StringSplit(StringTrimRight($sResult, 1), '|', 2)
EndFunc

Func _stripNine1(ByRef $array, $iStart = 0)
    Local $sArray = StringRegExpReplace(_ArrayToString($array, Default, $iStart), "\|\s*$|(?<=\|)\s*\|", "")
    Return StringSplit(($iStart ? _ArrayToString($array, Default, 0, $iStart - 1) & "|" : "") & $sArray, "|", $STR_NOCOUNT)
EndFunc

Func _stripNine2(ByRef $array, $iStart = 0)
    Local $aTemp = _ArrayFindAll($array, "^\s*$", $iStart, Default, 0, 3)
    _ArrayInsert($aTemp, 0, UBound($aTemp))
    _ArrayDelete($array, $aTemp)
    Return $array
EndFunc


; $bNoWS = White spaces only count as empty element, true by default
Func _strip_SmOke_N(ByRef $aArgs, $iStart = 0, $bNoWS = True)

    If UBound($aArgs, 2) Then Return SetError(1, 0, 0) ; out of bounds
    
    If $iStart = Default Or $iStart == -1 Then $iStart = 0
    If $bNoWS = Default Or $bNoWS == -1 Then $bNoWS = True

    Local $iUB = UBound($aArgs)
    ; catch start out of bounds
    If $iStart < 0 Or $iStart > $iUB - 1 Then Return SetError(2, 0, 0)

    Local $aRet[$iUB]
    Local $iEnum = 0
    
    ; build array without concatenation
    For $i = $iStart To $iUB - 1
        If StringLen($aArgs[$i]) == 0 Then ContinueLoop
        If $bNoWS Then
            If StringRegExp($aArgs[$i], "(?m)^\s+$") Then ContinueLoop
        EndIf
        $aRet[$iEnum] = $aArgs[$i]
        $iEnum += 1
    Next

    If $iEnum = 0 Then
        ; nothing found, but rather than return a false
        ;  set error and return array where user can do what they want with it
        Return SetError(2, 0, $aArgs)
    EndIf

    ; resize return array
    ReDim $aRet[$iEnum]
    ; return extended as the ubound of new array
    Return SetExtended($iEnum, $aRet)
EndFunc

Func _strip_AspirinJunkie(ByRef $A, $iStart = 0, $iEnd = UBound($A) - 1)
    Local $x = $iStart
    For $i = $iStart To $iEnd
        If StringIsSpace($A[$i]) Then ContinueLoop
        $A[$x] = $A[$i]
        $x += 1
    Next
    Redim $A[$x]
EndFunc

Nine`s first function performs best, but would still have to be adapted to certain special cases for which it currently does not work, depending on the type of data: 1. if first array element is empty it`s still in the array 2. only line breaks are not recognized as empty strings (which may be correct depending on the context) and 3. if pipes ("|") occur in the strings.

 

Edited by AspirinJunkie
Link to comment
Share on other sites

It looks like replacing the costly StringStripWS() with StringIsSpace() the function performs much better:

Func _strip_Andreik(ByRef $aData, $iStart = 0)
    If Not IsArray($aData) Then Return SetError(1, 0, False)
    Local $iElements = UBound($aData)
    If $iStart >= $iElements Then Return SetError(2, 0, False)
    Local $sResult
    For $Index = $iStart To $iElements - 1
        $sResult &= (StringIsSpace($aData[$Index]) ? '' : $aData[$Index] & '|')
    Next
    Return StringSplit(StringTrimRight($sResult, 1), '|', 2)
EndFunc

Totally forgot about this function. Thanks @AspirinJunkie.

Edited by Andreik

When the words fail... music speaks.

Link to comment
Share on other sites

@AspirinJunkie Thanks for the evaluation.  I edited the pattern of my first script above to meet all (as far of my tests went) special cases you mentioned.  As for CR, if it is required, OP can just add it along with \s as alternates.  In the case of pipes included in the array, OP can change the separation character to anything he wants if needs be.

Link to comment
Share on other sites

The pattern is still not enough to match all empty rows of the array. Try this:

#include <Array.au3>

Local $array[7]
$array[0] = "  "
$array[1] = "01621"
$array[2] = "xyz"
$array[3] = ""
$array[4] = " john smith"
$array[5] = " sally turner "
$array[6] = ""

$array = StripEmpty2($array)
_ArrayDisplay($array)

Func StripEmpty2(ByRef $array, $iStart = 0)
  Local $sArray = StringRegExpReplace(_ArrayToString($array, Default, $iStart), "\|\s*$|(?<=\|)\s*\|", "")
  Return StringSplit(($iStart ? _ArrayToString($array, Default, 0, $iStart - 1) & "|" : "") & $sArray, "|", $STR_NOCOUNT)
EndFunc

First index is not stripped. Also having two or more empty rows at the end of the array will leave an empty space:

#include <Array.au3>

Local $array[8]
$array[0] = "  "
$array[1] = "01621"
$array[2] = "xyz"
$array[3] = ""
$array[4] = " john smith"
$array[5] = " sally turner "
$array[6] = ""
$array[7] = ""

$array = StripEmpty2($array)
_ArrayDisplay($array)

Func StripEmpty2(ByRef $array, $iStart = 0)
  Local $sArray = StringRegExpReplace(_ArrayToString($array, Default, $iStart), "\|\s*$|(?<=\|)\s*\|", "")
  Return StringSplit(($iStart ? _ArrayToString($array, Default, 0, $iStart - 1) & "|" : "") & $sArray, "|", $STR_NOCOUNT)
EndFunc

 

When the words fail... music speaks.

Link to comment
Share on other sites

@Nine Try this to see an unexpected behavior:

#include <Array.au3>

Local $array[10]
$array[0] = " "
$array[1] = "  "
$array[2] = "  "
$array[3] = "  test "
$array[4] = "   "
$array[5] = " "
$array[6] = "  more tests  "
$array[7] = " "
$array[8] = ""
$array[9] = "   "

ConsoleWrite(StringLen($array[6]) & @CRLF)

$array = StripEmpty2($array)
_ArrayDisplay($array)

ConsoleWrite(StringLen($array[1]) & @CRLF)

Func StripEmpty2(ByRef $array, $iStart = 0)
  Local $sArray = StringRegExpReplace(_ArrayToString($array, Default, $iStart), "^\s*\||(?<=\|)\s*\||\|(?=[\s\|]*$)", "")
  Return StringSplit($sArray, "|", $STR_NOCOUNT)
EndFunc

Spaces of the last row are added to the last non empty data due to regex pattern matching only the last pipe delimitator without the following spaces.

When the words fail... music speaks.

Link to comment
Share on other sites

You need one more update (don't hate me :lol: )

#include <Array.au3>

Local $array[3]
$array[0] = ''
$array[1] = ''
$array[2] = ' some data   '

ConsoleWrite(StringLen($array[2]) & @CRLF)
$array = StripEmpty2($array)
ConsoleWrite(StringLen($array[0]) & @CRLF)

_ArrayDisplay($array)

Func StripEmpty2(ByRef $array, $iStart = 0)
  Local $sArray = StringRegExpReplace(_ArrayToString($array, Default, $iStart), "^\s*\||(?<=\|)\s*\||\|(?=[\s\|]*$)|\s*$", "")
  Return StringSplit($sArray, "|", $STR_NOCOUNT)
EndFunc

I think you can remove the positive lookahead from the last capturing group for a simpler pattern like this:

^\s*\||(?<=\|)\s*\||((\|\s*)*$)

It might be slightly faster but don't expect anything considerable.

When the words fail... music speaks.

Link to comment
Share on other sites

Interesting.  I agree your pattern is simpler to follow, but for some reason that escapes me, the positive lookahead is a tad (10-15%) faster on any array length.

Maybe it is the capturing group that causing it to be slower ?

Edited by Nine
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...