MyEarth

_FileListToArrayRec sorting?

4 posts in this topic

#1 ·  Posted (edited)

Hi people

English isn't my mother language and i have some difficulties to understand this sentence:

$iSort = $FLTAR_FASTSORT (2) Sorted with faster algorithm

1) requires NTFS

About "requires NTFS" using DriveGetFileSystem on the $sFilePath i think i don't have problem to switch to $FLTAR_SORT  if necessary

2) assumes files in folder returned sorted 

I really don't understand what it mean

3) not guaranteed

Not guaranteed what? Sorting? Why?  This option was present also on the old "RecFileListToArray" and inside the description of the UDF the words "not guaranteed" wasn't present

Sorted with faster algorithm (assumes files sorted within each folder - requires NTFS drive)

Pratically i don't understand if is secure to use $FLTAR_FASTSORT if before i check the file system type.

 

Edited by MyEarth

Share this post


Link to post
Share on other sites



#2 ·  Posted

MyEarth,

As I wrote the UDF, I feel qualified to answer.

In pre NTFS OSes, the files returned by repeated FileFindNextFile calls were almost never in alphabetical order - they were returned in whatever order they were stored within the FAT, which depended on write date, previous deletions, etc. So the basic UDF sort code does not presume the files within a folder are returned in any particular order and sorts them by default.

NTFS OSes usually return the files in alphabetical order, although this is not guaranteed, and so I added the $FLTAR_FASTSORT option to skip one of the steps in the sorting algorithm. As the sorting is one of the slowest parts of the UDF code, any shortcuts are welcome and not sorting files within a folder can make quite a difference if you are dealing with large volumes.

I hope that makes it clearer. Basically, if you want to be sure the files are sorted, use the $FLTAR_SORT parameter or you may find that they are not!

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

Share this post


Link to post
Share on other sites

#3 ·  Posted

Understood Melba, thanks. A question about your choice to use the DualPivotSort, without any offence is just a question, are you sure is the fastest for sort the file? As you have say "sorting is one of the slowest parts of the UDF code"

Searching around there are many, many algorithms. I have tested a couple just for see:

11510 files no folder, my result:

WITHOUT SORT :1332.39364220872
WITH SORT DUALPIVOT:2043.70862777252
WITH SORT CLIB:1465.54492260888
WITH SORT VBS:1546.84827261565

Also the VBS one is fastest then DualPivotSort.

#include <Array.au3>
#include "FileListToArrayRec.au3"

Sleep(1000)

$begin = TimerInit()
$aArray = _FileListToArrayRec("C:\Windows", "*", 1, 1, 0, 2)
$dif = TimerDiff($begin)
ConsoleWrite("WITHOUT SORT :" & $dif & @CRLF)

Sleep(1000)

$begin = TimerInit()
$aArray = _FileListToArrayRec("C:\Windows", "*", 1, 1, 1, 2)
$dif = TimerDiff($begin)
ConsoleWrite("WITH SORT DUALPIVOT:" & $dif & @CRLF)
;~ _ArrayDisplay($aArray)

Sleep(1000)

$begin = TimerInit()
$aArray = _FileListToArrayRec("C:\Windows", "*", 1, 1, 0, 2)
_ArraySortClib($aArray) ; by Siao
$dif = TimerDiff($begin)
ConsoleWrite("WITH SORT CLIB:" & $dif & @CRLF)
_ArrayDisplay($aArray)

Sleep(1000)

$begin = TimerInit()
$aArray = _FileListToArrayRec("C:\Windows", "*", 1, 1, 0, 2)
_JSort1D($aArray)
$dif = TimerDiff($begin)
ConsoleWrite("WITH SORT VBS:" & $dif & @CRLF)
_ArrayDisplay($aArray)

Func _ArraySortClib(ByRef $Array, $iMode = 1, $fDescend = False, $iStart = 0, $iEnd = 0, $iColumn = 0, $iStrMax = 4095)
    If @AutoItX64 Then Return SetError(64, 0, 0)
    Local $iArrayDims = UBound($Array, 0)
    If @error Or $iArrayDims > 2 Then Return SetError(1, 0, 0)
    Local $iArraySize = UBound($Array, 1), $iColumnMax = UBound($Array, 2)
    If $iArraySize < 2 Then Return SetError(1, 0, 0)
    If $iEnd < 1 Or $iEnd > $iArraySize - 1 Then $iEnd = $iArraySize - 1
    If ($iEnd - $iStart < 2) Then Return SetError(2, 0, 0)
    If $iArrayDims = 2 And ($iColumnMax - $iColumn < 0) Then Return SetError(2, 0, 0)
    If $iStrMax < 1 Then Return SetError(2, 0, 0)
    Local $i, $j, $iCount = $iEnd - $iStart + 1, $fNumeric, $aRet, $sZero = ChrW(0), $sStrCmp, $sBufType = 'byte[', $hDll = DllOpen('msvcrt.dll'), $tSource, $tIndex, $tFloatCmp, $tCmpWrap = DllStructCreate('byte[64]'), $tEnumProc = DllStructCreate('byte[64]')
    If $hDll = -1 Then Return SetError(3, 0, 0)
    ;; initialize compare proc
    Switch $iMode
        Case 0
            $fNumeric = True
            $tFloatCmp = DllStructCreate('byte[36]')
            DllStructSetData($tFloatCmp, 1, '0x8B4C24048B542408DD01DC1ADFE0F6C440750D80E441740433C048C333C040C333C0C3')
            DllStructSetData($tCmpWrap, 1, '0xBA' & Hex(Binary(DllStructGetPtr($tFloatCmp)), 8) & '8B4424088B4C2404FF30FF31FFD283C408C3')
            DllStructSetData($tEnumProc, 1, '0x8B7424048B7C24088B4C240C8B442410893789470483C60883C708404975F1C21000')
        Case 1, 2
            $sStrCmp = "_strcmpi" ;case insensitive
            If $iMode = 2 Then $sStrCmp = "strcmp" ;case sensitive
            $aRet = DllCall('kernel32.dll', 'ptr', 'GetModuleHandle', 'str', 'msvcrt.dll')
            $aRet = DllCall('kernel32.dll', 'ptr', 'GetProcAddress', 'ptr', $aRet[0], 'str', $sStrCmp)
            If $aRet[0] = 0 Then Return SetError(3, 0, 0 * DllClose($hDll))
            DllStructSetData($tCmpWrap, 1, '0xBA' & Hex(Binary($aRet[0]), 8) & '8B4424088B4C2404FF30FF31FFD283C408C3')
            DllStructSetData($tEnumProc, 1, '0x8B7424048B7C24088B4C240C8B542410893789570483C7088A064684C075F9424975EDC21000')
        Case 3
            $sBufType = 'wchar['
            $aRet = DllCall('kernel32.dll', 'ptr', 'GetModuleHandle', 'str', 'kernel32.dll')
            $aRet = DllCall('kernel32.dll', 'ptr', 'GetProcAddress', 'ptr', $aRet[0], 'str', 'CompareStringW')
            If $aRet[0] = 0 Then Return SetError(3, 0, 0 * DllClose($hDll))
            DllStructSetData($tCmpWrap, 1, '0xBA' & Hex(Binary($aRet[0]), 8) & '8B4424088B4C24046AFFFF306AFFFF3168000000006800040000FFD283E802C3')
            DllStructSetData($tEnumProc, 1, '0x8B7424048B7C24088B4C240C8B542410893789570483C7080FB70683C60285C075F6424975EAC21000')
        Case Else
            Return SetError(2, 0, 0)
    EndSwitch
    ;; write data to memory
    If $fNumeric Then
        $tSource = DllStructCreate('double[' & $iCount & ']')
        If $iArrayDims = 1 Then
            For $i = 1 To $iCount
                DllStructSetData($tSource, 1, $Array[$iStart + $i - 1], $i)
            Next
        Else
            For $i = 1 To $iCount
                DllStructSetData($tSource, 1, $Array[$iStart + $i - 1][$iColumn], $i)
            Next
        EndIf
    Else
        Local $sMem = ""
        If $iArrayDims = 1 Then
            For $i = $iStart To $iEnd
                $sMem &= StringLeft($Array[$i], $iStrMax) & $sZero
            Next
        Else
            For $i = $iStart To $iEnd
                $sMem &= StringLeft($Array[$i][$iColumn], $iStrMax) & $sZero
            Next
        EndIf
        $tSource = DllStructCreate($sBufType & StringLen($sMem) + 1 & ']')
        DllStructSetData($tSource, 1, $sMem)
        $sMem = ""
    EndIf
    ;; index data
    $tIndex = DllStructCreate('int[' & $iCount * 2 & ']')
    DllCall('user32.dll', 'uint', 'CallWindowProc', 'ptr', DllStructGetPtr($tEnumProc), 'ptr', DllStructGetPtr($tSource), 'ptr', DllStructGetPtr($tIndex), 'int', $iCount, 'int', $iStart)
    ;; sort
    DllCall($hDll, 'none:cdecl', 'qsort', 'ptr', DllStructGetPtr($tIndex), 'int', $iCount, 'int', 8, 'ptr', DllStructGetPtr($tCmpWrap))
    DllClose($hDll)
    ;; rearrange the array by sorted index
    Local $aTmp = $Array, $iRef
    If $iArrayDims = 1 Then ; 1D
        If $fDescend Then
            For $i = 0 To $iCount - 1
                $iRef = DllStructGetData($tIndex, 1, $i * 2 + 2)
                $Array[$iEnd - $i] = $aTmp[$iRef]
            Next
        Else ; ascending
            For $i = $iStart To $iEnd
                $iRef = DllStructGetData($tIndex, 1, ($i - $iStart) * 2 + 2)
                $Array[$i] = $aTmp[$iRef]
            Next
        EndIf
    Else ; 2D
        If $fDescend Then
            For $i = 0 To $iCount - 1
                $iRef = DllStructGetData($tIndex, 1, $i * 2 + 2)
                For $j = 0 To $iColumnMax - 1
                    $Array[$iEnd - $i][$j] = $aTmp[$iRef][$j]
                Next
            Next
        Else ; ascending
            For $i = $iStart To $iEnd
                $iRef = DllStructGetData($tIndex, 1, ($i - $iStart) * 2 + 2)
                For $j = 0 To $iColumnMax - 1
                    $Array[$i][$j] = $aTmp[$iRef][$j]
                Next
            Next
        EndIf
    EndIf
    Return 1
EndFunc   ;==>_ArraySortClib

Func _JSort1D(ByRef $arEither2D1D)
    Local $code = '   function SortArray(arrArray) {'
    $code &= @LF & '            return arrArray.toArray().sort().join("|");'
    $code &= @LF & '        }'
    Local $jvs = ObjCreate("ScriptControl")
    $jvs.language = "jscript"
    $jvs.Timeout = -1
    $jvs.addcode ($code)
    Local $arEither2D1DSt = $jvs.Run("SortArray", $arEither2D1D)
    $arEither2D1D = StringSplit($arEither2D1DSt, "|")
    $jvs = ""
EndFunc   ;==>_JSort1D

I know now _FileListToArrayRec is part of autoit but i'd like to know your opinion about.

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

MyEarth,

There have been many threads about the sorting algorithms of this function in the past and I have absolutely no intention of entering yet another pointless discussion. The arrays returned by your CLIB and VBS sort algorithms do NOT provide the same array as that provided by the current UDF - the files are certainly sorted alphabetically, but this almost certainly does NOT guarantee that they are in the order as would be returned by, say, Explorer. In particular, just look at the position of those files located in the root path - scattered throughout the array rather than grouped at the beginning before the first level subfolders.

If you want to provide a faster, better UDF then please do so - but make sure it performs to the same level as the current one before asking for it to be adopted.

M23

Edit: See this post for an explanation of why file tree sorting is more complex that you seem to think.

Edited by Melba23

Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now