Modify

Opened 14 years ago

Closed 14 years ago

Last modified 13 years ago

#1499 closed Feature Request (Fixed)

Performance increase to internal __FTP_ListToArray()

Reported by: Beege Owned by: Jpm
Milestone: 3.3.7.0 Component: Standard UDFs
Version: Severity: None
Keywords: _FTPEx Cc:

Description

In the current internal function _FTP_ListToArray, every time a new item is added to the array, the array is Resized by only +1, which can be a major performance hit when getting directory listing of large numbers. If we instead Resize by ubound*2, and then trim unused slots at the end of function, we can avoid a lot or resizing. I learned this method from the original _filelisttoarray() func.

I have included a example with changes to the function and a demonstration of the new vs. old. I have put a set of XXXXXXXXXXXXXXXXXXXXXXXXXXX around any lines that I changed or added. Not that many changes were needed since the function already kept track of array size in index $array[0]. Thank you for your consideration and hard work in creating this wonderful language.:)

Attachments (1)

_FTP_Listtoarray.au3 (7.9 KB) - added by Beege 14 years ago.

Download all attachments as: .zip

Change History (11)

Changed 14 years ago by Beege

comment:1 Changed 14 years ago by TicketCleanup

  • Version 3.3.4.0 deleted

Automatic ticket cleanup.

comment:2 Changed 14 years ago by Jpm

  • Milestone set to 3.3.7.0
  • Owner changed from Gary to Jpm
  • Resolution set to Fixed
  • Status changed from new to closed

Fixed by revision [5802] in version: 3.3.7.0

comment:3 Changed 14 years ago by Spiff59

His version is not going to work as coded.
If $Return_Type is '1' (Directories only) or '2' (Files only) he never bothers to check the value of $Arraycount for the Redim() at the end of the routine. As coded, it simply defaults to returning a one-dimension array. Therefore, with $Return_Type set to '1' or '2', the proposed FTP_ListToArray() internal routine would only work correctly when called from _FTP_ListToArray(), and would fail when called by either _FTP_ListToArray2D() or _FTP_ListToArrayEx().

Also, why all the extra code to build and maintain separate arrays for files and directories, when they are ALWAYS combined at the end?

I'd suggest the following, which fixes the Redim() bug, eliminates a bunch of useless code, and is faster yet:

Func __FTP_ListToArray($l_FTPSession, $Return_Type = 0, $l_Flags = 0, $bFmt = 1, $ArrayCount = 6, $l_Context = 0)
	Local $tWIN32_FIND_DATA, $tFileTime, $IsDir, $callFindNext
	Local $Index = 0
	If $ArrayCount = 1 Then ; filename only
		Local $Array[1]
	Else
		Local $Array[1][$ArrayCount]
	EndIf

	If $Return_Type < 0 Or $Return_Type > 2 Then Return SetError(1, 0, $Array)

;~ Global Const $tagWIN32_FIND_DATA = "DWORD dwFileAttributes; dword ftCreationTime[2]; dword ftLastAccessTime[2]; dword ftLastWriteTime[2]; DWORD nFileSizeHigh; DWORD nFileSizeLow; dword dwReserved0; dword dwReserved1; WCHAR cFileName[260]; WCHAR cAlternateFileName[14];"
	$tWIN32_FIND_DATA = DllStructCreate($tagWIN32_FIND_DATA)
	Local $callFindFirst = DllCall($__ghWinInet_FTP, 'handle', 'FtpFindFirstFileW', 'handle', $l_FTPSession, 'wstr', "", 'ptr', DllStructGetPtr($tWIN32_FIND_DATA), 'dword', $l_Flags, 'dword_ptr', $l_Context)
	If @error Or Not $callFindFirst[0] Then Return SetError(1, _WinAPI_GetLastError(), 0)

	Do
		$IsDir = BitAND(DllStructGetData($tWIN32_FIND_DATA, "dwFileAttributes"), $FILE_ATTRIBUTE_DIRECTORY) = $FILE_ATTRIBUTE_DIRECTORY
		If ($IsDir And $Return_Type <> 2) Or (Not $IsDir And $Return_Type <> 1) Then
			$Index += 1
			If $ArrayCount = 1 Then
				If UBound($Array) < $Index+1 Then ReDim $Array[$Index*2]
				$Array[$Index] = DllStructGetData($tWIN32_FIND_DATA, "cFileName")
			Else
				If UBound($Array) < $Index+1 Then ReDim $Array[$Index*2][$ArrayCount]
				$Array[$Index][0] = DllStructGetData($tWIN32_FIND_DATA, "cFileName")

				$Array[$Index][1] = _WinAPI_MakeQWord(DllStructGetData($tWIN32_FIND_DATA, "nFileSizeLow"), DllStructGetData($tWIN32_FIND_DATA, "nFileSizeHigh"))
				If $ArrayCount = 6 Then
					$Array[$Index][2] = DllStructGetData($tWIN32_FIND_DATA, "dwFileAttributes")
					$tFileTime = DllStructCreate($tagFILETIME, DllStructGetPtr($tWIN32_FIND_DATA, "ftLastWriteTime"))
					$Array[$Index][3] = _Date_Time_FileTimeToStr( $tFileTime ,$bFmt)
					$tFileTime = DllStructCreate($tagFILETIME, DllStructGetPtr($tWIN32_FIND_DATA, "ftCreationTime"))
					$Array[$Index][4] = _Date_Time_FileTimeToStr( $tFileTime ,$bFmt)
					$tFileTime = DllStructCreate($tagFILETIME, DllStructGetPtr($tWIN32_FIND_DATA, "ftLastAccessTime"))
					$Array[$Index][5] = _Date_Time_FileTimeToStr( $tFileTime ,$bFmt)
				EndIf
			EndIf
		EndIf

		$callFindNext = DllCall($__ghWinInet_FTP, 'bool', 'InternetFindNextFileW', 'handle', $callFindFirst[0], 'ptr', DllStructGetPtr($tWIN32_FIND_DATA))
		If @error Then Return SetError(2, _WinAPI_GetLastError(), 0)
	Until Not $callFindNext[0]

	DllCall($__ghWinInet_FTP, 'bool', 'InternetCloseHandle', 'handle', $callFindFirst[0]) ; No need to test @error.

	If $ArrayCount = 1 Then ; filename only
		$Array[0] = $Index
		ReDim $Array[$Index+1]
	Else
		$Array[0][0] = $Index
		ReDim $Array[$Index + 1][$ArrayCount]
	EndIf
	Return $Array
EndFunc   ;==>__FTP_ListToArray

comment:4 Changed 14 years ago by Spiff59

Actually, this with the test "If $Return_Type - $IsDir <> 1 Then" to determine what is included in the returned array is likely quicker than the compound statement in the last example, and I'd think maintaining a simple $ArraySize variable would be cleaner than calling Ubound() for each iteration of the main loop. So, here's a final cleaned-up version:

Func __FTP_ListToArray($l_FTPSession, $Return_Type = 0, $l_Flags = 0, $bFmt = 1, $ArrayCount = 6, $l_Context = 0)
	Local $tWIN32_FIND_DATA, $tFileTime, $IsDir, $callFindNext
	Local $Index = 0, $ArraySize = 16
	If $ArrayCount = 1 Then ; filename only
		Local $Array[16]
	Else
		Local $Array[16][$ArrayCount]
	EndIf

	If $Return_Type < 0 Or $Return_Type > 2 Then Return SetError(1, 0, $Array)

;~ Global Const $tagWIN32_FIND_DATA = "DWORD dwFileAttributes; dword ftCreationTime[2]; dword ftLastAccessTime[2]; dword ftLastWriteTime[2]; DWORD nFileSizeHigh; DWORD nFileSizeLow; dword dwReserved0; dword dwReserved1; WCHAR cFileName[260]; WCHAR cAlternateFileName[14];"
	$tWIN32_FIND_DATA = DllStructCreate($tagWIN32_FIND_DATA)
	Local $callFindFirst = DllCall($__ghWinInet_FTP, 'handle', 'FtpFindFirstFileW', 'handle', $l_FTPSession, 'wstr', "", 'ptr', DllStructGetPtr($tWIN32_FIND_DATA), 'dword', $l_Flags, 'dword_ptr', $l_Context)
	If @error Or Not $callFindFirst[0] Then Return SetError(1, _WinAPI_GetLastError(), 0)

	Do
		$IsDir = BitAND(DllStructGetData($tWIN32_FIND_DATA, "dwFileAttributes"), $FILE_ATTRIBUTE_DIRECTORY) = $FILE_ATTRIBUTE_DIRECTORY
		If $Return_Type - $IsDir <> 1 Then
			$Index += 1
			If $ArrayCount = 1 Then
				If $Index = $ArraySize Then
					$ArraySize *= 2
					ReDim $Array[$ArraySize]
				EndIf
				$Array[$Index] = DllStructGetData($tWIN32_FIND_DATA, "cFileName")
			Else
				If $Index = $ArraySize Then
					$ArraySize *= 2
					ReDim $Array[$ArraySize][$ArrayCount]
				EndIf
				$Array[$Index][0] = DllStructGetData($tWIN32_FIND_DATA, "cFileName")
				$Array[$Index][1] = _WinAPI_MakeQWord(DllStructGetData($tWIN32_FIND_DATA, "nFileSizeLow"), DllStructGetData($tWIN32_FIND_DATA, "nFileSizeHigh"))
				If $ArrayCount = 6 Then
					$Array[$Index][2] = DllStructGetData($tWIN32_FIND_DATA, "dwFileAttributes")
					$tFileTime = DllStructCreate($tagFILETIME, DllStructGetPtr($tWIN32_FIND_DATA, "ftLastWriteTime"))
					$Array[$Index][3] = _Date_Time_FileTimeToStr( $tFileTime ,$bFmt)
					$tFileTime = DllStructCreate($tagFILETIME, DllStructGetPtr($tWIN32_FIND_DATA, "ftCreationTime"))
					$Array[$Index][4] = _Date_Time_FileTimeToStr( $tFileTime ,$bFmt)
					$tFileTime = DllStructCreate($tagFILETIME, DllStructGetPtr($tWIN32_FIND_DATA, "ftLastAccessTime"))
					$Array[$Index][5] = _Date_Time_FileTimeToStr( $tFileTime ,$bFmt)
				EndIf
			EndIf
		EndIf

		$callFindNext = DllCall($__ghWinInet_FTP, 'bool', 'InternetFindNextFileW', 'handle', $callFindFirst[0], 'ptr', DllStructGetPtr($tWIN32_FIND_DATA))
		If @error Then Return SetError(2, _WinAPI_GetLastError(), 0)
	Until Not $callFindNext[0]

	DllCall($__ghWinInet_FTP, 'bool', 'InternetCloseHandle', 'handle', $callFindFirst[0]) ; No need to test @error.

	If $ArrayCount = 1 Then ; filename only
		$Array[0] = $Index
		ReDim $Array[$Index+1]
	Else
		$Array[0][0] = $Index
		ReDim $Array[$Index + 1][$ArrayCount]
	EndIf
	Return $Array
EndFunc   ;==>__FTP_ListToArray

comment:5 Changed 14 years ago by anonymous

Although it should be:

	Local $Index = 0, $ArraySize = 16
	If $ArrayCount = 1 Then ; filename only
		Local $Array[$ArraySize]
	Else
		Local $Array[$ArraySize]$ArrayCount]
	EndIf

I didn't need to leave '16' hard-coded in the array definitions...

comment:6 Changed 14 years ago by Beege

Spiff59 is right about 2D arrays not working when user wants only files or only Dirs returned. The last block of code in the version I submitted should be:

	Switch $Return_Type
		Case 0
			If $ArrayCount = 1 Then
				ReDim $DirectoryArray[$DirectoryArray[0] + $FileArray[0] + 1]
				For $i = 1 To $FileIndex
					$DirectoryArray[$DirectoryArray[0] + $i] = $FileArray[$i]
				Next
				$DirectoryArray[0] += $FileArray[0]
			Else
				ReDim $DirectoryArray[$DirectoryArray[0][0] + $FileArray[0][0] + 1][$ArrayCount]
				For $i = 1 To $FileIndex
					For $j = 0 To $ArrayCount-1
					$DirectoryArray[$DirectoryArray[0][0] + $i][$j] = $FileArray[$i][$j]
					Next
				Next
				$DirectoryArray[0][0] += $FileArray[0][0]
			EndIf
			Return $DirectoryArray
		Case 1
			;XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
			If $ArrayCount = 1 Then
				ReDim $DirectoryArray[$DirectoryIndex+1]
			Else
				ReDim $DirectoryArray[$DirectoryIndex+1][$ArrayCount]
			EndIf
			;XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
			Return $DirectoryArray
		Case 2
			;XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
			If $ArrayCount = 1 Then
				ReDim $FileArray[$FileIndex +1]
			Else
				ReDim $FileArray[$FileIndex +1][$ArrayCount]
			EndIf
			;XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
			Return $FileArray
	EndSwitch

comment:7 Changed 14 years ago by Jpm

so What is the best the Spiff59 or the modified Beege?

comment:8 Changed 14 years ago by Spiff59

Not (unnecessarily) building two seperate arrays, which then have to be combined in the end, saves a ton of code, and makes for a more straight-forward, easily-maintained function.

It does however affect the order in which hits are returned in the output array.

The original function would place all folders first, followed by all files.
My (smaller, faster, cleaner) version returns a folder, then any files within that folder, then the next folder, and it's files, and so on...

There is no documentation, or requirement, regarding the order in which the returned array is sorted. There any advantage, nor disadvantage, to either sequence. So, I (of course), vote for the simpler version of the routine.

Maybe a fresh set of eyes will pipe in on this... ?

comment:9 Changed 14 years ago by Spiff59

Actually, maybe there is an advantage to the folder/files, folder/files, folder/files sequence of the newer, smaeller, version. It might allow one to use the returned array sequence to determine a folders contents, without having to compare the actual pathnames (although subfolders would complicate that coding). In any case, this array sequencing is no less useful than that of the existing routine.

comment:10 Changed 13 years ago by Beege

Sorry to be the one who caused a bug, but the mistake pointed out in my code submitted above never got corrected before being applied to the beta. Also my opinion on which is best (Spiff59 or mine), I personally would like to stick to the one I submitted. Not because its mine but because it leaves the return format of the array same. Spiff59s function is a bit quicker, but like he said it changes the return format which could possible cause users scripts to break or act/look differently. I would only push for a change like that if the performance increase was much higher.

Guidelines for posting comments:

  • You cannot re-open a ticket but you may still leave a comment if you have additional information to add.
  • In-depth discussions should take place on the forum.

For more information see the full version of the ticket guidelines here.

Add Comment

Modify Ticket

Action
as closed The owner will remain Jpm.
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.