Sign in to follow this  
Followers 0
orbs

unicode functions for filesystem operations?

18 posts in this topic

i was wondering about the status of unicode support for filesystem functions, to enable support for long paths (>256 characters).

so i tested.

attached a simple script -  it attempts to create a folder with two nested subfolders (each 200 characters long) in 4 methods:

DirCreate($path)

DirCreate("?"&$path)

DllCall CreateDirectoryA with $path (default ANSI)

DllCall CreateDirectoryW with "?" & $path (Unicode)

unlike DirCreate(), CreateDirectory from kernel32.dll has a limitation: the parent directory must already exist. now, since AutoIt has already taken care of this limitation - and if i assume correctly that it is using the ANSI version of CreateDirectory function from kernel32.dll - then it should be fairly simple to convert to unicode... no?

maybe just need to add the unicode prefix to the path if necessary? (yes i'm aware there's different prefix for UNC, and it's not needed for root of drives, but still...)

here's the test script, i named the functions according to their results:

Global Const $sTooLongPathRoot='C:\folder'
Global Const $sTooLongPath='123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_'
; the above is a string of 200 characters.

; ** uncomment only the one function you want to test!
;CreateRootAnd1Sub()
;CreateRootAnd2SubsTrancatedToTotal255()
;CreateRootAnd1SubWithANSI()
;CreateAllWithUnicode()
; ** don't forget to delete the test folder before next run!

Func CreateRootAnd1Sub()
    ; this is the native DirCreate
    DirCreate($sTooLongPathRoot&'\'&$sTooLongPath&'\'&$sTooLongPath)
EndFunc

Func CreateRootAnd2SubsTrancatedToTotal255()
    ; this is the native DirCreate with the unicode prefix
    DirCreate('\\?\'&$sTooLongPathRoot&'\'&$sTooLongPath&'\'&$sTooLongPath)
EndFunc

Func CreateRootAnd1SubWithANSI()
    ; this is direct call to the default ANSI version of CreateDirectory = identical result as DirCreate (first function)
    $aRet=DllCall('kernel32.dll','bool','CreateDirectoryA','str',$sTooLongPathRoot,'struct*',0)
    $aRet=DllCall('kernel32.dll','bool','CreateDirectoryA','str',$sTooLongPathRoot&'\'&$sTooLongPath,'struct*',0)
    $aRet=DllCall('kernel32.dll','bool','CreateDirectoryA','str',$sTooLongPathRoot&'\'&$sTooLongPath&'\'&$sTooLongPath,'struct*',0)
EndFunc

Func CreateAllWithUnicode()
    ; this is direct call to the unicode version of CreateDirectory in kernel32.dll with the unicode prefix
    $aRet=DllCall('kernel32.dll','bool','CreateDirectoryW','wstr','\\?\'&$sTooLongPathRoot,'struct*',0)
    $aRet=DllCall('kernel32.dll','bool','CreateDirectoryW','wstr','\\?\'&$sTooLongPathRoot&'\'&$sTooLongPath,'struct*',0)
    $aRet=DllCall('kernel32.dll','bool','CreateDirectoryW','wstr','\\?\'&$sTooLongPathRoot&'\'&$sTooLongPath&'\'&$sTooLongPath,'struct*',0)
EndFunc

Share this post


Link to post
Share on other sites



maybe just need to add the unicode prefix to the path if necessary? (yes i'm aware there's different prefix for UNC, and it's not needed for root of drives, but still...)

 

OK, so here's my take on this: _FU_DirCreate, which intends to behave exactly like the native DirCreate (or so i think) - but with Full Unicode support - hence the _FU_ prefix. no, it's not what you thought ;)  ).

(note that there are 2 auxiliary functions to set the unicode prefix).

Global Const $sTooLongPathRoot='C:\folder'
Global Const $sTooLongPath='123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_'
; the above is a string of 200 characters.

_FU_DirCreate($sTooLongPathRoot&'\'&$sTooLongPath&'\'&$sTooLongPath)

Func _FU_DirCreate($sPath)
    Local $nFirstElement=7  ; assuming local drive
    $sPath=_String_SetUnicodePrefix($sPath)
    If @extended=2 Then $nFirstElement=_String_GetNthOccuranceOfCharInString($sPath,'\',6)  ; assumption incorrect, it's a UNC path
    Local $i
    Local $aRet
    For $i=$nFirstElement To StringLen($sPath)
        If StringMid($sPath,$i,1)='\' Then
            $aRet=DllCall('kernel32.dll','bool','CreateDirectoryW','wstr',StringLeft($sPath,$i),'struct*',0)
            If @error Then Return SetError(1,@error,0)
        EndIf
    Next
    $aRet=DllCall('kernel32.dll','bool','CreateDirectoryW','wstr',$sPath,'struct*',0)
    If @error Then Return SetError(1,@error,0)
    Return $aRet[0]
EndFunc

Func _String_SetUnicodePrefix($sTarget)
    If (StringLeft($sTarget,4)='\\?\') Then Return $sTarget ; $sTarget already has the unicode prefix
    If (StringLen($sTarget)=2 And StringRight($sTarget,1)=':') Or (StringLen($sTarget)=3 And StringRight($sTarget,2)=':\') Then Return $sTarget ; $sTarget is a drive root
    If StringLeft($sTarget,2)<>'\\' And StringMid($sTarget,2,1)<>':'  Then Return $sTarget ; $sTarget is a relative path
    If StringLeft($sTarget,2)='\\' Then
        Return SetExtended(2,'\\?\UNC\'&StringTrimLeft($sTarget,2))
    Else
        Return SetExtended(1,'\\?\'&$sTarget)
    EndIf
EndFunc

Func _String_GetNthOccuranceOfCharInString($sString,$sChar,$nOccurance)
    Local $nFound=0
    Local $i
    For $i=1 To StringLen($sString)
        If StringMid($sString,$i,1)=$sChar Then
            $nFound+=1
            If $nFound=$nOccurance Then Return $i
        EndIf
    Next
    Return 0
EndFunc

so now i have DirCreate will full unicode support for too long paths. but since AutoIt already have the major part (create the parent directory structure) already implemented in DirCreate, it may be just needed to change DirCreate to call the unicode version of CreateDirectoryW, and add the unicode prefix.

Share this post


Link to post
Share on other sites

The 'unicode' prefix you talk about is really a Windows 'long filename' prefix, which AutoIt doesn't support at all.

Share this post


Link to post
Share on other sites

The 'unicode' prefix you talk about is really a Windows 'long filename' prefix...

 

terminology. this prefix is used in sole conjunction with the unicode versions of the internal functions of kernel32.dll (and perhaps others), hence referred to as "the unicode prefix".

 

...which AutoIt doesn't support at all.

 

that's just my point - if even my humble self could write a DirCreate alternative to support long file names, then for the AutoIt developers (or advanced users) it should be a breeze, and a considerable added value to the entire AutoIt ecosystem.

Share this post


Link to post
Share on other sites

That is true, but AutoIt checks validity of the user input which results in rejection of a path with the prefix. If I remember correctly it was Jon who fixed one particular and related bug by adding this check.


♡♡♡

.

eMyvnE

Share this post


Link to post
Share on other sites

That is true, but AutoIt checks validity of the user input which results in rejection of a path with the prefix. If I remember correctly it was Jon who fixed one particular and related bug by adding this check.

 

as the script in the first post demonstrates (2nd Func), at least for DirCreate(), the path with the unicode prefix is not rejected (and actually delivers *slightly* better result than without it). i made some additional tests with other functions, and although with yet poor results, the prefix is not by itself rejected.

although i'm all for bug squashing - if that was indeed the case - perhaps a reconsideration is due.

Share this post


Link to post
Share on other sites

well, until too long paths are natively supported, i did to the rest of the functions what i did to DirCreate() above, and the result is this UDF:

'?do=embed' frameborder='0' data-embedContent>>

Except that long file names (greater than 256 characters) aren't supported in Windows, so why would you expect AutoIt to support it? It's a Windows limitation, not an AutoIt one. Use Unicode file names and paths (?c:someobscenelylongfilename") if you have longer lengths.


If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Share this post


Link to post
Share on other sites

...long file names (greater than 256 characters) aren't supported in Windows...

 

if you mean "not supported" by the front-end Windows Explorer, then you are correct; however these do exist. not too common, but common enough. if you mean "not supported" by the Windows kernel, that is incorrect. practically all functions in kernel32.dll offer support for too long paths.

... why would you expect AutoIt to support it? ...

 

 

why not? relatively simple, as i've demonstrated in post #2 here. and as mentioned, it's a good added value.

It's a Windows limitation...

 

not accurate; Windows kernel does not impose this limitation, it's Windows front-end elements that do. the real limitation is that Windows may allow the front-end to create too long paths, but not to access them later on. if it was not possible at all to create too long paths, we wouldn't have any problem at all. but we do face this problem. are you going to wait for Microsoft to solve it?  ;)

 ...not an AutoIt one. 

 

yes, it is an AutoIt limitation. although AutoIt can't fix the world, it can enhance it's own features. and what Windows kernel supports, no reason AutoIt shouldn't.

Unicode file names and paths (?c:someobscenelylongfilename") if you have longer lengths.

 

and that's exactly what i did. still i feel it would be handled better by AutoIt developers, but that's the best i can offer.

Share this post


Link to post
Share on other sites

Windows does not let you access a path longer than approximately 260 characters unless you use a Unicode file path, that is a limit imposed by the API from Microsoft, take it up with them.

http://msdn.microsoft.com/en-us/library/aa365247.aspx


If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Share this post


Link to post
Share on other sites

 take it up with them.

 

up >taken:rambo:

and i'm sure it isn't the first case where AutoIt takes on a Microsoft inanity...  ;)

Share this post


Link to post
Share on other sites

Wraithdu's _LargeFileCopy handles UNC paths, I had similar issues which he fully corrected in his UDF around 2.5 years ago. And FYI I do NOT use a Unicode OS as I am in USA but all UNC functions work great.

'?do=embed' frameborder='0' data-embedContent>>

This does not help with deletions by the way, just copying, but it would save you a lot of time to just use his functions.

Ian


My projects:

  • IP Scanner - Multi-threaded ping tool to scan your available networks for used and available IP addresses, shows ping times, resolves IPs in to host names, and allows individual IPs to be pinged.
  • INFSniff - Great technicians tool - a tool which scans DriverPacks archives for INF files and parses out the HWIDs to a database file, and rapidly scans the local machine's HWIDs, searches the database for matches, and installs them.
  • PPK3 (Persistent Process Killer V3) - Another for the techs - suppress running processes that you need to keep away, helpful when fighting spyware/viruses.
  • Sync Tool - Folder sync tool with lots of real time information and several checking methods.
  • USMT Front End - Front End for Microsoft's User State Migration Tool, including all files needed for USMT 3.01 and 4.01, 32 bit and 64 bit versions.
  • Audit Tool - Computer audit tool to gather vital hardware, Windows, and Office information for IT managers and field techs. Capabilities include creating a customized site agent.
  • CSV Viewer - Displays CSV files with automatic column sizing and font selection. Lines can also be copied to the clipboard for data extraction.
  • MyDirStat - Lists number and size of files on a drive or specified path, allows for deletion within the app.
  • 2048 Game - My version of 2048, fun tile game.
  • Juice Lab - Ecigarette liquid making calculator.
  • Data Protector - Secure notes to save sensitive information.
  • VHD Footer - Add a footer to a forensic hard drive image to allow it to be mounted or used as a virtual machine hard drive.
  • Find in File - Searches files containing a specified phrase.

Share this post


Link to post
Share on other sites

 

And FYI I do NOT use a Unicode OS as I am in USA

Yes you do, whatever Microsoft planet you live on!

Windows has had Unicode support from Win 98 SE. I confess W98SE "support" was way below par but it quickly got much better over time.

1 person likes this

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

Many users of my own humble AutoIt offerings have requested long path support. It would be great to see AutoIt support this natively.

In the meantime, thank you orbs!

;o) Cor


nothing is foolproof to the sufficiently talented fool..

Share this post


Link to post
Share on other sites

During the last release I actually went through all the path functions and made sure that they didn't truncate paths automatically, so that if they ended up getting passed to kernel functions that support unicode long paths they should work.  A few AutoIt functions though end up using a shell32.dll function and no shell32 functions support long paths  (CopyDir, MoveDir, Recycle, Open/Save dialogs, etc.) and they couldn't be made to work without a full rewrite (too low a priority).

CreateDir looks to be using CreateDirectoryW so in theory I would have expected it to work already, but I see that it does some checking for backslashes so it may be making incorrect assumptions.  We probably need to tag functions that support these paths, but as I said it's super super low priority.  Feel free to make bug reports on the functions that you know to be broken so I can track it though.

Share this post


Link to post
Share on other sites

Having a quick look and it looks like most of the functions are going wrong because they use wsplitpath() to sanitize  input which messes up on ? style paths.  Well, actually wsplitpath doesn't mess up on short ? paths, but it does on long ones. I reckon it would be possible to replace all uses of wsplitpath with a custom one that doesn't barf in these cases and that would probably fix a number of functions at the same time.

Share this post


Link to post
Share on other sites

Great news!

Orbs' UDF has a nice range of functions that work almost as-is (need only UNC-style paths supplied), as well as a useful list of those which don't:

'?do=embed' frameborder='0' data-embedContent>>

This could be a useful reference for the tagging.

;o) Cor


nothing is foolproof to the sufficiently talented fool..

Share this post


Link to post
Share on other sites
Guest
This topic is now closed to further replies.
Sign in to follow this  
Followers 0