Jump to content

Retrieve Unicode results from Console


KaFu
 Share

Recommended Posts

Hiho,

following this discussion I learned from mfecteau & Ascend4nt that comspec output can be forced to be in unicode with the /u switch. Reading some docs about the dir command I saw the possibility to re-route the console-output to a pipe... which lead me to this discussion post by ProgAndy (original by Paul Campbell (PaulIA) / PsaltyDS). Just threw it all together and did some minor modifications (wchar). Maybe I'll give it some benchmarking later on, in the meantime I'll just assume it's fast as hell ;)...

This technique enables you to retrieve any unicode results from the console, the appended _FileListToArrayEx() using the DIR command is just one possible implementation example.

#include <NamedPipes.au3>
#include <WinAPI.au3>

; Example Start
#include <array.au3> ; for example only
$timer = TimerInit()
$aFiles = _FileListToArrayEx(@ProgramFilesDir, "*.*", 0, -1, True, True)
ConsoleWrite(TimerDiff($timer) & @CRLF & $aFiles[0] & " files read" & @CRLF)
_ArrayDisplay($aFiles)
; Example End

;===============================================================================
;
; Description:    lists all or preferred files and or folders in a specified path (Similar to using Dir with the /B Switch)
; Syntax:          _FileListToArrayEx($sPath, $sFilter = '*.*', $iFlag = 0, $sExclude = '')
; Parameter(s):     $sPath = Path to generate filelist for
;                   $sFilter = The filter to use. Search the Autoit3 manual for the word "WildCards" For details, support now for multiple searches
;                           Example *.exe; *.txt will find all .exe and .txt files
;                  $iFlag = determines weather to return file or folders or both.
;                   $sExclude = exclude a file from the list by all or part of its name
;                           Example: Unins* will remove all files/folders that start with Unins
;                       $iFlag=0(Default) Return both files and folders
;                      $iFlag=1 Return files Only
;                       $iFlag=2 Return Folders Only
;
; Requirement(s):   None
; Return Value(s):  On Success - Returns an array containing the list of files and folders in the specified path
;                       On Failure - Returns the an empty string "" if no files are found and sets @Error on errors
;                       @Error or @extended = 1 Path not found or invalid
;                       @Error or @extended = 2 Invalid $sFilter or Invalid $sExclude
;                      @Error or @extended = 3 Invalid $iFlag
;                       @Error or @extended = 4 No File(s) Found
;
; Author(s):        SmOke_N, modified by mfecteau, Ascend4nt & KaFu
;                   http://www.autoitscript.com/forum/index.php?showtopic=33930&view=findpost&p=799369
; Note(s):          The array returned is one-dimensional and is made up as follows:
;                   $array[0] = Number of Files\Folders returned
;                   $array[1] = 1st File\Folder
;                   $array[2] = 2nd File\Folder
;                   $array[3] = 3rd File\Folder
;                   $array[n] = nth File\Folder
;
;                   All files are written to a "reserved" .tmp file (Thanks to gafrost) for the example
;                   The Reserved file is then read into an array, then deleted
;===============================================================================
Func _FileListToArrayEx($s_path, $s_mask = "*.*", $i_flag = 0, $s_exclude = -1, $f_recurse = False, $f_full_path = False)

    If FileExists($s_path) = 0 Then Return SetError(1, 1, 0)

    ; Strip trailing backslash, and add one after to make sure there's only one
    $s_path = StringRegExpReplace($s_path, "[\\/]+\z", "") & "\"

    ; Set all defaults
    If $s_mask = -1 Or $s_mask = Default Then $s_mask = "*.*"
    If $i_flag = -1 Or $i_flag = Default Then $i_flag = 0
    If $s_exclude = -1 Or $s_exclude = Default Then $s_exclude = ""

    ; Look for bad chars
    If StringRegExp($s_mask, "[/:><\|]") Or StringRegExp($s_exclude, "[/:><\|]") Then
        Return SetError(2, 2, 0)
    EndIf

    ; Strip leading spaces between semi colon delimiter
    $s_mask = StringRegExpReplace($s_mask, "\s*;\s*", ";")
    If $s_exclude Then $s_exclude = StringRegExpReplace($s_exclude, "\s*;\s*", ";")

    ; Confirm mask has something in it
    If StringStripWS($s_mask, 8) = "" Then Return SetError(2, 2, 0)
    If $i_flag < 0 Or $i_flag > 2 Then Return SetError(3, 3, 0)

    ; Validate and create path + mask params
    Local $a_split = StringSplit($s_mask, ";"), $s_hold_split = ""
    For $i = 1 To $a_split[0]
        If StringStripWS($a_split[$i], 8) = "" Then ContinueLoop
        If StringRegExp($a_split[$i], "^\..*?\..*?\z") Then
            $a_split[$i] &= "*" & $a_split[$i]
        EndIf
        $s_hold_split &= '"' & $s_path & $a_split[$i] & '" '
    Next
    $s_hold_split = StringTrimRight($s_hold_split, 1)
    If $s_hold_split = "" Then $s_hold_split = '"' & $s_path & '*.*"'

    Local $i_pid, $s_stdout, $s_hold_out, $s_dir_file_only = "", $s_recurse = "/s "
    If $i_flag = 1 Then $s_dir_file_only = ":-d"
    If $i_flag = 2 Then $s_dir_file_only = ":D"
    If Not $f_recurse Then $s_recurse = ""

    $command = @ComSpec & " /u /c dir /b " & $s_recurse & "/a" & $s_dir_file_only & " " & $s_hold_split
    $s_hold_out = _RunWaitStdOut($command, "", @SW_HIDE)
    ; ConsoleWrite($command & @crlf & $s_hold_out & @crlf & @extended & @crlf)

    $s_hold_out = StringRegExpReplace($s_hold_out, "\v+\z", "")
    If Not $s_hold_out Then Return SetError(4, 4, 0)

    ; Parse data and find matches based on flags
    Local $a_fsplit = StringSplit(StringStripCR($s_hold_out), @LF), $s_hold_ret
    $s_hold_out = ""

    If $s_exclude Then $s_exclude = StringReplace(StringReplace($s_exclude, "*", ".*?"), ";", "|")

    For $i = 1 To $a_fsplit[0]
        If $s_exclude And StringRegExp(StringRegExpReplace( _
                $a_fsplit[$i], "(.*?[\\/]+)*(.*?\z)", "\2"), "(?i)\Q" & $s_exclude & "\E") Then ContinueLoop
        If StringRegExp($a_fsplit[$i], "^\w:[\\/]+") = 0 Then $a_fsplit[$i] = $s_path & $a_fsplit[$i]
        If $f_full_path Then
            $s_hold_ret &= $a_fsplit[$i] & Chr(1)
        Else
            $s_hold_ret &= StringRegExpReplace($a_fsplit[$i], "((?:.*?[\\/]+)*)(.*?\z)", "$2") & Chr(1)
        EndIf
    Next

    $s_hold_ret = StringTrimRight($s_hold_ret, 1)
    If $s_hold_ret = "" Then Return SetError(5, 5, 0)

    Return StringSplit($s_hold_ret, Chr(1))
EndFunc   ;==>_FileListToArrayEx

; ====================================================================================================
; Execute a command and display the results
; ====================================================================================================
; Paul Campbell (PaulIA), ProgAndy, modified by KaFu and trancexx
; http://www.autoitscript.com/forum/index.php?showtopic=76607&view=findpost&p=555091

Func _RunWaitStdOut($sCmd, $sWorkingDir = "", $state = @SW_SHOW)

    Local $iBytes, $sData, $hReadPipe, $hWritePipe, $tBuffer, $tProcess, $tSecurity, $tStartup
    Local $STILL_ACTIVE = 0x103
    Local Const $STARTF_USESHOWWINDOW = 0x1
    Local Const $STARTF_USESTDHANDLES = 0x100

    ; Set up security attributes
;~     $tSecurity = DllStructCreate($tagSECURITY_ATTRIBUTES)
;~     DllStructSetData($tSecurity, "Length", DllStructGetSize($tSecurity))
;~     DllStructSetData($tSecurity, "InheritHandle", True)

    ; Create a pipe for the child process's STDOUT
    _NamedPipes_CreatePipe($hReadPipe, $hWritePipe);, $tSecurity)

    ;**************
    _WinAPI_SetHandleInformation($hReadPipe, 1, 0) ; redundant in this new situation
    _WinAPI_SetHandleInformation($hWritePipe, 1, 1)
    ;**************

    ; Create child process
    $tProcess = DllStructCreate($tagPROCESS_INFORMATION)
    $tStartup = DllStructCreate($tagSTARTUPINFO)
    DllStructSetData($tStartup, "Size", DllStructGetSize($tStartup))
    DllStructSetData($tStartup, "Flags", BitOR($STARTF_USESTDHANDLES, $STARTF_USESHOWWINDOW))
    DllStructSetData($tStartup, "StdOutput", $hWritePipe)
    DllStructSetData($tStartup, "StdError", $hWritePipe)
    DllStructSetData($tStartup, "ShowWindow", $state)
    _WinAPI_CreateProcess("", $sCmd, 0, 0, True, 0, 0, $sWorkingDir, DllStructGetPtr($tStartup), DllStructGetPtr($tProcess))
    Local $handle = DllStructGetData($tProcess, "hProcess"), $exitCode
    _WinAPI_CloseHandle(DllStructGetData($tProcess, "hThread"))

    Do
        $exitCode = DllCall("kernel32.dll", "long", "GetExitCodeProcess", "hwnd", $handle, "dword*", 0)
    Until $exitCode[0] <> $STILL_ACTIVE
    $exitCode = $exitCode[2]
    ; Close the write end of the pipe before reading from the read end of the pipe
    _WinAPI_CloseHandle($handle)
    _WinAPI_CloseHandle($hWritePipe)

    ; Read data from the child process
    $tBuffer = DllStructCreate("wchar Text[4096]")
    $pBuffer = DllStructGetPtr($tBuffer)
    While 1
        _WinAPI_ReadFile($hReadPipe, $pBuffer, 4096, $iBytes)
        If $iBytes = 0 Then ExitLoop
        $sData &= StringLeft(DllStructGetData($tBuffer, "Text"), $iBytes / 2)
    WEnd
    _WinAPI_CloseHandle($hReadPipe)
    SetExtended($exitCode)
    Return $sData
EndFunc   ;==>_RunWaitStdOut

Especially take note of these lines

$command = @ComSpec & " /u /c " & $CustomCommand
$s_hold_out = _RunWaitStdOut($command, "", @SW_HIDE)

Setting $CustomCommand to any app parsing unicode results to the console will enable you to directly pipe them to $s_hold_out for further processing.

Edited by KaFu
Link to comment
Share on other sites

Fixed bug ;) ...

$sData &= StringLeft(DllStructGetData($tBuffer, "Text"), $iBytes)

needs to be

$sData &= StringLeft(DllStructGetData($tBuffer, "Text"), $iBytes / 2)

because of the unicode characters.

Link to comment
Share on other sites

Interesting technique, KaFu. I created my own 'dir' FileListToString/Array program a while back, but the unicode 'dir' pipes to a file. I'm not sure how much of a speed difference you can get with your method, but its certainly a unique way to go. If needs be I will let the 'dir' command run in the background and then process it once its done. Unfortunately, piping to a file won't give you a progress indicator, which would be its only drawback. You could technically 'peek' at the file's contents now and then to grab a bit of file/folder info, however. (v3.3.6.1+).

Even with the no-progress issue, I've yet to see anyone write a fast and *working* FileList UDF that allows multiple wildcards, filtering by attributes (hidden/read-only etc, date-modified/created etc), sorting (based on folder, extension, date), filtering-out of symbolic links/junctions, and so on. 'Dir' kills every single FileListToArray UDF speed-wise when it comes to more advanced stuff than just a single generic filter (*.xxx).

I'm curious how using a stdout pipe will work out.. Taking a guess, I'd say it *might* slow down the dir command a bit, but it could solve the progress issue at a minimum.

Anyway, I leave all that testing up to you KaFu haha. Nice work though

Link to comment
Share on other sites

Thanks for the feedback ;). Did a speed-test against @ProgramFilesDir, 10 iterations (repeated 3 times), 14,7 sec with my example above, 25,7 sec with the temporary file in your original example ;), wouldn't have thought that the difference is so huge, truth to be told I thought you might have been right that the pipes are a lot slower.

What additionally came to my mind is that this is of course just one example of the usage, in principle this is a fast way to get any unicode result from @comspec, not only for the DIR command but for any program writing to console... changed the post title accordingly :shocked:...

And btw, I'm interested in this stuff because this was the starting point for me to code SMF, not being able to retrieve unicode results from the DIR command and other FileListToArray functions available back then (Alas, started it in 2008 :))...

Edited by KaFu
Link to comment
Share on other sites

That's a nice speed difference. I wonder how much of a difference you would get reading from or piping out to a file on a different drive?

I hadn't thought of it before, but obviously writing the file on the same drive you are using 'dir' with incur a performance penalty..

When I have time I might look further into this, and maybe try to understand all the piping stuff - hopefully it shouldn't be too hard to port to x64, and maybe there's some ways of squeezing even more performance out of it.

One thing I had done some investigation into is the 'undocumented' function 'NtQueryDirectoryFile', which actually works quite well for reading in a complete folder at a time. The only issues with that is each folder has to be 'opened', read to a buffer, and traversed through. Using AutoIt this is a slow process, however it might very well be fast using Assembly language.. but still I'm thinking 'dir' has got to be the quickest and simplest.

Link to comment
Share on other sites

  • Moderators

One thing I had done some investigation into is the 'undocumented' function 'NtQueryDirectoryFile', which actually works quite well for reading in a complete folder at a time. The only issues with that is each folder has to be 'opened', read to a buffer, and traversed through. Using AutoIt this is a slow process, however it might very well be fast using Assembly language.. but still I'm thinking 'dir' has got to be the quickest and simplest.

I'd imagine the NT functions would probably kill the idea of using it on a non-windows backup drive.

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

Smoke, actually this 'undocumented' function isn't NTFS specific, and is associated with the main FindFirst/Next Windows API functions, so should be able to read FAT file systems as well. Was there some other file system you had in mind? If so, please do tell how you are accessing it using AutoIt?

Anyway, the way I call it, it returns all the same information as my own _WinAPI_FileFind UDF, including shortnames and basic file attributes. Some extra info it returns are Size-on-Disk, Attribute-Change-Time, and some extended attribute info.

Link to comment
Share on other sites

  • Moderators

Smoke, actually this 'undocumented' function isn't NTFS specific, and is associated with the main FindFirst/Next Windows API functions, so should be able to read FAT file systems as well. Was there some other file system you had in mind? If so, please do tell how you are accessing it using AutoIt?

Anyway, the way I call it, it returns all the same information as my own _WinAPI_FileFind UDF, including shortnames and basic file attributes. Some extra info it returns are Size-on-Disk, Attribute-Change-Time, and some extended attribute info.

I just remember running into issues on a clients system that used a drive for nothing but raw storage of files and applications. It was network shared so all the computers had access to one core location. The issue I believe I was having ( because of no OS installed ) was more for file monitoring ( trying to avoid the First/Next lag ).

It's been sometime since I wrote the monitor system, but I could have sworn I had it working with MS OS's using NT* api's, but it failed on the Non-OS system.

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

Smoke,

Just to double check, I tested the NtQueryDirectoryFile method today on a USB thumbdrive (FAT32) and an external backup USB storage drive (NTFS), and both returned the correct info without any errors, so I don't see any issues with using my method. (works on Win2000 -> Win7 x64 too). There are *certain* classes of information that you can't use on anything other than a NTFS volume, but as long as you stick with the basics you should be fine.

btw, sorry KaFu - didn't mean to hijack the thread hehe.

Link to comment
Share on other sites

  • 1 year later...
  • 11 months later...

Returns the file 1631, but should be 80

See topic FileFindFirstFile in AutoIt.chm

"Wildcards: In general, * denotes zero or more characters, and ? denotes zero or one character. If your file search string contains only wildcards (or is "*.*"), then see the example below for the return value!

You can use only one wildcard in the filename part or in the extension part i.e. a*.b?.

?? seems equivalent to * (not described in Microsoft documentation).

When using a 3-char extension any extension starting with those 3 chars will match, .e.g. "*.log" will match "test.log_1". (not described either in Microsoft documentation)."

Maybe this is your case.

The point of world view

Link to comment
Share on other sites

I think ValeryVal is right :). If I add a ConsoleWrite($command & @crlf) directly before the _RunWaitStdOut(), it gives me this syntax for the call you've defined:

C:\Windows\System32\cmd.exe /u /c dir /b /s /a "C:Windowss*.???.*"

Running this in a command prompt returns some thousand hits on my system too.

Link to comment
Share on other sites

?? seems equivalent to * (not described in Microsoft documentation).

Just an FYI, ?? does not equal * when used as a wildcard, it just means 2 or less unknown characters. That part of the help file is wrong I tested this extensively on multiple versions of Windows and it never returns the files the same way using just * does.

If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...