Sign in to follow this  
Followers 0
lbsl

Best optimized way to use FileReadLine?

16 posts in this topic

#1 ·  Posted (edited)

Goodday folks,

I'm having problems reading log-files using the _FileReadToArray() function so i'm forced to use the FileReadLine function instead.

The _FileReadToArray() doesn't work because it quits reading the file if it encounters a null character as the first character on a line. (The FileCountLines() for that matter fails as well) The log files i try to read unfortunately contain many of these.

Is there any particular way to crank up the performance of the FileReadLine function?

Also any possibility that a future build of AutoIT would use LOF method on files instead of trying to estimate the last line in a file based upon specific characters codes for the _FileReadToArray function?

perhaps allow two reading modes:raw and plain, where in plain, special character codes are replaced by symbol tags like [null] and in raw only LF and CR are used to divide content into arrays. (I can understand for the null character this probably has to be tagged anyway as the array is likely null terminated as well?)

Regards,

Vince.

Edited by lbsl

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

I tried to adjust _FileReadToArray myself, but i figured out something goes wrong with the contents loaded after using FileRead() and reading the whole file to a buffer.

If i use ConsoleWrite, it also quits spewing out the rest of the buffer as soon as it encounters the first null character on the line.

Edited by lbsl

Share this post


Link to post
Share on other sites

Check out this thread:

Share this post


Link to post
Share on other sites

Thanks for the link (Found so many references about filetoarray not working, but this one was the needle in the haystack)

Did some minor adjustment to replace the null characters meanwhile.It works but i will see if the binary functions will perform faster.

Func _FileReadToArray($sFilePath, ByRef $aArray)
    Local $hFile = FileOpen($sFilePath, $FO_READ)
    If $hFile = -1 Then Return SetError(1, 0, 0);; unable to open the file
    ;; Read the file and remove any trailing white spaces
    Local $tbuffer = FileRead($hFile, FileGetSize($sFilePath))
    Local $aFile = ""
;~   $aFile = StringStripWS($aFile, 2)
    ; remove last line separator if any at the end of the file
    For $x = 1 To FileGetSize($sFilePath)
       If Asc(StringMid($tbuffer, $x, 1)) > 0 Then
          $aFile = $aFile & StringMid($tbuffer, $x, 1)
       Else
          $aFile = $aFile & "[null]"
       EndIf
    Next

    If StringRight($aFile, 1) = @LF Then $aFile = StringTrimRight($aFile, 1)
    If StringRight($aFile, 1) = @CR Then $aFile = StringTrimRight($aFile, 1)
    FileClose($hFile)
    If StringInStr($aFile, @LF) Then
        $aArray = StringSplit(StringStripCR($aFile), @LF)
    ElseIf StringInStr($aFile, @CR) Then ;; @LF does not exist so split on the @CR
        $aArray = StringSplit($aFile, @CR)
    Else ;; unable to split the file
        If StringLen($aFile) Then
            Dim $aArray[2] = [1, $aFile]
        Else
            Return SetError(2, 0, 0)
        EndIf
    EndIf
    Return 1
EndFunc   ;==>_FileReadToArray

Share this post


Link to post
Share on other sites

#include <WinAPI.au3>

Global $sFile, $hFile, $sText, $nBytes, $tBuffer
$sFile = @ScriptDir & 'test.txt'

; read 100 bytes from end of file
$tBuffer = DLLStructCreate("byte[100]")
$hFile = _WinAPI_CreateFile($sFile, 2, 2)
_WinAPI_SetFilePointer($hFile, -100, 2)
_WinAPI_ReadFile($hFile, DLLStructGetPtr($tBuffer), 100, $nBytes)
_WinAPI_CloseHandle($hFile)
$sText = BinaryToString(DLLStructGetData($tBuffer, 1))
$sText = StringReplace($sText, Chr(0), '<NULL>')
ConsoleWrite($sText)

Share this post


Link to post
Share on other sites

You could try replacing the entire For/Next loop with: StringReplace($tbuffer, Chr(0), "[null]")

Processing the entire file in one pass ought to be considerably faster.

Share this post


Link to post
Share on other sites

#include <WinAPI.au3>

Global $sFile, $hFile, $sText, $nBytes, $tBuffer
$sFile = @ScriptDir & 'test.txt'

; read 100 bytes from end of file
$tBuffer = DLLStructCreate("byte[100]")
$hFile = _WinAPI_CreateFile($sFile, 2, 2)
_WinAPI_SetFilePointer($hFile, -100, 2)
_WinAPI_ReadFile($hFile, DLLStructGetPtr($tBuffer), 100, $nBytes)
_WinAPI_CloseHandle($hFile)
$sText = BinaryToString(DLLStructGetData($tBuffer, 1))
$sText = StringReplace($sText, Chr(0), '<NULL>')
ConsoleWrite($sText)

This looks quite fast... I have some 8MB log file to feast it on, should at least give me a noticable difference.

You could try replacing the entire For/Next loop with: StringReplace($tbuffer, Chr(0), "[null]")

Processing the entire file in one pass ought to be considerably faster.

Yes you are right in that one.

I did just that thing, also cleaned up some code in the _filetoarray function by simply replacing all CRLF and CR combinations with an LF.

Didn't understood why CR and LF were both filtered seperately if either of them would be used to split lines.

At least this works so far:

Func _FileReadToArray($sFilePath, ByRef $aArray)
    Local $hFile = FileOpen($sFilePath, $FO_READ)
    If $hFile = -1 Then Return SetError(1, 0, 0);; unable to open the file
    ;; Read the file and remove any trailing white spaces
    Local $tbuffer = FileRead($hFile, FileGetSize($sFilePath))
    FileClose($hFile)
    Local $aFile = StringReplace(BinaryToString($tbuffer), Chr(0), "[nul]")
    $aFile = StringReplace(BinaryToString($aFile), Chr(13)&Chr(10), Chr(10))
    $aFile = StringReplace(BinaryToString($aFile), Chr(13), Chr(10))

    If StringRight($aFile, 1) = @LF Then $aFile = StringTrimRight($aFile, 1)
    If StringInStr($aFile, @LF) Then
        $aArray = StringSplit($aFile, @LF)
    Else ;; unable to split the file
        If StringLen($aFile) Then
            Dim $aArray[2] = [1, $aFile]
        Else
            Return SetError(2, 0, 0)
        EndIf
     EndIf
    Return 1
EndFunc   ;==>_FileReadToArray

I also attempted to add filters for [eth][stx][etx] etc, but that seemed a bit too much for IT to process.

Share this post


Link to post
Share on other sites

Some OS's use CRLF, some use LF, some use CR, if you only split on one of them, you can't split the lines correctly. Also, you're tripling the time needed to process the file by replacing the NUL with "" and also replacing the CR and CRLF with LF.


If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Share this post


Link to post
Share on other sites

#10 ·  Posted

lbsl,

You might be interested in this little SRE which forces all line endings (whether @CR, @LF or @CRLF) into @CRLF to ensure you can split the lines correctly - it works even if there is a mixture of endings within the same file: ;)

$sText = StringRegExpReplace($sText, "((?<!\x0d)\x0a|\x0d(?!\x0a))", @CRLF)

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites

#11 ·  Posted (edited)

This looks quite fast... I have some 8MB log file to feast it on, should at least give me a noticable difference.

My solution is REALLY very fast because it reads only few desired bytes from end of file (and not whole file from beginning)

so its speed doesn't depend on size of file.

EDIT:

You can also use StringSplit() on result ($sText) to get these few last rows in array ...

Edited by Zedna

Share this post


Link to post
Share on other sites

#12 ·  Posted

To throw in my two cents on your latest posted version...

The $afile variable returned from your first StringReplace() call would be a string type, so the BinaryToString($afile) calls in the next 2 lines could be yanked.

Share this post


Link to post
Share on other sites

#13 ·  Posted

Some OS's use CRLF, some use LF, some use CR, if you only split on one of them, you can't split the lines correctly. Also, you're tripling the time needed to process the file by replacing the NUL with "" and also replacing the CR and CRLF with LF.

That is true, but i don't notice an immense speed drop on this.

I had more performance problems splitting out log contents to two RichEdit forms.

It looks like when you concatenate results into a string, goes as fast up to a certain amount of lines that are concatenated, then you have to append the contents to the RichEdit box and clear the string to refill it up.

lbsl,

You might be interested in this little SRE which forces all line endings (whether @CR, @LF or @CRLF) into @CRLF to ensure you can split the lines correctly - it works even if there is a mixture of endings within the same file: ;)

$sText = StringRegExpReplace($sText, "((?<!x0d)x0a|x0d(?!x0a))", @CRLF)

M23

Thanks for the snippet. The results were however different than with the double stringreplace lines (i did replaced the @LF filter with the @CRLF in the string matching lines below the replace lines). I also thought to have read something particular about comparing two chars using StringRegExp in general.

My solution is REALLY very fast because it reads only few desired bytes from end of file (and not whole file from beginning)

so its speed doesn't depend on size of file.

EDIT:

You can also use StringSplit() on result ($sText) to get these few last rows in array ...

It's not about reading the last few lines, i need to read the whole file anyway, but it is about content lacking due to the null characters.

To throw in my two cents on your latest posted version...

The $afile variable returned from your first StringReplace() call would be a string type, so the BinaryToString($afile) calls in the next 2 lines could be yanked.

Thanks for the pennies, applied the change :)

Share this post


Link to post
Share on other sites

#14 ·  Posted (edited)

#include <WinAPI.au3>

Global $sFile, $hFile, $sText, $nBytes, $tBuffer
$sFile = @ScriptDir & 'test.txt'

; read 100 bytes from end of file
$tBuffer = DLLStructCreate("byte[100]")
$hFile = _WinAPI_CreateFile($sFile, 2, 2)
_WinAPI_SetFilePointer($hFile, -100, 2)
_WinAPI_ReadFile($hFile, DLLStructGetPtr($tBuffer), 100, $nBytes)
_WinAPI_CloseHandle($hFile)
$sText = BinaryToString(DLLStructGetData($tBuffer, 1))
$sText = StringReplace($sText, Chr(0), '<NULL>')
ConsoleWrite($sText)

I have been toying with this snippet. It works great on files that aren't opened by any other program, but it fails if the file is opened by another program, regardless if that file was opened in shared read mode or not. This would have been very handy to quickly update changes made to the file being read. I don't get any errors but $nBytes always returns 0 and i have no idea why it doesn't return any error in this case.

I'm going to fool around with FileSetPos (As FileRead() does work)

Edited by lbsl

Share this post


Link to post
Share on other sites

Use Share parameter in _WinAPI_CreateFile()

$hFile = _WinAPI_CreateFile($sFile, 2, 2, 7) ; share for READ+WRITE+DELETE

Yes i have tried that, no luck, the application who has it open for sure locked it for write access (I have not written that application unfortunately). I even looked around for the overlapped structure to work with, but that doesn't seem to have use if no bytes are ever read in the first place.

From all the advises from above, this is the final modified _FileReadToArray that allows reading from any arbitrary position in the file including the null filtering.

; #FUNCTION# ====================================================================================================================
; Name...........: _FileReadToArray
; Description ...: Reads the specified file into an array.
; Syntax.........: _FileReadToArray($sFilePath, ByRef $aArray)
; Parameters ....: $sFilePath - Path and filename of the file to be read.
;                 $aArray    - The array to store the contents of the file.
;                   $offset      - The fileposition to start reading from (by default 0 always returns last measured filesize)
; Return values .: Success - Returns a 1
;                 Failure - Returns a 0
;                 @Error  - 0 = No error.
;                 |1 = Error opening specified file
;                 |2 = Unable to Split the file
; Author ........: Jonathan Bennett <jon at hiddensoft dot com>, Valik - Support Windows Unix and Mac line separator
; Modified.......: Lbsl - added loading from offset for reading live modified files.
; Remarks .......: $aArray[0] will contain the number of records read into the array.
; Related .......: _FileWriteFromArray
; Link ..........:
; Example .......: Yes
; ===============================================================================================================================
Func _FileReadToArray($sFilePath, ByRef $aArray, ByRef $offset)
    Local $hFile = FileOpen($sFilePath, $FO_READ)
    Local $set_offset_first = false
    If $hFile = -1 Then Return SetError(1, 0, 0);; unable to open the file
    ;; Read the file and remove any trailing white spaces
    Local $fSize = FileGetSize($sFilePath)

    FileSetPos ( $hFile, $offset, $FILE_BEGIN )

    ;When fetching all contents for the first time, we know we are going to read
    ;the $fSize amount of bytes, therefore, we set the offset to the current End
    ;of the file for the next pass if the user wants to continue from that position
    If $offset == 0 Then
      $offset = $fSize
      $set_offset_first = True
    EndIf

    Local $tbuffer = FileRead($hFile, FileGetSize($sFilePath))
    FileClose($hFile)
    ;However when we do a second pass, $fSize actually would be beyond the
    ;the $fSize amount of bytes, therefore, we set the offset to the current End
    ;of the file after the file has been closed and we cloase it after the bytes
    ;have been read, this is also simply the reason no byte amount to read is defined
    ;because you can't tell on a file that is being life updated.
    If $set_offset_first == False Then
       $offset = FileGetSize ( $sFilePath )
    EndIf

    Local $aFile = StringReplace(BinaryToString($tbuffer), Chr(0), "[nul]")
    If StringRight($aFile, 1) = @LF Then $aFile = StringTrimRight($aFile, 1)
    If StringRight($aFile, 1) = @CR Then $aFile = StringTrimRight($aFile, 1)

    If StringInStr($aFile, @LF) Then
        $aArray = StringSplit(StringStripCR($aFile), @LF)
    ElseIf StringInStr($aFile, @CR) Then ;; @LF does not exist so split on the @CR
        $aArray = StringSplit($aFile, @CR)
    Else ;; unable to split the file
        If StringLen($aFile) Then
            Dim $aArray[2] = [1, $aFile]
        Else
            Return SetError(2, 0, 0)
        EndIf
    EndIf
EndFunc   ;==>_FileReadToArray

Here's the test snippet:

#include <File.au3>

Local $sFilePath = @ScriptDir&"test.txt"
Local $last_updated = FileGetTime($sFilePath,0,1)
Local $sFileCurrentPosition = 0
Local $sLines = 0
Dim $sFileContent = 0

ConsoleWrite('Initial offset:' &$sFileCurrentPosition&@CRLF)

_FileReadToArray($sFilePath, $sFileContent, $sFileCurrentPosition)

If @error or Ubound($sFileContent) < 1 Then
   If Ubound($sFileContent) < 1 Then
      ConsoleWrite('Array content empty' )
   Else
        ConsoleWrite('Error occured' )
   EndIf

   Exit
EndIf

Dim $sDisplayText[Ubound($sFileContent)]
ConsoleWrite('File-> ['&$sFilePath&"]"&@CRLF)

ConsoleWrite("---------------------------------------------------------------------"&@CRLF)
ConsoleWrite("--------------------------Unaltered content--------------------------"&@CRLF)
ConsoleWrite("---------------------------------------------------------------------"&@CRLF)

For $x = 0 To UBound($sFileContent)-1
   $sDisplayText[$x] = $sFileContent[$x]
   If $x > 0 Then
      ConsoleWrite($sFileContent[$x]&@CRLF)
   EndIf
Next
$sLines = $sFileContent[0]


While 1
    If FileGetTime($sFilePath,0,1) > $last_updated Then
       $last_updated = FileGetTime($sFilePath,0,1)
      Dim $snwFileContent = 0
      ConsoleWrite("---------------------------------------------------------------------"&@CRLF)
      ConsoleWrite("----------------------File modification detected---------------------"&@CRLF)

      If _FileCountLines($sFilePath) <= $sLines Then
         ;File has no added content, read it again
         $sLines = _FileCountLines($sFilePath)
         $sFileCurrentPosition = 0
         ConsoleWrite("---------------------------Lines modificatied------------------------"&@CRLF)
         ConsoleWrite("---------------------------------------------------------------------"&@CRLF)
      Else
         ConsoleWrite("-----------------------------Lines added-----------------------------"&@CRLF)
         ConsoleWrite("---------------------------------------------------------------------"&@CRLF)
      EndIf

      ConsoleWrite('going to read offset:' &$sFileCurrentPosition&@CRLF)
      _FileReadToArray($sFilePath, $snwFileContent, $sFileCurrentPosition)
      ConsoleWrite('next offset:' &$sFileCurrentPosition&@CRLF)
      ConsoleWrite("------------------------------Read lines-----------------------------"&@CRLF)
      If UBound($snwFileContent) > 0 Then
         ReDim $sDisplayText[$sDisplayText[0]+$snwFileContent[0]+1]
         For $x = 1 To $snwFileContent[0]
            ConsoleWrite($snwFileContent[$x]&@CRLF)
            $sDisplayText[$sDisplayText[0]+$x] = $snwFileContent[$x]
         Next
      Else
         ConsoleWrite('File has no content or inaccessible?'&@CRLF)
      EndIf
    EndIf
WEnd

Just fool around with the test.txt in notepad and save it after each change.

Or simply let it loose on a datalogger.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0