Biatu

Faster _FileCountLines/FileReadLine?

23 posts in this topic

Anyone have a Machine code version of _FileCountLines and/or FileReadLine? I am currently processing log files that contain millions of lines in some cases. The log is updated in real time and my script will read only the new lines in the file. As I get further down the log each call to FileReadLine gets slower and slower. I need to be able to keep up with the log. As for _FileCountLines (including much faster alternatives) I cannot use FileRead due to file size concerns...as these logs can exceed 200MB easily.

 

Thank you,


What is what? What is what.

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

How about a tool that monitors the log file like "tail"? You could call the tool from AutoIt and process the new lines in the script then.
http://tailforwin32.sourceforge.net/

Here you get even more tools: https://stackify.com/13-ways-to-tail-a-log-file-on-windows-unix/

Edited by water
Stackify link added

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

@Biatu,

my (very simple) implementation of tailing a log file, in native AutoIt:

Global $hFile = FileOpen('text.txt')
Global $sLine = ''

While True
    $sLine = FileReadLine($hFile)
    If Not @error Then ConsoleWrite($sLine & @CRLF)
WEnd

FileClose($hFile)

concept: you open a file and get a handle to it. then you keep the handle, and use it constantly to read the file. as long as there is something to read, you read it and the file position advances to the end. if there is nothing new to read, then nothing needs to be done - but the handle is still kept.

you may want to elaborate it to detect log rotation, or to optimize your entire script performance, or whatever.

P.S. you have the source code for _FileCountLines(). look at it, and see why you should absolutely NOT be using it for this purpose.

Edited by orbs
1 person likes this

Share this post


Link to post
Share on other sites
23 minutes ago, orbs said:

@Biatu,

my (very simple) implementation of tailing a log file, in native AutoIt:

Global $hFile = FileOpen('text.txt')
Global $sLine = ''

While True
    $sLine = FileReadLine($hFile)
    If Not @error Then ConsoleWrite($sLine & @CRLF)
WEnd

FileClose($hFile)

concept: you open a file and get a handle to it. then you keep the handle, and use it constantly to read the file. as long as there is something to read, you read it and the file position advances to the end. if there is nothing new to read, then nothing needs to be done - but the handle is still kept.

you may want to elaborate it to detect log rotation, or to optimize your entire script performance, or whatever.

P.S. you have the source code for _FileCountLines(). look at it, and see why you should absolutely NOT be using it for this purpose.

Right, but if im not mistaken, the larger the file the longer it will take FileReadLine to even goto that line. One of these logs is like >700Mb (27,981,714 Lines)


What is what? What is what.

Share this post


Link to post
Share on other sites

You could always use FileSetPos to set the pointer where you want to start reading.


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites
45 minutes ago, Biatu said:

the larger the file the longer it will take FileReadLine to even goto that line

that is incorrect. goto which line? i do not pecify a line number. i do not need to, because the whole point of keeping a handle open is that the position remains at the end of the file at all times - unless there is new data to read. you can even use FileRead instead of FileReadLine, and it will still work. try it.

44 minutes ago, water said:

You could always use FileSetPos to set the pointer where you want to start reading.

no need, file position is set after FileRead/FileReadLine.

Share this post


Link to post
Share on other sites

#7 ·  Posted (edited)

What I mean is:

To speed up reading of new lines you could set the pointer to the end of the file.
From there you can read all lines which have been added after FileOpen without the need to read through the whole file.

Edited by water

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites
2 hours ago, orbs said:

@Biatu,

my (very simple) implementation of tailing a log file, in native AutoIt:

Global $hFile = FileOpen('text.txt')
Global $sLine = ''

While True
    $sLine = FileReadLine($hFile)
    If Not @error Then ConsoleWrite($sLine & @CRLF)
WEnd

FileClose($hFile)

concept: you open a file and get a handle to it. then you keep the handle, and use it constantly to read the file. as long as there is something to read, you read it and the file position advances to the end. if there is nothing new to read, then nothing needs to be done - but the handle is still kept.

you may want to elaborate it to detect log rotation, or to optimize your entire script performance, or whatever.

P.S. you have the source code for _FileCountLines(). look at it, and see why you should absolutely NOT be using it for this purpose.

This one looks promising, doesn't need to read whole file, and works much faster than counting lines first, then using a for each line loop starting at last read line number.


What is what? What is what.

Share this post


Link to post
Share on other sites

#9 ·  Posted (edited)

It does read the whole file.  The file gets opened in read mode by default. The script reads all lines and outputs them to the console starting from line 1.
According to the help file for FileReadLine:
"If no line number to read is given, the "next" line will be read. ("Next" for a newly opened file is initially the first line.)"

Edited by water
Quote the help file

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

#10 ·  Posted

I would use something like this:

#include <FileConstants.au3>
Global $sLine = ''
Global $hFile = FileOpen('text.txt') ; Open the file in read mode
FileSetPos($hFile, 0, $FILE_END) ; Moves the current file position to the end of the file
While True
    $sLine = FileReadLine($hFile)
    If Not @error Then ConsoleWrite($sLine & @CRLF)
WEnd
FileClose($hFile)

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

#11 ·  Posted (edited)

water,
I thought you would rather have suggested this way , which seems very promising for this particular case
:)

Edited by mikell

Share this post


Link to post
Share on other sites

#12 ·  Posted

What I posted above is nearly the same as the script you link to :)


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

#13 ·  Posted (edited)

Hmm doesn't your old code allow to read several lines ?

 

Edited by mikell

Share this post


Link to post
Share on other sites

#14 ·  Posted

My old code moves the file position to 20 characters before the end of the file.
Then it reads all remaining bytes and splits it into single lines at @CRLF.

You would have to guess the maximum line length. Then change 20 to twice this value plus 2 for every CRLF.


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

#15 ·  Posted

I understood that. Such a way could make Biatu's script extremely fast, assuming that the new lines to read are the last ones in the file

BTW the code in post#10 returns nothing for me - and never ends  :)

Share this post


Link to post
Share on other sites

#16 ·  Posted

Do you add new lines to the file while the script is running? If yes you should get them written to the SciTE output pane.

I wanted to post the fact that the script never ends because there is no statement checking a condition to end the loop. You have to cancel the script to end it right now.
It depends on what the OP needs.


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

#17 ·  Posted

19 hours ago, mikell said:

Hmm doesn't your old code allow to read several lines ?

I just tested and noticed that FileReadLine only allows to read the last line by setting the line parameter to -1. But you can't set it to -2 to read the second to last line.


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

#18 ·  Posted

water,
I meant nothing more than what you said in post#14 :  using FileSetPos and file_end, then going backwards to let's say 200 chars, we can grab several last lines from EOF
:)

Share this post


Link to post
Share on other sites

#19 ·  Posted

I know ;) I just wanted to make sure that my proposal really works. Unfortunately it doesn't.
FileSetPos always returns 0.

Need to do some more tests.


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

#20 ·  Posted

Which proposal doesn't work ? the one in post#10 or the one from the link ?
For me the one from the link always works  :)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now