Jump to content

Modified _FileCountLines()


Spiff59
 Share

Recommended Posts

I found the production version of _FileCountLines() improperly handles blank lines at the end of a file. It rarely reports the same number of lines shown in notepad or SciTE.

This modified function is 60-70% faster, uses half the memory (allowing it to read larger files and less likely to spit out "memory allocation" errors), and it properly handles blanks lines at the end of a file.

Anyone see anything wrong with the following replacement candidate?

Thank you.

Func _FileCountLines($sFilePath)
    Local $hFile = FileOpen($sFilePath, $FO_READ)
    If $hFile = -1 Then Return SetError(1, 0, 0)
    Local $sFileContent = FileRead($hFile), $aTerminator[2] = ["n", "r"] ; linefeed, carriage return
    FileClose($hFile)
    For $x = 0 to 1
        StringRegExpReplace($sFileContent, $aTerminator[$x], $aTerminator[$x])
        If @extended Then
            Local $count = @extended
            If StringRight($sFileContent, 1) <> $aTerminator[$x] Then $count += 1
            ExitLoop
        EndIf
    Next
    If Not $count Then
        If StringLen($sFileContent) Then
            $count = 1 ; single-line file
        Else
            Return SetError(2, 0, 0) ; 0-byte file
        EndIf
    EndIf
    Return $count
EndFunc   ;==>_FileCountLines

Edit: I do see one might save the result of the StringRight() statement, and test that for the single-line/0-byte condition, eliminating the need for the StringLen() statement.

Edited by Spiff59
Link to comment
Share on other sites

It's best to declare variables outside of loops as this can affect performance too.

UDF List:

 
_AdapterConnections()_AlwaysRun()_AppMon()_AppMonEx()_ArrayFilter/_ArrayReduce_BinaryBin()_CheckMsgBox()_CmdLineRaw()_ContextMenu()_ConvertLHWebColor()/_ConvertSHWebColor()_DesktopDimensions()_DisplayPassword()_DotNet_Load()/_DotNet_Unload()_Fibonacci()_FileCompare()_FileCompareContents()_FileNameByHandle()_FilePrefix/SRE()_FindInFile()_GetBackgroundColor()/_SetBackgroundColor()_GetConrolID()_GetCtrlClass()_GetDirectoryFormat()_GetDriveMediaType()_GetFilename()/_GetFilenameExt()_GetHardwareID()_GetIP()_GetIP_Country()_GetOSLanguage()_GetSavedSource()_GetStringSize()_GetSystemPaths()_GetURLImage()_GIFImage()_GoogleWeather()_GUICtrlCreateGroup()_GUICtrlListBox_CreateArray()_GUICtrlListView_CreateArray()_GUICtrlListView_SaveCSV()_GUICtrlListView_SaveHTML()_GUICtrlListView_SaveTxt()_GUICtrlListView_SaveXML()_GUICtrlMenu_Recent()_GUICtrlMenu_SetItemImage()_GUICtrlTreeView_CreateArray()_GUIDisable()_GUIImageList_SetIconFromHandle()_GUIRegisterMsg()_GUISetIcon()_Icon_Clear()/_Icon_Set()_IdleTime()_InetGet()_InetGetGUI()_InetGetProgress()_IPDetails()_IsFileOlder()_IsGUID()_IsHex()_IsPalindrome()_IsRegKey()_IsStringRegExp()_IsSystemDrive()_IsUPX()_IsValidType()_IsWebColor()_Language()_Log()_MicrosoftInternetConnectivity()_MSDNDataType()_PathFull/GetRelative/Split()_PathSplitEx()_PrintFromArray()_ProgressSetMarquee()_ReDim()_RockPaperScissors()/_RockPaperScissorsLizardSpock()_ScrollingCredits_SelfDelete()_SelfRename()_SelfUpdate()_SendTo()_ShellAll()_ShellFile()_ShellFolder()_SingletonHWID()_SingletonPID()_Startup()_StringCompact()_StringIsValid()_StringRegExpMetaCharacters()_StringReplaceWholeWord()_StringStripChars()_Temperature()_TrialPeriod()_UKToUSDate()/_USToUKDate()_WinAPI_Create_CTL_CODE()_WinAPI_CreateGUID()_WMIDateStringToDate()/_DateToWMIDateString()Au3 script parsingAutoIt SearchAutoIt3 PortableAutoIt3WrapperToPragmaAutoItWinGetTitle()/AutoItWinSetTitle()CodingDirToHTML5FileInstallrFileReadLastChars()GeoIP databaseGUI - Only Close ButtonGUI ExamplesGUICtrlDeleteImage()GUICtrlGetBkColor()GUICtrlGetStyle()GUIEventsGUIGetBkColor()Int_Parse() & Int_TryParse()IsISBN()LockFile()Mapping CtrlIDsOOP in AutoItParseHeadersToSciTE()PasswordValidPasteBinPosts Per DayPreExpandProtect GlobalsQueue()Resource UpdateResourcesExSciTE JumpSettings INISHELLHOOKShunting-YardSignature CreatorStack()Stopwatch()StringAddLF()/StringStripLF()StringEOLToCRLF()VSCROLLWM_COPYDATAMore Examples...

Updated: 22/04/2018

Link to comment
Share on other sites

Spiff59, I like the design. Use of SRER and @extended is interesting. On the other hand I personally like how the original function does not include trailing blank lines in the line count. I don't view that as improper.

Link to comment
Share on other sites

Hello Spiff59.

If we use function that use FileOpen on file that greater than 179639503 bits, we have an memory allocation error return.

So the production function and your's can't be used on large files.

I don't know if it's a good choice, but the code below count lines faster than your function on large file :

Func _FileCountLines2($sFilePath)
    Local $hFile = FileOpen($sFilePath, 0)
    If $hFile = -1 Then Return SetError(1, 0, 0)
    Local $nbLine = 0
    While 1
        FileReadLine($hFile)
        If @error = -1 Then ExitLoop
        $nbLine += 1
    WEnd
    FileClose($hFile)
    Return $nbLine
EndFunc   ;_FileCountLines2

On some file, your function return a different amount of line. For example I have tested a NDX file.

My function and Scite return me the same count, but not your function (I can give you the file for testing), same thing for a FIC file and many other type of files (I have trying with the first files I found :)).

The file have a size of 436Ko and the function above do the count in 0.012s and your's in 0.14s.

On pure text file, the difference is smaller (about same time on 1,6Mo log file), but the more the file is big, the more the difference is important (2s on 50Mo file).

I have remaked that my function do not count an empty last line, but your's do it. So it can be a bad point (or not) it depend the point of view. ;)

Edited by Tlem

Best Regards.Thierry

Link to comment
Share on other sites

I don't understand the logic in anyone preferring to strip trailing blank lines from the file.

Isn't the name of the function FileCountLines(), not FileStripTrailingBlankLinesAndThenCountLines()?

Notepad, Word, SciTE, DOS Edit, and every other editor in the world do not remove these lines.

_FileReadToArray(), _FileReadLine(), and _FileGetSize() read these lines.

I also found BugTracker ticket #1831 out there, that shows that the production FCL version stops processing a file when it encounters certain special characters. This SRER version does not.

So...

1. This routine is considerably faster.

2. Uses half as much memory

3. Reports the same number of lines as returned by FileReadToArray() and every text editor in existance.

4. Processes files containing control characters.

What's not to like? lol

@Tlem

I can't reproduce your numbers. Whether a 1000-line file is input, or a file consisting of 12 million lines, the SRER version is substantially faster. When I scan a batch of .au3 files (both with and without a trailing blank line or lines), the line-count from this version matches, the others do not.

Edited by Spiff59
Link to comment
Share on other sites

May be your are right, but in this case, the built in fonction return is wrong and some Windows and Linux programs too (like wc). ^^

So finaly, how's right. :)

For text file, I think your are right, but on other files, if we considere that the problem is about leading lines, your function report a realy wrong number of line. ;)

And there still a problem with the size of the file. ;)

Edit :

This behavior make me doubt :

#include <File.au3>
Global $File = "TestFile.txt"
; Creation of a new test file with 5 lines and one last blank line (6 lines for Spiff59)
If FileExists($File) Then FileDelete($File)
FileWrite($File, "Line 1" & @CRLF & _
     "Line 2" & @CRLF & _
     "Line 3" & @CRLF & _
     "Line 4" & @CRLF & _
     "Line 5" & @CRLF)
; here Spiff59 FileCountLines report 6 lines !!!
MsgBox(64, "FileCountLine Tests 1", "Built in _FileCountLines return " & _FileCountLines($File) & " line(s)" & @CRLF & _
   "Spiff59 _FileCountLines return " & _Spiff59_FileCountLines($File) & " line(s)" & @TAB & @TAB & _
   @CRLF & @CRLF & " Yeahhh it's ok")
; Then we add one line to the end of file.
FileWriteLine($File, "This line should be the line number 7")
; here Spiff59 FileCountLines report 7 lines (that's right ?)
MsgBox(64, "FileCountLine Tests 2", "Built in _FileCountLines return " & _FileCountLines($File) & " line(s)" & @CRLF & _
   "Spiff59 _FileCountLines return " & _Spiff59_FileCountLines($File) & " line(s)" & @TAB & @TAB & _
   @CRLF & @CRLF & " Yeahhh it's ok again")
; But what if we edit the file.
ShellExecute($File)
Sleep(2000)
MsgBox(16, "FileCountLine Tests", "Hoouuchhh the line we added isn't the line number 7 !!!")

Func _Spiff59_FileCountLines($sFilePath)
    Local $hFile = FileOpen($sFilePath, $FO_READ)
    If $hFile = -1 Then Return SetError(1, 0, 0)
    Local $sFileContent = FileRead($hFile), $aTerminator[2] = ["n", "r"] ; linefeed, carriage return
    FileClose($hFile)
    For $x = 0 to 1
        StringRegExpReplace($sFileContent, $aTerminator[$x], $aTerminator[$x])
        If @extended Then
            Local $count = @extended
            If StringRight($sFileContent, 1) <> $aTerminator[$x] Then $count += 1
            ExitLoop
        EndIf
    Next
    If Not $count Then
        If StringLen($sFileContent) Then
            $count = 1 ; single-line file
        Else
            Return SetError(2, 0, 0) ; 0-byte file
        EndIf
    EndIf
    Return $count
EndFunc   ;==>_Spiff59_FileCountLines
Edited by Tlem

Best Regards.Thierry

Link to comment
Share on other sites

The reason your text is on line 6 is because line 6 is a blank line. If it contained either a LF or CRLF your new line would go onto line 7, but because it's blank it goes to line 6.

p.s. What do you mean here?

For text file, I think your are right, but on other files, if we considere that the problem is about leading lines, your function report a realy wrong number of line.

Edited by BrewManNH

If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Link to comment
Share on other sites

Along the lines of what BrewMan said, the @CRLF at the end of line 5 is creating line 6.

The FileWriteLine() starts from that point, line 6, writes the new data, and per the helpfile "If the line does NOT end in @CR or @LF then a @CRLF will be automatically added.", it appends another @CRLF, which creates line 7.

I can press the down cursor button in notepad and it stops on line 7, as it should.

Link to comment
Share on other sites

I know the explications, but what I want to said is that this point of view is less simple to understand and it differ on standard use in Windows because if you don't know that the last line is a trailling blank line, it can perturb the code you want to use.

It also raises an important point :

The build in _FileCountLine function is incorrect.

The FileReadLine is incorrect too because it doesn't count the trailling blank line.

And maybe other function that depend of this characteristic. ^^

@BrewManNH

I mean that if you use the _FileCountLine function on other file than text file, the number of lines isn't all the time correct. It depend of the file. :)

The loop with FileReadLine return always the correct number of line (if you make exception on the trailling blank line).

Try with attached file and sample.

Just to not forget : The problem for the file size is still present.

FileCount example.zip

Edited by Tlem

Best Regards.Thierry

Link to comment
Share on other sites

I believe the following version is about as fast as pure AutoIt can be and it doesn't have any file size or memory limitation.

By default, it simply counts linefeeds only (this is compatible with both traditional Unix-like [LF] and DOS-Windows [CRLF] conventions) using a 8Mb buffer, which should cope with any decent PC RAM and hard disk or SSD buffer. Adjust buffer size to vary results sligthly.

One can change the line counting behavior by passing a non-zero second argument. It then counts carriage-returns only (suitable for traditional Mac text files convention [CR]).

Func _FileCountLines($sFilePath, $CRonly = 0)
    Local $hFile = FileOpen($sFilePath)
    If $hFile = -1 Then Return SetError(1, 0, 0)
Local $iLineCount = 0, $sBuffer, $iReadBytes, $bDone
Local Const $BUFFER_SIZE = 8 * 1024 * 1024
Local $sTermination = @LF
If $CRonly Then $sTermination = @CR
Do
  $sBuffer = FileRead($hFile, $BUFFER_SIZE)
  $bDone = (@extended <> $BUFFER_SIZE)
  StringRegExpReplace($sBuffer, $sTermination, "")
  $iLineCount += @extended
Until $bDone
    If FileGetPos($hFile) > 0 Then
        FileSetPos($hFile, -1, 1)
        If FileRead($hFile, 1) <> $sTermination Then $iLineCount += 1
    EndIf
    FileClose($hFile)
    Return $iLineCount
EndFunc   ;==>_FileCountLines

Edit: fixed empty file case. Oops.

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

I mean that if you use the _FileCountLine function on other file than text file, the number of lines isn't all the time correct

Why would you EVER use FileCountLines on anything other than text files? It's designed to be used on text files because binary files don't have "lines", they may have the line end characters in them, but they're rarely readable as text except for some data in them. Edited by BrewManNH

If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Link to comment
Share on other sites

@BrewManNH

Well you are right, but like Spiff59 said, the function name isn't FileStripTrailingBlankLinesAndThenCountLines() and if we go in this way, the function name isn't _TextFileCountLine(). :)

In the documentation, type of file to use isn't specified, but you can read :

It does not count a final @LF as a line

^^

@jchd

Your function work like Spiff59's (result and time) but without the size problem, and we can use with one option for those ou don't count the trailling blank line. ;)

So I think it's the best choice. But notice that on "binary file", you have the same problem. ;)

Best Regards.Thierry

Link to comment
Share on other sites

The "trailing blank line" is a myth.

Open any text file in most editors: it will show you one more line than actually exist.

Now you hit a spot: an empty file should give 0. Fixed.

There is no good definition of "line" in a random binary file.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

The function's name may not be accurate, but then there's plenty of that out there. Just because it doesn't say it's only for text files doesn't mean that it makes sense to try and count the lines in files that don't have lines.

If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Link to comment
Share on other sites

@jchd,

Open any text file in most editors: it will show you one more line than actually exist.

Yes - that has long bugged me about Notepad++.

There is no good definition of "line" in a random binary file.

In fact, defacto usage of the term "text" implies lines delimited by /n or /r/n, at least in Windows (don't know about Unix based systems). kylomas Edited by kylomas

Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Link to comment
Share on other sites

I don't understand the logic in anyone preferring to strip trailing blank lines from the file.

The original authors of the function felt the same way. Other wise the would not have removed the trailing linefeeds with the command:

Local $sFileContent = StringStripWS(FileRead($hFile), 2)

Isn't the name of the function FileCountLines(), not FileStripTrailingBlankLinesAndThenCountLines()?

Actually it's _FileCountLines(). Get it straight before you try to mock people.

I also found BugTracker ticket #1831 out there, that shows that the production FCL version stops processing a file when it encounters certain special characters.

Ya special characters being NULL characters. Almost all C functions stop processing a string when it hits a NULL character.

What's not to like? lol

I don't like anything that could be considered script breaking just to gain a few milliseconds. Make it backwards compatible with the original function that has been included with AutoIt for years now, and then I'll get on board. How can you not understand that logic?
Link to comment
Share on other sites

I don't like anything that could be considered script breaking just to gain a few milliseconds. Make it backwards compatible with the original function that has been included with AutoIt for years now, and then I'll get on board. How can you not understand that logic?

I'd place the substantial performance gain third in the list of advantages this version has over the existing code.

Second, would be the improved memory usage. jchd's version apparently eliminates any memory restrictions.

Easily, first on the list, is making it return an accurate number that is in accordance with many other Autoit functions, and matches every text editor out there. I screwed up one of my first reports in school and chewed a black, gooey hole horizontally right through the entire width of a page of 132-column greenbar by printing a 40-odd page report on a single line of a single page. I learned my lesson about carriage control after that. Once you park a carriage return/linefeed at the end of a line, you have then moved to the next line. If there's a @CRLF at the end of line 4, then line 5 exists.

I don't see much validity in the "that's the way it's always been" argument, when it's always been broke. But, in 4 years I've never used this function even once. So, it being corrected is not something I find important enough advocate any further. I just think it is the correct thing to do.

Link to comment
Share on other sites

I don't see much validity in the "that's the way it's always been" argument, when it's always been broke.

You can't call it broken if the function was designed to do that, and most importantly tells you that it does this. The doc clearly states in the remarks: It does not count a final @LF as a line.

Don't get me wrong, I support yours/jchd's changes to this function, but for one reason only. It removes the memory limitations. Thats something I did not know before reading this thread.

So if you do put in a ticket for this to be changed I recommend the following bits of advice: Do not mark it as a bug, because it is not. This is a request. Leave out your opinion on how you think it should work because your opinion probably won't matter to the devs. Second leave out your little bits about how the performance is increased by 60-70%. Thats just a bonus. For the most part they don't care about performance enhancements. Anything that even starts to get close to the realm of speed is the root of all evil seems to to get shot down. Take for example my recent improvements to _ArrayDisplay(). It adds 2 lines of code, remains backwards compatible, and makes the function about 4x faster, but yet looks like its still going to be shot down.

Ok so when grabbing the link to my _arraydisplay ticket I see you already submitted a ticket. I'm disappointed you didn't include jchd's version as that version removes the memory limitations, IMOP the only problem with the function. And I also see you later called the design of the function a bug..

I'm going to point them to this thread as I would like them to see the other versions of this function.

Edited by Beege
Link to comment
Share on other sites

Yes, I jumped the gun in my enthusiasm and posted the bugtracker prior to much peer-review or jchd's contribution.

I do understand that the behavior of the current routine is (vaguely) documented.

The docs probably ought to state that it removes all CR's, all LF's, all combinations of carriage control characters from the end of the file. That multiple lines containing any number of spaces and/or carriage control chars may be removed.

My argument is that the current routine is unconventional, an oddball. It is of little use in it's present state.

The result becomes less valuable when there is a built-in hidden edit removing an unknown amount of data from the end of the file. I don't think the function should treat a @CRLF as some sort of unwanted meaningless line terminator, when in fact it is a request to create a new line. _FCL ought to perform like other similar functions or editors out there. Leaving the option of trimming whitespace to the scripters discretion would be my wish.

PS - I also experienced how tough it is getting some changes accepted back in 2009 when a bunch of us were hoping to get a backward-compatible recursive _FileListToArray() put in to replace the existing routine. Some of the final offerings in the 275-post thread are still excellent candidates, IMHO.

PPS - I've apparently also had enough success getting some tickets approved that I haven't abandoned the effort at making contributions :)

Edited by Spiff59
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...