Sign in to follow this  
Followers 0
ryeguy

New FAST line counter

53 posts in this topic

#1 ·  Posted (edited)

The line counter that is included in file.au3 is incredibly slow, so I decided to make a new one. The way it works is really clever. The code is:

Func LineCount($file)
   $thousands = 0
   Do
      $thousands = $thousands + 1000
      FileReadLine($file, $thousands)
   Until @error = -1
   $thousands = $thousands - 1000
   $hundreds = $thousands
   Do
      $hundreds = $hundreds + 100
      FileReadLine($file, $hundreds)
   Until @error = -1
   $hundreds = $hundreds - 100
   $tens = $hundreds
   Do
      $tens = $tens + 10
      FileReadLine($file, $tens)
   Until @error = -1
   $tens = $tens - 10
   $ones = $tens
   Do
      $ones = $ones + 1
      FileReadLine($file, $ones)
   Until @error = -1
   Return $ones - 1
EndFunc

This is good for handling files that are below 10,000 lines, but you will rarely need to handle any above that. If you do, edit the code to handle 100 thousands (im sure you can figure out how), because it will make a huge difference.

Speed difference test:

- File was 4360 lines long -

Old Method: 62.718 seconds

New method: .573 seconds

109 times faster!!

Edited by Jos

Share this post


Link to post
Share on other sites



Wow, that is sure a speed increase. I have not tested it yet. But sounds good. :)

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

I tested with a script of mine.

And your UDF is slower than the original.

This is the result:

---------------------------

Debug Test

---------------------------

LineCount counted 2818 lines in 0.434 seconds

_FileCountLines counted 2818 lines in 0.199 seconds

---------------------------

OK 

---------------------------

And I can't imagine you can make a UDF faster than _FileCountLines

Edit:

If you people want to test this, use my test script.

#include <File.au3>

$TestFile = "C:\scripts\Gui\NotepadNG.au3"

$StartTime = TimerInit()
   $CountLines_1 = LineCount($TestFile) & " lines in "
$Test1_Time = Round(TimerDiff($StartTime) / 1000, 3) & " seconds"

$StartTime = TimerInit()
   $CountLines_2 = _FileCountLines($TestFile) & " lines in "
$Test2_Time = Round(TimerDiff($StartTime) / 1000, 3) & " seconds"

MsgBox(64, "Debug Test", "LineCount counted " & $CountLines_1 & $Test1_Time & @CRLF _
& "_FileCountLines counted " & $CountLines_2 & $Test2_Time)

Func LineCount($file)
  Local $ones
  Local $tens
  Local $hundreds
  Local $thousands
  $thousands = 0
  Do
     $thousands = $thousands + 1000
     FileReadLine($file, $thousands)
  Until @error = -1
  $thousands = $thousands - 1000
  $hundreds = $thousands
  Do
     $hundreds = $hundreds + 100
     FileReadLine($file, $hundreds)
  Until @error = -1
  $hundreds = $hundreds - 100
  $tens = $hundreds
  Do
     $tens = $tens + 10
     FileReadLine($file, $tens)
  Until @error = -1
  $tens = $tens - 10
  $ones = $tens
  Do
     $ones = $ones + 1
     FileReadLine($file, $ones)
  Until @error = -1
  Return $ones - 1
EndFunc
Edited by Jos

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

Try this one... that is really faster :)

Func _NFileCountLines($file)
   Local $HFile,$AArray
   $HFile = FileOpen($File, 0)
   If $HFile = -1 Then
      SetError(1)
      Return 0
   EndIf
   $AArray = StringSplit( FileRead($HFile, FileGetSize($File)), @LF)
   FileClose($HFile)
   Return $AArray[0]
EndFunc
Edited by Jos

Visit the SciTE4AutoIt3 Download page for the latest versions        Beta files                                                          Forum Rules
 
Live for the present,
Dream of the future,
Learn from the past.
  :)

Share this post


Link to post
Share on other sites

#5 ·  Posted (edited)

Just for personal knowledge, why someone should need to know ONLY the number of lines in a file?

JdeB, why not...

Func _NFileCountLines($sFile)
  Local $AArray
  If Not FileExist($sFile) Then
     SetError(1)
     Return 0
  EndIf
  $AArray = StringSplit(FileRead($sFile, FileGetSize($sFile)), @LF)
  Return $AArray[0]
EndFunc

...the file is opened only once in both ways. Does not it?

Edited by Jos

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

Great ideas. Here's my version.

Func _NFileCountLines($sFile)
   Local $AArray
   Local $Content
   Local $FileSize
   If Not FileExists($sFile) Then
      Return 0
   Else
      $FileSize = FileGetSize($sFile)
      $Content = FileRead($sFile, $FileSize)
      If NOT StringInStr($Content, @LF) Then
         Return 1
      Else
         $AArray = StringSplit($Content, @LF)
         Return $AArray[0]
      EndIf
   EndIf
EndFunc

I'll test again and see what the difference is between the official UDF and this one.

Edit:

Definitely faster :)

Results:

---------------------------

Debug Test

---------------------------

_NFileCountLines counted 2818 lines in 0.049 seconds

_FileCountLines counted 2818 lines in 0.203 seconds

---------------------------

OK 

---------------------------

Edited by Jos

Share this post


Link to post
Share on other sites

#7 ·  Posted (edited)

It is also true that is in very large files (like millions of lines) loading the whole file in an array maybe a useless waste of memory.

Edited by ezzetabi

Share this post


Link to post
Share on other sites

#8 ·  Posted (edited)

#cs
NOTE, The start point is in reference to the number of 'zeros' of the line number you would like to start.
An example is if you wanted to start at 1000000 lines you would use wLineCount(filehandle, 6)
#ce
Func wLineCount($wFile, $wStart)
   FileReadLine($wFile)
   If @error = 1 Then
      Return "ERROR"
   Else
      $wCounted = 0
      For $wCount = $wStart to 0 Step -1
         While 1
            FileReadLine($wFile, 10 ^ $wCount + $wCounted)
            If @error = -1 Then
               ExitLoop
            Else
               $wCounted = 10 ^ $wCount + $wCounted
            EndIf
         WEnd 
      Next
      Return $wCounted
   EndIf
EndFunc

I just made this, with this you can state how you want to start the file reading, unlike yours that just starts with 1000

Please tell me what you think...

Edited by Wolvereness

Offering any help to anyone (to my capabilities of course)Want to say thanks? Click here! [quote name='Albert Einstein']Only two things are infinite, the universe and human stupidity, and I'm not sure about the former.[/quote][quote name='Wolvereness' date='7:35PM Central, Jan 11, 2005']I'm NEVER wrong, I call it something else[/quote]

Share this post


Link to post
Share on other sites

#9 ·  Posted (edited)

<span style='font-size:14pt;line-height:100%'>And the winner is:</span>

File is 10000 lines each 385 long, last line empty:

First Post: Linecount counted 9999 lines in 36.209 seconds

Standard UDF: _FileCountLines counted 9999 lines in 3.766 seconds

My version: _NFileCountLines counted 10000 lines in 3.25 seconds

Larry's Version Linecount counted 9999 lines in 36.894 seconds

:)

#include <File.au3>
$TestFile = "1notin2.txt"
$StartTime = TimerInit()
$CountLines_1 = LineCount($TestFile) & " lines in "
$Test1_Time = Round(TimerDiff($StartTime) / 1000, 3) & " seconds"
$StartTime = TimerInit()
$CountLines_2 = _FileCountLines ($TestFile) & " lines in "
$Test2_Time = Round(TimerDiff($StartTime) / 1000, 3) & " seconds"
$StartTime = TimerInit()
$CountLines_3 = _NFileCountLines($TestFile) & " lines in "
$Test3_Time = Round(TimerDiff($StartTime) / 1000, 3) & " seconds"
$StartTime = TimerInit()
$CountLines_4 = _LineCount($TestFile) & " lines in "
$Test4_Time = Round(TimerDiff($StartTime) / 1000, 3) & " seconds"
MsgBox(64, "Debug Test", "LineCount counted " & $CountLines_1 & $Test1_Time & @CRLF _
       & "_FileCountLines counted " & $CountLines_2 & $Test2_Time & @CRLF _
       & "_NFileCountLines counted " & $CountLines_3 & $Test3_Time & @CRLF _
       & "_LineCount counted " & $CountLines_4 & $Test4_Time)
;
Func LineCount($file)
   Local $ones
   Local $tens
   Local $hundreds
   Local $thousands
   $thousands = 0
   Do
      $thousands = $thousands + 1000
      FileReadLine($file, $thousands)
   Until @error = -1
   $thousands = $thousands - 1000
   $hundreds = $thousands
   Do
      $hundreds = $hundreds + 100
      FileReadLine($file, $hundreds)
   Until @error = -1
   $hundreds = $hundreds - 100
   $tens = $hundreds
   Do
      $tens = $tens + 10
      FileReadLine($file, $tens)
   Until @error = -1
   $tens = $tens - 10
   $ones = $tens
   Do
      $ones = $ones + 1
      FileReadLine($file, $ones)
   Until @error = -1
   Return $ones - 1
EndFunc  ;==>LineCount
;
Func _NFileCountLines($file)
   Local $HFile, $AArray
   $HFile = FileOpen($file, 0)
   If $HFile = -1 Then
      SetError(1)
      Return 0
   EndIf
   $AArray = StringSplit( FileRead($HFile, FileGetSize($file)), @LF)
   FileClose($HFile)
   Return $AArray[0]
EndFunc  ;==>_NFileCountLines
;
Func _LineCount($LC_File)
   FileReadLine($LC_File)
   If @error Then Return -1
   $LC_Counted = 0
   $LC_Step = 10000
   While $LC_Step >= 1
      FileReadLine($LC_File, $LC_Counted + $LC_Step)
      If @error Then
         $LC_Step = $LC_Step / 10
      Else
         $LC_Counted = $LC_Counted + $LC_Step
      EndIf
   Wend
   Return $LC_Counted
EndFunc  ;==>_LineCount
Edited by Jos

Visit the SciTE4AutoIt3 Download page for the latest versions        Beta files                                                          Forum Rules
 
Live for the present,
Dream of the future,
Learn from the past.
  :)

Share this post


Link to post
Share on other sites

#10 ·  Posted (edited)

Whoah :)

Ok, here's the deal. Before I knew include files existed, I made my own line counter, which is almost the exact same as the included one, EXCEPT, i used:

FileReadLine($file,$i)

instead of

FileReadLine($file)

The difference? About 100x loss in speed!

So here i was thinking, wow this thing is slow, even though one little paramater slowed it down.

Edited by ryeguy

Share this post


Link to post
Share on other sites

Here are my results... so the winner goes to standard include... correct...

<{POST_SNAPBACK}>

nah...

On my calculator is .884 bigger than .262 :)


Visit the SciTE4AutoIt3 Download page for the latest versions        Beta files                                                          Forum Rules
 
Live for the present,
Dream of the future,
Learn from the past.
  :)

Share this post


Link to post
Share on other sites

What I meant... is the Winner... "_NFileCountLines" should go into the standard include...  goofus

Lar.

<{POST_SNAPBACK}>

oops, my internal translator failed here... :">

Will make the change when i get the stuff from Jeremy....


Visit the SciTE4AutoIt3 Download page for the latest versions        Beta files                                                          Forum Rules
 
Live for the present,
Dream of the future,
Learn from the past.
  :)

Share this post


Link to post
Share on other sites

i did this (http://www.autoitscript.com/fileman/users/public/piccaso/linecount.zip)

for marc some time ago, its an external solution (dll)

somehow it fits in here :">


CoProc Multi Process Helper libraryTrashBin.nfshost.com store your AutoIt related files here!AutoIt User Map

Share this post


Link to post
Share on other sites

Why don't you include my line counter in the competition???


Offering any help to anyone (to my capabilities of course)Want to say thanks? Click here! [quote name='Albert Einstein']Only two things are infinite, the universe and human stupidity, and I'm not sure about the former.[/quote][quote name='Wolvereness' date='7:35PM Central, Jan 11, 2005']I'm NEVER wrong, I call it something else[/quote]

Share this post


Link to post
Share on other sites

#16 ·  Posted (edited)

Why don't you include my line counter in the competition???

<{POST_SNAPBACK}>

Ofcourse:

Used wLineCount($TestFile,3)

Time= 33.29

First Post: Linecount counted 9999 lines in 36.209 seconds

Standard UDF: _FileCountLines counted 9999 lines in 3.766 seconds

My version: _NFileCountLines counted 10000 lines in 3.25 seconds

Larry's Version Linecount counted 9999 lines in 36.894 seconds

Wolvereness Version WLinecount counted 10000 lines in 33.29 seconds

Edited by JdeB

Visit the SciTE4AutoIt3 Download page for the latest versions        Beta files                                                          Forum Rules
 
Live for the present,
Dream of the future,
Learn from the past.
  :)

Share this post


Link to post
Share on other sites

Just for the heck of it, i created an internal function which was 3 times faster with the previous described test.


Visit the SciTE4AutoIt3 Download page for the latest versions        Beta files                                                          Forum Rules
 
Live for the present,
Dream of the future,
Learn from the past.
  :)

Share this post


Link to post
Share on other sites

so what is it then?

Share this post


Link to post
Share on other sites

Just for personal knowledge, why someone should need to know ONLY the number of lines in a file?

I have used this to read a file that had UPC's in it to determine how many UPC's were being sent to a cash register (The cash registers we used only had room for 6001 items, so i needed to know how close to the limit we were)

We have enough youth. How about a fountain of SMART?

Share this post


Link to post
Share on other sites

so what is it then?

<{POST_SNAPBACK}>

I Called it FileCountRec() on my pc, but that doesn't mean it will be included in the official AutoIt version... just wanted to test with internal code to see how fast it would be.

Visit the SciTE4AutoIt3 Download page for the latest versions        Beta files                                                          Forum Rules
 
Live for the present,
Dream of the future,
Learn from the past.
  :)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0