Jump to content

Performance Degradation In Loop


Recommended Posts

Problem:

The code posted below degrades dramatically in performance the longer the loop

continues: Values are averages of multiple runs, and the performance continues to worsen as the number of records increases.

Records 1 - 1000 - 4.7 seconds

1001 - 2000 - 17.1 seconds

2001 - 3000 - 28.6 seconds

History:

I had posted a theoretical question in the bugs area (thx to all who responded), but I consider this issue to likely be me rather than the tool. I have also used filewriteline and filereadline instead of filewrite and fileread (to try to ensure that

arrays/memory usage was not growing w/o bound) , but similar performance has been noted.

Code OverView:

This code is designed (and debugged and works :whistle: ) to store a big file in a variable $filecontents, read its @CRLF delimited lines into an array $lines, then break the lines out into records using @tab as a delimiter, do "stuff useful to me" w/ certain fields, and then write the output.

$statusfile and the if loop dealing with $thousands were inserted only for

obtaining performance information.

$inputfilename = fileopendialog("Open FILE","c:\","(*.*)",1)
$outputfilename = $inputfilename & ".out"
$statusfilename = $outputfilename & ".status"
$hInputFile = FileOpen($inputfilename,0)
$inputfilesize = FileGetSize($inputfilename) 

$filecontents = FileRead($hInputfile,$inputfilesize)
fileclose($hInputfile)
$statusfile = FileOpen($statusfilename,1)
$lines = StringSplit($filecontents,@LF)
Dim $Time[int($lines[0]/1000)]
;ProgressOn("Progress","Parsing Lines")
$thousands = 0
$begin = TimerStart()
for $i = 1 to $lines[0] 
    $record = StringSplit($lines[$i],@tab)
    if $record[0] > 37 tHEN
    $ichg_dos     = $record[1]
  $ichg_dtrpt   = $record[2]
  $ichg_chgcode = $record[3]
  $ichg_patname = $record[13]
  $ichg_units   = $record[18]
  $ichg_acctnum = $record[35]
  $ichg_ichg    = $record[37]
  $lineout = $ichg_ichg & "," & $ichg_acctnum & "," & $ichg_dos & "," & $ichg_chgcode & "," & $ichg_patname & "," & $ichg_units
  $fileout = $fileout  & $lineout & @LF
    ENDIF   
                  if int($i/1000) = ($i / 1000 )then
   $thousands = $thousands + 1
  $time[$thousands] = timerstop($begin)
  $begin = timerStart()
  $statuslineout = $thousands & "," & $time[$thousands] & @LF
  FileWriteLine($statusfile,$statuslineout)
  traytip($i,$time[$thousands],3)
;   ProgressOff()
;msgbox(4096,$thousands,$statuslineout,3)
    endif
next
$outputfile = FileOpen($outputfilename,2)
FileWrite($outputfile,$fileout)
FileClose($outputfile)
FileClose($statusfile)

Anyone have any ideas?

Edited by flyingboz

Reading the help file before you post... Not only will it make you look smarter, it will make you smarter.

Link to comment
Share on other sites

I'd expect StringSplit to have O(n) time-complexity. In other words, StringSplit should takes longer to execute on longer strings.

$record = StringSplit($lines[$i],@tab)

How long does this statement take to execute each time? Maybe it's the main problem?

Use Mozilla | Take a look at My Disorganized AutoIt stuff | Very very old: AutoBuilder 11 Jan 2005 prototype I need to update my sig!
Link to comment
Share on other sites

Strings are the same length, in this case, each element of the

array $lines is approx 2000 characters.

Reading the help file before you post... Not only will it make you look smarter, it will make you smarter.

Link to comment
Share on other sites

  • Developers

Could it be because the $fileout gets bigger everytime it loops, so would it help if you write the line when its formatted?

$INPUTFILENAME = FileOpenDialog("Open FILE","c:\","(*.*)",1)
$OUTPUTFILENAME = $INPUTFILENAME & ".out"
$STATUSFILENAME = $OUTPUTFILENAME & ".status"
$HINPUTFILE = FileOpen($INPUTFILENAME,0)
$INPUTFILESIZE = FileGetSize($INPUTFILENAME) 

$FILECONTENTS = FileRead($HINPUTFILE,$INPUTFILESIZE)
FileClose($HINPUTFILE)
$STATUSFILE = FileOpen($STATUSFILENAME,1)
$LINES = StringSplit($FILECONTENTS,@LF)
Dim $TIME[Int($LINES[0]/1000)]
;ProgressOn("Progress","Parsing Lines")
$THOUSANDS = 0
$BEGIN = TimerStart()
$OUTPUTFILE = FileOpen($OUTPUTFILENAME,2)
For $I = 1 To $LINES[0] 
   $RECORD = StringSplit($LINES[$I],@TAB)
   If $RECORD[0] > 37 Then
      $ICHG_DOS     = $RECORD[1]
      $ICHG_DTRPT   = $RECORD[2]
      $ICHG_CHGCODE = $RECORD[3]
      $ICHG_PATNAME = $RECORD[13]
      $ICHG_UNITS   = $RECORD[18]
      $ICHG_ACCTNUM = $RECORD[35]
      $ICHG_ICHG    = $RECORD[37]
      $LINEOUT = $ICHG_ICHG & "," & $ICHG_ACCTNUM & "," & $ICHG_DOS & "," & $ICHG_CHGCODE & "," & $ICHG_PATNAME & "," & $ICHG_UNITS
      FileWriteLine($OUTPUTFILE,$LINEOUT)
   EndIf 
   If Int($I/1000) =($I / 1000 )Then
      $THOUSANDS = $THOUSANDS + 1
      $TIME[$THOUSANDS] = TimerStop($BEGIN)
      $BEGIN = TimerStart()
      $STATUSLINEOUT = $THOUSANDS & "," & $TIME[$THOUSANDS] & @LF
      FileWriteLine($STATUSFILE,$STATUSLINEOUT)
      TrayTip($I,$TIME[$THOUSANDS],3)
     ; ProgressOff()
     ;msgbox(4096,$thousands,$statuslineout,3)
   EndIf
Next
FileClose($OUTPUTFILE)
FileClose($STATUSFILE)

SciTE4AutoIt3 Full installer Download page   - Beta files       Read before posting     How to post scriptsource   Forum etiquette  Forum Rules 
 
Live for the present,
Dream of the future,
Learn from the past.
  :)

Link to comment
Share on other sites

My timing is performed outside of the filewrite command - except for the

$statusfile being used to log performance data - so the filewrite of a huge

variable is irrelevant.

However, creating the large $fileout variable appears to be the culprit - I switched to using FileWriteLine within the loop and my times through the for loop are now down to consistent values of 700 msec per 1000 records iterated. Thanks to all

who read and offered comments.

For Extra Credit :whistle:

I don't know why growing that variable would affect performance so much. Does it make sense that adding bytes to a defined variable would take MORE time than

writing to a file handle?? The machine in question never showed mem utilization of greater than 200 MB of a 1GB 2GHZ PC, no paging going on, etc....

Just so we know - at completion with my test data the length of the defined variable would have been 112 chars per line (including @LF) * 15635 lines

112*15635 = 1751120 one byte chars. -- seems like a huge performance hit for

less than 2MB of memory, particularly when compared with the fact that I'm now

performing 15635 filewriteline operations instead of 1 filewrite.

Edited by flyingboz

Reading the help file before you post... Not only will it make you look smarter, it will make you smarter.

Link to comment
Share on other sites

One of the problems could be the line

$fileout = $fileout  & $lineout & @LF

Now my understanding of the way strings work is that if there isn't enough room in the destination array for the string then it deletes the storage, allocates enough new storage for the new string and then performs a copy of the existing string.

So let us assume that each line adds exactly 112 bytes and that there are 15635 lines. So initially $fileout is empty (allocated size is 0). The first time round the loop $fileout is too small to assign 112 bytes to. So we allocate 112 bytes of memory and copy 112 bytes. Next time round the loop we want to allocate 224 bytes into $fileout. $fileout is too small, so we delete its current memory, allocates 224 bytes of memory and copy 224 bytes. Next loop, we delete 224 bytes of memory, allocates 336 bytes and copies 336 bytes. So after 15635 loops we have:

1. Performed 15634 memory deletions (total memory deleted is 122,218,795, i.e. 122MB de-allocated)

2. Performed 15635 memory allocations (total memory allocated is 122,234,430, i.e. 122MB)

3. Copied 112 + 2*112 + 3*112 + ... + 15635*122 bytes, i.e. 13,690,256,160 bytes, i.e. 12GB. Technical aside 1 + 2 + .. + n-1 + n = (n*(n+1))/2

Now memory allocations and deletions, especially of large blocks can be slow (especially since we're doing a lot of allocations interspersed with de-allocations, which could fragment the heap - a well known performance bottleneck).

Copying 12GB of data is also not a good performance enhancer :angry:

Aside to developers (especially Jon) - this analysis is based on the assumption that the line $a = $a & $b uses the AString::assign function.

Now there are two technical improvements that could be made:

1. If we could preallocate the size of storage for a string then we could get rid of the memory allocations/de-allocations

2. When AString::assign needs to grow memory it is more efficient to grow the size of the array by a fixed factor. There is research (which I could probably track down if necessary) that states that the fixed factory should be 1.5. The down size is that there is some wasted memory (i.e. memory that is allocated but doesn't contain any data)

This could explain why performing a file write of each line improves the performance. File writes perform indexing operations which mean that they don't need to copy what is already written to the file (not quite true in the presence of file caches, but then the actual writing to disk will using DMA or similar techniques which will not tie up the CPU).

I welcome any comments from developers as to why this analysis is flawed B)

P.S. I've just had another thought. $a = $a & $b probably creates a temporary variable of size sizeof($a) + sizeof($:whistle:, copies $a and $b into this temporary and then, after growing the memory, copies the temporary into the new memory, finally de-allocating the memory for the temporary. This would double both the memory allocations, de-allocations and the size of the data copied (24Gb copied - aaagggghhhhh!)

GrahamS

Link to comment
Share on other sites

Graham,

Thanks so much - if you're off in some detail as to exactly which method is being used, what you've stated matches well w/ results - I was seeing memory usage

going up and down - but didn't put the pieces together.

I knew there was some reason those computer science guys didn't all want to

be Electrical Engineers like me :whistle:

This also brings up the question:

How would I dimension a string variable beforehand (in those cases where it was possible to calculate) or to assign a "huge enough to never be worried about it" size to the variable, and then write to it?

Would it just be - (and if so , thank god and greyhound for line continuation)?

Dim $sVar = "xxxxxxxxxxxxxx.........to insane number of x?"

If I understand your analysis correctly, the issue I ran into here is not the size of the variable, but having to constantly "redim" it.

Hmm.. the docs appear to be mute on how to specify the amount of memory to

be allocated (dimensioned) by a nonarray variable, though the number of elements in an array is covered pretty nicely.

Edited by flyingboz

Reading the help file before you post... Not only will it make you look smarter, it will make you smarter.

Link to comment
Share on other sites

Funny, I "instinctively" use the 1.5 grow thing whenever I do reallocation, although I've never heard that's a "good" number to use. To me, its worth it to waste just a little space than have to constantly realloc every time you want to add something.

Link to comment
Share on other sites

This also brings up the question:

How would I dimension a string variable beforehand (in those cases where it was possible to calculate) or to assign a "huge enough to never be worried about it" size to the variable, and then write to it?

Don't think that it can be done just now. However, the proposed redim command (which is intended to redimension arrays), could also reallocate the size of string variables.

Would it just be - (and if so , thank god and greyhound for line continuation)?

Dim $sVar = "xxxxxxxxxxxxxx.........to insane number of x?"
That would work, but I would hate to type 2 million x's, even with cut and paste

If I understand your analysis correctly, the issue I ran into here is not the size of the variable, but having to constantly "redim" it.

That would be a big component, but don't forget the 2GB of byte copying.

Heh, just had an amazing thought. Add an optimisation phase to AutoIt B)

It should be possible to recognise constructs of the type $a = $a & $b and, providing that there is enough room in $a, just copy $b into $a at the correct place, eliding the temporary. Wow :angry::whistle:

GrahamS

Link to comment
Share on other sites

Funny, I "instinctively" use the 1.5 grow thing whenever I do reallocation, although I've never heard that's a "good" number to use.  To me, its worth it to waste just a little space than have to constantly realloc every time you want to add something.

OK, a bit of googling for the 1.5 recommendation throws up:

Discussion on comp.lang.c++.moderated - which appears to conclude that the correct number is actually the golden ratio, i.e.about 1·61803

and perhaps more importantly the exact article that I remember which is Herb Sutter's More Effective C++ (Item 13). He quotes Andrew Koenig's September 1998 column in the Journal of Object-Oriented Programming as containing the analysis.

This item is available online at guru of the week (gotw), which contains a good discussion on the correct growth stratgey. By the way, gotw is an excellent resource in general

According to a message on boost, the Koenig article is not available on line

GrahamS

Link to comment
Share on other sites

That would work, but I would hate to type 2 million x's, even with cut and paste

You don't have to (tounge firmly in cheek)

DimString("$var",10000)


Func DimString($StringVariable,$intReallyBig)
  
    Opt("SendKeyDelay",1) ; with thanks to Valik :)
    $lines_needed = 1 + int($intReallyBig / 4000); autoit has line max of 4096
    send ("{ENTER 2}")
    send ("Dim ")
    send ($StringVariable)
    send ('="')
    For $i = 1 to $lines_needed
  
        send ("{x 4000}")
  if $i <> $lines_needed Then
    send(" _ {ENTER}")
  endif
    Next
    send ("{ENTER 2}")
  Return
EndFunc

Running for cover!!! :whistle:

Edited by flyingboz

Reading the help file before you post... Not only will it make you look smarter, it will make you smarter.

Link to comment
Share on other sites

This also brings up the question:

How would I dimension a string variable beforehand (in those cases where it was possible to calculate) or to assign a "huge enough to never be worried about it" size to the variable, and then write to it?

Instead of a special Dim statement, wouldn't something like this work ..

$sVar = StringRepeat(" ", 1000)

.. of course, it means a new intrinsic StringRepeat function :whistle: .. which also has other benefits, by the way (padding and formatting; populating arrays in collusion with StringSplit; etc)

Link to comment
Share on other sites

  • Administrators

D'oh. The strings used to work in a similar way but it got removed during some bug hunt. i.e. if they needed to grow then instead of just adding a few bytes they doubled the amount of space. I'll add that code back in tonight for both the Variant strings and the AString strings.

So 1.6 is the magic number eh? :whistle: I think I used to deallocate the memory if less than half of the string memory was used too (so that variables that were allocated a massive amount of memory didn't stay massive).

Link to comment
Share on other sites

  • Administrators

Heh, just had an amazing thought. Add an optimisation phase to AutoIt  :whistle:

That's on my list. The lexer could be speeded up _loads_ at the expensive of a lot of memory so I'm trying to think of a nice balance.
Link to comment
Share on other sites

One addition that would help in this case and be relatively easy to implement is to add a string append operator

$a &= $b

This could call an append function in the string classes. No temporaries required

GrahamS

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...