Sign in to follow this  
Followers 0
Suppir

AutoIt speed in regular expressions

20 posts in this topic

#1 ·  Posted (edited)

Hello!

I've tested some typical operations for string replacement in AutoIt(3.3.4) and Perl(5.10):

1) open file (100 mb of ANSI text) and read it line by line

2) replace fragments with simple regular expression (s/(.+?):.+/$1/)

3) write to another file

I've got these results:

Perl - 2,4 seconds;

AutoIt - about 16 seconds.

I thought Perl and PCRE are the similar regex flavours (with similar speed). So, the problem in the way AutoiT working with PCRE.

Edited by Suppir

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

whatever Edited by MvGulik

"Straight_and_Crooked_Thinking" : A "classic guide to ferreting out untruths, half-truths, and other distortions of facts in political and social discussions."
"The Secrets of Quantum Physics" : New and excellent 2 part documentary on Quantum Physics by Jim Al-Khalili. (Dec 2014)

"Believing what you know ain't so" ...

Knock Knock ...
 

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

Ok, try this code.

First, create the test file with code:

$OUT = FileOpen("test.txt", 2)
For $x = 1 to 1000000
    FileWriteLine($OUT, "Some text: 1234567890")
Next

Then try this code AutoIt code on file:

#include <Timers.au3>

$IN = FileOpen("test.txt", 0)
$OUT = FileOpen("test2.txt", 2)

$starttime = _Timer_Init()
While True
    $sLine = FileReadLine($IN)
    if @error = -1 Then ExitLoop
    $sLine = StringRegExpReplace($sLine, "(.+?):.+", "$1")
    FileWriteLine($OUT, $sLine)
WEnd
MsgBox(0, "", _Timer_Diff($starttime))

Then try this code (absolutely equivalent) on Perl:

use Benchmark;
open(IN, "test.txt");
open(OUT, ">test2.txt");
$t0 = new Benchmark;
while(<IN>){
    s/(.+?):.+/$1/;
    print OUT
}
$t1 = new Benchmark;
$td = timediff($t1, $t0);
print "the code took:",timestr($td),"\n";
<>

Result

(on my home Celeron 420):

AutoIt - 29 sec

Perl - 6.5 sec

Try at your PC

Edited by Suppir

Share this post


Link to post
Share on other sites

I've tried testing speed at work on my pentium E5300. End the difference in speed was from 7 to 10 times.

Share this post


Link to post
Share on other sites

#5 ·  Posted (edited)

It would be faster to read all of the file at once, instead of reading one line at a time. Also get rid of that Timers.au3, the way you're using it you may aswell use the native funcs.

Edited by AdmiralAlkex

Share this post


Link to post
Share on other sites

Suppir,

What is taking the time is the multiple FileWriteLine commands - they open and close the file each time, and they have an awful lot of opening and closing to do : . It would be better to read in the whole file at once - although you have to change the RegEx to cope with this:

$CREATE = FileOpen("test.txt", 2)
For $x = 1 to 1000000
    FileWriteLine($CREATE, "Some text: 1234567890")
Next
FileClose($CREATE)

ConsoleWrite("Written!" & @CRLF)

$IN = FileOpen("test.txt", 0)
$OUT_1 = FileOpen("test1.txt", 2)
$OUT_2 = FileOpen("test2.txt", 2)

$starttime = TimerInit()
While True
    $sLine = FileReadLine($IN)
    if @error = -1 Then ExitLoop
    $sLine = StringRegExpReplace($sLine, ": 1234567890", "")
    FileWriteLine($OUT_1, $sLine)
WEnd
ConsoleWrite(TimerDiff($starttime) & @CRLF)

FileClose($OUT_1)

$starttime = TimerInit()
$sText = FileRead($IN)
$sText = StringRegExpReplace($sText, ": 1234567890", "")
FileWrite($OUT_2, $sText)
ConsoleWrite(TimerDiff($starttime) & @CRLF)

FileClose($IN)
FileClose($OUT_2)

This gives me 41.3 secs and 2.8 secs and 2 identical output files. The moral? Optimise the code for the language - do not blindly copy the same code. What is important is the result. :D

By the way, you also do not need the Timer include file, the builtin commands are faster and just as accurate. :huggles:

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites

#7 ·  Posted (edited)

Melba23, your code is much faster!

I think that Perl do a smart buffering of STDIN/STDOUT. It does not write each line, it collects lines in memory and writes them in portions. So it works fast in both cases: line by line, or file at once.

And AutoIt tryes to write to disc everytime when FileWriteLine() is used. This causes very slow speed.

BUT we have a problem with your way. If all the text in one variable, it is hard to do some complicated text work with lines. We will need to split lines (to work with them apart from each other), and the splitting will be going long time.

Edited by Suppir

Share this post


Link to post
Share on other sites

Suppir,

You are correct about the problem with having the text in a single variable. AutoIt can, of course, read the whole file into an array and then you can work on each line separately - but that is pretty slow too:

#include <File.au3>

Local $aArray
$starttime = TimerInit()
_FileReadToArray("test.txt", $aArray)
For $i = 1 To $aArray[0]
    $aArray[$i] = StringRegExpReplace($aArray[$i], ": 1234567890", "")
Next
_FileWriteFromArray("test3.txt", $aArray, 1)
ConsoleWrite(TimerDiff($starttime) & @CRLF)

This runs at about 30 secs on my machine.

But to return to your original statement - it is not the PCRE that is slow in AutoIt, it is the array/file manipulation, and that is not easily fixed. :D

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites

Melba23, I suppose, you're right - this is not a RCRE problem.

But here is some problem in reading/writing files lines by line.

If here is some authors of AutoIt - I wish them look at this problem and try to solve it!

Share this post


Link to post
Share on other sites

Melba23, I have tried function _FileReadToArray() and this function has a very bad (and undocumented) thing.

The function CUTS(TRIMS) the ends of lines if they consist of spaces.

I had a huge problem because of this function recently - it had damaged my database.

Share this post


Link to post
Share on other sites

Suppir,

Have you put in a bug report? That is the way to get things fixed around here. Let me have a play around this evening before you do though! :D

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites

Suppir,

Have you put in a bug report? That is the way to get things fixed around here. Let me have a play around this evening before you do though! :D

M23

No. My english is not very good to explain it in a right way...

Share this post


Link to post
Share on other sites

Here is this function (file.au3):

Func _FileReadToArray($sFilePath, ByRef $aArray)
    Local $hFile = FileOpen($sFilePath, $FO_READ)
    If $hFile = -1 Then Return SetError(1, 0, 0);; unable to open the file
    ;; Read the file and remove any trailing white spaces
    Local $aFile = FileRead($hFile, FileGetSize($sFilePath))
;~  $aFile = StringStripWS($aFile, 2)
    ; remove last line separator if any at the end of the file

# HERE IS A TRIM. AND THIS IS NOT DOCUMENTED NOWHERE. That caused data damage in my database.
    If StringRight($aFile, 1) = @LF Then $aFile = StringTrimRight($aFile, 1) 
    If StringRight($aFile, 1) = @CR Then $aFile = StringTrimRight($aFile, 1)

    FileClose($hFile)
    If StringInStr($aFile, @LF) Then
        $aArray = StringSplit(StringStripCR($aFile), @LF)
    ElseIf StringInStr($aFile, @CR) Then ;; @LF does not exist so split on the @CR
        $aArray = StringSplit($aFile, @CR)
    Else ;; unable to split the file
        If StringLen($aFile) Then
            Dim $aArray[2] = [1, $aFile]
        Else
            Return SetError(2, 0, 0)
        EndIf
    EndIf
    Return 1
EndFunc   ;==>_FileReadToArray

Share this post


Link to post
Share on other sites

Suppir,

Those lines are there to remove a final <newline> at the end of the file which would otherwise cause a blank element to be added to the array when it is split with StringSplit. The function used to be much stricter and took ALL blank lines away from the end of the file, as you can see here.

As you now know that this happens, it should be easy for you to write your own version of the UDF to prevent it damaging your databases. I do not believe that the Devs would entertain adding a parameter to skip this line as your case seems pretty specific and they tend, understandably, not to want to code for "special cases".

Here is my suggestion as to how you could rewrite the function:

; #FUNCTION# ====================================================================================================================
; Name...........: _Suppir_FileReadToArray
; Description ...: Reads the specified file into an array.
; Syntax.........: _Suppir_FileReadToArray($sFilePath, ByRef $aArray, $iStrip)
; Parameters ....: $sFilePath - Path and filename of the file to be read.
;                  $aArray    - The array to store the contents of the file.
;                  $iStrip    - True (default) = Remove a final line separator at the end of the file
;                             - False = Leave a final line separator at the end of the file
; Return values .: Success - Returns a 1
;                  Failure - Returns a 0
;                  @Error  - 0 = No error.
;                  |1 = Error opening specified file
;                  |2 = Unable to Split the file
; Author ........: Jonathan Bennett <jon at hiddensoft dot com>, Valik - Support Windows Unix and Mac line separator
; Modified.......: Jpm - fixed empty line at the end, Gary Fixed file contains only 1 line.
; Remarks .......: $aArray[0] will contain the number of records read into the array.
; Related .......: _FileWriteFromArray
; Link ..........:
; Example .......: Yes
; ===============================================================================================================================
Func _Suppir_FileReadToArray($sFilePath, ByRef $aArray, $iStrip = True)
    Local $hFile = FileOpen($sFilePath, $FO_READ)
    If $hFile = -1 Then Return SetError(1, 0, 0);; unable to open the file
    ;; Read the file and remove any trailing white spaces
    Local $aFile = FileRead($hFile, FileGetSize($sFilePath))
;~  $aFile = StringStripWS($aFile, 2)
    ; if required, remove last line separator if any at the end of the file
    If $iStrip = True Then
        If StringRight($aFile, 1) = @LF Then $aFile = StringTrimRight($aFile, 1)
        If StringRight($aFile, 1) = @CR Then $aFile = StringTrimRight($aFile, 1)
    EndIf
    FileClose($hFile)
    If StringInStr($aFile, @LF) Then
        $aArray = StringSplit(StringStripCR($aFile), @LF)
    ElseIf StringInStr($aFile, @CR) Then ;; @LF does not exist so split on the @CR
        $aArray = StringSplit($aFile, @CR)
    Else ;; unable to split the file
        If StringLen($aFile) Then
            Dim $aArray[2] = [1, $aFile]
        Else
            Return SetError(2, 0, 0)
        EndIf
    EndIf
    Return 1
EndFunc   ;==>_Suppir_FileReadToArray

Just put it in your own include folder (you do have one I suppose?) and you have your own UDF to use! :D

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites

I find it was bit unfair to benchmark blindly Perl and AutoIt.

The first is a language specially tailored to manipulate text with regular expressions while the second is a generic Windows scripting tool. The first is interpreted, the second uses an intermediate syntax tree. You just couldn't remove regexp support from the first one without devastating the structure of the language itself, while removal could be a build option for the second (if it made any sense to remove the feature). The first is essentially devoted to process text efficiently while the second includes extensive support for a specific OS and many unique OS features, which by themselves certainly totalize a huge number of development hours. The first took benefit of support and enhancements from large development teams for their professional needs while the second relies on volontary giveout by a very small number of dedicated individuals.

And yet AutoIt proves it can do a very decent job processing strings in general and regexps as in your example(s) when a suitable reading strategy is used.

All in all, I hope readers appreciate the good performance of AutoIt.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

Thanks to everebody!

I use AutoIt for several month. For many cases this language is awesome. And I hope developeres will fix the problem with reading/writing files line by line.

Share this post


Link to post
Share on other sites

#17 ·  Posted (edited)

One of things that makes Autoit slower is syntax case insensitive..

I say about script execution:

$var = $VaR

_Func() = _func()

to compare "_Func" and "_func", Autoit probably convert their to .Upper or .Lower, and doing the same action with all commands, and it takes a time..

Edited by Godless

_____________________________________________________________________________

Share this post


Link to post
Share on other sites

Yes partly, but by how much is an open question. Of course it would certainly be possible to have a cryptic enough switch for making the syntax parser case-autist, but is it worth the burden?

Another thing is that most string comparison don't need to be case insensitive. But of course even if the range of string comparison operators would be complete, how many scripts would be using them? I don't see == being used that much!

Nonetheless, drops make oceans...


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

Melba23, I have tried function _FileReadToArray() and this function has a very bad (and undocumented) thing.

The function CUTS(TRIMS) the ends of lines if they consist of spaces.

I had a huge problem because of this function recently - it had damaged my database.

Replace _FileReadToArray() With this function and you won't have that problem.

Func _StringToArray2($sStr)
   If FileExists($sStr) Then $sStr = FileRead($sStr)
   If $sStr = "" Then Return SetError(1);; Empty strings not allowed
   Local $aRegEx = StringRegExp($sStr, "(?i)(.*)(?:\z|\v)+", 3)
   If @Error Then Return SetError(2) ;; RegExp failed
   Return $aRegEx
EndFunc    ;<===> _StringToArray2()

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0