AutoIt v3.X with 10-30x speedup?

tylo · August 26, 2004

Jon, and The Development Team!

Speed up by 10-30x? Yep, it is even realistic, because a large part of the work is already done. The changes that must be done in the current code may be surprisingly small (compared to all the benefits it gives), which by and large will be to remove existing code. That is the current code that does lexical analysis (token recognition from input source), symbol look-up (funcs, vars, macros), and checking for syntax errors, DURING execution!

Just let me say that this project's current or future success does not in any way depend on doing this. As the rest of you, I love AutoIt the way it is. So, this is simply a suggestion for future improvements. I also find it intriguing if AutoIt could do some heavy computation on its own - and speedy!

As some of you know, I wrote the au3check utility tool just before this summer. This tool does a full lexical and grammar check of au3 source code, and it also checks that symbols in the input (funcs, vars and macros) are defined. That utility still works with the latest Autoit3 beta - you will only need to update the symbols text file (.def), which defines all the current predefined functions, variables and macros.

The au3check can be downloaded from:

http://home.tiscali.no/tylohome/sw/Au3Check.zip

http://home.tiscali.no/tylohome/sw/Au3Check_src_vc6.zip

The work on au3check made me think that AutoIt could fairly easily be modified to execute code in a much more efficient way than I believe it is doing today, like a language such as TCL. The following is how I believe AutoIt is executing its scripts today. This is only from briefly wading the source code, and based on how I see the execution behaves:

A. An initial brief scan of the source is done, to collect entry points for functions, checking that all functions called are defined, and that all loops are well formed, e.g. each While has a Wend, etc.

B. The code is executed by scanning the source code directly. Everything is done: lexical analysis, syntax checking, grammar parsing and semantic analysis, which results in execution of C++ code that is an interpretation of the AutoIt source code (of cource the C++ code is also compiled into machine-code).

(note: if this is all wrong, much of the following is based on wrong premises)

There are two disadvantages with this approach:

1. The full source is never checked for beeing correct - only executed code is checked. Conditional blocks that are seldom executed may have syntax errors, but it may take many runs before any notice. (Of cource, au3check does exactly this - it checks the entire source code for beeing correct in one go).

2. Speed. The constant lexical analysis and syntax/grammar checking during execution is a resource hog. Most of the execution time is used for this, and only a small percent for real execution of the interpreted code.

--

In the following, I describe an alternative route. The entire input is fully scanned and parsed by a modification of the au3check code, that both checks syntax and grammar, but in the same go, translates the parsed tokens into an internal byte-code stream. This byte-code stream is now finished crunched and chewed, correct input for the parser:

Example. Take the following AutoIt source:

$myLongVariableName = 0
While ($myLongVariableName <= 10)
   $myLongVariableName = $myLongVariableName + 1
  ; Call a function
   myLongFunctionName($myLongVariableName, 1.0)
Wend

The au3check modified code would produce the following internal byte-stream (I have written how many bytes each element in the byte-stream take up, in the line below the symbols):

VAR ptr = NUM 0 NL WHILE ( VAR ptr LE NUM 10 ) NL VAR ptr = VAR ptr + NUM 1 NL
  1  4  1  1  8  1   1   1  1   4  1   1   8 1  1   4  1  1   1  4  1  1  8  1

FUN ptr ( VAR ptr , NUM 1.0 ) NL WEND NL
  1  4  1  1   4  1  1   8  1  1   1   1

As you see, the following symbols are represented as one-byte constants:

FUN, VAR, NUM, WHILE, WEND, LE, NL

'=', '(', ')', ',', '+'

whereas 'ptr' is a four-byte pointer (or eight-byte on 64-bit systems) to a variable/function object. Similarly, the number constants (8 bytes double binary representation of these numbers) are "arguments" to their respective byte-codes. I.e. they immediately follow a FUN, VAR, or NUM byte-code. I didn't mention strings in the example, but a STR byte-code would require two arguments: length (4 bytes) and then 'length' number of bytes that represented the string.

The byte-code stream is very similar to the input for the current parser, but there is no longer need for any lexical analysis, syntax/grammar checking, expensive lookup of variables or functions during execution.

So, down to buissiness. My part would be to modify, or guide you to modify the au3check source so that it also produce the byte-stream. This is luckily a relatively small and easy task. Then this source code must be taken into the current project, which shouldn't be a big deal either, because it doesn't use any external libraries, or STL.

The hardest part is to adopt the parser engine to take the byte-stream, instead of the current source code input. The Lexer is replaced by the one in au3check (but as said, only used once when creating the byte-code).

For those worried about the size of the executable, I wouldn't think it should increase by much, if at all! The current au3check.exe is only 16 KB (upx'ed) and it won't expand much by adding the byte-code stream writing. Then a lot of code in the current Autoit project can be thown out, which may equal it out (maybe).

/edited: spellings etc.

Edited August 26, 2004 by tylo

Nutster · August 26, 2004

Very close to what is actually happening. What is happening for each line is the generation a series of tokens, where each token knows its type and optional value in a Variant (for literals, etc.) and the character position in the line, so that errors can be reported accurately. The variant will contain a string with the name for variables and functions. I do not see a way around using the variable names in the tokens, because of the use of local variables and global variables in the same function.

I like the idea of pre-tokenizing the input stream. The check for structure integrity (WHILE-WEND pairs etc.) would happen in a subsequent pass through the token stream. The processing during loops will be greatly reduced.

I would also recommend that an evaluation be converted to reverse polish notation during tokenization, to make processing much easier at run time.

For example, 3 + 4 * $var would become 3 4 $var * +.

This should happen in a seperate class that just reads, #includes, stores, and tokenizes the script. Then an execution class will call the lines of the token stream to exectute the script.

Now that the variants are smaller, we could store the token stream for the AutoIt executables, and remove the tokenizing code from the executable, making it much smaller and faster. It would mean that the EXE2AUT program could not just copy the script that is stored (which happens presently), but would re-create the script from the tokens. A pass through tidy would probably be a good idea.

When this is going to happen? I have no clue. I am going to try to create a routine to convert a standard expression into reverse polish notation.

JohnDeHope3 · August 26, 2004

Do you need to keep the parenthesis and newlines in the bytecode? You really don't even need them to reproduce human-readable sourcecode. Since you know where things like parenthesis and newlines should go in the code to make it look nice you can just put them back in, if you need to go from bytecode to source code.

JohnDeHope3 · August 26, 2004

I'm reading a book about parsing using Java that has a completely different way of representing the program internally. What they did was use objects to capture the code. For example there might be a WhileLoop class so that when a while loop occurs in code, you could populate this object and it would exist as part of another object. The WhileLoop class might have properties like BooleanExpressionToCheck and CodeBlockToExecute. The CodeBlock class is just a linked list of Command objects. A WhileLoop is derived (or implements the interface of) the Command class. At the top of the hierarchy is a Program object which you execute to start the whole ball rolling.

Anyway it was a cool idea I had never heard of. I just need a compilable programming language to implement it in. I don't like C++ because I don't like managing my own memory. I don't like .Net because it is platform dependant (less so with mono but still somewhat), same for VB6. I might end up having to learn Java because it solves these problems for me.

Jon · August 26, 2004

I always intended to have the script pre-tokenized but currently the token uses a variant which although is alot smaller since the last change would still be too big to store a large script. I was going to try and have a specialised token type that was smaller but it's one of those things that has been on the todo list forever and something new to do always comes along first

The pre-run syntax checking was also on the list. I read a couple of PDFs on it but it soon got pretty complicted and stored in the "later" pile :ph34r:

Anything like this would have to be done in stages too, i don't like making massive changes all at once to critical parts. Maybe optimizing/seperating the lexer first - IIRC the lexing off each line on the fly takes the most execution time.

tylo · August 26, 2004

Very close to what is actually happening.

Thanks for verifying, David.

I like the idea of pre-tokenizing the input stream. The check for structure integrity (WHILE-WEND pairs etc.) would happen in a subsequent pass through the token stream. The processing during loops will be greatly reduced.

Indeed, au3check does both the tokenizing (lexical analysis) and grammar checking (or structure integrity check), but both in the same pass. The grammar checking is done very fast and compact by C-code generated by the yacc tool (from a grammar definition). It reports syntax and grammar errors, similar to what autoit does now. The advantage that the pre-tokenized stream is also pre-checked for structure integrity, makes the parsing/execution simpler, faster, and smaller - just as you say.

(JohnDeHope): Do you need to keep the parenthesis and newlines in the bytecode? You really don't even need them to reproduce human-readable sourcecode. Since you know where things like parenthesis and newlines should go in the code to make it look nice you can just put them back in, if you need to go from bytecode to source code.

You don't need them, as you say. But, leaving them there makes it easier to rewrite the current parser to accept the byte-code stream, instead of the current token stream as it processes today, because it will receive the same tokens stream. When it is rewritten and it works, you may optimize away those redundant tokens - but it won't speed up much.

I would also recommend that an evaluation be converted to reverse polish notation during tokenization, to make processing much easier at run time.
For example, 3 + 4 * $var would become 3 4 $var * +.

Basically, I'd say the same as I said to JohnDeHope. These optimalizations are insignificant, compared to executing a pre-tokenized vs. executing by continuosly interpreting the source code. In the first attempt, keeping the token sequence, let you use the current parser engine logic, which is well tested.

This should happen in a seperate class that just reads, #includes, stores, and tokenizes the script. Then an execution class will call the lines of the token stream to exectute the script.

Exactly. As I said, this code is already written in the au3check program, except for outputing the analyzed tokens to a byte-stream, that the parser can execute.

Now that the variants are smaller, we could store the token stream for the AutoIt executables, and remove the tokenizing code from the executable, making it much smaller and faster. It would mean that the EXE2AUT program could not just copy the script that is stored (which happens presently), but would re-create the script from the tokens. A pass through tidy would probably be a good idea

Yep.

When this is going to happen? I have no clue. I am going to try to create a routine to convert a standard expression into reverse polish notation.

From the argument I gave above, I wouldn't start by writing a RPN convertion routine. Given that tokenizer/grammar checker is already in-place. The byte-code generation is trivial (I can do), it's up to you say how easy it will be to rewrite the code-executer/parser which exists today.

(Jon): Anything like this would have to be done in stages too, i don't like making massive changes all at once to critical parts. Maybe optimizing/seperating the lexer first - IIRC the lexing off each line on the fly takes the most execution time.

This is exactly what I am trying to argue above. The Lexer should be replaced for pre-tokenizing, but the Parser/Executer should keep its logic.

SlimShady · September 30, 2004

I was very enthiuastic when reading this topic at the time.

Are you still working on this?

tylo · October 10, 2004

Yes, I am working on it partly by developing au3check, partly by thinking about it. Jon is busy with the GUI stuff right now, and I am busy myself, but here's a few thoughts:

1. Jon recently improved the lexer by caching some lexed lines. This gave about 20% speedup. I realize that we also need to replace the parser in order to get real speed. As Nutster suggested, the parser could improve the expression representation of the token stream. By creating an expression tree (google that), we can traverse it in post-order to directly output a postfix (or reverse polish notation) representation of the stream. This is very efficient (and easy) to execute.

2. Another benefit with a new lexer/parser, which scans/analyzes the entire source in one pass before execution, is that it can do variable and function lookup only once. During execution, we can access the variable value / function by direct access. E.g. local variable access (token stream: 'LOCALVAR offset') can be done with:

return g_localVariant[ g_currentCallLevel ][ offset ];

without any expensive lookup search, as is currently done.

Detail: g_localVariant is a stack of vectors. For each function call g_currentCallLevel is increased, and a vector is pushed upon it. This vector has length 'number of local variables declared in that function', which is a fixed number for each different function. offset = 2 means the 3rd variable declared in that function (indexed from 0).

That's it for now - hopefully we can work further on this.

Edited October 10, 2004 by tylo

Jon · October 10, 2004

After adding the caching and only getting about 20% speed up (I thought it would be way more) I'm not sure the lexer is the bottleneck anymore. I think it may be the expression parser and the way variants are moved around internally. At some point i'm going to create a completely pre-lexed script and see how fast it runs - that should show what the slow part is.

Sign In

AutoIt v3.X with 10-30x speedup?

Recommended Posts

tylo

Link to comment

Share on other sites

Nutster

Link to comment

Share on other sites

JohnDeHope3

Link to comment

Share on other sites

JohnDeHope3

Link to comment

Share on other sites

Jon

Link to comment

Share on other sites

tylo

Link to comment

Share on other sites

SlimShady

Link to comment

Share on other sites

tylo

Link to comment

Share on other sites

Jon

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Recently Browsing 0 members

Browse

AutoIt Resources

Release

Beta