Experimental (Academic) AutoIT Script Interpreter [C++]

twitchyliquid64 · January 3, 2012

For a long time now, I have been curious about the dealing of Interpreters, and compilers, and often, this curiousity manifests into some experimentation.

My Interpreter (nicknamed PARADIME) is an attempt at interpreting the autoit syntax, to gain a better understanding of how AutoIT 'ticks' and also to cure my curiousity to see if I can write an interpreter for an existing language.

The current (UNFINISHED) result I am quite happy with. A great deal of the syntatical features of autoit are implemented, with most intended to be implemented.

The Following functionality operates correctly in my Interpreter:

Global Declarations
'=' Assignments
Function calls (including recursive)
Variant Datatype (Implementing Arrays, INT32, INT64, double, string)
Operators: + - / * > < <> >= <= = ==
Singleline IF
Multiline IF
WHILE statements
About 20 macros
About 12 Builtin Functions{
ConsoleWrite
FileRead
FileOpen
FileClose
MsgBox(Non optional params only)
Stringlen
StringLeft/Right
StringTrimLeft/Right
TimerInit
(TimerDiff() is bugged, however)
}
Arrays

For example, the following code will execute correctly.

$t = 1
$t2 = 2

if $t = $t2 then MsgBox(48, "TEST", "EQUALITY")
if $t <> $t2 then MsgBox(48, "TEST", "NOT EQUALITY")

Global $mate = 89, $eee, $f = 55, $arraydestroytest[65000]


MsgBox(48, "TEST", "This Code is running in Paradime: " & $eee)


While $mate < 4000
    $arraydestroytest[$mate] = $mate
    $mate = $mate + 1
    $r = "FFFF" & "00043"
WEnd
MsgBox(48, "TEST", "This Code is running in Paradime: " & $mate & " " & $arraydestroytest[$mate-1])

COnsoleWrite(Stringlen("LOL RECURSION"))
ConsoleWrite("Macro Test:" & @LF)
ConsoleWrite("Program files: " & @PROGRAMFILESDIR & @LF)
ConsoleWrite("Common files: " & @CommonFilesDir & @CR)
ConsoleWrite("My Documents: " & @MyDocumentsDir & @CR)
ConsoleWrite("AppDataC files: " & @AppDataCommonDir & @CR)
ConsoleWrite("DesktopC files: " & @DesktopCommonDir & @CR)
ConsoleWrite("DocumentsC files: " & @DocumentsCommonDir & @CR)
ConsoleWrite("FavouritesC files: " & @FavoritesCommonDir & @CR)
ConsoleWrite("ProgramsC files: " & @ProgramsCommonDir & @CR)
ConsoleWrite("StartMC files: " & @StartMenuCommonDir & @CR)
ConsoleWrite("Startup files: " & @StartupCommonDir & @CR)
ConsoleWrite("AppData files: " & @AppDataDir & @CR)
ConsoleWrite("Desktop files: " & @DesktopDir & @CR)
ConsoleWrite("Favs files: " & @FavoritesDir & @CR)
ConsoleWrite("Program files: " & @ProgramsDir & @CR)
ConsoleWrite("Start Menu files: " & @StartMenuDir & @CR)
ConsoleWrite("Startup files: " & @StartupDir & @CR)

ConsoleWrite(@CRLF & "Computer: " & @ComputerName & @CR)
ConsoleWrite("WIN: " & @WindowsDir & @CR)
ConsoleWrite("Working: " & @WorkingDir & @CR)
ConsoleWrite("System: " & @SystemDir & @CR)
ConsoleWrite("IP1: " & @IPAddress1 & @CR)
ConsoleWrite("IP2: " & @IPAddress2 & @CR)
ConsoleWrite("IP3: " & @IPAddress3 & @CR)
ConsoleWrite("IP4: " & @IPAddress4 & @CR)
ConsoleWrite("TempDir: " & @TempDir & @CR)
ConsoleWrite("Username: " & @UserName & @CR)
ConsoleWrite("HomeDrive: " & @HomeDrive & @CR)
ConsoleWrite("HomePath: " & @HomePath & @CR)
ConsoleWrite("HomeShare: " & @HomeShare & @CR)
ConsoleWrite("LogonServer: " & @LogonServer & @CR)
ConsoleWrite("LogonDomain: " & @LogonDomain & @CR)
ConsoleWrite("LogonDNSDomain: " & @LogonDNSDomain & @CR)

Academic Discourse:

The biggest thing that surprised me was how well written/optimized AutoIT was (or how inefficient a C++ coder I am, having 6 months of experience ^^')

My interpreter runs approximately 5.4X slower than the AutoIT interpreter, dispite the datastructures being similar. My guess is that these speed differences are due to two things:

-Pointer passing: nearly everything large in the public autoit source has its pointer passed around as opposed to the datatype. Substantial portions of my code do not pointer-pass, reducing speed. Also, my inexperience/rush in writing this would attenuate this with potentially inferior code (relative to autoIT)

-Operator evaluation: I originally thought that AutoIT's decision to treat every operand as a VARIANT class would incur a noticable overhead, so I thought I could sidestep it by using my original TOKEN datastructure from the lexing stage. Now I realise that this overhead is unavoidable, as im still doing typechecking and conversions with the token datatype. The only difference is Jonathon is doing it in a pretty little variant class and my parser_eval.cpp is littered with switch statements for every operand possibility for every operator. (Please dont look at the source, you will cry).

PARADIME Implements, from scratch,

-Custom Lexer/tokeniser

-Stateful Recursive Decent Parser

-Shunting yard algorithm for expression evaluation

-implements std::map for Variable and builtinfunction pointer lookup

-implements std::vector for token storage.

Parser

Evidently, I have attempted to deviate from Jonathons chosen parsing approach to test the validity of other algorithms, and initial results indicate that my parsing model is applicable.

Both of our interpreters use the recursive decent model for traversing nested structures. Paradime has various parsing states transparent to parsing of the tokens themselves. The two main states are EXEC and IGNORE, where EXEC, executes the code up to the corresponding end of the code block (ENDIF, WEND etc), whereas IGNORE 'ignores' the contained code. I did not quite understand how Jon traversed nested structures, so I cannot comment further on his methods here.

Handling of Expressions is done entirely different on the two interpreters. Jonathon uses a LALR Shift/Reduce Algorithm, where as I use dijkastras shunting yard algorithm. Thus far, both approaches seem entirely applicable.

Variant Storage:

Done the same on both interpreters. Array handling code is practically copied, It was better than anything I could ever make.

Lookup Speed:

One other thing I noticed is that Macros and Builtin functions have no optimal lookup table (in the public autoIT source). Perhaps, to improve speed, these things could be stored in a red/black binary tree to increase efficiency?

Conclusion:

All in all, the parsing and interpreting backbone is a magnificent piece of work, and all my attempts to replicate it and deconstruct (from the publicsource) it have only increased my sense of awe. I express my most sincere thanks to the autoit developers for such, and I hope that development of AutoIT never stops. One day, when I get out of highschool I would like to develop autoIT, who knows.

Paradime Sourcecode:

As previously mentioned, the vast majority of the sourcecode is created from scratch. However, the There was no point re-inventing the wheel when implementing some macros and some builtins, and the code for array handling in variants, and one or two syntatical expressions. These elements of the sourcecode are clearly labelled at the top and have the GNU license attached (Code from before Autoit went to closedsource). Credit is clearly given.

Please dont look at it. It is poorly written, undercommented, and due to my bad choice to use the token structure as the operand structure, a good deal of parsing logic is littered in hundreds of lines of switch statements. (eww)

http://code.google.com/p/paradime-interpreter/source/browse/#hg%2FParadime%2Fcore

Please, dont judge me.

SciTE integration:

Thanks to LaCastiglione:

command.38.*.au3="C:Paradime.exe" "$(FilePath)"
command.name.38.*.au3=Paradime
command.save.before.38.*.au3=1
command.shortcut.38.*.au3=Ctrl+F7

Drop Paradime.exe into your C: drive.

Future of Paradime:

I will implement NOT, AND, OR, FOR-NEXT, SWITCH-CASE-ENDSWITCH, and user defined functions. Then I will deviate from autoit, exploring new, custom language constructs, but thats another academic project entirely.

-hyperzap

Edited January 3, 2012 by twitchyliquid64

Jon · January 3, 2012

The internals have progressed quite a bit since that public source code as well - some parts from scratch. There's lots of weird optimizations. Quite a lot of Copy-on-write activity which really sped things up a lot as well. The main slow down the last time I checked was the eval code which still creates a lot of copies of data as it works the values out using stacks - it's one of the scarier areas to contemplate rewriting though...

Edited January 3, 2012 by Jon

Jon · January 3, 2012

Lookup Speed:
One other thing I noticed is that Macros and Builtin functions have no optimal lookup table (in the public autoIT source). Perhaps, to improve speed, these things could be stored in a red/black binary tree to increase efficiency?

I'd need to check but I think I changed macro/built-in function lookup to be resolved during the lexing/token stage so it didn't do a runtime lookup (the token contains an index to the function).

For user functions, the names of all the functions are stored in a sorted list - which then uses a binary search for lookup.

Variable lookup is done with splay trees.

Edited January 3, 2012 by Jon

GEOSoft · January 3, 2012

I have the feeling that there isn't much of the current code that even resembles the last public released source code. Anyone looking at that old source shouldn't be under the impression that it still looks like that.

Valik · January 3, 2012

It doesn't look anything like that, really. Some parts have been re-written multiple times.

twitchyliquid64 · January 3, 2012

I have the feeling that there isn't much of the current code that even resembles the last public released source code. Anyone looking at that old source shouldn't be under the impression that it still looks like that.

I am/was aware that there were differences in the source code, and thus I have modelled my studies based on the following underlying assumption:

The available autoit source is an implementation of behaviour. Any revision is based on the 
fundamental elements of this implementation. (for instance, I would expect the token structure
 to be mostly the same, and the variant structure to be similar save the addition of binary 
and Boolean types. Furthermore, the shift/reduce algorithm and recursive decent are unlikely
 to be changed much, save optimization)

Edited January 3, 2012 by twitchyliquid64

twitchyliquid64 · January 3, 2012

The internals have progressed quite a bit since that public source code as well - some parts from scratch. There's lots of weird optimizations. Quite a lot of Copy-on-write activity which really sped things up a lot as well. The main slow down the last time I checked was the eval code which still creates a lot of copies of data as it works the values out using stacks - it's one of the scarier areas to contemplate rewriting though...

I don't believe I know of a method of expression parsing that does not use stacks. (perhaps manadar /mat can jump in on this one).

The most optimal thing I can think of would be to convert all expressions to RPN form at compile time, then all you would need is one simple token/variant pointer stack to evaluate the expression.

twitchyliquid64 · January 3, 2012

It doesn't look anything like that, really. Some parts have been re-written multiple times.

How much different?

I don't understand the point of having a public version available, so upcoming developers understand how to integrate functionality into the interpreter, only to let the public version remain un-updated to the point it cannot be used as a introduction point for developers.

Is this the case? Or are the internals roughly the same (save optimizations)?

jvanegmond · January 3, 2012

I don't believe I know of a method of expression parsing that does not use stacks. (perhaps manadar /mat can jump in on this one).

The point is not to not use stacks.

Also, inb4 pratt parser.

GEOSoft · January 3, 2012

The public release version of the source really isn't very relevant any more.

That was back in the days when AutoIt was open source which no longer applies and that is why it's never updated.

Valik · January 4, 2012

George hit it. The source is available because that version of AutoIt is open source. No other reason really.

Your assumptions are incorrect. The Variant class - for example - has been re-written 2 or 3 times. Much of the core of AutoIt is different. Some of the functions are maybe the same give or take a bug-fix.

jchd · January 4, 2012

At this point the published code (which I have no intention to ever look at) is probably more a "how not to do it" example than anything else.

twitchyliquid64 · January 4, 2012

At this point the published code (which I have no intention to ever look at) is probably more a "how not to do it" example than anything else.

I disagree entirely.

It's well written, it's just not optimal and as valik said, elements have been rewritten as better methods have become known.

(I'm only talking about the interpreter core here, btw)

twitchyliquid64 · January 4, 2012

George hit it. The source is available because that version of AutoIt is open source. No other reason really.
Your assumptions are incorrect. The Variant class - for example - has been re-written 2 or 3 times. Much of the core of AutoIt is different. Some of the functions are maybe the same give or take a bug-fix.

Wow...2/3 times??? Really??? What was wrong with it to sanction those re-writes! It seemed quite fine to me.

Valik · January 4, 2012

At this point the published code (which I have no intention to ever look at) is probably more a "how not to do it" example than anything else.

This.

It's well written

No. No it's not.

Wow...2/3 times??? Really??? What was wrong with it to sanction those re-writes! It seemed quite fine to me.

Everything. It's still hopelessly bad but it's massive and it works. The same can be said for quite a number or parts of AutoIt.

Richard Robertson · January 4, 2012

Do you know if any of the original code is still there?

Valik · January 4, 2012

I'm sure there's lots of original code still there. What that may be, though, I do not know.

twitchyliquid64 · January 4, 2012

Everything. It's still hopelessly bad but it's massive and it works. The same can be said for quite a number or parts of AutoIt.

Can you be more specific? WHY is it hopelessly bad? What design goals does it not achieve, how does it underperform the expectations you would have for an 'ideal' Variant Class?

czardas · January 4, 2012

I imagine those are the parts we're not allowed to see.

Valik · January 4, 2012

It's a bloated mess of inter-connected pieces that should be separate. It uses about a billion switch statements to do what C++ can do for you if you know what abstract classes are.

Experimental (Academic) AutoIT Script Interpreter [C++]

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Recently Browsing 0 members

Similar Content