Sign in to follow this  
Followers 0
JohnDeHope3

Compilers And Interpreters

15 posts in this topic

I have this driving facination with writing an interpreter. I am particularly impressed with the complexity of AutoIt 3 versus the older 2 version. I wonder if the developers have any suggestion (book, website) that they think is particularly valuable to learn from? I've already written a tokenizer and a parser, for reference, but I'm not a C/C++ programmer. Of course if you have to learn C to do it, then maybe I'll have to learn C.

Share this post


Link to post
Share on other sites



Hey those're some great links. Thanks!

Share this post


Link to post
Share on other sites

Well Jon tell me this, what sort of intermediate language does AutoIt use? Does it compile to bytecode or is everything held in data structures in a tree or what? That's really where I am at in my journey. I can take a source code files, tokenize it, parse it to ensure the syntax is correct, but now I can't figure out how to make an interpreter for my list of valid tokens. I'm not so much looking for advice, but to know where cool people (Jon) went so maybe I'll know the direction and can move towards it.

Thanks!

Share this post


Link to post
Share on other sites

Well Jon tell me this, what sort of intermediate language does AutoIt use? Does it compile to bytecode or is everything held in data structures in a tree or what? That's really where I am at in my journey. I can take a source code files, tokenize it, parse it to ensure the syntax is correct, but now I can't figure out how to make an interpreter for my list of valid tokens. I'm not so much looking for advice, but to know where cool people (Jon) went so maybe I'll know the direction and can move towards it.

Thanks!

Nothing fancy. It parses line by line into an array of tokens. The tokens are things like TOK_EQUALS, TOK_VARIABLE, etc. I think at this point you could run it through a grammar checker to check that the line is valid, but for simplicity I did it keyword by keyword.

For example the first token is checked, and it is "IF" so then the "Keyword_IF" function is called and this just does rough checking. Is the IF keyword followed by an expression? Is it then followed by a THEN keyword...etc. Pretty basic. I have a generic "evaluate expression" function that takes a stream of tokens and works out the result (or gives an error).

Share this post


Link to post
Share on other sites

Can I ask, if the if is found to be false, how you decide where to begin executing the else part, without even know if there is one? I have an answer I have been thinking of using, but I am curious to hear yours. I hope yours is the same because mine is the only one I've even been able to imagine, much less think I can code.

What about functions? Those seem really tricky to me. I am not sure if I should maintain them as a totally separate list of tokens, or if I should just maintain one huge list and jump around inside of it.

Share this post


Link to post
Share on other sites

Basically that is the idea (jump around inside the list of statements). I am looking at how to simplify the way that we call user-defined functions.

If the IF expression fails then it either jump to the next line (single-line If) or jumps to the end of the structure. AutoIt scans all lines before executing to find all the beginnings and endings of the structures, so it checks to see where the If on line 40 ends by reading the list of structures identified at the beginning.

Yeah, we did the whole thing in C++. :ph34r:


David Nuttall
Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius

AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...

Share this post


Link to post
Share on other sites

Thanks for the info Nutster. Let me see if I have this straight. If I do I might venture forth and try to take a look at the source code of the compiler (shudder).

You make a nice long (linked?) list of tokens. Then for the IF tokens you scan the list and look for important spots, like the coresponding ELSE or ENDIF, and cache that information (where, in the IF token?). For function calls you put each parameter in the function call on the data stack and then jump to the (pre-cached) token where the function definition is at, and start executing it.

Does that sound about right?

Share this post


Link to post
Share on other sites

A list of lines, each of which has a list of tokens, to be specific. The information about structure start/stop is stored in yet another list.

Something like: IF, Stars on Line 40, ends Line 51 (ELSEIF, ELSE, ENDIF)

TOK_IF, 40, 51

For functions, the values from the actual parameters are evaluated and stored in an array that gets passed to the C++ function that actually runs the UDF by jumping to the first line of the function. When the EndFunc or Return is hit, the function returns to the place where it was called. I am looking at a way to improve this operation.


David Nuttall
Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius

AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...

Share this post


Link to post
Share on other sites

I am looking at a way to improve this operation.

Can I ask what you are seeking to improve?

A list of lines, each of which has a list of tokens, to be specific.

Why do the tokens remain in "lines"? Is it just to be able to say which line it's on? What if a line has more than one statement? Or perhaps AutoIt doesn't handle that.

Something like: IF, Stars on Line 40, ends Line 51

Why not maintain a pointer to the token where execution would jump to, rather than a line number?

I'm not arguing with you, I just want to understand the logic. I really apreciate your time and insight :ph34r:

Share this post


Link to post
Share on other sites

AutoIt is line-oriented so it is fairly easy to manage things by a list or array of lines, where each line is stored as a series of tokens. This also makes it easier to manage error reporting, where we tell the user which line the error occured on. The only times multiple commands are allowed on a line is in the single-line If statement, where a command is executed if the If expression is true.

This morning, I submitted some of my improvements to Jon, our project manager here at the AutoIt International Development Team. :ph34r: We are using a stack to keep track of which structure is the current one and where it starts and ends. When we hit a ContinueLoop or an ExitLoop, then we start popping elements until we get to the loop that we want to manipulate.

You could use a token location instead of line number, just as easily, I guess. But you need to keep track of where the token is from in the source file to report errors to your users, unless you are compiling it, then your object file might not need that much detail.

We handle built-in functions and user-defined functions (UDF's) a little differently. Built-in functions are passed the evaluations of the actual parameters in an array. The actual parameters of UDF's are assigned to a set of local variables for the function. Some of the functions that are assigned ByRef actually have the VariableTable point to the original variable, not a new storage with the evaluated result of the actual parameters, which is what happens with the other parameters.

I checked last night and determined that the first pass checks are not used to determine the start and end locations. For each structure, we read the the lines to see which keywords (the first token when we are dealing with structures) are meaningful to the structure. i.e. For looks for Next. If looks for ElseIf, Else and EndIf and so on.

Hope this helps.


David Nuttall
Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius

AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...

Share this post


Link to post
Share on other sites

It really does thank's Nutster. I bought a book over the weekend that tackles this subject from a Java perspective, very heavy in OOP. I got some really cool ideas out of it. But I need an OOP language to do the things they do they way they do them. I appreciate all your help and information!

Share this post


Link to post
Share on other sites

Would it be easy then to parse lines having multiple statements on a single line using a delimeter such as a : or ! or :: or !! or whatever ?


Agreement is not necessary - thinking for one's self is!

My-Colors.jpg

cuniform2.gif

Share this post


Link to post
Share on other sites

Anybody resurrecting a 2 year old thread should be banned on the spot. Especially when it's resurrected with stupid ideas.

Share this post


Link to post
Share on other sites

Then consider me banned.


Agreement is not necessary - thinking for one's self is!

My-Colors.jpg

cuniform2.gif

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0