Jump to content
Sign in to follow this  

Pratt's parser

Recommended Posts

For those of you (and I know from previous discussions that there are people out there) making your own compilers and interpreters and choosing not to use tools like yacc.

Pratt's parser is a predictive parser, but with a number of improvements that make it so easy to implement you wonder why in the 38 since the original paper was published it hasn't been used in more than a small handful of projects.

Look it up if you ever consider hand writing a parser. I managed to write a calculator from first principles in 30 mins. 20 minutes of that was writing the lexer.


Edit: Just to give you an idea of how unused this is... Google "Pratt's Parser" and you'll get this thread as the first result.

Edited by Mat

Share this post

Link to post
Share on other sites

hmmm. maybe, considering I'm doing a complete re-write of my project in C++.

ongoing projects:-firestorm: Largescale P2P Social NetworkCompleted Autoit Programs/Scripts: Variable Pickler | Networked Streaming Audio (in pure autoIT) | firenet p2p web messenger | Proxy Checker | Dynamic Execute() Code Generator | P2P UDF | Graph Theory Proof of Concept - Breadth First search

Share this post

Link to post
Share on other sites

Nice thread, Mat.

Any docs/links too? Would be great.


Simple: http://effbot.org/zone/simple-top-down-parsing.htm

More in depth: http://journal.stuffwithstuff.com/2011/03/19/pratt-parsers-expression-parsing-made-easy/

Also in depth: http://javascript.crockford.com/tdop/tdop.html

I'll also tidy up the source code for the calculator I made and post that. All I need to add is a tertiary operator to prove that this parser can do anything.

Share this post

Link to post
Share on other sites

I was thinking of that, unfortunately I got started in C as I was going to be using asm to implement the final thing and I didn't want one of those algorithms that looked nice in python when you don't worry about data types. It would be easy to do. Especially given that autoit makes it almost as easy as python.

Share this post

Link to post
Share on other sites

Ok, I tidied it up a bit to show you how nice it is to build it incrementally. Initially, for binary operators and no lexer, it was an 80 line C program, of which a lot were utility functions to do with the functions rather than the parser. To make the code easier to understand it only works with integers.

Usually I'd seperate this out into different files, but for now I'm posting them as standalone c programs... They are all tested compiling on gcc. For those involving sqrt you'll need to link the maths library in (gcc -lm file.c).

#include <stdio.h>
#define TOK_END 0
#define TOK_INT 1
#define TOK_OP  2
#define OP_PLUS 0
#define OP_MUL  1
// 7 + 2 * 3
int pos = 0;
struct TOKEN
    int Type;
    int Val;
} test[] = {
    {TOK_INT, 7},
    {TOK_OP, OP_PLUS},
    {TOK_INT, 2},
    {TOK_OP, OP_MUL},
    {TOK_INT, 3}
}, tok;
int Op_Add(int left, int right)
    return left + right;
int Op_Mul(int left, int right)
    return left * right;
    int Precedence;
    void* func;
} Ops[] = {
int lbp(struct TOKEN token)
    if (token.Type == TOK_OP)
        return Ops[token.Val].Precedence;
    return 0;
int led(struct TOKEN token, int left)
    int right = expression(lbp(token));
    int (*fun)(int, int) = Ops[token.Val].func;
    return fun(left, right);
int nud(struct TOKEN token)
    if (token.Type == TOK_INT)
        return token.Val;
    return -1;
int expression(int rbp)
    struct TOKEN token = tok;
    tok = test[++pos];
    int left = nud(token);
    while (lbp(tok) > rbp)
        token = tok;
        tok = test[++pos];
        left = led(token, left);
    return left;
int main(int argc, char **argv)
    tok = test[0];
    printf("%i\n", expression(0));
    return 0;

(Simple.c Simple.exe)

It then got more complex gradually:

Next up is error checking. Again it's very easy to add, as you just need to add default statements to the switches and else statements for ifs for unexpeced tokens. After that I'll do the same for AutoIt.

The lexer is a mess at the moment, so I'll tidy that up sometime.

If anyones interested about what I did to go from one step to the next then just ask :graduated:


Edit: Added exes.

Edited by Mat

Share this post

Link to post
Share on other sites

And for those interested: Simple.au3

Cool. :graduated:

btw, Why did you make the distinction between TOK_OP with params and TOK_INT with params instead of just TOK_INT and TOK_ADD, TOK_MUL? Only because python example does it like that?

$s = "7 + 2 * 3"
Global $test = lex($s)

MsgBox(0, $s, expression(0))

Func lex($s)
    Local $out[100][2]
    $arr = StringSplit($s, '', 2)
    $n = 0
    For $i = 0 To UBound($arr)-1
        Switch $arr[$i]
            Case " "
            Case "0" To "9"
                $out[$n][0] = $TOK_INT
                $out[$n][1] = lexNumber($arr, $i)
                $n += 1
            Case "+"
                $out[$n][0] = $TOK_OP
                $out[$n][1] = $OP_ADD
                $n += 1
            Case "*"
                $out[$n][0] = $TOK_OP
                $out[$n][1] = $OP_MUL
                $n += 1
    $out[$n][0] = 0
    $out[$n][1] = 0
    Return $out

Func lexNumber($s, ByRef $i)
    $out = ""
    While $s[$i] >= "0" And $s[$i] <= "9"
        $out &= $s[$i]
        $i += 1

        If $i = UBound($s) Then ExitLoop
    $i -= 1
    Return $out
Edited by Manadar

Share this post

Link to post
Share on other sites

Thats a nice simple way to do it... I was thinking of using the operator array and adding strings to it rather than looking for the tokens in the switch statement... It would make it very easy to add operators.

Also, I think that it would be possible to lose the global $test, probably by combining the nud,led and expression functions and putting it in there as a static.

Share this post

Link to post
Share on other sites

Well... Actually the correct way to do it is to make the lex function yeild tokens. My version of the parser uses $pos and $pos-1 because I was lazy and AutoIt doesn't do structs very easily. If you look at the c examples where I use a struct for tokens, it copies the token, then fills tok with the next one. (or it should do anyway, I might have been lazy there as well)

Global $tok

; int calc(string)
Func calc($expr)
    $tok = lexGetToken($expr)

    Return expression(0)

; int nud(TOKEN)
Func nud($token)

; int led(TOKEN)
Func led($token)

; int expression(int)
Func expression($rbp)
    Local $token = $tok  ; Stores the last token
    $tok = lexGetToken() ; Fills the current one

    Local $left = nud($token) ; Rather than using $pos-1


    return $left

; TOKEN lexGetToken( [string] )
Func lexGetToken($in = "")
    Local static $pos = 0
    Local static $str = $in

    Local $ret[2] ; Token of the form [TYPE, VALUE]


    Return $ret

The problem is that it gets very messy passing 2 element arrays around all the time. And accessing them is not nearly as easy to read as tok.Type and tok.Val.

Because nud and led never call themselves, only calling expression, it is possible to merge them into a single function, and replace Global $tok with Local Static $tok. expression calls lexGetToken and the user only has to call calc. Nice and clean.

It's also possible to take out expression calling itself completely by using a stack, but that would be pointless in AutoIt. It does work well in asm though.

Edited by Mat

Share this post

Link to post
Share on other sites

btw, Why did you make the distinction between TOK_OP with params and TOK_INT with params instead of just TOK_INT and TOK_ADD, TOK_MUL? Only because python example does it like that?

Sorry, didn't see this.

The reason is that it makes it considerably easier to add and manage operators (including dynamically: More on that later).

You'll see in the C examples that there is an array of operators, with details like flags, precedence and then a pointer to the function to call. When I implemented the lexer, I didn't hard-program the operators in, instead added a pointer to the string containing the operator to that array. The lexer then searches through the operators and returns {TOK_OP,OP_????}. When I add an operator, I add a OP_* constant (optional btw), write the function for the operator and add the operator details to the array. Simple :graduated: It won't get any easier than that. There will be one more operator flag (User) which would mean user defined operators are handled just as easily, with a slightly different pointer to the function.

Additionally, most operators behave the same, and those that are different it is managed using their flags (Right associative, unary and postfix). This makes functions like lbp only a few lines long as it just needs to check that the token is of type TOK_OP and then read the precedence from the table. Obvious exceptions are those that have more than one meaning (+, -).

The last reason is because of the language I am ultimately going to use this for. It allows definition of operators, and an operator is any string that passes ispunct and isn't a special character like parenthesis/brackets.

Share this post

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By Colduction
      Hi again guys!, i had COVID-19 for twice and i couldn't check the forum since 3 or 4 months ago till now! i hope you will get better if you're fighting for beat COVID-19

      I have two question, first is about extracting all of the IP Address from an IP Ranges, for e.g: (Start and End are variable and will be defined by the user) and for second one, i have a friend that he is Python programmer, he made a IP Parser that it can support large txt files (1TB) and it can parse all of them under 10min and it also supports low-end PCs that have 1 GB RAM!

      The list that his program parses are:
      #1765497 8082 #1765496 8084 #1965493 8089 #9565495 8086 #2565492 8081 and it converts very very fast to this: I wonder how to do this via AutoIt, if you can help me in this way, i will be happy✌❤

      Thanks for your helps.

    • By Alok_Arora
      Hello Sir, 
      While searching for the solution to my problem, I have just gone through some of your old post. 
      My name is Alok Arora 
      Email id is *snip*
      I am writing a code to store logs for each script I run. This I did by using FileWriteLog function and it is successfully storing logs in txt file. 
      Now, I have to work on to read the log file and if any script has been mistakenly clicked twice in a day script will pop up a message that task already done for the day by verifying entries in the log file. 
      I have the logic for it.. 
      I mean a variable will read the log file and will search for the entry and will perform action if entry found or not. 
      I believe for reading the file I can write a code like
      $i =fileread("logfile.txt")
      Now I am stuck on how to compare the log entry in log file. 
      Can you please help me in this? 
    • By TheDcoder
      See this thread for info:
    • By TheAutomator
      I'm writing a recursive decent parser in Autoit!
      The programming language i'm making is called HighLevel.
      I'm doing this for learning purposes, because it's fun and because I can implement it into my other project:
      Fullscreen Console With custom programming language!
      It's not easy...

      In Autoit you don't have objects like in Java or Visual Basic, so I had to figure out a way to still convert the code to an abstract syntax tree.
      I used nested array's and array based dictionary's instead of objects.
      The code is still very dirty and I need to make a lot of modifications but if you're careful with testing you'll see what it can do already.
      Console window
      Because this code eventually will get implemented into my console project I crafted a nice little console window (with a custom sci-fi looking theme, yeah i was a little bored haha).
      {ESC} is your panic button for now, it terminates the script completely.
      If you get an error while opening a script the text will turn red.
      To minimize it press the blue button, to close it use the red one, to drag the gui just grab it on one of the sides.
      The console window will display what you write to it with your "HighLevel-script" and some additional information:

      How to test it:
      Download: HighLevel.Au3, Debug.Au3 (includes a function to display nested arrays for debugging), GUI.bmp (for the console)
      Compile the Autoit code to EXE.
      The GUI.bmp must be in the same folder as the EXE file!
      Write a HighLevel-script (text file) and drag it into the compiled autoit-exe.
      The custom made little console window will pop up in the left top corner of your screen and your HighLevel-script (the text file) will be interpreted and executed.
      The Language:
      exit script:     Abort      show / hide the console:     Show     Hide      write to/clear the console:     Write 'this is a ''string''!'     Clear variables: test_var_1 = 123 some_list = ['a', 5, true] some_list[1] = 3 math = 1 + 2 * 3 / 4 - -5 & test_var beep (under construction):     Beep F, optD wait X seconds:     Wait X      Messages:     Message 'Hello World!'      move/click the mouse:     Move X, Y     Click      send keys (under construction):     Send 'HighLevel', True      if's:     If false     ElseIf true         # this part will run     Else     End subs:     Sub X         # do stuff     End     Call X      for loops:     For X = 1 to 10         # X iterates     End Values:     Input 'Give me input'     Random     YesNo 'yes or no' operators:     + - * / & > = ! < ( ) And Not Or  
      Example script:
      # my first HighLevel script message 'Hello World!' message 'Lets write to the console...' clear # clear the console... list = ['a', 16, true] for i = 0 to 2     write list[i]     wait 1 end sub test     if YesNo 'would you like to quit?'         message 'Goodbye!'         abort     else         write 1 + 2 * 3 & ' math!'     end end call test  
      test script.HighLevel
    • By Fade91
      Hello, first time poster here
      I am working on a project that has to parse a log file in real time. The thing is I know it's hard for Autoit to attach itself to log files when they're already in use by other programs, at least in my experience.
      I was taking a look at this thread because the log file is quite large and I think Autoit might be a little slow on it's own.

      The thing is I don't know how to use this properly to extract all data out of a log file or is there a native way to do this using Autoit.
      Basically , I just need a log parser that is able to read from a log that is 'already opened' and 'being written to'
  • Create New...