Sign in to follow this  
Followers 0
Mat

Pratt's parser

13 posts in this topic

#1 ·  Posted (edited)

For those of you (and I know from previous discussions that there are people out there) making your own compilers and interpreters and choosing not to use tools like yacc.

Pratt's parser is a predictive parser, but with a number of improvements that make it so easy to implement you wonder why in the 38 since the original paper was published it hasn't been used in more than a small handful of projects.

Look it up if you ever consider hand writing a parser. I managed to write a calculator from first principles in 30 mins. 20 minutes of that was writing the lexer.

Mat

Edit: Just to give you an idea of how unused this is... Google "Pratt's Parser" and you'll get this thread as the first result.

Edited by Mat

Share this post


Link to post
Share on other sites



hmmm. maybe, considering I'm doing a complete re-write of my project in C++.


ongoing projects:-firestorm: Largescale P2P Social NetworkCompleted Autoit Programs/Scripts: Variable Pickler | Networked Streaming Audio (in pure autoIT) | firenet p2p web messenger | Proxy Checker | Dynamic Execute() Code Generator | P2P UDF | Graph Theory Proof of Concept - Breadth First search

Share this post


Link to post
Share on other sites

Nice thread, Mat.

Any docs/links too? Would be great.

Well...

Simple: http://effbot.org/zone/simple-top-down-parsing.htm

More in depth: http://journal.stuffwithstuff.com/2011/03/19/pratt-parsers-expression-parsing-made-easy/

Also in depth: http://javascript.crockford.com/tdop/tdop.html

I'll also tidy up the source code for the calculator I made and post that. All I need to add is a tertiary operator to prove that this parser can do anything.

Share this post


Link to post
Share on other sites

I was thinking of that, unfortunately I got started in C as I was going to be using asm to implement the final thing and I didn't want one of those algorithms that looked nice in python when you don't worry about data types. It would be easy to do. Especially given that autoit makes it almost as easy as python.

Share this post


Link to post
Share on other sites

#7 ·  Posted (edited)

Ok, I tidied it up a bit to show you how nice it is to build it incrementally. Initially, for binary operators and no lexer, it was an 80 line C program, of which a lot were utility functions to do with the functions rather than the parser. To make the code easier to understand it only works with integers.

Usually I'd seperate this out into different files, but for now I'm posting them as standalone c programs... They are all tested compiling on gcc. For those involving sqrt you'll need to link the maths library in (gcc -lm file.c).

#include <stdio.h>
 
#define TOK_END 0
#define TOK_INT 1
#define TOK_OP  2
 
#define OP_PLUS 0
#define OP_MUL  1
 
// 7 + 2 * 3
 
int pos = 0;
struct TOKEN
{
    int Type;
    int Val;
} test[] = {
    {TOK_INT, 7},
    {TOK_OP, OP_PLUS},
    {TOK_INT, 2},
    {TOK_OP, OP_MUL},
    {TOK_INT, 3}
}, tok;
 
int Op_Add(int left, int right)
{
    return left + right;
}
 
int Op_Mul(int left, int right)
{
    return left * right;
}
 
struct OPERATOR
{
    int Precedence;
    void* func;
} Ops[] = {
    {30,&Op_Add},
    {40,&Op_Mul}
};
 
int lbp(struct TOKEN token)
{
    if (token.Type == TOK_OP)
        return Ops[token.Val].Precedence;
    return 0;
}
 
int led(struct TOKEN token, int left)
{
    int right = expression(lbp(token));
 
    int (*fun)(int, int) = Ops[token.Val].func;
    return fun(left, right);
}
 
int nud(struct TOKEN token)
{
    if (token.Type == TOK_INT)
        return token.Val;
    return -1;
}
 
int expression(int rbp)
{
    struct TOKEN token = tok;
    tok = test[++pos];
    int left = nud(token);
 
    while (lbp(tok) > rbp)
    {
        token = tok;
        tok = test[++pos];
        left = led(token, left);
    }
 
    return left;
}
 
int main(int argc, char **argv)
{
    tok = test[0];
 
    printf("%i\n", expression(0));
    return 0;
}

(Simple.c Simple.exe)

It then got more complex gradually:

Next up is error checking. Again it's very easy to add, as you just need to add default statements to the switches and else statements for ifs for unexpeced tokens. After that I'll do the same for AutoIt.

The lexer is a mess at the moment, so I'll tidy that up sometime.

If anyones interested about what I did to go from one step to the next then just ask :graduated:

Mat

Edit: Added exes.

Edited by Mat

Share this post


Link to post
Share on other sites

#9 ·  Posted (edited)

And for those interested: Simple.au3

Cool. :graduated:

btw, Why did you make the distinction between TOK_OP with params and TOK_INT with params instead of just TOK_INT and TOK_ADD, TOK_MUL? Only because python example does it like that?

$s = "7 + 2 * 3"
Global $test = lex($s)

MsgBox(0, $s, expression(0))


Func lex($s)
    Local $out[100][2]
    $arr = StringSplit($s, '', 2)
    $n = 0
    For $i = 0 To UBound($arr)-1
        Switch $arr[$i]
            Case " "
                ContinueLoop
            Case "0" To "9"
                $out[$n][0] = $TOK_INT
                $out[$n][1] = lexNumber($arr, $i)
                $n += 1
            Case "+"
                $out[$n][0] = $TOK_OP
                $out[$n][1] = $OP_ADD
                $n += 1
            Case "*"
                $out[$n][0] = $TOK_OP
                $out[$n][1] = $OP_MUL
                $n += 1
        EndSwitch
    Next
    $out[$n][0] = 0
    $out[$n][1] = 0
    Return $out
EndFunc

Func lexNumber($s, ByRef $i)
    $out = ""
    While $s[$i] >= "0" And $s[$i] <= "9"
        $out &= $s[$i]
        $i += 1

        If $i = UBound($s) Then ExitLoop
    WEnd
    $i -= 1
    Return $out
EndFunc
Edited by Manadar

Share this post


Link to post
Share on other sites

Thats a nice simple way to do it... I was thinking of using the operator array and adding strings to it rather than looking for the tokens in the switch statement... It would make it very easy to add operators.

Also, I think that it would be possible to lose the global $test, probably by combining the nud,led and expression functions and putting it in there as a static.

Share this post


Link to post
Share on other sites

Also, I think that it would be possible to lose the global $test, probably by combining the nud,led and expression functions and putting it in there as a static.

Just give $test as the first parameter on everything.

Share this post


Link to post
Share on other sites

#12 ·  Posted (edited)

Well... Actually the correct way to do it is to make the lex function yeild tokens. My version of the parser uses $pos and $pos-1 because I was lazy and AutoIt doesn't do structs very easily. If you look at the c examples where I use a struct for tokens, it copies the token, then fills tok with the next one. (or it should do anyway, I might have been lazy there as well)

Global $tok

; int calc(string)
Func calc($expr)
    $tok = lexGetToken($expr)

    Return expression(0)
EndFunc

; int nud(TOKEN)
Func nud($token)
    ...
EndFunc

; int led(TOKEN)
Func led($token)
    ...
EndFunc

; int expression(int)
Func expression($rbp)
    Local $token = $tok  ; Stores the last token
    $tok = lexGetToken() ; Fills the current one

    Local $left = nud($token) ; Rather than using $pos-1

    ...

    return $left
EndFunc

; TOKEN lexGetToken( [string] )
Func lexGetToken($in = "")
    Local static $pos = 0
    Local static $str = $in

    Local $ret[2] ; Token of the form [TYPE, VALUE]

    ...

    Return $ret
EndFunc

The problem is that it gets very messy passing 2 element arrays around all the time. And accessing them is not nearly as easy to read as tok.Type and tok.Val.

Because nud and led never call themselves, only calling expression, it is possible to merge them into a single function, and replace Global $tok with Local Static $tok. expression calls lexGetToken and the user only has to call calc. Nice and clean.

It's also possible to take out expression calling itself completely by using a stack, but that would be pointless in AutoIt. It does work well in asm though.

Edited by Mat

Share this post


Link to post
Share on other sites

btw, Why did you make the distinction between TOK_OP with params and TOK_INT with params instead of just TOK_INT and TOK_ADD, TOK_MUL? Only because python example does it like that?

Sorry, didn't see this.

The reason is that it makes it considerably easier to add and manage operators (including dynamically: More on that later).

You'll see in the C examples that there is an array of operators, with details like flags, precedence and then a pointer to the function to call. When I implemented the lexer, I didn't hard-program the operators in, instead added a pointer to the string containing the operator to that array. The lexer then searches through the operators and returns {TOK_OP,OP_????}. When I add an operator, I add a OP_* constant (optional btw), write the function for the operator and add the operator details to the array. Simple :graduated: It won't get any easier than that. There will be one more operator flag (User) which would mean user defined operators are handled just as easily, with a slightly different pointer to the function.

Additionally, most operators behave the same, and those that are different it is managed using their flags (Right associative, unary and postfix). This makes functions like lbp only a few lines long as it just needs to check that the token is of type TOK_OP and then read the precedence from the table. Obvious exceptions are those that have more than one meaning (+, -).

The last reason is because of the language I am ultimately going to use this for. It allows definition of operators, and an operator is any string that passes ispunct and isn't a special character like parenthesis/brackets.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0

  • Similar Content

    • Fade91
      By Fade91
      Hello, first time poster here
      I am working on a project that has to parse a log file in real time. The thing is I know it's hard for Autoit to attach itself to log files when they're already in use by other programs, at least in my experience.
      I was taking a look at this thread because the log file is quite large and I think Autoit might be a little slow on it's own.

      The thing is I don't know how to use this properly to extract all data out of a log file or is there a native way to do this using Autoit.
      Basically , I just need a log parser that is able to read from a log that is 'already opened' and 'being written to'
       
      Thanks!
    • TheAutomator
      By TheAutomator
      Hi everyone
      Are there any people here that know how to use a parser generator that uses 'bnf' (not 'ebnf'!)
      i'm using the gold parser:
      http://goldparser.org/index.htm
      not possible in autoit:
      https://www.autoitscript.com/forum/topic/167663-activex-gold-parser-in-autoit-is-this-possible/
      What i'm looking for is the best way to parse a list:
      <List> ::= <Item> ',' <List> | <Item> <Item> ::= Number | String for example, the tree returned from:
      'test', 1, 2 is:

      since its not possible in AutoIT code I did it in vb script.
      this is what i did:
      ' in the parse loop: Case Rule_List_Comma set result = new list call result.input(.tokens(0).data,.tokens(2).data) ' the list class: class list private arg0 private arg1 public sub input(a,b) set arg0 = a set arg1 = b end sub private sub push(item,byref stack) redim preserve stack(ubound(stack) + 1) stack(ubound(stack)) = item end sub public function value value = array(arg0.value) value2 = arg1.value if isarray(value2) then for each thing in value2 push thing,value next else push value2,value end if end function end class  
      is there a better way to do this?
       
      regards,
      TheAutomator
    • TheAutomator
      By TheAutomator
      hi everyone
      i have an interesting question about the gold parser:
      info:
      http://www.goldparser.org/index.htm
      download (regsvr32 to register):
      http://www.goldparser.org/engine/1/vb6/index.htm
      on the website it seems that this dll can be used as an activex object,
      does that mean that it can be used in autoit to?
      help for the activeX dll:
      http://www.goldparser.org/engine/1/vb6/doc/index.htm
      it gives me error code '4' if i try to use it... 
      Const $gpMsgAccept = 3 Const $gpMsgCommentBlockRead = 9 Const $gpMsgCommentError = 7 Const $gpMsgCommentLineRead = 10 Const $gpMsgInternalError = 8 Const $gpMsgLexicalError = 5 Const $gpMsgNotLoadedError = 4 Const $gpMsgReduction = 2 Const $gpMsgSyntaxError = 6 Const $gpMsgTokenRead = 1 $Parser = ObjCreate("goldparserengine.goldparser") $Parser.LoadCompiledGrammar("test_script.cgt") $Parser.OpenFile("Program.txt") $Response = $Parser.Parse() MsgBox(0,'test',$Response) If there are people interested in answering or helping feel free to reply and then i will upload the "test_script.cgt" somewhere if you want
      i know this question is a bit specific but you never know.. 
      Thanks for reading!
      TheAutomator.
    • myspacee
      By myspacee
      Hello,
      try to read simple XML and i decided to make generic script.
      I've a value (140403_0_21_00_XX_C044C021_00N_00F) and i MUST retrieve a value in same node id (="21")
      <PubPlain> <spread0 id="1" pdfname="140403_0_01_54_MN_M048C001_00N_00F" /> <spread1 id="2" pdfname="140403_0_02_00_XX_C002C047_00N_00F" /> <spread2 id="3" pdfname="140403_0_03_54_XX_CL16C003_00N_00F" /> <spread3 id="4" pdfname="140403_0_04_54_XX_C004CL15_00N_00F" /> <spread4 id="5" pdfname="140403_0_05_54_XX_ML14M005_00N_00F" /> <spread5 id="6" pdfname="140403_0_06_54_XX_C006CL13_00N_00F" /> <spread6 id="7" pdfname="140403_0_07_54_XX_CL12C007_00N_00F" /> <spread7 id="8" pdfname="140403_0_08_54_XX_C008CL11_00N_00F" /> <spread8 id="9" pdfname="140403_0_09_54_XX_CL10C009_00N_00F" /> <spread9 id="10" pdfname="140403_0_10_54_XX_M010CL09_00N_00F" /> <spread10 id="11" pdfname="140403_0_11_54_XX_CL08C011_00N_00F" /> <spread11 id="12" pdfname="140403_0_12_54_XX_C012CL07_00N_00F" /> <spread12 id="13" pdfname="140403_0_13_54_XX_CL06C013_00N_00F" /> <spread13 id="14" pdfname="140403_0_14_54_XX_C014CL05_00N_00F" /> <spread14 id="15" pdfname="140403_0_15_54_XX_CL04C015_00N_00F" /> <spread15 id="16" pdfname="140403_0_16_54_XX_C016CL03_00N_00F" /> <spread16 id="17" pdfname="140403_0_17_54_XX_CL02C017_00N_00F" /> <spread17 id="18" pdfname="140403_0_18_54_XX_C018CL01_00N_00F" /> <spread18 id="19" pdfname="140403_0_19_00_XX_C046C019_00N_00F" /> <spread19 id="20" pdfname="140403_0_20_00_XX_C020C045_00N_00F" /> <spread20 id="21" pdfname="140403_0_21_00_XX_C044C021_00N_00F" /> <spread21 id="22" pdfname="140403_0_22_00_XX_M022C043_00N_00F" /> <spread22 id="23" pdfname="140403_0_23_00_XX_C042C023_00N_00F" /> <spread23 id="24" pdfname="140403_0_24_00_XX_C024C041_00N_00F" /> <spread24 id="25" pdfname="140403_0_25_00_XX_M040C025_00N_00F" /> <spread25 id="26" pdfname="140403_0_26_00_XX_C026C039_00N_00F" /> <spread26 id="27" pdfname="140403_0_27_00_XX_C038C027_00N_00F" /> <spread27 id="28" pdfname="140403_0_28_00_XX_C028C037_00N_00F" /> <spread28 id="29" pdfname="140403_0_29_00_XX_C036C029_00N_00F" /> <spread29 id="30" pdfname="140403_0_30_00_XX_C030C035_00N_00F" /> <spread30 id="31" pdfname="140403_0_31_00_XX_M034C031_00N_00F" /> <spread31 id="32" pdfname="140403_0_32_00_XX_C032C033_00N_00F" /> </PubPlain> My XMLs are simple but tend to vary in certain conditions. So i haven't any sure foothold, I must read XML everytime.
      I've read several approaches to XML reading but can't determine which best.
      (dead?) _XMLDomWrapper.au3 seems to be the best choice;
      can you suggest best way to implement XML reading in AI ?
      Thank you,
      m.
    • Decipher
      By Decipher
      Hi,

      I'm inviting all autoit forum members to contribute to a HTML parser udf. I going to attempt to replicate a python module called BeautifulSoup. It would be greatly appreciated if some senior Autoit programmers took interest in this topic. There is no template other than the module written in python located here and the documentation here.

      I can't wait to see what this develops into.