Jump to content

Speeding Up Function Lookups


Recommended Posts

David, you probably have a much better idea of actual realistic performance gains by this idea with your experience, so please comment if that is the case.

Anyway, I hate how functions are looked up. That strcmp crap is slow and tedious if you ask me. I much much prefer a Finite State Machine in this situation (Hereafter referred to as FSM). There is one major disadvantage to an FSM, and that is memory. It will use up boatloads of memory by its nature, but, the trade-off is that it only has to look at each character 1 time and then it figures out what it's looking for and is basically a hell of a lot faster. With strcmp, you compare the entire string multiple times before finding a match (Or worse, not finding a match).

Although I don't know how to do all those nice timing's and stuff to see how much of a performance gain it would be, I'm willing to say it would probably be an exponential increase in the time it takes to look up a function, especially any function name located at the end of the list, or a user function as they could be recognized as not being built-in almost instantly (Instead of having to iterat the entire list as done now).

Because an FSM uses a character based approach, each character is read once, then if it is searched for in a big-ass list. If it matches, then it moves to the next character and so on until either you find the function you're looking for or it finds an invalid character. The worst case scenario in an FSM for the number of characters read is actually a full match, because it has to read every character to make the match.

However, the memory requirement is big. It uses a multi-dimensional array (For simplicity, also doable in a single dimension, but requires good math to access the elements). Each row has 1 column for every valid character that may appear in a function name and 1 column for all remaining characters (Terminators, basically). In a worst case scenario with no words that are even close to similar, you will consume this much memory:

(NumCharsInAllWords - (NumWords-1)) * NumColumns

So lets give some numbers:

10 words each completely unique with 10 characters each.

We'll use 27 columsn, a-z plus the terminator column.

You have:

((10*10)-(10-1)) * 27

(100 - 9) * 27

91 * 27 = 2,457 bytes of memory to hold 10 words at 10 letters.

However, this is worst case math. In reality, because of Jon's naming scheme, there is a lot of commonality between functions. Lots start with Win, File, et cetera. That reduces the memory impact some.

I have worked a little on a C++ class to manage an FSM, although it does still need some (lots...) work. I don't think it would be too hard to get some basic hack working enough to test with.

Thoughts from the rest of you on this? Jon, what's your priority, speed or less memory use? David, you have any experience with FSM's or similar?

Link to comment
Share on other sites

  • Replies 64
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

  • Administrators

If we simply had the function names in alphabetical order when could use a binary chop search which would cut down the number of string compares from worst case of 180 (or whatever the number of functions is these days) to, let me see, 90, 45, 22, 11, 5, 2, 1, ah, to an average of 7 stricmps per function lookup with only the smallest amount of programming required.

I was going to do similar for the user function list, so how does that sound?

Link to comment
Share on other sites

I was thinking of using a binary search as well.

The problem with the FSM is the amount of memory it takes and adjusting the list of entries is a pain unless you are using a generator program, like lex or GOLD Parser. Also the code generated by lex can be quite large. I have not really looked GOLD yet, so I have no idea how big its code is.

I would also use a struct to manage the information, rather than 2 arrays: function names and min/max args becomes one struct that can be managed together.

We should manage the keywords, macros as well as the functions in the same way.

// in script.h
struct FuncInfo {
    AString m_name;
    enum FuncList m_lookup;
    unsigned short m_min;
    unsigned short m_max;
};

struct KWInfo {
    AString m_name;
    enum KWList m_lookup;
};

struct MacroInfo {
    AString m_name;
    enum MacroList m_lookup;
};

extern const FuncInfo functions[];
extern const KWInfo keywords[];
extern const MacroInfo macros[];

// in script_lists.cpp (generated)

#include "script.h"

const FuncInfo functions[] = {
    {"abs", F_ABS, 1, 1}, 
    ...
    {"AutoItSetOption", F_AUTOITSETOPTION, 2, 2},
    ....
    {"opt", F_AUTOITSETOPTION, 2, 2},
    ...
    {"WinWaitNotActive", F_WINWAITNOTACTIVE, 1, 3},
    {"", F_MAX, 0,0}   // end of list
};

const KWInfo keywords[] = {
    {"and", K_AND}, 
    ...
    {"while", K_WHILE},
    {"", K_MAX}
};

extern const MacroInfo macros[] = {
    {"AppDataCommonDir", M_APPDATACOMMONDIR},
    ....
    {"Year", M_YEAR},
    {"", M_MAX}
};

This kind of set up will also allow for easy aliasing of functions and keywords, as I did with opt and AutoItSetOption above.

edit: fix spelling/grammar mistakes

Edited by Nutster

David Nuttall
Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius

AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...

Link to comment
Share on other sites

If you use a structure like David suggested, why not totally get rid of the constants and use pointers to functions? That's the route my aborted DLL idea was taking. The function signatures are very similar, most take vParam and vResult, some take iNumParams. It would be quite easy to just make all functions have this signature:

AUT_RESULT Function(VectorVariant &vParams, uint iNumParams, Variant &vResult);

Then you can do run-time binding through ->* or .* to either a global object or an instance of AutoIt_Script that has been passed to the function responsible for calling things (Hell, if done in the parser as it currently is, you can just bind the function pointer with *this and there you go). This would eliminate the entire case structure, which is like a billion lines of code.

Plus, to me, it makes sense that each function should be represented by an object which contains all the necessary information to activate that function. That includes the name, min/max parameters and the pointer to the actual function that does the work.

Plus, wouldn't it be so nice to just write:

// Incoming parameters:
// const char *szName - Function name
// vParams, iNumParams, vResult - We all know what these are.
SCRIPT_FUNCTION *pfn;  // Would be a typedef of the above signature

pfn = Lookup(szName)    // Returns NULL if not found

if (pfn != NULL)
    this->*pfn(vParams, iNumParams, vResult)  // Call pointer to member
else
{
    // Must be a user function or non-existent function
}

That code right there would replace a several hundred line case structure.

Link to comment
Share on other sites

  • Administrators

I had a quick try with stuff like:

typedef AUT_RESULT (*AU3_FUNCTION)(VectorVariant &vParams, uint iNumParams, Variant &vResult);

typedef struct

{

AString sName; // Function name

AU3_FUNCTION lpFunc; // Pointer to function

uchar cMin; // Min params

uchar cMax; // Max params

} AU3_FuncInfo;

Declared with (because you seem to need static when pre-declaring member vars, as in the current function/keyword lists. That is really lame and seems nuts that you can't pre initialize any member data without using a constuctor)

static AU3_FuncInfo m_FuncList[];

AU3_FuncInfo AutoIt_Script::m_FuncList[] = {

{"RandomTest", &Random, 0, 2}

};

This bombs with the &Random until you change the Random function to a static too (Like WndProc). And of course if we do that then every function will have to access other member vars and functions with the whole g_oScript.nnnn syntax.

It makes some sense because the static declaration of the array means that there is only one copy no matter how many instances there are, so it can't contain a pointer for a specific instance...

Any other ideas? :D

Link to comment
Share on other sites

Jon, doesn't it work if you call it like this (This assumes you're calling from a non-static member function, such as Parser_FunctionCall())

this->*m_FuncList[0].lpFunc(vParams, iNumParams, vResult);

This binds the pointer to the function (lpFunc) to an actual instantiated object (this).

Link to comment
Share on other sites

Ohhhh, duh! You typedef'd it wrong, it should be:

typedef AUT_RESULT (AutoIt_Script::*AU3_FUNCTION)(VectorVariant &vParams, uint iNumParams, Variant &vResult);

You forgot to tell the typedef you wanted a pointer to a function in the AutoIt_Script class. It was just looking for a regular ole function, not a member.

Link to comment
Share on other sites

  • Administrators

Still can't get it to work. If I do the typedef like that then I get errors because it has no ida what "AutoIt_Script" is. I need to typedef it before as the Autoit_Script class contains a member array that uses the typedef....

I wonder if I can replace the lpFunc member of the struct with a void * instead...

Link to comment
Share on other sites

Just forward declare the class. At the top of whatever file that is, place:

class AutoIt_Script;

If that doesn't fix it, let me know. I know this can be done, because this is the same method I used to create that DLL framework.

Link to comment
Share on other sites

Jon, this doesn't need declared in the header file before the class... Why don't you just make the list a static member of AutoIt_Script, then move the definition to the .cpp file at the top of it? Here's a small quick example I wrote:

#include <iostream>
#include <string>

class Cow;  // Forward declare so we can typedef

typedef void (Cow::*PCOW)(const char *);
struct info
{
    std::string name;
    PCOW lpFunc;
};

class Cow
{
public:
    void Speak(const char *say)
    {
        std::cout<<say<<std::endl;
    }
    static info list[];
};

// This definition could be in another file, the linker will find it.
info Cow::list[] = {
    {"Betsy", Speak}  // Doesn't need qualified because list[] is a member
};


int main()
{
    Cow *c = new Cow;
    std::cout<<c->list[0].name<<std::endl;
    (c->*c->list[0].lpFunc)("MOOOOOOOOOOOO!");
    delete c;
    return 0;
}
Edited by Valik
Link to comment
Share on other sites

  • Administrators

Does that code run? From what I've been told the static member variable is created once, and once only _before_ the instance of the class is created. So how can it contain a pointer to a function that has no memory address yet... I know that you are using another pointer in the actual call, but what _is_ lpFunc being set to when it is initialized?

That or I've been lied to about static members.

Link to comment
Share on other sites

Yes, that code runs. I wrote and tested it before I posted it. It's not using the address of the function. That's not how pointer to members work. Taken from The C++ Programming Language Special Edition:

Just like ordinary pointers to functions, pointers to member functions are used when we need to refer to a function without having to know its name.  However, a pointer to member isn't a pointer to a piece of memory the way a pointer to a variable or a pointer to a function is.  It is more like an offset into a structure or an index into an array.  When a pointer to member is combined with a pointer to an object of the right type, it yields something that identifies a particular member of a particular object.

Link to comment
Share on other sites

  • Administrators

Ah, that was the only other thing I could think of.

I really should get around to buying a c++ book to update my google-learnt c++ :D

Thanks, V. I'll have a play tommorrow. And at least with this sort of thing it is easy to implement and test alongside the existing code. :huh2:

Link to comment
Share on other sites

I would recommend The C++ Programming Language Special Edition. It was the first programming book I read (BAD IDEA, not an easy starter book). I didn't quite get a lot of things, but it's great for reference and I really need to re-read it so I can get more out of it. It's definitely a good book, though. I picked the above pointer-to-member trick from it.

Also, have you read either of Bruce Eckel's free books? Thinking in C++ 1 and 2 are both good (Well, 1 is, I have only read the Concurrency chapter in volume 2, but it was good). They are free to download from Mindview.net.

Bjarne's book is rather pricey though (Or was when I bought it). I got it use from some Amazon affiliate or something for ~$40-$50 (Like $85 new...). The book was in new condition, though.

Link to comment
Share on other sites

Now where did I put the C/C++ textbooks I wrote? :DHonest, I did!

Do not put the & on the function name when making a function pointer.

double (*sf)(double) = sin; // declare a function pointer called sf that takes a double and returns a double. sf is initialized to point to sin.

x=sf(y); // call sin with y, returning to x

sf=tan; // assign a new function

x=sf(y); // call tan with y, returning to x

this->* should not be needed inside the function. Just use the same notation as above.

AU3_FuncInfo AutoIt_Script::m_FuncList[] = {
{"RandomTest", Random, 0, 2}  // remember case sensitivity
};

int FuncListSize = sizeof(m_FuncList)/sizeof(AU3_FuncInfo));

// inside AutoIt_Script
   if (funcname == m_FuncList[i] && iNumParams >= m_Min && iNumParams <= m_Max) {
       if (m_FuncList[i].lpFunc(vParams, iNumParams, vResult)==AUT_ERR) {
             // screw this.  Abort!!!
       }
       return vResult;
   }
Edited by Nutster

David Nuttall
Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius

AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...

Link to comment
Share on other sites

That doesn't work here, David. The compiler will complain:

error C2440: 'initializing' : cannot convert from 'void (__thiscall Cow::D )(const char *)' to 'void (__cdecl *)(const char *)'

The only 2 ways around that:

1) Make each function static (As Jon already mentioned)

2) Use pointer-to-members.

That brings me to *this. A pointer-to-member is absolutely useless by itself. It requires an object to be bound to. In this case, *this is a good match because it is of the right type and is an instantiated object. g_oScript would also be a safe choice, although, that uses a global variable which we want to avoid.

Your idea would be fine were we not calling class members, but I don't see how to do it like that when we are calling them.

Link to comment
Share on other sites

  • Administrators

:D

I just called a MsgBox function via:

return (this->*m_FuncList[nFunction].lpFunc)(vParams, iNumParams, vResult);

I find myself giggling in a troubling way.

I'll have to rework about 20% of the functions (single line funcs, Larry's magic number funcs :huh2::), not same 3 parameters etc) but even so this is very cool. Surely a couple of KB off the exe size.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...