Jump to content



Photo

StringRegexp beating stringinstr() and c++ iostream by miles


  • Please log in to reply
3 replies to this topic

#1 Aktonius

Aktonius

    Prodigy

  • Active Members
  • PipPipPip
  • 182 posts

Posted 02 July 2012 - 05:42 PM

If stringinstr($text, "substr") on a big text(the one i use) will average about 150-200 milisecs to loop through the text and finally finishing scanning text for string.

The following c++ example will average about 15-30 milisecs.

Plain Text         
#include "stdafx.h" #include <iostream> #include <fstream> #include <string> using namespace std; int _tmain(int argc, _TCHAR* argv[]) { string line; size_t found; ifstream myfile ("C:/Users/xxx/xxx/xxx/x/c++examples/io/iotexst/iotexst/file.xml"); if (myfile.is_open()) { while ( myfile.good() ) { getline (myfile,line); found = line.find("something that cant be found so we put a real test"); if (found!=string::npos) cout << "first 'name' found at: " << int(found) << endl; } myfile.close(); } else cout << "Unable to open file"; return 0; //return 0; }



But what is most interesting of all that on the same file readed into string while using if StringRegexp(readed_text, "substr",0) the average time to finish will be 3-8 milisecs!

My first guess is that stringregexp is actually that faster because of the way it figures out that the text cant be found by first trying to match the starting chars of the substring.

Edited by Aktonius, 02 July 2012 - 05:44 PM.






#2 Mat

Mat

    43 38 48 31 30 4E 34 4F 32

  • MVPs
  • 4,040 posts

Posted 02 July 2012 - 07:56 PM

They should be using the kmp algorithm, so I am very surprised they are that slow. Particularly the C code. At the end of the day you have to compile the regex so it should always be possible to write equivalent code, or faster, in a low level language.

Edited by Mat, 02 July 2012 - 08:08 PM.

I don't know where I'm going, but I'm on my way.


#3 JohnQSmith

JohnQSmith

    Polymath

  • Active Members
  • PipPipPipPip
  • 226 posts

Posted 02 July 2012 - 08:04 PM

on a big text

Have you considered the impact of caching? Try timing the same scan 1000 times for each method and see what the results are.

#4 Shaggi

Shaggi

    Universalist

  • Active Members
  • PipPipPipPipPip
  • 296 posts

Posted 11 July 2012 - 02:11 AM

if you read the whole text and scanning it, instead of doing sequential reads, the c++ example will be much quicker.
im guessing, for comparison, you autoit script doesn't do something like:
while not @error stringregexp(filereadline(file), ...)

Ever wanted to call functions in another process? ProcessCall UDFConsole stuff: Console UDFC Preprocessor for AutoIt OMG




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users