Sign in to follow this  
Followers 0
Xenobiologist

StringRegExp and StringRegExpReplace

30 posts in this topic

Hi,

I'm using StringRegExp and StringRegExpReplace for getting info from big text files.

I did not get an error till now, but just to know - are there any limitations?

Buffer size for backreferences or buffer size for something else.

Thanks!

Mega


Scripts & functions Organize Includes Let Scite organize the include files

Yahtzee The game "Yahtzee" (Kniffel, DiceLion)

LoginWrapper Secure scripts by adding a query (authentication)

_RunOnlyOnThis UDF Make sure that a script can only be executed on ... (Windows / HD / ...)

Internet-Café Server/Client Application Open CD, Start Browser, Lock remote client, etc.

MultipleFuncsWithOneHotkey Start different funcs by hitting one hotkey different times

Share this post


Link to post
Share on other sites



#3 ·  Posted (edited)

LIMITATIONS There are some size limitations in PCRE but it is hoped that they will never in practice be relevant.
 The maximum length of a compiled pattern is 65539 (sic) bytes if PCRE is compiled with the default internal linkage 
size of 2. If you want to process regular expressions that are truly enormous, you can compile PCRE with an internal 
linkage size of 3 or 4 (see the README file in the source distribution and the pcrebuild documentation for details). 
In these cases the limit is substantially larger. However, the speed of execution is slower. All values in repeating 
quantifiers must be less than 65536. There is no limit to the number of parenthesized subpatterns, but there can 
be no more than 65535 capturing subpatterns. The maximum length of name for a named subpattern is 32 characters, 
and the maximum number of named subpatterns is 10000. The maximum length of a subject string is the largest 
positive number that an integer variable can hold. However, when using the traditional matching function, PCRE 
uses 
recursion to handle subpatterns and indef- inite repetition. This means that the available stack space may limit the 
size of a subject string that can be processed by certain patterns. For a discussion of stack issues, see the 
pcrestack 
documentation.

Thanks, then these are the limitations. I hope Autoit's implementation does not add some mroe ;)

Mega

Edited by Xenobiologist

Scripts & functions Organize Includes Let Scite organize the include files

Yahtzee The game "Yahtzee" (Kniffel, DiceLion)

LoginWrapper Secure scripts by adding a query (authentication)

_RunOnlyOnThis UDF Make sure that a script can only be executed on ... (Windows / HD / ...)

Internet-Café Server/Client Application Open CD, Start Browser, Lock remote client, etc.

MultipleFuncsWithOneHotkey Start different funcs by hitting one hotkey different times

Share this post


Link to post
Share on other sites

Handling big files keep in mind the string size limit ;)... as far as I remember it depends on the installed memory size, on my notebook this code

$string = ""
while 1
for $i = 0 to 999999
    $string &= "a"
next
ConsoleWrite(BinaryLen($string)/1024/1024 & @crlf)
wend

kicked me out at ~ 236 MB with an "Error allocating memory".

According to help-file also Maximum string length is 2.147.483.647 characters (if you've got enough spare RAM :evil:).

Share this post


Link to post
Share on other sites

We also work in unicode, so 1 char = 2 bytes. You'll almost always hit memory limits before our own data structure limits come into play. And also, we copy things around internally so a string stored in a variable and then copied/used will likely allocate a larger 2nd copy temporarily.

Share this post


Link to post
Share on other sites

And also, we copy things around internally so a string stored in a variable and then copied/used will likely allocate a larger 2nd copy temporarily.

I've noticed this watching the memory usage of KaFu's script. The memory usage goes up very high and then drops to normal levels again.

Share this post


Link to post
Share on other sites

#7 ·  Posted (edited)

kicked me out at ~ 236 MB with an "Error allocating memory".

I got to 888 MB :idea: It helps a lot to have x64 and a bucketload of memory (6GB and the standard pagefile of 9GB).

That was running with x86 AutoIt by the way.

For kicks I tried with x64, and after around 70 minutes gave up, I was bored. Took like 10-30 seconds per megabyte ;)

The whole PC was brought down to it's knees :evil:

Posted Image

Task manager had the update speed set to "High" :evil:

Edited by AdmiralAlkex

Share this post


Link to post
Share on other sites

I got to 888 MB

This is pretty damn close to the max you'll get in a 32-bit application. Especially after fragmentation and what-not. There 32-bit address space is only 2GB so the maximum string you can get before you run out of RAM is 1GB since appending to a 1GB string requires a second 1GB copy to be made and that will exhaust the available address space.

Share this post


Link to post
Share on other sites

#9 ·  Posted (edited)

This is pretty damn close to the max you'll get in a 32-bit application. Especially after fragmentation and what-not. There 32-bit address space is only 2GB so the maximum string you can get before you run out of RAM is 1GB since appending to a 1GB string requires a second 1GB copy to be made and that will exhaust the available address space.

I must be missing something here, what you said doesn't make much (any) sense. Testing was done on Win7 (rc) x64 and AutoIt has the "large address aware" flag set, so the max is 4GB. Or? ;)

Edit: I'm talking about the address space here (but I hope you get that).

Edited by AdmiralAlkex

Share this post


Link to post
Share on other sites

I must be missing something here, what you said doesn't make much (any) sense.

Um, pardon me? It makes perfect sense since:

  • You implied you were using 32-bit since the rest of your post goes on to talk about the behavior of a 64-bit process.
  • You do not mention if you have large address support enabled for your OS. Both the flag in the application header needs set (It is for AutoIt) and the boot-time flag for Windows.

Testing was done on Win7 (rc) x64 and AutoIt has the "large address aware" flag set, so the max is 4GB. Or? ;)

It's not 4GB, it's 3GB.

Share this post


Link to post
Share on other sites

I'm just more confused now. When I said "x86 AutoIt", I was referring to AutoIt3.exe, not Windows. So the limit should be 4GB.

What the fuck are you talking about? 32-bit processes are capped at a 2GB address space. If the application AND Windows are both configured correctly then the address space is extended to 3GB. The available address space has absolutely fuck all to do with the amount of RAM in the system (which is capped at 4GB for licensing reasons in 32-bit Windows).

Share this post


Link to post
Share on other sites

What the fuck are you talking about? 32-bit processes are capped at a 2GB address space. If the application AND Windows are both configured correctly then the address space is extended to 3GB. The available address space has absolutely fuck all to do with the amount of RAM in the system (which is capped at 4GB for licensing reasons in 32-bit Windows).

Valik, lets try this again.

Windows x64.

AutoIt x86 with "large address aware" set.

That means the address space should be 4gb. LINK

And I already have proof for this, look at the 888mb big string, first of all it is Unicode so it takes 1776MB of memory. Then when you add to it, it creates a copy. So it had to use 3552 mb!

Hell I could rerun the script above and take a screenshot if you want.

The limit IS 4GB.

Share this post


Link to post
Share on other sites

Getting information from you is like sticking my dick in a grinder.

Valik, lets try this again.

Windows x64.

AutoIt x86 with "large address aware" set.

That means the address space should be 4gb. LINK

Alright, fine. Maybe that page is newer than the one I read that said the limit was 3GB.

And I already have proof for this, look at the 888mb big string, first of all it is Unicode so it takes 1776MB of memory. Then when you add to it, it creates a copy. So it had to use 3552 mb!

Hell I could rerun the script above and take a screenshot if you want.

The limit IS 4GB.

What the fuck are you talking about now? You can't have an 888MB string that takes up 1776MB of memory. Do you mean you have an 931135488 character string that takes up 1776MB of memory?

If you caused all this fucking confusion because you idiotically referred to the length of the string in units of size I'm going to punch your in the soul so hard everyone in your family will feel it.

Share this post


Link to post
Share on other sites

Getting information from you is like sticking my dick in a grinder.

Sounds fun.

Alright, fine. Maybe that page is newer than the one I read that said the limit was 3GB.

Did you miss this?

"Up to 3 GB with IMAGE_FILE_LARGE_ADDRESS_AWARE and 4GT"

That is what you're talking about. 32-bit addresses on a 32-bit OS.

What the fuck are you talking about now? You can't have an 888MB string that takes up 1776MB of memory. Do you mean you have an 931135488 character string that takes up 1776MB of memory?

If you caused all this fucking confusion because you idiotically referred to the length of the string in units of size I'm going to punch your in the soul so hard everyone in your family will feel it.

How the fuck can you blaim me for that? You can't blaim me if KaFu can't count!

We have been talking about the 4GB limit, which you apparently didn't even know about. It has been there always. That's a pretty large time.

Share this post


Link to post
Share on other sites

#16 ·  Posted (edited)

Did you miss this?

"Up to 3 GB with IMAGE_FILE_LARGE_ADDRESS_AWARE and 4GT"

That is what you're talking about. 32-bit addresses on a 32-bit OS.

Right, I've already admitted the page I was reading didn't explain there was a difference with the 64-bit version of Windows.

How the fuck can you blaim me for that? You can't blaim me if KaFu can't count!

You ran the code and reported the results without verifying it. Since you were the only person here who knew you should be operating in a 4GB address space (more on this later) you should have recognized right off the bat the number was fishy.

We have been talking about the 4GB limit, which you apparently didn't even know about.

Correct, because it wasn't until much later in this thread that you implied (and you still have not confirmed) that large address support was enabled in your OS. It's kind of an important detail to mention, don't you think?

It has been there always. That's a pretty large time.

Okay, this statement is just absolute bullshit. First of all, as you yourself mention, the limit is 3GB on a 32-bit OS. Since up until ~2002 or so (I'm sure this date is wrong, I don't give a shit, the point isn't changed) Microsoft didn't have a 64-bit version of their OS that right there eliminates "always". Second point, large address support requires the OS to be configured. Unless the switch is on by default on a 64-bit OS, that still means the vast majority of 64-bit machines still have their 32-bit processes capped at a 2GB limit. Third point, would it have fucking killed you to mention "with large address support" enabled? I mean, as I mentioned previously, it's kind of a fucking important detail. If it's on by default that's fine but don't just assume everybody is aware of that little technical tidbit. If it's not on and you turned it on but neglected to mention it... well thanks for changing the rules and not telling anybody. Edited by Valik
Typo

Share this post


Link to post
Share on other sites

Well in my defence, if I'm allowed to have any, is that the IncreaseUserVA is not set at all. This is the default behaviour for x86 code under a x64 Windows.

Feel free to check yourself if you have any installed (but I'm guessing you're not).

Kids *sigh*...

Hey I'm twenty! ;)

Share this post


Link to post
Share on other sites

Okay guys, thanks for the discussion how much RAM a process is able to deal with under a 32 bit OS. ;)

What error message will occur when a RegExp function fails for the reason mentioned above?


Scripts & functions Organize Includes Let Scite organize the include files

Yahtzee The game "Yahtzee" (Kniffel, DiceLion)

LoginWrapper Secure scripts by adding a query (authentication)

_RunOnlyOnThis UDF Make sure that a script can only be executed on ... (Windows / HD / ...)

Internet-Café Server/Client Application Open CD, Start Browser, Lock remote client, etc.

MultipleFuncsWithOneHotkey Start different funcs by hitting one hotkey different times

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0