Jump to content

is there limitations on REGEX match extractions?


Recommended Posts

I know this isn't a REGEX (regular expressions) forum, but we have a very bright and knowledgable group, whom I have no doubt could answer this for me.

(not pertaining to any limitations that may exist Autoit, but REGEX itself)

I'm trying to use regular expressions to extract the first 500 chars of a XML field, storing it into the \n (REGEX's \1-\9).

I AM able to extract upto 254 characters, but nothing over that.

Does anyone know of REGEX having limitations of 254, and not being able to work with anything over that?

Here's the description from server's Application event, that we want to extract the 1st 500 chars of the "exception" field:

(the length of text WITHIN the following sample is 450chars.)

...<Exception>Slm.PrivateConsolidation.PreQualifyException: 10: Illegal Characters Borrower Employer^Illegal Characters in Borrower annualIncomeSchedule at Slm.Interfaces.PCW.OvationConsolidationManager.PreQualifyApplication(Application app) in C:\Projects\PCC\Release\src\slm\interfaces\PCW\OvationConsolidationManager.cs:line 149 at Slm.Batch.PCC.PrequalifyApplications.PreQualify() in C:\Projects\PCC\Release\src\slm\batch\PCC\PrequalifyApplications.cs:line 118</Exception>...

I'm using the following REGEX expression (Against the above sample to catch upto the 1st 254 chars of the exception field):

.*<Exception>\(\(.\{1,254\}\)</Exception>\).*

But the REGEX fails if I change the above range to anything above 254.

I'm being told by a software vendor (which we've addressed this concern to them) that this is a limitation of REGEX itself, but a colleague of mine has shown me tests he's performed indicating it isn't a REGEX-imposed limitation, but the vendors implementation of it in their product.

Does anyone know of any such limitations on REGEX 'range' functionality?

Any assistance provided would be greatly appreciated!

(even if just an URL where such a limitation may be confirmed.)

Thanks!

Van Renier

Link to comment
Share on other sites

I think it is a vendor limitation. It will work with AutoIt's PCRE implementation as this sample shows.

Also note that I have changed your original regexp to match AutoIt syntax. And I do think you have a logical error in your original regexp (could be because you provided it as an sample?)

;NOTE: Requires SciTe for debugging
testRegexp()
Func testRegexp()
    $data = '...<Exception>Slm.PrivateConsolidation.PreQualifyException: 10: Illegal Characters Borrower Employer^Illegal Characters in Borrower annualIncomeSchedule at Slm.Interfaces.PCW.OvationConsolidationManager.PreQualifyApplication(Application app) in C:\Projects\PCC\Release\src\slm\interfaces\PCW\OvationConsolidationManager.cs:line 149 at Slm.Batch.PCC.PrequalifyApplications.PreQualify() in C:\Projects\PCC\Release\src\slm\batch\PCC\PrequalifyApplications.cs:line 118 This text is added to islustrate a lengthy match</Exception>...   '
;The original regexp could have a logical error as it requires a certain number of chars between two identifiers.
    $regexp = '.*<Exception>\(\(.\{1,254\}\)</Exception>\).*'
    $regexp = '.*<Exception>(.{0,270}).*</Exception>'
    $res = StringRegExp($data, $regexp, 3)
    dbgarr($res)
    dbg("StringLen($res[0]) = " & Stringlen($res[0]))
EndFunc
Func dbgarr($arr, $line=@ScriptLineNumber, $err=@error, $ext=@extended)
    Local $i
    If IsArray($arr) Then 
        For $i = 0 to UBound($arr, 0) - 1
            dbg("dbgarr[" & $i & "]:=" & $arr[$i])
        Next
    EndIf
EndFunc
Func dbg($msg, $line=@ScriptLineNumber, $err=@error, $ext=@extended)
    ConsoleWrite("(" & $line & ") := (" & $err & ")(" & $ext & ") : " & $msg & @CRLF)
EndFunc

My output form this is:

>"D:\portableapps\PortableApps\autoit-v3.2.3.0\SciTe\..\autoit3.exe" /ErrorStdOut "C:\slettes\test.au3" 
(28) := (0)(0) : dbgarr[0]:=Slm.PrivateConsolidation.PreQualifyException: 10: Illegal Characters Borrower Employer^Illegal Characters in Borrower annualIncomeSchedule at Slm.Interfaces.PCW.OvationConsolidationManager.PreQualifyApplication(Application app) in C:\Projects\PCC\Release\src\slm\interfa
(9) := (0)(0) : StringLen($res[0]) = 270
>Exit code: 0   Time: 0.299
Edited by Uten
Link to comment
Share on other sites

I think it is a vendor limitation. It will work with AutoIt's PCRE implementation as this sample shows.

Also note that I have changed your original regexp to match AutoIt syntax. And I do think you have a logical error in your original regexp (could be because you provided it as an sample?)

Thanks. alot, Uten!

I thought it was a vendor-imposed limitation, and not REGEX itself.

Thanks for pointing out the error in the logical expression, too.

I was cutting/pasting a lot during my testing, and forgot to remove that last '\)'.

Appreciate the time,effort, and support!

Van

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...