Jump to content

stupid Perl regexp Compatability on Lookbehind assertions like (?<=ab(c|de))


Recommended Posts

Posted

I have been working on this for two days debugging, with no clue at all, till I found out in the document telling me that the Perl Engine (which autoit uses) does not allow grammar like (?<=ab(c|de)), please refer to the docu below:

http://www.autoitscript.com/autoit3/pcrepattern.html

"

However, if there are several top-level alternatives, they do not all have to have the same fixed length. Thus

(?<=bullock|donkey)

is permitted, but

(?<!dogs?|cats?)

causes an error at compile time. Branches that match different length strings are permitted only at the top level of a lookbehind assertion. This is an extension compared with Perl (at least for 5.8), which requires all branches to match the same length of string. An assertion such as

(?<=ab(c|de))

is not permitted, because its single top-level branch can match two different lengths, but it is acceptable if rewritten to use two top-level branches:

(?<=abc|abde)

"

So is there any alternaltive way to do the task like (?<=\w+) . Thanks,

Posted

Your "(?<=ab(c|de))" is just rewritten as "(?<=abc|abde)", and then it works as explained in that doc.

Your "(?<=\w+)" doesn't make any sense. What are you trying to get?

;)

Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law
Posted

Your "(?<=ab(c|de))" is just rewritten as "(?<=abc|abde)", and then it works as explained in that doc.

Your "(?<=\w+)" doesn't make any sense. What are you trying to get?

;)

Thanks for replying.

Actually I want to parse an HTML code and retrieves those words between the HTML tags, e.g.

<a> this is a program</a>

<b> to retrieve the sentences bwteen tags and break down into words</b>

will be broken down into

this

is

a

program

to

retrieve

the

sentences

bwteen

tags

and

break

down

into

words

by using the Regular Expression

(?<=>[^<>]*)([^<>\s\n\r]+)(?=[^<>]*<)

The Regular Expression I am using works well in some parser but not in AutoIt.

Please see the attachement. thanks,

post-54973-12623258321856_thumb.jpg

Posted

#Include<array.au3> ;; For _ArrayDisplay() only
$sTxt = "<a> this is a program</a>" & @LF
$sTxt &= "<b> to retrieve the sentences bwteen tags and break down into words</b>"

$aMatch = StringRegExp(StringRegExpReplace($sTxt, "(?i)(<.+?>)", ""), "(?i)(\b\w+\b)", 3)
If NOT @Error Then
    _ArrayDisplay($aMatch)
Else
    MsgBox(0, "Error", "No match was found")
EndIf

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...