Modify

Opened 10 years ago

Closed 10 years ago

Last modified 10 years ago

#1267 closed Feature Request (Rejected)

Regular expression - Specify line endings default code to CRLF instead of LF

Reported by: Mison Owned by:
Milestone: Component: AutoIt
Version: Severity: None
Keywords: Cc:

Description

Multiline syntax(?m) is pretty useless when used with string entered into an Edit control because Edit uses CRLF as line ending(+new line) while regular expression engine uses LF.

On the other words, a string is always a single line string, even when using multi-line mode.

Attachments (0)

Change History (4)

comment:1 follow-up: Changed 10 years ago by Valik

  • Resolution set to Rejected
  • Status changed from new to closed

Uh, you're missing something rather obvious here. A CRLF still contains a LF so lines still break in the correct place as far as the regex engine is concerned. They just have an extra CR in there as far as you are concerned. You also have the option of using StringStripCR() to remove the CR characters.

comment:2 in reply to: ↑ 1 Changed 10 years ago by Mison

Thanks for the prompt reply, Valik.

This is my example code, works fine with LF but not with CRLF.

#include <array.au3>

$string = "abc"&@CRLF&"def"&@CRLF&"ghi" ; not ok
;~ $string = "abc"&@LF&"def"&@LF&"ghi" ; ok

$string_array = StringRegExp($string,"(?m)\w+$",3)
_ArrayDisplay($string_array)")

And I thought it's because of this:

By  default,  PCRE interprets the linefeed (LF) character as indicating
the end of a line. This is the normal newline  character  on  Unix-like
systems.  You  can compile PCRE to use carriage return (CR) instead, by
adding...

I'm sorry if this request is not relevant.

comment:3 Changed 10 years ago by Valik

The following pattern works with both CRLF and LF terminated strings: "(?m)\w+\r?$"

Your pattern matches word characters immediately followed by a newline. But a CRLF line-ending means between the word and the newline is a CR. Your pattern does not account for this character and so your pattern doesn't match. This is correct. The only reason you get any result at all with your original pattern is $ also matches end-of-string.

My pattern allows for an optional CR prior to the LF. You may be better served to ignore all trailing whitespace instead but I don't know your requirements.

comment:4 Changed 10 years ago by Mison

It's true that your pattern will matches any word characters followed by CR(optional) and end-of-line, but any occurrence of CR will be included in the match(1st and 2nd line). This pattern will excludes CR, "(?m)\w+(?=\r?$)", with 4 characters more (I know you guys will have to write definitely more than just 4 characters to implement my request, and I fully understand if you refuse)

Anyway, last night, I finally have found a solution to this problem. The PCRE manual states that "specify a newline convention by starting a pattern.. override the default and the options given to pcre_compile()".

So this pattern will just do the trick "(*ANYCRLF)(?m)\w+$" and (now)I think it's best to left newline default the way it is.

Guidelines for posting comments:

  • You cannot re-open a ticket but you may still leave a comment if you have additional information to add.
  • In-depth discussions should take place on the forum.

For more information see the full version of the ticket guidelines here.

Add Comment

Modify Ticket

Action
as closed The ticket will remain with no owner.
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.