Jump to content

Regex: remove all duplicate lines from a sorted file


 Share

Recommended Posts

9 hours ago, Nine said:

Just be careful with the last line.  If it does not have any \R (newline sequence) at the end,.....

Just make this \R optional
BTW a * instead of + allows to remove blank lines  "(?m)^(.*?)\R(\1(\R|$))+"

Edit
False affirmation. Sorry

Edited by mikell
Link to comment
Share on other sites

16 minutes ago, mikell said:

BTW a * instead of + allows to remove blank lines  "(?m)^(.*?)\R(\1\R?)+"

Using \R? instead of \R|$ will not work properly if a line is repeated in another line with additional characters. It will removes the parts of the line and leave the rest behind. Like these entries:

Quote

Bxnnxd xccxxnt
Bxnnxd xccxxnt (xgx-rxlxtxd)
Bxnnxd xccxxnt (nxt xgx-rxlxtxd)

It will remove "Bxnnxd xccxxnt" and leaves the brackets behind.

Link to comment
Share on other sites

  • 2 weeks later...

Thanks, everyone, for your inputs.  The next day it occurred to me that this task may be possible in Excel, and after enquiring about it with some colleagues, I discovered that yes, indeed, this task is super simple in Excel, and faster even than the AutoIt method.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...