Jump to content

Seeking better way to eliminate multiple blank lines


Recommended Posts

As I've worked more with StringRegExpReplace statements, I've come to believe they can do just about any pattern matching and replacement ... and that I'm limited only by my own knowledge of the "language". But I've hit a wall on trying to even get a handle on one fairly basic task:

I need a way to replace MULTIPLE occurrences of blank lines (single blank lines are fine).

A central problem of my particular case is that the blank lines I'm dealing with can be of all different kinds: an LF ... a CRLF ... a blank + LF ... two blanks + LF ... and so on and so on.  All that's definite is that the blank lines don't have any letters, numbers or symbols. IOW, there's nothing to "see" on the line.

What I've resorted to is this array-based approach ... which works for my current use:

Local $finalPass = StringSplit($article, @LF, 2)        ; 2 = no count in [0]
Local $uMax = UBound($finalPass) - 1
Local $delMode = False
For $i = $uMax to 0 Step -1
    $finalPass[$i] = StringStripWS($finalPass[$i], 3)
    If $finalPass[$i] = "" Then
        If $delMode Then
            _ArrayDelete($finalPass, $i)
        Else
            $delMode = True
        EndIf
    Else
        $delMode = False
    EndIf
Next
$article = _ArrayToString($finalPass, @CRLF)

But does anyone know of a sure-fire RegEx statement that can get the same result?  (It's OK if there's not one. But I'd sure like to know if there is.)

Thanks in advance for any help.

 

Link to comment
Share on other sites

Try this:

Local $s = @LF & @CRLF & "   " & @CR & "   aaa" & @LF & "bbb" & @CRLF & " " & @LF & @CRLF & "   " & @CR & "ccc" & @LF & "ddd" & @CR & "   " & @LF & "eee" & @LF & "fff" & @LF & "ggg" & @LF & @CRLF & "   "
ConsoleWrite(StringRegExpReplace($s, "^(\h*\R)+|\R\K((?:\h*\R\h*)+)(?=\R)|(\R\h*)*$", "") & @LF)

In fact this isn't as easy as it first looks and several ambiguities remain.

First line termination can be LF, CR or CRLF, but all three are gracefully dealt with by \R.

Then one must specify what "blank" means.  Do unbreakable whitespaces count?  Can you expect to hit a FF or a VT?  What about a number of special Unicode codepoints that may render a blank line?
All this can be handled by adding what's needed to match what can be expected.  Here I've considered only \h class.

That said there are three cases of "unwanted blank lines" and details depend on what you want exactly.

First blank lines before any text, at top of subject.  I've made the convention that you'd want to remove them all.  This is matched by the first part of the alternation: ^(\h*\R)+
But you might consider that a single leading blank line is to be kept verbatim.

Then extraneous blank lines after the end of text.  Again I made the convention that you'd want to remove anything not "meaningful". This is dealt with by the last part of the alternation: (\R\h*)*$

Finally, we need a pattern to remove multiple blank lines in the subject body.  This is done by the middle part of the alternation: \R\K((?:\h*\R\h*)+)(?=\R)
Note the use of \K to "forget" what we have just matched.  This is because the lookbehind assertions can only span a fixed-length subpattern and neither \R nor \h* are fixed-length.

All in all the pattern as it is isn't perfect for all use cases but would probably do what you want in real life.  It's quite possible that a simpler pattern can be built, using same or slightly different conventions.

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

not going to pretend to know why any of my regexes do anything, but this appears to remove all the duplicate 'blank' lines, but if there are spaces it keeps them and places them before the next non-space value.  So it does the thing, but is also kind of quirky about it :)

#include<array.au3>

Local $s = @LF & @CRLF & "   " & @CR & "   aaa" & @LF & "bbb" & @CRLF & "  " & @LF & @CRLF & "        " & @CR & "ccc" & @LF & "ddd" & @CR & " " & @LF & "eee" & @LF & "fff" & @LF & "ggg" & @LF & @CRLF & "   "

_ArrayDisplay(StringRegExp($s , "(\s+(?:\S+?)+)" , 3))

 

Edited by iamtheky

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Link to comment
Share on other sites

Problem is: that does remove ALL blank lines.

Op said: (single blank lines are fine)

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

oh, quite right.  I was thinking single line feeds...carry on.

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Link to comment
Share on other sites

As stated I already distorded this requirement a bit by removing blank lines at begin & end of text, which I suppose is correct.  Yet it isn't hard to modify the pattern to allow only one blank line there, if needed.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

@jchd: First, let me say thanks.  1) your solution is quite remarkable in what it does and 2) it does fill the need I have.

Here are clarifications on the points you've brought up:

• do unbreakable white spaces count? ... no, they're aren't in the picture.
• can you expect to hit a FF or a VT? ... same: they shouldn't occur.
• special Unicode code points that may render a blank line? ... I'm not expecting any.

• blank lines before any text, at top of subject ... I prefer that they evaporate.
• extraneous blank lines after the end of text ... likewise: they can be dropped.

My initial tests show that—when using "" as the replacement string—it does eliminate the extra lines.  And with what you've outlined, above, I can experiment with any fringe cases I find — plus, strive to understand the aspects of what you're provided.  I'll admit that your solution is beyond anything I was trying.

I appreciate what you've provided. Thanks.

 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...