qwert Posted June 8, 2019 Share Posted June 8, 2019 As I've worked more with StringRegExpReplace statements, I've come to believe they can do just about any pattern matching and replacement ... and that I'm limited only by my own knowledge of the "language". But I've hit a wall on trying to even get a handle on one fairly basic task: I need a way to replace MULTIPLE occurrences of blank lines (single blank lines are fine). A central problem of my particular case is that the blank lines I'm dealing with can be of all different kinds: an LF ... a CRLF ... a blank + LF ... two blanks + LF ... and so on and so on. All that's definite is that the blank lines don't have any letters, numbers or symbols. IOW, there's nothing to "see" on the line. What I've resorted to is this array-based approach ... which works for my current use: Local $finalPass = StringSplit($article, @LF, 2) ; 2 = no count in [0] Local $uMax = UBound($finalPass) - 1 Local $delMode = False For $i = $uMax to 0 Step -1 $finalPass[$i] = StringStripWS($finalPass[$i], 3) If $finalPass[$i] = "" Then If $delMode Then _ArrayDelete($finalPass, $i) Else $delMode = True EndIf Else $delMode = False EndIf Next $article = _ArrayToString($finalPass, @CRLF) But does anyone know of a sure-fire RegEx statement that can get the same result? (It's OK if there's not one. But I'd sure like to know if there is.) Thanks in advance for any help. Link to comment Share on other sites More sharing options...
jchd Posted June 9, 2019 Share Posted June 9, 2019 (edited) Try this: Local $s = @LF & @CRLF & " " & @CR & " aaa" & @LF & "bbb" & @CRLF & " " & @LF & @CRLF & " " & @CR & "ccc" & @LF & "ddd" & @CR & " " & @LF & "eee" & @LF & "fff" & @LF & "ggg" & @LF & @CRLF & " " ConsoleWrite(StringRegExpReplace($s, "^(\h*\R)+|\R\K((?:\h*\R\h*)+)(?=\R)|(\R\h*)*$", "") & @LF) In fact this isn't as easy as it first looks and several ambiguities remain. First line termination can be LF, CR or CRLF, but all three are gracefully dealt with by \R. Then one must specify what "blank" means. Do unbreakable whitespaces count? Can you expect to hit a FF or a VT? What about a number of special Unicode codepoints that may render a blank line? All this can be handled by adding what's needed to match what can be expected. Here I've considered only \h class. That said there are three cases of "unwanted blank lines" and details depend on what you want exactly. First blank lines before any text, at top of subject. I've made the convention that you'd want to remove them all. This is matched by the first part of the alternation: ^(\h*\R)+ But you might consider that a single leading blank line is to be kept verbatim. Then extraneous blank lines after the end of text. Again I made the convention that you'd want to remove anything not "meaningful". This is dealt with by the last part of the alternation: (\R\h*)*$ Finally, we need a pattern to remove multiple blank lines in the subject body. This is done by the middle part of the alternation: \R\K((?:\h*\R\h*)+)(?=\R) Note the use of \K to "forget" what we have just matched. This is because the lookbehind assertions can only span a fixed-length subpattern and neither \R nor \h* are fixed-length. All in all the pattern as it is isn't perfect for all use cases but would probably do what you want in real life. It's quite possible that a simpler pattern can be built, using same or slightly different conventions. Edited June 9, 2019 by jchd This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
iamtheky Posted June 9, 2019 Share Posted June 9, 2019 (edited) not going to pretend to know why any of my regexes do anything, but this appears to remove all the duplicate 'blank' lines, but if there are spaces it keeps them and places them before the next non-space value. So it does the thing, but is also kind of quirky about it #include<array.au3> Local $s = @LF & @CRLF & " " & @CR & " aaa" & @LF & "bbb" & @CRLF & " " & @LF & @CRLF & " " & @CR & "ccc" & @LF & "ddd" & @CR & " " & @LF & "eee" & @LF & "fff" & @LF & "ggg" & @LF & @CRLF & " " _ArrayDisplay(StringRegExp($s , "(\s+(?:\S+?)+)" , 3)) Edited June 9, 2019 by iamtheky ,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-. |(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/ (_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_) | | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) ( | | | | |)| | \ / | | | | | |)| | `--. | |) \ | | `-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_| '-' '-' (__) (__) (_) (__) Link to comment Share on other sites More sharing options...
jchd Posted June 9, 2019 Share Posted June 9, 2019 Problem is: that does remove ALL blank lines. Op said: (single blank lines are fine) This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
iamtheky Posted June 9, 2019 Share Posted June 9, 2019 oh, quite right. I was thinking single line feeds...carry on. ,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-. |(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/ (_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_) | | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) ( | | | | |)| | \ / | | | | | |)| | `--. | |) \ | | `-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_| '-' '-' (__) (__) (_) (__) Link to comment Share on other sites More sharing options...
jchd Posted June 9, 2019 Share Posted June 9, 2019 As stated I already distorded this requirement a bit by removing blank lines at begin & end of text, which I suppose is correct. Yet it isn't hard to modify the pattern to allow only one blank line there, if needed. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
qwert Posted June 9, 2019 Author Share Posted June 9, 2019 @jchd: First, let me say thanks. 1) your solution is quite remarkable in what it does and 2) it does fill the need I have. Here are clarifications on the points you've brought up: • do unbreakable white spaces count? ... no, they're aren't in the picture. • can you expect to hit a FF or a VT? ... same: they shouldn't occur. • special Unicode code points that may render a blank line? ... I'm not expecting any. • blank lines before any text, at top of subject ... I prefer that they evaporate. • extraneous blank lines after the end of text ... likewise: they can be dropped. My initial tests show that—when using "" as the replacement string—it does eliminate the extra lines. And with what you've outlined, above, I can experiment with any fringe cases I find — plus, strive to understand the aspects of what you're provided. I'll admit that your solution is beyond anything I was trying. I appreciate what you've provided. Thanks. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now