Jump to content

RegExp for delete duplicate lines


Recommended Posts

I got this code from PhoenixXL at this >post

#include <Array.au3>

Local $File = "First Line" & @CR & "Second Line" & @CRLF & "first line"
Local $aRegEx = StringRegExp( $File, '[^\v]+', 3 )
If @error Then Exit -1 ;No matches

_ArrayDisplay( $aRegEx )
$aRegEx = _ArrayUnique( $aRegEx )
_ArrayDelete( $aRegEx, 0 )
_ArrayDisplay( $aRegEx )

 

What is a correct RegExp pattern to let the code understand that "First Line" is the same (duplicate) with "first line     ", "    first      line", "firstline", "f i r s t l i n e" ?

Merry Christmas, Folks!

Link to comment
Share on other sites

Here are a couple of methods.

#include <Array.au3>

Local $File = "First Line" & @CR & "Second Line" & @CRLF & _
        "first line" & @CRLF & "First Line" & @CRLF & _
        "first line     " & @CRLF & "    first      line" & @CRLF & _
        "firstline" & @CRLF & "  second  line " & @CRLF & "f i r s t l i n e"

Local $aRegEx = StringRegExp($File, '\V+', 3)
If @error Then Exit -1 ;No matches

_ArrayDisplay($aRegEx)
Local $iUbnd = UBound($aRegEx) - 1
For $j = 0 To $iUbnd
    For $i = $iUbnd To $j Step -1
        If $i <> $j And StringCompare(StringStripWS($aRegEx[$i], 8), StringStripWS($aRegEx[$j], 8)) = 0 Then
            _ArrayDelete($aRegEx, $i)
            $iUbnd -= 1
        EndIf
    Next
Next

_ArrayDisplay($aRegEx)

and,

#include <Array.au3>

Local $File = "First Line" & @CR & "Second Line" & @CRLF & _
        "first line" & @CRLF & "First Line" & @CRLF & _
        "first line     " & @CRLF & "    first      line" & @CRLF & _
        "firstline" & @CRLF & "f i r s t l i n e"

Local $aRegEx = StringRegExp($File, '\V+', 3)
If @error Then Exit -1 ;No matches

_ArrayDisplay($aRegEx)
Local $iUbnd = UBound($aRegEx) - 1
Local $iCount = 0, $aNewArray[$iUbnd + 1]
For $j = 0 To $iUbnd ; step -1
    $sPattern = "(?im)(^\h*" & StringRegExpReplace(StringStripWS($aRegEx[$j], 8), "(\V+?)", "\1\\h*") & "(\v*|$))"
    ConsoleWrite($sPattern & @TAB & $aRegEx[$j] & @LF)
    If StringRegExp($File, $sPattern) Then
        $aNewArray[$iCount] = $aRegEx[$j]
        $File = StringRegExpReplace($File, $sPattern, "")
        $iCount += 1
    EndIf
    If $File = "" Then ExitLoop
Next

ReDim $aNewArray[$iCount]
_ArrayDisplay($aNewArray, "Unique")
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...