Sign in to follow this  
Followers 0
leuce

How to merge values of same-key entries in a simple flat file

7 posts in this topic

Hello everyone

Please help me think of a solution for processing a flat file in which the values of identical keys must be merged.

The file contains two "columns" separated by a tab:

Key [tab] Value

Previously, when I had to deal with this type of file, the data contained no duplicate keys:

A [tab] Text for A
B [tab] Text for B
C [tab] Text for C
D [tab] Text for D
E [tab] Text for E

But in the file I have to process today, there are some duplicate keys:

A [tab] Text for A
B [tab] Text for B
C [tab] Text for C
D [tab] Text for D
D [tab] More text for D
D [tab] Even more text for D
E [tab] Text for E

And I need to merge the values of those keys, i.e.:

A [tab] Text for A
B [tab] Text for B
C [tab] Text for C
D [tab] Text for D # More text for D # Even more text for D
E [tab] Text for E

Separating the merged values with e.g. "#" is entirely optional -- I just did it here to make it visually clearer.

I'm sure the solution to this must be simple, but I'm stumped.  Can anyone give me some ideas about the mechamism that would solve this problem?

Thanks

Samuel

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

Hi leuce,

  there is probably some cool SRE solutions for this but if I had to do it I would go with the strings and file functions.

I have a similar case in my sig called Simple Text Mainpulation.

let me know if you have any questions.

Bill

Edited by billo

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

Well I would handle this with arrays. Use >CSVSplit or some function to produce two arrays, by changing the delimiter to @TAB. Then create a unique array with items from the first column. Loop through and concatenate the strings in the second column as you go. I skipped one or two steps. Finally use _ArrayToCSV to create the TSV.

BTW, if your flat files contain double quotes, then _CSVSplit and _ArrayToCSV may not be suitable.

Edit

On a second thought, the above is not a good way to start. First remove existing duplicates before proceeding. Then create 2D arrays and loop through as before. The general principle is the same.

Edited by czardas

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

Try this:

#include <Array.au3>

Global $aData = StringSplitW(FileRead(@ScriptDir & "\Text.txt"), @TAB)
Global $aNew[UBound($aData)][2], $i = 0, $c = 0

Do
    If $i > UBound($aData) - 2 Then ExitLoop
    If $aData[$i][0] <> $aData[$i + 1][0] Then
        $aNew[$c][0] = $aData[$i][0]
        $aNew[$c][1] &= $aData[$i][1]
        $c += 1
    Else
        $aNew[$c][0] = $aData[$i][0]
        $aNew[$c][1] &= $aData[$i][1]  & " # "
    EndIf
    $i += 1
Until False
$i -= 1
If $aData[$i][0] <> $aData[$i + 1][0] Then  ;check last 2 entries
    $i += 1
    $aNew[$c][0] = $aData[$i][0]
    $aNew[$c][1] &= $aData[$i][1]
Else
    $i += 1
    $aNew[$c][0] = $aData[$i][0]
    $aNew[$c][1] &= $aData[$i][1]  & " # "
EndIf
ReDim $aNew[$c + 1][2]
_ArrayDisplay($aNew)

Global $sNew, $hFile
$hFile = FileOpen(@ScriptDir & "\New.txt", 2) ;write array to a new file
For $i = 0 To UBound($aNew) - 1
    $sNew &= $aNew[$i][0] & @TAB & $aNew[$i][1] & @CRLF
Next
FileWrite($hFile, $sNew)
FileClose($hFile)
ConsoleWrite($sNew & @LF)

Exit

; #FUNCTION# ========================================================================================================================================
; Name .................:   StringSplitW()
; Description ..........:   Splits  a string into columns instead of rows as it is done by SplitString(), like a csv file to a 2d array ;-)
; Syntax ...............:   StringSplitW($sString, $sDelimiter, $iWidthLen)
; Parameters ...........:   $sString - string to split
;                           $sDelimiter - [optional] the delimter how to split the string
;                           $iWidthLen - [optional] length of the row (amount of columns - default is 256)
; Return values .......:    Success - 2d array
;                           Error 1 - either $sString or $delimter is not set
;                           Error 2 - array width exceeded
;                           Error 3 - error splitting string
;
; Version .............:    v0.96 build 2015-01-20 beta
; Author ..............:    UEZ
; Modified ............:
; Remarks .............:    RegEx take from http://stackoverflow.com/questions/4476812/regular-expressions-how-to-replace-a-character-within-quotes
; Related .............:    StringSplit, StringReplace, StringRegExpReplace, StringLen, StringStripCR
; ===================================================================================================================================================
Func StringSplitW($sString, $sDelimiter = ";", $sQuotationMark = '"', $sDummy = "¦", $iWidthLen = 256)
    If $sString = "" Or $sDelimiter = "" Then Return SetError(1, 0, 0)
    Local $chk, $iWidth, $i, $j, $k, $iLen, $iMax = 1, $iMaxWidth
    Local $aPos[1], $l = 0
    Local $aSplit =  StringSplit(StringStripCR($sString), @LF)
    If @error Then Return SetError(3, 0, 0)
    Local $aVertical[$aSplit[0]][$iWidthLen], $iDelimiterLen = StringLen($sDelimiter) - 1, $sLine
    For $k = 1 To $aSplit[0]
        $iLen = StringLen($aSplit[$k])
        If $iLen > 1 Then
            $sLine = StringRegExpReplace($aSplit[$k], '(?m)\' & $sDelimiter & '(?=[^' & $sQuotationMark & ']*' & $sQuotationMark & '(?:[^' & $sQuotationMark & '\r\n]*' & $sQuotationMark & '[^' & $sQuotationMark & ']*' & $sQuotationMark & ')*[^' & $sQuotationMark & '\r\n]*$)', $sDummy)
            $chk = StringReplace($sLine, $sDelimiter, $sDelimiter)
            $iWidth = @extended
            If $iWidth > $iWidthLen Then Return SetError(2, 0, 0)
            If $iWidth >= $iMax Then $iMax = $iWidth + 1
            Switch $iWidth
                Case 0
                    $aVertical[$l][0] = $sLine
                Case Else
                    Dim $aPos[$iWidth * 2 + 2]
                    $j = 1
                    $aPos[0] = 1
                    For $i = 0 To $iWidth - 1
                        $aPos[$j] = StringInStr($sLine, $sDelimiter, 0, $i + 1) - 1
                        $aPos[$j + 1] = $aPos[$j] + 2 + $iDelimiterLen
                        $j += 2
                    Next
                    $aPos[UBound($aPos) - 1] = StringLen($sLine)
                    $j = 0
                    For $i = 0 To UBound($aPos) - 1 Step 2
                        $aVertical[$l][$j] = StringMid(StringReplace($sLine, $sDummy, $sDelimiter), $aPos[$i], $aPos[$i + 1] - $aPos[$i] + 1)
                        $j += 1
                    Next
                EndSwitch
                $l += 1
        EndIf
    Next
    ReDim $aVertical[$l][$iMax]
    Return $aVertical
EndFunc

Change the path to the text file. Further the text file must be sorted as shown in the 1st post.

 

Br,

UEZ

Edited by UEZ

Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

Create a unique array with items from the first column. Loop through and concatenate the strings in the second column as you go.

Wow, that is actually quite simple and it makes perfect sense. Thanks.

 

Have you read through this >recent thread?

Thanks, that looks like a nice thread. I bookmarked it and will read it later when I have time to read it slooooooowly. (-:

Try this...

It works, thanks.

Edited by leuce

Share this post


Link to post
Share on other sites

another way
with this, there is no need to sort the input text

#include <array.au3>
#include <file.au3>
Global $source, $WorkArray[1][2], $output[1][2], $TempRow, $delim = Chr(9)
Global $separator[2] = ["", "#"]
_FileReadToArray(".\file.txt", $source) ; <--- change input file here
ReDim $WorkArray[$source[0]][2] ; create a new [2D] array with same nr. of lines of file.txt
For $i = 1 To $source[0] ; scan all lines of the array
    $TempRow = StringSplit($source[$i], $delim, 2) ; split the line according to delimiter
    $WorkArray[$i - 1][0] = $TempRow[0]
    $WorkArray[$i - 1][1] = $TempRow[1]
Next
$SingleKeys = _ArrayUnique($WorkArray) ; This array contains single keys
ReDim $output[$SingleKeys[0]][2] ; the output array contains one line for each key
For $i = 1 To $SingleKeys[0]
    $SameKey = _ArrayFindAll($WorkArray, $SingleKeys[$i]) ; find duplicates for each key
    ; and merge duplicates
    For $ii = 0 To UBound($SameKey) - 1
        $output[$i - 1][1] &= $WorkArray[$SameKey[$ii]][1] & $separator[$ii < UBound($SameKey) - 1]
    Next
Next
_ArrayDisplay($output)

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0