leuce Posted December 13, 2013 Share Posted December 13, 2013 Hello everyone Please help me think of a solution for processing a flat file in which the values of identical keys must be merged. The file contains two "columns" separated by a tab: Key [tab] Value Previously, when I had to deal with this type of file, the data contained no duplicate keys: A [tab] Text for A B [tab] Text for B C [tab] Text for C D [tab] Text for D E [tab] Text for E But in the file I have to process today, there are some duplicate keys: A [tab] Text for A B [tab] Text for B C [tab] Text for C D [tab] Text for D D [tab] More text for D D [tab] Even more text for D E [tab] Text for E And I need to merge the values of those keys, i.e.: A [tab] Text for A B [tab] Text for B C [tab] Text for C D [tab] Text for D # More text for D # Even more text for D E [tab] Text for E Separating the merged values with e.g. "#" is entirely optional -- I just did it here to make it visually clearer. I'm sure the solution to this must be simple, but I'm stumped. Can anyone give me some ideas about the mechamism that would solve this problem? Thanks Samuel Link to comment Share on other sites More sharing options...
l3ill Posted December 13, 2013 Share Posted December 13, 2013 (edited) Hi leuce, there is probably some cool SRE solutions for this but if I had to do it I would go with the strings and file functions. I have a similar case in my sig called Simple Text Mainpulation. let me know if you have any questions. Bill Edited December 13, 2013 by billo My Contributions... SnippetBrowser NewSciTE PathFinder Text File Manipulation FTP Connection Tester / INI File - Read, Write, Save & Load Example Link to comment Share on other sites More sharing options...
czardas Posted December 13, 2013 Share Posted December 13, 2013 (edited) Well I would handle this with arrays. Use >CSVSplit or some function to produce two arrays, by changing the delimiter to @TAB. Then create a unique array with items from the first column. Loop through and concatenate the strings in the second column as you go. I skipped one or two steps. Finally use _ArrayToCSV to create the TSV. BTW, if your flat files contain double quotes, then _CSVSplit and _ArrayToCSV may not be suitable. Edit On a second thought, the above is not a good way to start. First remove existing duplicates before proceeding. Then create 2D arrays and loop through as before. The general principle is the same. Edited December 13, 2013 by czardas operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
UEZ Posted December 13, 2013 Share Posted December 13, 2013 (edited) Try this:expandcollapse popup#include <Array.au3> Global $aData = StringSplitW(FileRead(@ScriptDir & "\Text.txt"), @TAB) Global $aNew[UBound($aData)][2], $i = 0, $c = 0 Do If $i > UBound($aData) - 2 Then ExitLoop If $aData[$i][0] <> $aData[$i + 1][0] Then $aNew[$c][0] = $aData[$i][0] $aNew[$c][1] &= $aData[$i][1] $c += 1 Else $aNew[$c][0] = $aData[$i][0] $aNew[$c][1] &= $aData[$i][1] & " # " EndIf $i += 1 Until False $i -= 1 If $aData[$i][0] <> $aData[$i + 1][0] Then ;check last 2 entries $i += 1 $aNew[$c][0] = $aData[$i][0] $aNew[$c][1] &= $aData[$i][1] Else $i += 1 $aNew[$c][0] = $aData[$i][0] $aNew[$c][1] &= $aData[$i][1] & " # " EndIf ReDim $aNew[$c + 1][2] _ArrayDisplay($aNew) Global $sNew, $hFile $hFile = FileOpen(@ScriptDir & "\New.txt", 2) ;write array to a new file For $i = 0 To UBound($aNew) - 1 $sNew &= $aNew[$i][0] & @TAB & $aNew[$i][1] & @CRLF Next FileWrite($hFile, $sNew) FileClose($hFile) ConsoleWrite($sNew & @LF) Exit ; #FUNCTION# ======================================================================================================================================== ; Name .................: StringSplitW() ; Description ..........: Splits a string into columns instead of rows as it is done by SplitString(), like a csv file to a 2d array ;-) ; Syntax ...............: StringSplitW($sString, $sDelimiter, $iWidthLen) ; Parameters ...........: $sString - string to split ; $sDelimiter - [optional] the delimter how to split the string ; $iWidthLen - [optional] length of the row (amount of columns - default is 256) ; Return values .......: Success - 2d array ; Error 1 - either $sString or $delimter is not set ; Error 2 - array width exceeded ; Error 3 - error splitting string ; ; Version .............: v0.96 build 2015-01-20 beta ; Author ..............: UEZ ; Modified ............: ; Remarks .............: RegEx take from http://stackoverflow.com/questions/4476812/regular-expressions-how-to-replace-a-character-within-quotes ; Related .............: StringSplit, StringReplace, StringRegExpReplace, StringLen, StringStripCR ; =================================================================================================================================================== Func StringSplitW($sString, $sDelimiter = ";", $sQuotationMark = '"', $sDummy = "¦", $iWidthLen = 256) If $sString = "" Or $sDelimiter = "" Then Return SetError(1, 0, 0) Local $chk, $iWidth, $i, $j, $k, $iLen, $iMax = 1, $iMaxWidth Local $aPos[1], $l = 0 Local $aSplit = StringSplit(StringStripCR($sString), @LF) If @error Then Return SetError(3, 0, 0) Local $aVertical[$aSplit[0]][$iWidthLen], $iDelimiterLen = StringLen($sDelimiter) - 1, $sLine For $k = 1 To $aSplit[0] $iLen = StringLen($aSplit[$k]) If $iLen > 1 Then $sLine = StringRegExpReplace($aSplit[$k], '(?m)\' & $sDelimiter & '(?=[^' & $sQuotationMark & ']*' & $sQuotationMark & '(?:[^' & $sQuotationMark & '\r\n]*' & $sQuotationMark & '[^' & $sQuotationMark & ']*' & $sQuotationMark & ')*[^' & $sQuotationMark & '\r\n]*$)', $sDummy) $chk = StringReplace($sLine, $sDelimiter, $sDelimiter) $iWidth = @extended If $iWidth > $iWidthLen Then Return SetError(2, 0, 0) If $iWidth >= $iMax Then $iMax = $iWidth + 1 Switch $iWidth Case 0 $aVertical[$l][0] = $sLine Case Else Dim $aPos[$iWidth * 2 + 2] $j = 1 $aPos[0] = 1 For $i = 0 To $iWidth - 1 $aPos[$j] = StringInStr($sLine, $sDelimiter, 0, $i + 1) - 1 $aPos[$j + 1] = $aPos[$j] + 2 + $iDelimiterLen $j += 2 Next $aPos[UBound($aPos) - 1] = StringLen($sLine) $j = 0 For $i = 0 To UBound($aPos) - 1 Step 2 $aVertical[$l][$j] = StringMid(StringReplace($sLine, $sDummy, $sDelimiter), $aPos[$i], $aPos[$i + 1] - $aPos[$i] + 1) $j += 1 Next EndSwitch $l += 1 EndIf Next ReDim $aVertical[$l][$iMax] Return $aVertical EndFuncChange the path to the text file. Further the text file must be sorted as shown in the 1st post. Br,UEZ Edited January 19, 2015 by UEZ Please don't send me any personal message and ask for support! I will not reply! Selection of finest graphical examples at Codepen.io The own fart smells best! ✌Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!¯\_(ツ)_/¯ ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ Link to comment Share on other sites More sharing options...
Danp2 Posted December 13, 2013 Share Posted December 13, 2013 Have you read through this >recent thread? Latest Webdriver UDF Release Webdriver Wiki FAQs Link to comment Share on other sites More sharing options...
leuce Posted December 13, 2013 Author Share Posted December 13, 2013 (edited) Create a unique array with items from the first column. Loop through and concatenate the strings in the second column as you go.Wow, that is actually quite simple and it makes perfect sense. Thanks. Have you read through this >recent thread?Thanks, that looks like a nice thread. I bookmarked it and will read it later when I have time to read it slooooooowly. (-:Try this...It works, thanks. Edited December 13, 2013 by leuce Link to comment Share on other sites More sharing options...
Gianni Posted December 13, 2013 Share Posted December 13, 2013 another way with this, there is no need to sort the input text #include <array.au3> #include <file.au3> Global $source, $WorkArray[1][2], $output[1][2], $TempRow, $delim = Chr(9) Global $separator[2] = ["", "#"] _FileReadToArray(".\file.txt", $source) ; <--- change input file here ReDim $WorkArray[$source[0]][2] ; create a new [2D] array with same nr. of lines of file.txt For $i = 1 To $source[0] ; scan all lines of the array $TempRow = StringSplit($source[$i], $delim, 2) ; split the line according to delimiter $WorkArray[$i - 1][0] = $TempRow[0] $WorkArray[$i - 1][1] = $TempRow[1] Next $SingleKeys = _ArrayUnique($WorkArray) ; This array contains single keys ReDim $output[$SingleKeys[0]][2] ; the output array contains one line for each key For $i = 1 To $SingleKeys[0] $SameKey = _ArrayFindAll($WorkArray, $SingleKeys[$i]) ; find duplicates for each key ; and merge duplicates For $ii = 0 To UBound($SameKey) - 1 $output[$i - 1][1] &= $WorkArray[$SameKey[$ii]][1] & $separator[$ii < UBound($SameKey) - 1] Next Next _ArrayDisplay($output) Chimp small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt.... Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now