masvil Posted January 17, 2008 Share Posted January 17, 2008 (edited) I'm trying to find duplicate lines in several files (all files in specified dir and subdirs) and store them into a new file adding a reference. Criteria: only if lines begin with speficied string. Example: c:\dir\hello.txt content: hi, wou are you? specifiedstring have a nice trip I really like it c:\dir\goodbye.txt content: specifiedstring you are beautiful specifiedstring what's up I wish you stay well hi, my dear specifiedstring have a nice trip I really like it c:\dir\subdir\adios.txt content: specifiedstring what's up I wish you stay well where are you going? specifiedstring the pen is on the table Processing those files I have to get result.txt containing: "specifiedstring have a nice trip" found in hello.txt and goodbye.txt "specifiedstring what's up I wish you stay well" found in goodbye.txt and adios.txt Any help, please Edited January 17, 2008 by masvil Link to comment Share on other sites More sharing options...
Thatsgreat2345 Posted January 18, 2008 Share Posted January 18, 2008 I'm trying to find duplicate lines in several files (all files in specified dir and subdirs) and store them into a new file adding a reference. Criteria: only if lines begin with speficied string. Example: c:\dir\hello.txt content: hi, wou are you? specifiedstring have a nice trip I really like it c:\dir\goodbye.txt content: specifiedstring you are beautiful specifiedstring what's up I wish you stay well hi, my dear specifiedstring have a nice trip I really like it c:\dir\subdir\adios.txt content: specifiedstring what's up I wish you stay well where are you going? specifiedstring the pen is on the table Processing those files I have to get result.txt containing: "specifiedstring have a nice trip" found in hello.txt and goodbye.txt "specifiedstring what's up I wish you stay well" found in goodbye.txt and adios.txt Any help, please Use the UDF below, then go through each file and get how many lines it is using _FileCountLines, then using a for loop to how every many lines, use FileReadLine and stringinstr to check if the string is there. If it is then you can save the text to like an array using _arrayadd or some sort of text buffer which you can then use later after all files are checked. The UDF below will grab every file, in the directory and subdirectory of every directory THIS UDF WAS PROGRAMMED BY SMOKE_N NOT BY ME expandcollapse popupFunc _FileListToArrayEx($sPath, $sFilter = '*.*', $iFlag = 0, $sExclude = '', $iRecurse = False) If Not FileExists($sPath) Then Return SetError(1, 1, '') If $sFilter = -1 Or $sFilter = Default Then $sFilter = '*.*' If $iFlag = -1 Or $iFlag = Default Then $iFlag = 0 If $sExclude = -1 Or $sExclude = Default Then $sExclude = '' Local $aBadChar[6] = ['\', '/', ':', '>', '<', '|'] $sFilter = StringRegExpReplace($sFilter, '\s*;\s*', ';') If StringRight($sPath, 1) <> '\' Then $sPath &= '\' For $iCC = 0 To 5 If StringInStr($sFilter, $aBadChar[$iCC]) Or _ StringInStr($sExclude, $aBadChar[$iCC]) Then Return SetError(2, 2, '') Next If StringStripWS($sFilter, 8) = '' Then Return SetError(2, 2, '') If Not ($iFlag = 0 Or $iFlag = 1 Or $iFlag = 2) Then Return SetError(3, 3, '') Local $oFSO = ObjCreate("Scripting.FileSystemObject"), $sTFolder $sTFolder = $oFSO.GetSpecialFolder(2) Local $hOutFile = @TempDir & $oFSO.GetTempName If Not StringInStr($sFilter, ';') Then $sFilter &= ';' Local $aSplit = StringSplit(StringStripWS($sFilter, 8), ';'), $sRead, $sHoldSplit For $iCC = 1 To $aSplit[0] If StringStripWS($aSplit[$iCC],8) = '' Then ContinueLoop If StringLeft($aSplit[$iCC], 1) = '.' And _ UBound(StringSplit($aSplit[$iCC], '.')) - 2 = 1 Then $aSplit[$iCC] = '*' & $aSplit[$iCC] $sHoldSplit &= '"' & $sPath & $aSplit[$iCC] & '" ' Next $sHoldSplit = StringTrimRight($sHoldSplit, 1) If $iRecurse Then RunWait(@Comspec & ' /c dir /b /s /a ' & $sHoldSplit & ' > "' & $hOutFile & '"', '', @SW_HIDE) Else RunWait(@ComSpec & ' /c dir /b /a ' & $sHoldSplit & ' /o-e /od > "' & $hOutFile & '"', '', @SW_HIDE) EndIf $sRead &= FileRead($hOutFile) If Not FileExists($hOutFile) Then Return SetError(4, 4, '') FileDelete($hOutFile) If StringStripWS($sRead, 8) = '' Then SetError(4, 4, '') Local $aFSplit = StringSplit(StringTrimRight(StringStripCR($sRead), 1), @LF) Local $sHold For $iCC = 1 To $aFSplit[0] If $sExclude And StringLeft($aFSplit[$iCC], _ StringLen(StringReplace($sExclude, '*', ''))) = StringReplace($sExclude, '*', '') Then ContinueLoop Switch $iFlag Case 0 If StringRegExp($aFSplit[$iCC], '\w:\\') = 0 Then $sHold &= $sPath & $aFSplit[$iCC] & Chr(1) Else $sHold &= $aFSplit[$iCC] & Chr(1) EndIf Case 1 If StringInStr(FileGetAttrib($sPath & '\' & $aFSplit[$iCC]), 'd') = 0 And _ StringInStr(FileGetAttrib($aFSplit[$iCC]), 'd') = 0 Then If StringRegExp($aFSplit[$iCC], '\w:\\') = 0 Then $sHold &= $sPath & $aFSplit[$iCC] & Chr(1) Else $sHold &= $aFSplit[$iCC] & Chr(1) EndIf EndIf Case 2 If StringInStr(FileGetAttrib($sPath & '\' & $aFSplit[$iCC]), 'd') Or _ StringInStr(FileGetAttrib($aFSplit[$iCC]), 'd') Then If StringRegExp($aFSplit[$iCC], '\w:\\') = 0 Then $sHold &= $sPath & $aFSplit[$iCC] & Chr(1) Else $sHold &= $aFSplit[$iCC] & Chr(1) EndIf EndIf EndSwitch Next If StringTrimRight($sHold, 1) Then Return StringSplit(StringTrimRight($sHold, 1), Chr(1)) Return SetError(4, 4, '') EndFunc Link to comment Share on other sites More sharing options...
AnythinG Posted January 18, 2008 Share Posted January 18, 2008 (edited) well... this does the job ^-^! but... it writes in "log.txt" twice.. i mean... if a line found in "file1" and in "file2" then it writes "the line" is found in "file1" and if "file2" and "the line" is found in "file2" and "file1" but.. anyway... here is it ^-^! expandcollapse popup#include <file.au3> $NumberOfFiles = 3 Dim $file, $temp_file Dim $files[$NumberOfFiles] $files[0] = @ScriptDir & "\file1.TXT" $files[1] = @ScriptDir & "\file2.TXT" $files[2] = @ScriptDir & "\file3.TXT" $thefile = FileOpen(@ScriptDir & "\log.txt",2) For $i = 0 to $NumberOfFiles - 1 If Not _FileReadToArray($files[$i], $file) Then ;error opening file MsgBox(0,"","error!") Else ;the lines in $files[$i] are in $file For $j = 1 to $file[0] For $k = 0 to $NumberOfFiles - 1 If $k = $i Then ;is the same file! Else If not _FileReadToArray($files[$k], $temp_file) Then ;error? MsgBox(0,"","error!") Else For $l = 1 to $temp_file[0] If $temp_file[$l] = $file[$j] Then FileWrite($thefile, $file[$j] & " found in " & $files[$i] & " and " & $files[$k] & @CRLF) EndIf Next EndIf EndIf Next Next EndIf Next FileClose($thefile) MsgBox(0,"","done!") edit: hope this will help u!! ^-^! saludos! =P! Edited January 18, 2008 by AnythinG Link to comment Share on other sites More sharing options...
randallc Posted January 18, 2008 Share Posted January 18, 2008 (edited) Hi, Here's the logic; ; _FileCompareDupes.au3 #include<array.au3> #include<file.au3> Local $arFiles[3] = [@ScriptDir & "\hello.txt", @ScriptDir & "\goodbye.txt", @ScriptDir & "\adios.txt" ] Local $arArrays[3], $sCumul, $sResults = @ScriptDir & "resultDupes", $c = FileDelete($sResults);$arTmp[1], $arTmp2[1], ; get all the files to arrays which only contain matching lines [for speed, change to use RegExp on large fileread files] _FilesReadArrays($arFiles, $arArrays, "specifiedstring") ; loop through 2-file comparison for dupes to Delimited string For $i = 0 To UBound($arFiles) - 2 ;the last file will already have been checked against all others For $j = $i + 1 To UBound($arFiles) - 1 ;start at $i+1 so as not to repeat comparison of any 2 files Local $sStr = @TAB & ":Found in " & $arFiles[$i] & " and " & $arFiles[$j] & @CRLF ;document which files have the dupes ;compare 2 files, return the array of dupe lines Local $ArrayCompare = _ArrayCompare($arArrays[$i], $arArrays[$j], 1, 1) ;If @OSTYPE = "WIN32_WINDOWS" Then Return 0 ;not Win 9x If $ArrayCompare[0] <> "" Then $sCumul &= _ArrayToString($ArrayCompare, $sStr) & $sStr Next Next FileWrite($sResults, $sCumul) Run("notepad " & $sResults) Attached is full scrip with funcs Best, Randall Edited January 18, 2008 by randallc ExcelCOM... AccessCom.. Word2... FileListToArrayNew...SearchMiner... Regexps...SQL...Explorer...Array2D.. _GUIListView...array problem...APITailRW Link to comment Share on other sites More sharing options...
masvil Posted January 19, 2008 Author Share Posted January 19, 2008 (edited) Thanx for your effort, your're saving me from a serious problem at work!@ AnythinG and randallc: your scripts works great! I have to ask you a final help because I'm not good with arrays and, as I have to solve that problem urgently, I have no time to learn about them. I need to check all files in @scriptdir, could you add it please? Edited January 19, 2008 by masvil Link to comment Share on other sites More sharing options...
randallc Posted January 19, 2008 Share Posted January 19, 2008 (edited) Thanx for your effort, your're saving me from a serious problem at work! @ AnythinG and randallc: your scripts works great! I have to ask you a final help because I'm not good with arrays and, as I have to solve that problem urgently, I have no time to learn about them. I need to check all files in @scriptdir, could you add it please?Hi, In mine just change "$arFiles[3] line to; $arFiles=_FileListToArray(@scriptdir) _ArrayDelete($arFiles,0)Best, Randall [PS If you need subfolders, use newer version in my sig links _FileListToArrayNew; ] Edited January 20, 2008 by randallc ExcelCOM... AccessCom.. Word2... FileListToArrayNew...SearchMiner... Regexps...SQL...Explorer...Array2D.. _GUIListView...array problem...APITailRW Link to comment Share on other sites More sharing options...
masvil Posted January 20, 2008 Author Share Posted January 20, 2008 (edited) In mine just change "$arFiles[3] line to; $arFiles=_FileListToArray(@scripdir) _ArrayDelete($arFiles,0)Done, but I get: C:\temp\_FileCompareDupes\_FileCompareDupes.au3 (38) : ==> Array variable subscript badly formatted.: ReDim $arTmp2[$k] ReDim $arTmp2[^ ERROR PS please also change "@scripdir" to "@scriptdir" into your last post. Edited January 20, 2008 by masvil Link to comment Share on other sites More sharing options...
randallc Posted January 20, 2008 Share Posted January 20, 2008 Done, but I get: C:\temp\_FileCompareDupes\_FileCompareDupes.au3 (38) : ==> Array variable subscript badly formatted.: ReDim $arTmp2[$k] ReDim $arTmp2[^ ERROR PS please also change "@scripdir" to "@scriptdir" into your last post.Hi, Yes, some compensating changes would be needed!Local $arFiles=_FileListToArray(@ScriptDir,"*.txt",1) _ArrayDelete($arFiles,0) _ArrayDisplay($arFiles) ;~ Local $arFiles[3] = [@ScriptDir & "\hello.txt", @ScriptDir & "\goodbye.txt", @ScriptDir & "\adios.txt" ] Local $arArrays[UBound($arFiles)], $sCumul, $sResults = @ScriptDir & "resultDupes", $c = FileDelete($sResults);$arTmp[1], $arTmp2[1],oÝ÷ Ù«¢+Ø$%¥ÀÌØí¬±ÐìÐìÀÑ¡¸I¥´ÀÌØíÉQµÀÉlÀÌØíBest, Randall ExcelCOM... AccessCom.. Word2... FileListToArrayNew...SearchMiner... Regexps...SQL...Explorer...Array2D.. _GUIListView...array problem...APITailRW Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now