way1000 Posted October 27, 2017 Posted October 27, 2017 i have to delete many files with same text content even if the order of text lines is different in each text file so i need some help to create a solution eg: file1.txt: text line a text line c text line b deleted: file2.txt: text line a text line b text line c same text content exactly but different text line order it has to have an option to browse folder and batch delete files with duplicate text content of different line order
Moderators JLogan3o13 Posted October 27, 2017 Moderators Posted October 27, 2017 (edited) file2.txt, with same content but different order stays or is deleted? Sorry, re-read your post. Edited October 27, 2017 by JLogan3o13 "Profanity is the last vestige of the feeble mind. For the man who cannot express himself forcibly through intellect must do so through shock and awe" - Spencer W. Kimball How to get your question answered on this forum!
gruntydatsun Posted October 27, 2017 Posted October 27, 2017 (edited) sure. read both files into their own arrays, array 1 and array 2 check both arrays the same size with ubound join both a1 and a2 with _arrayconcatenate to make a3 cut down new a3 using _ArrayUnique check size of a3 with ubound is the same size as a1 and a2 if its different, they're not the same if its the same they are. Edit: forgot the last bit, delete one of the files Edited October 27, 2017 by gruntydatsun forgot the last part of this guys question
gruntydatsun Posted October 27, 2017 Posted October 27, 2017 slow day here so here's one way: #include <File.au3> #include <array.au3> Dim $aFile1, $aFile2, $aUnique _FileReadToArray(@Scriptdir & "\file1.txt",$aFile1,0) ;read file 1 into zero based array $aFile1 _FileReadToArray(@Scriptdir & "\file2.txt",$aFile2,0) ;read file 2 into zero based array $aFile2 if UBound($aFile1) = UBound($aFile2) Then ;if both arrays have same size _ArrayConcatenate($aFile1,$aFile2) ;merge a2 into a1 $aUnique = _ArrayUnique($aFile1,0,0,0,0) ;generate zero based array of unique lines in A1 EndIf msgbox(1,"RESULT",(UBound($aFile2) = UBound($aUnique)) ? "Files are the same" : "Files are different") for the rest you could get the folder path, get a file listing of that folder into two separate arrays then loop through both in a nested loop checking file 1 against every other file, then file 2 against every other file and so on until you hit the end. For $x = 0 to Ubound($array1)-1 For $y = 0 to Ubound($array2)-1 ;is $array1[$x] a match with $array2[$y] ;if yes store path of file in $array2[$y] in $array3 to delete at the end ;or delete $array2[$y] as you go and check for errors when reading files in Next Next ;delete all the files in $array3
way1000 Posted October 27, 2017 Author Posted October 27, 2017 (edited) now i have this script but it says that files are different even if files are the same #include <File.au3> #include <array.au3> Dim $aFile1, $aFile2, $aUnique, $array1, $array2 _FileReadToArray(@Scriptdir & "\file1.txt",$aFile1,0) ;read file 1 into zero based array $aFile1 _FileReadToArray(@Scriptdir & "\file2.txt",$aFile2,0) ;read file 2 into zero based array $aFile2 if UBound($aFile1) = UBound($aFile2) Then ;if both arrays have same size _ArrayConcatenate($aFile1,$aFile2) ;merge a2 into a1 $aUnique = _ArrayUnique($aFile1,0,0,0,0) ;generate zero based array of unique lines in A1 EndIf msgbox(1,"RESULT",(UBound($aFile2) = UBound($aUnique)) ? "Files are the same" : "Files are different") For $x = 0 to Ubound($array1)-1 For $y = 0 to Ubound($array2)-1 ;is $array1[$x] a match with $array2[$y] ;if yes store path of file in $array2[$y] in $array3 to delete at the end ;or delete $array2[$y] as you go and check for errors when reading files in Next Next ;delete all the files in $array3 how to fix it Edited October 27, 2017 by way1000
gruntydatsun Posted October 27, 2017 Posted October 27, 2017 attach a few sample files that make the problem happen and i'll have a look. you're in luck... i have no life
SlackerAl Posted October 27, 2017 Posted October 27, 2017 (edited) Your problem will most likely be that you have trailing spaces, an additional blank line, different line terminators, incorrect encoding or a control character. You can easily test your logic by replacing the file read with: ;Dim $aFile1, $aFile2, $aUnique, $array1, $array2 Dim $aUnique, $array1, $array2 ;_FileReadToArray(@Scriptdir & "\file1.txt",$aFile1,0) ;read file 1 into zero based array $aFile1 ;_FileReadToArray(@Scriptdir & "\file2.txt",$aFile2,0) ;read file 2 into zero based array $aFile2 Local $aFile1[3] = ["test1", "test2", "test3"] Local $aFile2[3] = ["test3", "test2", "test1"] Take small steps with the above, manually changing the arrays until you find your data error. You probably need to consider pre-processing your files to prevent such problems (e.g. strip trailing spaces etc.). Edit: Sorry Grunty, I thought you might have gone to sleep :-), I'll get out of the way. Edited October 27, 2017 by SlackerAl Problem solving step 1: Write a simple, self-contained, running, replicator of your problem.
gruntydatsun Posted October 27, 2017 Posted October 27, 2017 That sounds like good advice SlackAl. Find out why the match is failing then write something to clean up the input files or the process that generates them to deal with that. Load your input files up in notepad++ and turn on show all characters. It'll be pretty obvious whats different.
way1000 Posted October 27, 2017 Author Posted October 27, 2017 (edited) 2 hours ago, gruntydatsun said: attach a few sample files that make the problem happen and i'll have a look. you're in luck... i have no life how can i make it run on a folder with thousands of files to compare and delete as duplicates Edited October 27, 2017 by way1000
gruntydatsun Posted October 27, 2017 Posted October 27, 2017 Hi, sorry I meant to attach the two files themselves as attachments to a post on this thread. To do a folder with thousands of files, use FileFolderSelect to let the user pick the path, then get a directory listing of files into two arrays, FileListToArray then loop through both in a nested loop checking file 1 against every other file, then file 2 against every other file and so on until you hit the end. Download Notepad++, install and open those text files in it. Select View > Show Symbol > Show All Characters This will show the non-printing characters that are probably stopping your matching (carriage returns, tabs etc)
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now