Jump to content

Recommended Posts

Posted

i have to delete many files with same text content even if the order of text lines is different in each text file

so i need some help to create a solution

eg:

file1.txt:

text line a
text line c
text line b

deleted:

file2.txt:

text line a
text line b
text line c

same text content exactly but different text line order


it has to have an option to browse folder and batch delete files with duplicate text content of different line order

Posted (edited)

sure.  read both files into their own arrays, array 1 and array 2

check both arrays the same size with ubound

join both a1 and a2 with _arrayconcatenate to make a3

cut down new a3 using _ArrayUnique

check size of a3 with ubound is the same size as a1 and a2

if its different, they're not the same

if its the same they are.

Edit:  forgot the last bit, delete one of the files

Edited by gruntydatsun
forgot the last part of this guys question
Posted

slow day here so here's one way:
 

#include <File.au3>
#include <array.au3>

Dim $aFile1, $aFile2, $aUnique

_FileReadToArray(@Scriptdir & "\file1.txt",$aFile1,0)   ;read file 1 into zero based array $aFile1
_FileReadToArray(@Scriptdir & "\file2.txt",$aFile2,0)   ;read file 2 into zero based array $aFile2

if UBound($aFile1) = UBound($aFile2) Then               ;if both arrays have same size
    _ArrayConcatenate($aFile1,$aFile2)                  ;merge a2 into a1
    $aUnique = _ArrayUnique($aFile1,0,0,0,0)            ;generate zero based array of unique lines in A1
EndIf

msgbox(1,"RESULT",(UBound($aFile2) = UBound($aUnique)) ? "Files are the same" : "Files are different")

for the rest you could get the folder path, get a file listing of that folder into two separate arrays then loop through both in a nested loop checking file 1 against every other file, then file 2 against every other file and so on until you hit the end.

For $x = 0 to Ubound($array1)-1
    For $y = 0 to Ubound($array2)-1
        ;is $array1[$x] a match with $array2[$y]
        ;if yes store path of file in $array2[$y] in $array3 to delete at the end
        ;or delete $array2[$y] as you go and check for errors when reading files in
    Next
Next

;delete all the files in $array3

 

Posted (edited)

 

now i have this script but it says that files are different even if files are the same

#include <File.au3>
#include <array.au3>

Dim $aFile1, $aFile2, $aUnique, $array1, $array2

_FileReadToArray(@Scriptdir & "\file1.txt",$aFile1,0)   ;read file 1 into zero based array $aFile1
_FileReadToArray(@Scriptdir & "\file2.txt",$aFile2,0)   ;read file 2 into zero based array $aFile2

if UBound($aFile1) = UBound($aFile2) Then               ;if both arrays have same size
    _ArrayConcatenate($aFile1,$aFile2)                  ;merge a2 into a1
    $aUnique = _ArrayUnique($aFile1,0,0,0,0)            ;generate zero based array of unique lines in A1
EndIf

msgbox(1,"RESULT",(UBound($aFile2) = UBound($aUnique)) ? "Files are the same" : "Files are different")


For $x = 0 to Ubound($array1)-1
    For $y = 0 to Ubound($array2)-1
        ;is $array1[$x] a match with $array2[$y]
        ;if yes store path of file in $array2[$y] in $array3 to delete at the end
        ;or delete $array2[$y] as you go and check for errors when reading files in
    Next
Next

;delete all the files in $array3

 how to fix it

Edited by way1000
Posted (edited)

Your problem will most likely be that you have trailing spaces, an additional blank line, different line terminators, incorrect encoding or a control character.

You can easily test your logic by replacing the file read with:

;Dim $aFile1, $aFile2, $aUnique, $array1, $array2
Dim $aUnique, $array1, $array2

;_FileReadToArray(@Scriptdir & "\file1.txt",$aFile1,0)   ;read file 1 into zero based array $aFile1
;_FileReadToArray(@Scriptdir & "\file2.txt",$aFile2,0)   ;read file 2 into zero based array $aFile2

Local $aFile1[3] = ["test1", "test2", "test3"]
Local $aFile2[3] = ["test3", "test2", "test1"]

Take small steps with the above, manually changing the arrays until you find your data error. You probably need to consider pre-processing your files to prevent such problems (e.g. strip trailing spaces etc.).

 

Edit: Sorry Grunty, I thought you might have gone to sleep :-), I'll get out of the way.

Edited by SlackerAl

Problem solving step 1: Write a simple, self-contained, running, replicator of your problem.

Posted

That sounds like good advice SlackAl. 

Find out why the match is failing then write something to clean up the input files or the process that generates them to deal with that. 

Load your input files up in notepad++ and turn on show all characters.  It'll be pretty obvious whats different.

Posted (edited)
2 hours ago, gruntydatsun said:

attach a few sample files that make the problem happen and i'll have a look.   you're in luck... i have no life :)

 

 

how can i make it run on a folder with thousands of files to compare and delete as duplicates

Screenshot_1.png

Screenshot_2.png

Edited by way1000
Posted

Hi, sorry I meant to attach the two files themselves as attachments to a post on this thread.

To do a folder with thousands of files, use FileFolderSelect to let the user pick the path, then get a directory listing of files into two arrays, FileListToArray then loop through both in a nested loop checking file 1 against every other file, then file 2 against every other file and so on until you hit the end.

  • Download Notepad++, install and open those text files in it.
  • Select View > Show Symbol > Show All Characters
  • This will show the non-printing characters that are probably stopping your matching (carriage returns, tabs etc)
     

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...