Jump to content

< Solved > Urgent help needed please - Compare files


PCI
 Share

Recommended Posts

Hi everyone , hope some masters and MVP could help me on this.

I have 2 files to compare file1.txt and file2.txt

Both files have like 20000 lines and it's hard for me to go through them line by line.

Here's an example of the lines :

212121212121ýxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

313131313131ýxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

I need to read the first string 212121212121 before ý from file1.txt and compare it to the same string on file2.txt then if anything on the same line from both files is different then copy the whole line in another file result.txt

I'm really sorry i got stuck on this as i could figure out if i should use FileRead/_FileReadToArray or FileReadLine or StringMid

Please help me at least how to start my script

Thank you so much

PCI

Edited by PCI
Link to comment
Share on other sites

This is a basic script to do what you want.

I have made the assumption that the 2 files contain the same number of lines and that the lines are in the same order. If this is not the case then something a little more complex will be required.

#include <file.au3>
Global $aArray1 = 0
Global $aArray2 = 0
Global $hDiff = -1
_FileReadToArray("C:File1.txt",$aArray1)
_FileReadToArray("C:File2.txt",$aArray2)
$hDiff = Fileopen("C:Diff.txt",10)
For $i = 1 To UBound($aArray1) - 1
if $aArray1[$i] <> $aArray2[$i] Then
  FileWriteLine($hDiff, "File1:" & $aArray1[$i] & @CRLF &  "File2:" & $aArray2[$i])
endif
Next
FileClose($hDiff)

Edit: Fixed typo

Edited by Bowmore

"Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the universe trying to build bigger and better idiots. So far, the universe is winning."- Rick Cook

Link to comment
Share on other sites

This is a basic script to do what you want.

I have made the assumption that the 2 files contain the same number of lines and that the lines are in the same order. If this is not the case then something a little more complex will be required.

#include <file.au3>
Global $aArray1 = 0
Global $aArray2 = 0
Global $hDiff = -1
_FileReadToArray("C:File1.txt",$aArray1)
_FileReadToArray("C:File2.txt",$aArray2)
$hDiff = Fileopen("C:Diff.txt",10)
For $i = 1 To UBound($aArray1) - 1
if $aArray1[$i] <> $aArray2[$i] Then
  FileWriteLine($hDiff, "File1:" & $aArray1[$i] & @CRLF &  "File2:" & $aArray2[$i])
endif
Next
FileClose($hDiff)

Edit: Fixed typo

Thank you So much Bowmore , the issue i have i know that sometimes i will find the same first string but with difference like :

File1.txt

10757_160491010_0_1ý6.0000ýýNETýITEMýýý2ý1ýTestingýTestingýNýFULLý2012-01-01ý3000-12-31

File2.txt

10757_160491010_0_1ý6.0000ýýNETýITEMýýý2ý1ýTestingýTestingýNýFULLý2012-01-11ý3000-12-31

10757_160491010_0_1ý6.0000ýýNETýITEMýýý2ý1ýTestingýTestingýNýFULLý2012-01-31ý3000-12-31

So it's important not to check by lines but first numbers before the first ý .

Thank you for you valuable time

Link to comment
Share on other sites

Sorry I did not read your first post carefully enough. This version should check all the lines where the first part matches and then write the lines to new file if anything else on the lines from file1 and file2 are different.

#include <file.au3>
Global $aArray1 = 0
Global $aArray2 = 0
Global $hDiff = -1
_FileReadToArray("C:File1.txt",$aArray1)
_FileReadToArray("C:File2.txt",$aArray2)
$hDiff = Fileopen("C:Diff.txt",10)
For $i = 1 To UBound($aArray1) - 1
$sID1 = StringLeft($aArray1[$i],StringInStr($aArray1[$i],"ý",1,1)-1)
For $j = 1 To UBound($aArray2) - 1
  $sID2 = StringLeft($aArray2[$j],StringInStr($aArray2[$j],"ý",1,1)-1)
  if  $sID1 == $sID2 Then ;Check if the first part is the same
   ;then check if anything else on the lines is different
   if $aArray1[$i] <> $aArray2[$j] Then
    FileWriteLine($hDiff, "File1: Line" & $i & ":" & $aArray1[$i] & @CRLF &  "File2: Line" & $j & ":" & $aArray2[$j])
   endif
  EndIf
next
Next
FileClose($hDiff)

"Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the universe trying to build bigger and better idiots. So far, the universe is winning."- Rick Cook

Link to comment
Share on other sites

Sorry I did not read your first post carefully enough. This version should check all the lines where the first part matches and then write the lines to new file if anything else on the lines from file1 and file2 are different.

#include <file.au3>
Global $aArray1 = 0
Global $aArray2 = 0
Global $hDiff = -1
_FileReadToArray("C:File1.txt",$aArray1)
_FileReadToArray("C:File2.txt",$aArray2)
$hDiff = Fileopen("C:Diff.txt",10)
For $i = 1 To UBound($aArray1) - 1
$sID1 = StringLeft($aArray1[$i],StringInStr($aArray1[$i],"ý",1,1)-1)
For $j = 1 To UBound($aArray2) - 1
  $sID2 = StringLeft($aArray2[$j],StringInStr($aArray2[$j],"ý",1,1)-1)
  if  $sID1 == $sID2 Then ;Check if the first part is the same
   ;then check if anything else on the lines is different
   if $aArray1[$i] <> $aArray2[$j] Then
    FileWriteLine($hDiff, "File1: Line" & $i & ":" & $aArray1[$i] & @CRLF &  "File2: Line" & $j & ":" & $aArray2[$j])
   endif
  EndIf
next
Next
FileClose($hDiff)

Thank you Bowmore , i'm currently testing your script , i will post the results very soon as it's taking the time to check the 20000 lines

Thank you so much again

Link to comment
Share on other sites

Thank you Bowmore for the help i appreciate a lot.

Here's what i got after 977 seconds of comparing 13000 Lines

In Diff.txt :

File1: Line571:212121212121ýXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXY

File2: Line415:212121212121ýXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

File1: Line571:212121212121ýXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXY

File2: Line945:212121212121ýXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Question why i have the same line compared twice in the above example ( line 571 ) ?

Is it because of the same string found ?

Edited by PCI
Link to comment
Share on other sites

The reason some lines are compared twice is because the first part of line 571 in file 1 matches the first part of both line 414 and 415 so line 571 in file 1 gets compared with line 414 and 415 in file 2.

PS: If this is somthing you are going to have to do on a regular basis the script can be made considerably faster by sorting the the arrays and walking the index on the second array manually rather than looping through the entire array for each line. You could also add a lttle GUI added for the user to select the input and output files and show progress.

"Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the universe trying to build bigger and better idiots. So far, the universe is winning."- Rick Cook

Link to comment
Share on other sites

This is more of a question than a suggestion. Rather than read partial strings, would it be faster or slower to use _ArraySearch() after sorting both arrays first? This way wouldn't identify partial matches Something like: (untested)

_ArraySort($Array1,0,0,0,0)
_ArraySort($Array2,0,0,0,0)

For $i = 1 To UBound($aArray1) - 1
$match = _ArraySearch($Array2, $Array1[$i])
If @error Then ContinueLoop
FileWriteLine($hDiff, "File1: Line" & $i & ":" & $aArray1[$i] & " File2: Line " & $match)
Next

It could be that this is a horrendous way to do it, as I say I really don't know, but I'd be interested to hear.

[font='Comic Sans MS']Eagles may soar high but weasels dont get sucked into jet engines[/font]

Link to comment
Share on other sites

The reason some lines are compared twice is because the first part of line 571 in file 1 matches the first part of both line 414 and 415 so line 571 in file 1 gets compared with line 414 and 415 in file 2.

PS: If this is somthing you are going to have to do on a regular basis the script can be made considerably faster by sorting the the arrays and walking the index on the second array manually rather than looping through the entire array for each line. You could also add a lttle GUI added for the user to select the input and output files and show progress.

Thank you Bowmore for your inputs

Link to comment
Share on other sites

This is more of a question than a suggestion. Rather than read partial strings, would it be faster or slower to use _ArraySearch() after sorting both arrays first? This way wouldn't identify partial matches Something like: (untested)

_ArraySort($Array1,0,0,0,0)
_ArraySort($Array2,0,0,0,0)

For $i = 1 To UBound($aArray1) - 1
$match = _ArraySearch($Array2, $Array1[$i])
If @error Then ContinueLoop
FileWriteLine($hDiff, "File1: Line" & $i & ":" & $aArray1[$i] & " File2: Line " & $match)
Next

It could be that this is a horrendous way to do it, as I say I really don't know, but I'd be interested to hear.

I will try it and keep you posted.

Thank you so much !

Link to comment
Share on other sites

I got stuck on way to long file compare.

Here's what i'm stuck on :

1- Need to compare lines on fileA with fileB

2- If Line on fileA exist on fileB then do not display it on the log.

3- If Line on fileA exist on fileB but have differences output Difference : the line number and the line

4- if Line on fileA does not exist on fileB output Missing : as the line number and the line content

5- if Line on fileB does not exist on fileA output Missing : as the line number and the line content

Please advise ,

I'm loosing my hair ;)

PCI simple begginer

Link to comment
Share on other sites

Why not use something like WinMerge if it's that urgent.

UDF List:

 
_AdapterConnections()_AlwaysRun()_AppMon()_AppMonEx()_ArrayFilter/_ArrayReduce_BinaryBin()_CheckMsgBox()_CmdLineRaw()_ContextMenu()_ConvertLHWebColor()/_ConvertSHWebColor()_DesktopDimensions()_DisplayPassword()_DotNet_Load()/_DotNet_Unload()_Fibonacci()_FileCompare()_FileCompareContents()_FileNameByHandle()_FilePrefix/SRE()_FindInFile()_GetBackgroundColor()/_SetBackgroundColor()_GetConrolID()_GetCtrlClass()_GetDirectoryFormat()_GetDriveMediaType()_GetFilename()/_GetFilenameExt()_GetHardwareID()_GetIP()_GetIP_Country()_GetOSLanguage()_GetSavedSource()_GetStringSize()_GetSystemPaths()_GetURLImage()_GIFImage()_GoogleWeather()_GUICtrlCreateGroup()_GUICtrlListBox_CreateArray()_GUICtrlListView_CreateArray()_GUICtrlListView_SaveCSV()_GUICtrlListView_SaveHTML()_GUICtrlListView_SaveTxt()_GUICtrlListView_SaveXML()_GUICtrlMenu_Recent()_GUICtrlMenu_SetItemImage()_GUICtrlTreeView_CreateArray()_GUIDisable()_GUIImageList_SetIconFromHandle()_GUIRegisterMsg()_GUISetIcon()_Icon_Clear()/_Icon_Set()_IdleTime()_InetGet()_InetGetGUI()_InetGetProgress()_IPDetails()_IsFileOlder()_IsGUID()_IsHex()_IsPalindrome()_IsRegKey()_IsStringRegExp()_IsSystemDrive()_IsUPX()_IsValidType()_IsWebColor()_Language()_Log()_MicrosoftInternetConnectivity()_MSDNDataType()_PathFull/GetRelative/Split()_PathSplitEx()_PrintFromArray()_ProgressSetMarquee()_ReDim()_RockPaperScissors()/_RockPaperScissorsLizardSpock()_ScrollingCredits_SelfDelete()_SelfRename()_SelfUpdate()_SendTo()_ShellAll()_ShellFile()_ShellFolder()_SingletonHWID()_SingletonPID()_Startup()_StringCompact()_StringIsValid()_StringRegExpMetaCharacters()_StringReplaceWholeWord()_StringStripChars()_Temperature()_TrialPeriod()_UKToUSDate()/_USToUKDate()_WinAPI_Create_CTL_CODE()_WinAPI_CreateGUID()_WMIDateStringToDate()/_DateToWMIDateString()Au3 script parsingAutoIt SearchAutoIt3 PortableAutoIt3WrapperToPragmaAutoItWinGetTitle()/AutoItWinSetTitle()CodingDirToHTML5FileInstallrFileReadLastChars()GeoIP databaseGUI - Only Close ButtonGUI ExamplesGUICtrlDeleteImage()GUICtrlGetBkColor()GUICtrlGetStyle()GUIEventsGUIGetBkColor()Int_Parse() & Int_TryParse()IsISBN()LockFile()Mapping CtrlIDsOOP in AutoItParseHeadersToSciTE()PasswordValidPasteBinPosts Per DayPreExpandProtect GlobalsQueue()Resource UpdateResourcesExSciTE JumpSettings INISHELLHOOKShunting-YardSignature CreatorStack()Stopwatch()StringAddLF()/StringStripLF()StringEOLToCRLF()VSCROLLWM_COPYDATAMore Examples...

Updated: 22/04/2018

Link to comment
Share on other sites

Why not use something like WinMerge if it's that urgent.

Unfortunately , winmerge does not give me the flexibility to adjust the what to compare and what are differences i need to compare in the same line.

For now i was hoping to learn automating any process with Autoit , and have " Hopefully " solid coding skills in the future.

Thank you

Link to comment
Share on other sites

For now i was hoping to learn automating any process with Autoit , and have " Hopefully " solid coding skills in the future.

Have you actually coded anything? The only code I see in this thread has been provided by people other than yourself.

Whenever someone says "pls" because it's shorter than "please", I say "no" because it's shorter than "yes".

Link to comment
Share on other sites

Have you actually coded anything? The only code I see in this thread has been provided by people other than yourself.

Yes i did coded some routines and never worked for me ,,, ;)

Here's my code attached

#include <GUIConstantsEx.au3>
#include <WindowsConstants.au3>
#include <EditConstants.au3>
#include <misc.au3>
#include <file.au3>
;~ #NoTrayIcon
;=================================
Global $aArray1 = 0
Global $aArray2 = 0
Global $hDiff = -1
;=================================
_Singleton(@ScriptName, 0)
Dim $iNumber = 0
$VTC = ""
  If $VTC = "" Then _VTC_GUI()
Func _VTC_GUI()
$VTC_Production_Input=0
$VTC_Integration_Input=0
GuiCreate(" File Compare - Testing",520,128,-1,-1,$WS_BORDER,$WS_EX_ACCEPTFILES)
$VTC_Production_Label=GUICtrlCreateLabel("  Files - PRODUCTION ", 15, 12)
$VTC_Production_Input=GUICtrlCreateInput("", 180, 10, 210, 20)
GUICtrlSetData ($VTC_Production_Input, $VTC)
GUICTRLSetState ( $VTC_Production_Input, $GUI_DROPACCEPTED)
$VTC_Production_Browse_Button=GUICtrlCreateButton("Browse",400,8)
$VTC_Integration_Label=GUICtrlCreateLabel( "  Files - INTEGRATION ", 15, 42)
$VTC_Integration_Input=GUICtrlCreateInput("", 180, 40, 210, 20)
$VTC_Integration_Browse_Button=GUICtrlCreateButton("Browse",400,38)
GUICTRLSetState ( $VTC_Integration_Input, $GUI_DROPACCEPTED)
$Comparefiles=GuiCtrlCreateButton("Compare  Files",110,69)
$ConfigurationExitWithoutSaving=GuiCtrlCreateButton("Exit Without Comparing",260,69)
GUISetState()
While 1
  $msg=GuiGetMsg()
  If $msg=$VTC_Production_Browse_Button Then
   $VTC_Production_Browse_ButtonInput = FileOpenDialog("Select Production  File","", "All (*.Rep;*.Txt)")
   GUICtrlSetData($VTC_Production_Input, $VTC_Production_Browse_ButtonInput, "0")
  EndIf
  If $msg=$VTC_Integration_Browse_Button Then
   $VTC_Integration_Browse_ButtonInput = FileOpenDialog ("Select Integration  File","", "All (*.Rep;*.Txt)")
   GUICtrlSetData($VTC_Integration_Input, $VTC_Integration_Browse_ButtonInput, "0")
  EndIf
  If $msg=$Comparefiles Then
   ProgressOn("Compare Files", "Comparing Files...", "0% Complete", Default, (@DesktopHeight / 2) - (@DesktopHeight / 6) , 10)
   ;==================================================================================================================================
   $Production_VTC_Read = GUICTRLRead($VTC_Production_Input)
   $Integration_VTC_Read = GUICTRLRead($VTC_Integration_Input)
   _FileReadToArray($Production_VTC_Read,$aArray1)
   _FileReadToArray($Integration_VTC_Read,$aArray2)
   $hDiff = Fileopen("C:TempDifferences.txt",10)
   For $i = 1 To UBound($aArray1) - 1
    $iNumber = Round($i / UBound($aArray1) * 100, 2)
    ProgressSet($iNumber, $iNumber & "% Complete")
    If Mod($i, 5) = 0 Then
     $msg = GUIGetMsg()
     If $msg=$ConfigurationExitWithoutSaving Then
      $ExitDialog = MsgBox(36, "Are You Sure?", "Are you sure you want to exit?")
      If $ExitDialog = 6 then Exit
     EndIf
    EndIf
    $sID1 = StringLeft($aArray1[$i],StringInStr($aArray1[$i],"ý",1,12)-1)
      For $j = 1 To UBound($aArray2) - 1
    $sID2 = StringLeft($aArray2[$j],StringInStr($aArray2[$j],"ý",1,12)-1)
      if  $sID1 == $sID2 Then ;Check if the first part is the same
      ;then check if anything else on the lines is different
      if $aArray1[$i] <> $aArray2[$j] Then
       fileWriteLine($hDiff, " File Production  Line #"  & $i & ": " & $aArray1[$i] & @CRLF &  " File Integration Line #"  & $j & ": " & $aArray2[$j] & @CRLF & @CRLF)
      endif
     EndIf
     ;$msg=GuiGetMsg()
    Next
   Next
   ProgressOff()
   FileClose($hDiff)
   run("C:Program FilesIDM Computer SolutionsUltraEdituedit32.exe c:tempDifferences.txt","","")
   ;==================================================================================================================================
  EndIf
      If $msg=$ConfigurationExitWithoutSaving Then
      $ExitDialog = MsgBox(36, "Are You Sure?", "Are you sure you want to exit?")
      If $ExitDialog = 6 then Exit
      EndIf
  Wend
EndFunc
Exit
Link to comment
Share on other sites

Try this:

#include <SQLite.au3>
#include <SQLite.Dll.au3>
#include <Array.au3>
Main()

Func Main()
    ; init SQLite
    _SQLite_Startup()
_SQLite_SafeMode(0)  ; speed up SQLite UDF

    ; create a :memory: DB
    Local $hDB = _SQLite_Open()
    _SQLite_Exec($hDB,  "CREATE TABLE Strings (StrKey CHAR, Source INTEGER, Line integer, StrRest CHAR);")
    Local $dir = @ScriptDir & ""
    Local $file[2] = ["file1.txt", "file2.txt"]
    If @error Then Return
    ; process input files
    Local $txtstr, $strrestpos
    _SQLite_Exec($hDB, "begin;")
    For $i = 0 to 1
        ConsoleWrite("Processing file " & $dir & $file[$i] & @LF)
        _FileReadToArray($dir & $file[$i], $txtstr)
        ; process input lines
        If Not @error Then
            For $j = 1 To $txtstr[0]
    $strrestpos = StringInStr($txtstr[$j], 'ý', 2)
                _SQLite_Exec($hDB, "insert into Strings (Source, Line, StrKey, StrRest) values (" & _
           $i & "," & _
           $j & "," & _
           _SQLite_FastEscape(StringLeft($txtstr[$j], $strrestpos - 1)) & "," & _
           _SQLite_FastEscape(StringMid($txtstr[$j], $strrestpos)) & ");")
            Next
        EndIf
    Next
    _SQLite_Exec($hDB,  "CREATE index ixstrkey on Strings (StrKey collate nocase, Source, Line);")
    _SQLite_Exec($hDB, "commit;")
    ; create log file
    Local $nrows, $ncols, $hlog
    ConsoleWrite("Creating log file" & @LF)
$hlog = FileOpen($dir & "compare.log", 2)
    ; log orphan lines in files
For $i = 0 To 1
  $j = Int($i = 0)
  _SQLite_GetTable($hDB, "select 'Line ' || line || '" & @CRLF & "' || Strkey || strrest from Strings X where Source = " & $i & " and " & _
                                   "not exists (select 1 from Strings Y where Y.Strkey = X.Strkey and Y.Source != X.Source) order by line;", _
                          $txtstr, $nrows, $ncols)
  If $nrows Then
   FileWriteLine($hlog, "Orphan lines in " & $file[$i] & " :")
   _FileWriteFromArray($hlog, $txtstr, 2)
   FileWriteLine($hlog, @CRLF)
  EndIf
Next
; log differences
    _SQLite_GetTable($hDB, "select '" & $file[0] & "  line ' || X.line || '" & @CRLF & "' || X.Strkey || X.strrest || '" & @CRLF & _
                $file[1] & "  line ' || Y.line || '" & @CRLF & "' || Y.Strkey || Y.strrest || '" & @CRLF & "' " & _
         "from Strings X join Strings Y on X.Source = 0 and Y.Source = 1 and x.strkey = y.strkey and " & _
                                    "X.Strrest != Y.Strrest order by X.line;", _
                          $txtstr, $nrows, $ncols)
If $nrows Then
  FileWriteLine($hlog, "Differences :")
  _FileWriteFromArray($hlog, $txtstr, 2)
EndIf
FileClose($hlog)
EndFunc

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Thank you so much for you feedback and help jchd , i'm trying it but i think there's somthing broken on the code correct me if i'm wrong.

Between lines 26 and 33 i get errors.27) : ==> Unknown function name.:

$strrestpos = StringInStr($txtstr[$j], 'ý', 2)
                _SQLite_Exec($hDB, "insert into Strings (Source, Line, StrKey, StrRest) values (" & _
  
           $i & "," & _
           $j & "," & _
           _SQLite_FastEscape(StringLeft($txtstr[$j], $strrestpos - 1)) & "," & _
           _SQLite_FastEscape(StringMid($txtstr[$j], $strrestpos)) & ");")
            Next

Thank you

PCI

Edited by PCI
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...