CiRc2K5

compare two files

22 posts in this topic

Hi I am new to Autoit. What am trying do is get Autoit to open two different files, compare each line by line. What I want to compare is, where the tabs are located in the line.  But sometimes there may be a tab then a label then only one tab before the checksum. So if it finds in the source file that should have 1 tab Label then tab key checksum EOL. Can we put in the Destination file keep the text that is there, tab FIX-ME then tab key checksum then add EOL (CRLF) . Or the other way if it finds that the source file has the two Tabs then checksum EOL and the destination doesn't have the two tabs then checksum keep the text before and add the extra tab to file.

 

Thanks advance

Doug

Share this post


Link to post
Share on other sites



Hi Doug,

In order for people to understand exactly what you are trying to do it is best to provide some example files.

These do not need to be your "live" files, but some files that follow the same format you are expecting in the files.

Perhaps a resulting file also? (i.e. 2 files that are being compared and 1 file that has the changes you want made in the comparison process).

This way we can better make sense of what you're trying to do as otherwise it's a little hard to follow exactly what you're after.

Cheers!

Share this post


Link to post
Share on other sites

Welcome to Autoit and the forum!
And could you please post what you have tried so far? We do not spoon feed users here. We try to answer questions users have when working on their projects.


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

Thanks mpower here is an example of an outdated but correct code

Interface   18  길드 개설       1.00
Interface   19  길     해체    1.00
Interface   20  길드 목록       1.00
Interface   21  길드상세정보      1.00
Interface   22  보낸 사람       1.00
Interface   23  받을 물품       1.00
Interface   24  리더    선택    1.00

Doesn't show but there is a tab not a space after Interface, line number and after text. It can be either be one tabs if there is a tab in the text like there is in line 24 otherwise there is two tabs before the 1.00. 

I would do this by hand and was but this file is 15,000 lines and my eyes go buggy eyed after fixing 100 lines or so. So was hoping for lines 18,24 where it is just missing the tab we add the tab before the 1.00. When the tab is missing in the text like line 19. Keep the text and at end of text insert a Fix-Me or something.  Afterwords  I can search each Fix-Me lines by hand. Figure out beside what word that tab should be placed. "Perhaps a resulting file also? (i.e. 2 files that are being compared and 1 file that has the changes you want made in the comparison process)." Yes think that would be best a new file at the end from the 2 files compared. We can call them A, B and fixed that would be great A being the correct file to compare to.

@water no sorry I haven't tried to code anything was first seeing if this was possible with Autoit? Already have a major task with this file wasn't going to add on that task with trying to code something that may or may not do what I am looking to do. Thanks for the welcome.

 

Edited by CiRc2K5

Share this post


Link to post
Share on other sites

#5 ·  Posted (edited)

So essentially you would like to "align" the data so that it is all equally tab delimited? e.g. Interface<TAB>##<TAB>TEXT<TAB><TAB>#.##?

Edited by mpower

Share this post


Link to post
Share on other sites

but in the rare case where there is a Tab in the text.then would be Interface<TAB>##<TAB>TEXT<TAB>TEXT (FIX ME)<TAB>#.##? 

As that tab in the text is important and having it as <TAB><TAB>#.## causing it to crash. So don't think its only "align" as it would need to read the source file

to see which case it should be. Then would be Aligned as either two <tabs> at end of line or text<tab>text<tab>#.##

Share this post


Link to post
Share on other sites

#7 ·  Posted (edited)

would it be sufficient to strip out all TABs from text and replace them with spaces instead? that way each 'column' of data is separated by a TAB

e.g.:  Interface<TAB>##<TAB>TEXT<space>TEXT (FIX ME)<TAB>#.##?

would you still need (FIX ME) then?

Edited by mpower

Share this post


Link to post
Share on other sites

Could something like this work ?
I used the example from post #4, assuming that the 2 files have the same number of lines and that ## is the line number starting at 1 in the real files

$a1 = FileReadToArray("1.txt")
$a2 = FileReadToArray("2.txt")
Local $errors
For $i = 0 to UBound($a1)-1
   If not ($a1[$i] == $a2[$i]) Then 
     $line1 = StringRegExp($a1[$i], '^Interface\h+\d+\h+(.*?)(\h+)1\.00$', 3)
     $line2 = StringRegExp($a2[$i], '^Interface\h+\d+\h+(.*?)(\h+)1\.00$', 3)
     If not ($line1[0] == $line2[0]) Then $errors &= "text error - "
     If not ($line1[1] == $line2[1]) Then $errors &= "tab error - "
     $errors &= "line " & $i+1 & @crlf
   EndIf
Next
Msgbox(0,"", $errors)

 

Share this post


Link to post
Share on other sites

@mpower if we replace the tab in the text with a space it crashes the interface file.  I Was able to do a search and replace in notepad++ to make all lines in Interface<TAB>##<TAB>TEXT<TAB><TAB>#.##. Even if we just shows the error codes for the line numbers as mikell suggests above that's fine.

@mikell tried running your script I get "C:\Program Files (x86)\AutoIt3\compare.au3" (8) : ==> Subscript used on non-accessible variable.:
If not ($line1[0] == $line2[0]) Then $errors &= "text error - "

Share this post


Link to post
Share on other sites

Hmm without seeing the real text I can't say what makes the regex fail
But if a basic line comparison returning line number is fine then this should be enough

$a1 = FileReadToArray("1.txt")
$a2 = FileReadToArray("2.txt")
Local $errors
For $i = 0 to UBound($a1)-1
   If not ($a1[$i] == $a2[$i]) Then $errors &= "line " & $i+1 & @crlf
Next
Msgbox(0,"errors", $errors)

 

Share this post


Link to post
Share on other sites

Thanks again Mikell but think I have not expressed enough that it can't be a simple compare as source is Korean text (1.txt). The second file (2.txt) is source text that has converted to English.The Korean to English conversion process is where all the tab errors came into the file.  Here is a few lines of code from both files shows you both two Tabs 1.00 and just a single TAB before the 1.00. It needs to be able to to see if the source has two tabs at the end before the 1.00. EG Interface<TAB>##<TAB>TEXT<TAB><TAB>1.00 or if it has a tab in the text and only one tab before the 1.00 Interface<TAB>##<TAB>TEXT<TAB>TEXT<TAB>1.00

Interface<TAB>67<TAB>오아시스<TAB>오아시스<TAB>1.00
Interface<TAB>69<TAB>예지몽<TAB><TAB>1.00
Interface<TAB>70<TAB>예지몽<TAB><TAB>1.00

Interface<TAB>67<TAB>Oasis<TAB>Oasis<TAB>1.00
Interface<TAB>69<TAB>Precognition<TAB><TAB>1.00
Interface<TAB>70<TAB>Precognition<TAB><TAB>1.00

Share this post


Link to post
Share on other sites

If I understand you correctly you only need to compare the interface number and the value at the end, since the text (with possibly tab in it) can't be compared.

Is my assumption correct?


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

"since the text (with possibly tab in it) can't be compared." is that a statement so there be no compare feature to find this out?

If no

No both files have the interface and the line numbers matching already.The problem is the new conversion file doesn't have tabs in the right spot on some of the lines. Which causes the Interface file to crash.  If we compare this line from source file Interface<TAB>67<TAB>오아시스<TAB>오아시스<TAB>1.00 to Interface<TAB>67<TAB>Oasis<TAB>Oasis<TAB>1.00 it would match as all tabs are in the right spot. Now if we compare the same line with a slight adjustment  Interface<TAB>67<TAB>오아시스<TAB>오아시스<TAB>1.00 to Interface<TAB>67<TAB>Oasis Oasis<TAB>1.00 it should fail as there is no <TAB> between the text in the translated file. 

If yes

would it be sufficient to strip out all TABs from text and replace them with spaces instead? that way each 'column' of data is separated by a TAB

e.g.:  Interface<TAB>##<TAB>TEXT<space>TEXT (FIX ME)<TAB>#.##?

would you still need (FIX ME) then?

I would love to try this, before going back to manually editing that massive file.

 

Thanks

Doug

 

 

Share this post


Link to post
Share on other sites

#14 ·  Posted (edited)

This translation thing is new... I suppose now that focusing on tabs is the only way, meaning that when comparing lines there is an error if :
- the number of tabs changes in line from file1 and in the same line from file2
- there is a double tab in line from file1 but not in the same line from file2 (or contrary)
Example
     Interface<TAB>67<TAB>오아시스<TAB>오아시스<TAB>1.00   reference
=> Interface<TAB>67<TAB>Oasis<TAB>Oasis<TAB>1.00   no error
=> Interface<TAB>67<TAB>Oasis Oasis<TAB>1.00   error number of tabs
=> Interface<TAB>67<TAB>Oasis Oasis<TAB><TAB>1.00   error double tab
If this approach is correct then this should work

$a1 = FileReadToArray("1.txt")
$a2 = FileReadToArray("2.txt")
Local $errors
For $i = 0 to UBound($a1)-1
   StringReplace($a1[$i], @TAB, "")
   $n1 = @extended                       ;  counts tabs
   $db1 = StringInStr($a1[$i], @TAB&@TAB)  ; checks if double tab
   StringReplace($a2[$i], @TAB, "")
   $n2 = @extended
   $db2 = StringInStr($a2[$i], @TAB&@TAB)
   If $n1 <> $n2 OR $db1 <> $db2 Then $errors &= "line " & $i+1 & @crlf
Next
Msgbox(0,"errors", $errors)

 

Edited by mikell
thanks guinness for finding the typo

Share this post


Link to post
Share on other sites

You're missing & @mikell.


_AdapterConnections()_AlwaysRun()_AppMon()_AppMonEx()_BinaryBin()_CheckMsgBox()_CmdLineRaw()_ContextMenu()_ConvertLHWebColor()/_ConvertSHWebColor()_DesktopDimensions()_DisplayPassword()_DotNet_Load()/_DotNet_Unload()_Fibonacci()_FileCompare()_FileCompareContents()_FileNameByHandle()_FilePrefix/SRE()_FindInFile()_GetBackgroundColor()/_SetBackgroundColor()_GetConrolID()_GetCtrlClass()_GetDirectoryFormat()_GetDriveMediaType()_GetFilename()/_GetFilenameExt()_GetHardwareID()_GetIP()_GetIP_Country()_GetOSLanguage()_GetSavedSource()_GetStringSize()_GetSystemPaths()_GetURLImage()_GIFImage()_GoogleWeather()_GUICtrlCreateGroup()_GUICtrlListBox_CreateArray()_GUICtrlListView_CreateArray()_GUICtrlListView_SaveCSV()_GUICtrlListView_SaveHTML()_GUICtrlListView_SaveTxt()_GUICtrlListView_SaveXML()_GUICtrlMenu_Recent()_GUICtrlMenu_SetItemImage()_GUICtrlTreeView_CreateArray()_GUIDisable()_GUIImageList_SetIconFromHandle()_GUIRegisterMsg()_GUISetIcon()_Icon_Clear()/_Icon_Set()_IdleTime()_InetGet()_InetGetGUI()_InetGetProgress()_IPDetails()_IsFileOlder()_IsGUID()_IsHex()_IsPalindrome()_IsRegKey()_IsStringRegExp()_IsSystemDrive()_IsUPX()_IsValidType()_IsWebColor()_Language()_Log()_MicrosoftInternetConnectivity()_MSDNDataType()_PathFull/GetRelative/Split()_PathSplitEx()_PrintFromArray()_ProgressSetMarquee()_ReDim()_RockPaperScissors()/_RockPaperScissorsLizardSpock()_ScrollingCredits_SelfDelete()_SelfRename()_SelfUpdate()_SendTo()_ShellAll()_ShellFile()_ShellFolder()_SingletonHWID()_SingletonPID()_Startup()_StringCompact()_StringIsValid()_StringRegExpMetaCharacters()_StringReplaceWholeWord()_StringStripChars()_Temperature()_TrialPeriod()_UKToUSDate()/_USToUKDate()_WinAPI_Create_CTL_CODE()_WinAPI_CreateGUID()_WMIDateStringToDate()/_DateToWMIDateString()Au3 script parsingAutoIt SearchAutoIt3 PortableAutoIt3WrapperToPragmaAutoItWinGetTitle()/AutoItWinSetTitle()CodingDirToHTML5FileInstallrFileReadLastChars()GeoIP databaseGUI - Only Close ButtonGUI ExamplesGUICtrlDeleteImage()GUICtrlGetBkColor()GUICtrlGetStyle()GUIEventsGUIGetBkColor()Int_Parse() & Int_TryParse()IsISBN()LockFile()Mapping CtrlIDsOOP in AutoItParseHeadersToSciTE()PasswordValidPasteBinPosts Per DayPreExpandProtect GlobalsQueue()Resource UpdateResourcesExSciTE JumpSettings INISHELLHOOKShunting-YardSignature CreatorStack()Stopwatch()StringAddLF()/StringStripLF()StringEOLToCRLF()VSCROLLWM_COPYDATAMore Examples...

Updated: 04/09/2015

Share this post


Link to post
Share on other sites

mikell is almost perfect now but 1 small tweak in Tabs is needed it gives me an error for two tabs at end of line with none in the text.

"Example
     Interface<TAB>67<TAB>오아시스<TAB>오아시스<TAB>1.00   reference
=> Interface<TAB>67<TAB>Oasis<TAB>Oasis<TAB>1.00   no error - Perfect 1 in the text Area so 1 before the 1.00
=> Interface<TAB>67<TAB>Oasis Oasis<TAB>1.00   error number of tabs - Perfect None in the text Area so should be two at then end
=> Interface<TAB>67<TAB>Oasis Oasis<TAB><TAB>1.00   error double tab -No error would be correct as no Tab in text
If this approach is correct then this should work"

Share this post


Link to post
Share on other sites

Interface<TAB>67<TAB>오아시스<TAB>오아시스<TAB>1.00   = line in file1
Interface<TAB>67<TAB>Oasis Oasis<TAB><TAB>1.00   = the same line in file 2
The code compares the tabs position and number in the same line (line 67) in the 2 files
If the tab position/number is different in these 2 lines, it returns an error
Is not this the expected result ?

 

Share this post


Link to post
Share on other sites

sorry you are correct in that line 67 would be an error. The routine is successful in finding <TAB> errors when its in the text. But is giving a false error for any line that has <tabs><tabs>1.00. even when they do match. So we are getting much closer.

Share this post


Link to post
Share on other sites

Interface<TAB>67<TAB>오아시스오아시스<TAB><TAB>1.00   = line in file1
Interface<TAB>67<TAB>OasisOasis<TAB><TAB>1.00   = the same line in file 2
This should return no error : the double tab exists in the 2 lines and the global number of tabs in line is the same for the 2 lines
Interface<TAB>67<TAB>Oasis<TAB>Oasis<TAB><TAB>1.00
This returns an error : the double tab exists in the 2 lines but the global number of tabs is not the same

Share this post


Link to post
Share on other sites

#20 ·  Posted (edited)

Sorry Mikell I had to go into the hospital reason for long delay before response. With the main files I get error code I attached. do fact  the source (Korean) had more lines then the 2 (English)  after a patch update.  error.bmp So I manual added the Korean missing lines after running I get past the error. It is claiming all the lines have errors however. Checking line by line the first 10 lines no mismatch between the two files. Decided to do a short test of only 10 line of code for each file. Again it shows all lines have errors  error2.bmp. I have Included files 1 and 2 that I used for my test. If you check the tabs manually there should be 0 errors.

Is there a routine we can add also that compares the line numbers from file 1 to 2. Before hand and saves that as changes.txt most of the changed code is addition but there has been some removed lines as well...So was hoping the changed file could say Add line 1190 the text... for removed lines something like line 1175 removed... Then create a new file 2 to reflect the changes and will match the amount of lines and be checking the right line for the tab feature..

 1.txt

 2.txt

Hope this make sense had to go home on so high end oxycodone and a bit loopy right now. 

Thanks 

Doug

Edited by CiRc2K5

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now