Jump to content

How to compare text 2 files with large size ?


Nubie
 Share

Recommended Posts

hi all! I'm new and learning autoit basic. I have this problem, can't compare text with large size file( over 10.000 lines and over 1mb size)

Example: I have File1 and File2

File1 have:

A=10

B=20

C=30

File2 have:

B=20

C=30

A=40

D=50

result to File3 I want

A=40

D=50

I found this way. It work well what I want. But only for smal size files and too little lines. Please tech me how can do it

$Path_In1 = @ScriptDir & 'test_in1.txt'
$Path_In2 = @ScriptDir & 'test_in2.txt'
$Path_Out = @ScriptDir & 'test_Out.txt'
$sText1 = FileRead($Path_In1)
$sText2 = FileRead($Path_In2)

$sText_Out = _Unique_Lines_Text2($sText1, $sText2)
If @error Then
MsgBox(0, 'Error', 'Error = ' & @error)
Exit
Else
$hFile = FileOpen($Path_Out, 2) ; пишем в файл
FileWrite($hFile, $sText_Out)
FileClose($hFile)
EndIf

; @error = 2 - Not found
; @error = 2 - Не найдено
; не учитывает регистр String = StRiNg = STRING
; not case sensitive, String = StRiNg = STRING
Func _Unique_Lines_Text2($sText1, $sText2, $sep = @CRLF)
Local $i, $k, $aText, $s, $Trg = 0, $LenSep

If StringInStr($sText1 & $sText2, '[') And $sep <> '[' Then ; если сбойный символ есть до заменяем его
For $i = 0 To 255
$s = Chr($i)
If Not StringInStr($sText1 & $sText2, $s) Then
If StringInStr($sep, $s) Then ContinueLoop
$sText1 = StringReplace($sText1, '[', $s)
$sText2 = StringReplace($sText2, '[', $s)
$Trg = 1
ExitLoop
EndIf
Next
If Not $Trg Then Return SetError(1, 0, '')
EndIf

$LenSep = StringLen($sep)

$aText = StringSplit($sText1, $sep, 1) ; Создаём переменные первого файла
For $i = 1 To $aText[0]
Assign($aText[$i] & '/', 2, 1)
Next
Assign('/', 2, 1)

$aText = StringSplit($sText2, $sep, 1)

$k = 0
$sText1 = ''
For $i = 1 To $aText[0]
Assign($aText[$i] & '/', Eval($aText[$i] & '/')+1, 1) ; создаём локальные переменные или увеличиваем значение для уже созданных
If Eval($aText[$i] & '/') = 1 Then
$sText1 &= $aText[$i] & $sep
$k += 1
EndIf
Next
If $k = 0 Then Return SetError(2, 0, '')
If $Trg Then $sText1 = StringReplace($sText1, $s, '[')
Return StringTrimRight($sText1, $LenSep)
EndFunc
Link to comment
Share on other sites

Shouldn't be the result?

A=10

A=40

D=50

#include <Array.au3>
Global $sLines = StringStripCR(StringStripWS(FileRead(@ScriptDir & "File1.txt"), 3)) & @LF &  StringStripCR(StringStripWS(FileRead(@ScriptDir & "File2.txt"), 3))
Global $aLines = StringSplit($sLines, @LF, 2)
_ArraySort($aLines)
Global $i = 0
While $i < UBound($aLines) - 1
    If $aLines[$i] = $aLines[$i + 1] Then
        $aLines[$i] = ""
        $aLines[$i + 1] = ""
    EndIf
    $i += 1
WEnd
$aResult = _ArrayUnique($aLines)
_ArraySort($aResult, 0, 1)
_ArrayDelete($aResult, 1)
$aResult[0] = UBound($aResult) - 1
_ArrayDisplay($aResult)

Br,

UEZ

Edited by UEZ

Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Link to comment
Share on other sites

Thanks UEZ, but your way not really correct, just near correct

with A=10 and A=40 you can understand like A=old and A=new. Then I need result show A=new :)

And the result don't need show all. Because we'll take File2 is the main for compare with File1. If File1 have something like x=1 y=2 z=3 and File2 don't have, don't care it and don't need show result. But if File2 have something like 1=A 2=B 3=C and File1 don't have, need show result to File3. If File2 have string same File1, don't need show result

Edited by Nubie
Link to comment
Share on other sites

Yes, try this :)

http://www.mediafire.com/?ayabg1xie6s6y3w

Noitce: File2 is main for compare, like I said before

Btw, i have try this. it work correct with my example in this topic, but with my files isn't. I don't know why. Please help me :(

$sToMatch = "File1.txt"
$sMatchFrom = "File2.txt"
$output = "File3.txt"
$sOutPut = _myFileReturnInfo(FileRead($sToMatch), FileRead($sMatchFrom))
FileDelete($output)
FileWrite($output,$sOutPut)

;Acutal function
Func _myFileReturnInfo($sFile1, $sFile2)
;Might have your file reads here or whatever
Local $aSplit = StringSplit(StringStripCR($sFile1), @LF);Create file 1 array
;With RegExp, we don't really need a big function
Local $sHoldText
For $i = 1 To $aSplit[0]
If StringRegExp($sFile2, $aSplit[$i]) Then
$sHoldText &= ""
Else
$sHoldText &= $aSplit[$i] & @CRLF
EndIf
Next
Return StringTrimRight($sHoldText, 2);trim off the last carriage return + line feed
EndFunc
Link to comment
Share on other sites

  • Moderators

Nubie1,

my nick have limit post then I must create this nick

And what do the Forum Rules say abotu creating multiple accounts? It is one of the cardinal sins around here. :naughty:

I have lifted the 5 post limit and merged the 2 accounts - please do not do it again. ;)

M23

Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Link to comment
Share on other sites

What exactly is this code supposed to be doing?

For $i = 1 To $aText[0]
    Assign($aText[$i] & '/', 2, 1)
Next
Assign('/', 2, 1)

Because Assign doesn't work the way I think you want it to work in this. Especially the last Assign, that's not going to create anything.

If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Link to comment
Share on other sites

Nubie

"Error = 2" - This is not a bug in the script. This means that the second file does not contain unique rows. You have the first file is greater than the second. And contains more rows, including rows that are in the second file.

Or you don't expect such a situation?

Your rule

File1 have:

A=10

B=20

C=30

File2 have:

B=20

C=30

A=40

D=50

result to File3 I want

A=40

D=50

What do you think should be here?

File1 have:

A=10

B=20

C=30

File2 have:

B=20

C=30

result to File3 I want

???????????

The algorithm is simple:

1. First, eliminating the problem of square brackets.

2. Each line of the first file become variable and assigns it a value of 1

3. Variables are created using lines from the second file

4. To a variable value 1 is added.

5. If the variable already existed from the first file, then the value is greater than 1.

6. Unique variables contain the value 1 and added to the list.

7. Not unique variables contain a value greater than 1 and not added to the list.

8. If there is not one unique variable then the counter is set to 0. It gives an error "Error = 2"

9. Trigger restores bracket in the results.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...