Sign in to follow this  
Followers 0
Nubie

How to compare text 2 files with large size ?

31 posts in this topic

hi all! I'm new and learning autoit basic. I have this problem, can't compare text with large size file( over 10.000 lines and over 1mb size)

Example: I have File1 and File2

File1 have:

A=10

B=20

C=30

File2 have:

B=20

C=30

A=40

D=50

result to File3 I want

A=40

D=50

I found this way. It work well what I want. But only for smal size files and too little lines. Please tech me how can do it

$Path_In1 = @ScriptDir & 'test_in1.txt'
$Path_In2 = @ScriptDir & 'test_in2.txt'
$Path_Out = @ScriptDir & 'test_Out.txt'
$sText1 = FileRead($Path_In1)
$sText2 = FileRead($Path_In2)

$sText_Out = _Unique_Lines_Text2($sText1, $sText2)
If @error Then
MsgBox(0, 'Error', 'Error = ' & @error)
Exit
Else
$hFile = FileOpen($Path_Out, 2) ; пишем в файл
FileWrite($hFile, $sText_Out)
FileClose($hFile)
EndIf

; @error = 2 - Not found
; @error = 2 - Не найдено
; не учитывает регистр String = StRiNg = STRING
; not case sensitive, String = StRiNg = STRING
Func _Unique_Lines_Text2($sText1, $sText2, $sep = @CRLF)
Local $i, $k, $aText, $s, $Trg = 0, $LenSep

If StringInStr($sText1 & $sText2, '[') And $sep <> '[' Then ; если сбойный символ есть до заменяем его
For $i = 0 To 255
$s = Chr($i)
If Not StringInStr($sText1 & $sText2, $s) Then
If StringInStr($sep, $s) Then ContinueLoop
$sText1 = StringReplace($sText1, '[', $s)
$sText2 = StringReplace($sText2, '[', $s)
$Trg = 1
ExitLoop
EndIf
Next
If Not $Trg Then Return SetError(1, 0, '')
EndIf

$LenSep = StringLen($sep)

$aText = StringSplit($sText1, $sep, 1) ; Создаём переменные первого файла
For $i = 1 To $aText[0]
Assign($aText[$i] & '/', 2, 1)
Next
Assign('/', 2, 1)

$aText = StringSplit($sText2, $sep, 1)

$k = 0
$sText1 = ''
For $i = 1 To $aText[0]
Assign($aText[$i] & '/', Eval($aText[$i] & '/')+1, 1) ; создаём локальные переменные или увеличиваем значение для уже созданных
If Eval($aText[$i] & '/') = 1 Then
$sText1 &= $aText[$i] & $sep
$k += 1
EndIf
Next
If $k = 0 Then Return SetError(2, 0, '')
If $Trg Then $sText1 = StringReplace($sText1, $s, '[')
Return StringTrimRight($sText1, $LenSep)
EndFunc

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

Shouldn't be the result?

A=10

A=40

D=50

#include <Array.au3>
Global $sLines = StringStripCR(StringStripWS(FileRead(@ScriptDir & "File1.txt"), 3)) & @LF &  StringStripCR(StringStripWS(FileRead(@ScriptDir & "File2.txt"), 3))
Global $aLines = StringSplit($sLines, @LF, 2)
_ArraySort($aLines)
Global $i = 0
While $i < UBound($aLines) - 1
    If $aLines[$i] = $aLines[$i + 1] Then
        $aLines[$i] = ""
        $aLines[$i + 1] = ""
    EndIf
    $i += 1
WEnd
$aResult = _ArrayUnique($aLines)
_ArraySort($aResult, 0, 1)
_ArrayDelete($aResult, 1)
$aResult[0] = UBound($aResult) - 1
_ArrayDisplay($aResult)

Br,

UEZ

Edited by UEZ

Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

Thanks UEZ, but your way not really correct, just near correct

with A=10 and A=40 you can understand like A=old and A=new. Then I need result show A=new :)

And the result don't need show all. Because we'll take File2 is the main for compare with File1. If File1 have something like x=1 y=2 z=3 and File2 don't have, don't care it and don't need show result. But if File2 have something like 1=A 2=B 3=C and File1 don't have, need show result to File3. If File2 have string same File1, don't need show result

Edited by Nubie

Share this post


Link to post
Share on other sites

Nubie

1.You can give the file for testing?

2. The number of variables without restrictions, so there should be no mistake.

3. Look at the size of the process while the script

Share this post


Link to post
Share on other sites

Yes, try this :)

http://www.mediafire.com/?ayabg1xie6s6y3w

Noitce: File2 is main for compare, like I said before

Btw, i have try this. it work correct with my example in this topic, but with my files isn't. I don't know why. Please help me :(

$sToMatch = "File1.txt"
$sMatchFrom = "File2.txt"
$output = "File3.txt"
$sOutPut = _myFileReturnInfo(FileRead($sToMatch), FileRead($sMatchFrom))
FileDelete($output)
FileWrite($output,$sOutPut)

;Acutal function
Func _myFileReturnInfo($sFile1, $sFile2)
;Might have your file reads here or whatever
Local $aSplit = StringSplit(StringStripCR($sFile1), @LF);Create file 1 array
;With RegExp, we don't really need a big function
Local $sHoldText
For $i = 1 To $aSplit[0]
If StringRegExp($sFile2, $aSplit[$i]) Then
$sHoldText &= ""
Else
$sHoldText &= $aSplit[$i] & @CRLF
EndIf
Next
Return StringTrimRight($sHoldText, 2);trim off the last carriage return + line feed
EndFunc

Share this post


Link to post
Share on other sites

Yes, but over 10.000 lines it'll say error

Have anything for fix this ?

Share this post


Link to post
Share on other sites

Nubie

I asked not just some sort of file. I need a file that produces the error.

Share this post


Link to post
Share on other sites

If that, it'll miss something for compare

Example in my example here:

If only read Line1 to Line2 in File1 and File2, I'll miss A=40 and C=30 for compare

My bad :(

Share this post


Link to post
Share on other sites

Nubie

I copied the text from your file 5 times and got a 15000 rows. Made a copy of the file and changed 3 lines. No errors. The script returns three rows.

Share this post


Link to post
Share on other sites

Yo I'm here, my nick have limit post then I must create this nick

Sorry AZJIO !!! My fail !!! I have test again, no error now !!! I don't know why I have error before :(

Thanks AZJIO for your support and very quickly :)

Share this post


Link to post
Share on other sites

Nubie1,

my nick have limit post then I must create this nick

And what do the Forum Rules say abotu creating multiple accounts? It is one of the cardinal sins around here. :naughty:

I have lifted the 5 post limit and merged the 2 accounts - please do not do it again. ;)

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites

Yes sir !!!

Share this post


Link to post
Share on other sites

#16 ·  Posted (edited)

OMG !!! I'm going to crazy :(

It came to me again, the ... error :ranting:

Still Old code but now have error, please help me, tech me why :(

Here's files and source code I have used: http://www.mediafire.com/?an1t8sf1y1bv4y5

Wow I found the problem, problem by "-" symbol. Have anyway for fix this ?

counter-strike

Edited by Nubie

Share this post


Link to post
Share on other sites

What exactly is this code supposed to be doing?

For $i = 1 To $aText[0]
    Assign($aText[$i] & '/', 2, 1)
Next
Assign('/', 2, 1)

Because Assign doesn't work the way I think you want it to work in this. Especially the last Assign, that's not going to create anything.


If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Share this post


Link to post
Share on other sites

Function by AZJIO is hard for my little brain can understand :(

But I need it for my work checksum MD5 and a project Questions/Answers

Except this problem :mad2:

Share this post


Link to post
Share on other sites

Nubie

"Error = 2" - This is not a bug in the script. This means that the second file does not contain unique rows. You have the first file is greater than the second. And contains more rows, including rows that are in the second file.

Or you don't expect such a situation?

Your rule

File1 have:

A=10

B=20

C=30

File2 have:

B=20

C=30

A=40

D=50

result to File3 I want

A=40

D=50

What do you think should be here?

File1 have:

A=10

B=20

C=30

File2 have:

B=20

C=30

result to File3 I want

???????????

The algorithm is simple:

1. First, eliminating the problem of square brackets.

2. Each line of the first file become variable and assigns it a value of 1

3. Variables are created using lines from the second file

4. To a variable value 1 is added.

5. If the variable already existed from the first file, then the value is greater than 1.

6. Unique variables contain the value 1 and added to the list.

7. Not unique variables contain a value greater than 1 and not added to the list.

8. If there is not one unique variable then the counter is set to 0. It gives an error "Error = 2"

9. Trigger restores bracket in the results.

Share this post


Link to post
Share on other sites

BrewManNH

What exactly is this code supposed to be doing?

Created a local variable with a value of 1.

'/' - it excludes coincidence to usual variables in function

it eliminates blank lines, which can be regarded as unique.

Assign('/', 2, 1)

Nubie

Try or

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0