Sign in to follow this  
Followers 0
tkocsir

Comparing lines of files

16 posts in this topic

Hi!

I have to read in some files, in every file search for the same specific line, then read in the nexxt lines until a blank line, while adding the lines to an array. This is a two dimensional array. If a line already appeared earlier (in an other file for example), then is mustn't add to the array, but must add +1 to the second dimension to demonstrate how many times the line was found.

The script works almost perfect, except that it doesn't... I show you the script and then tell what is not working fine.

#include <File.au3>
#include <Array.au3>

Global $files
Global $array[1][2]
Global $i = 0
$array[0][1] = 1

_dir(@ScriptDir)
main()

;ConsoleWrite($i & @TAB & $array[$i - 1][1] & @CRLF)

_ArraySort($array, 0, 0, 0, 2)
For $k = 0 To $i - 1
        ConsoleWrite($array[$k][0] & @TAB & $array[$k][1] & @CRLF)
Next

Func _dir($dir)
    Local $arritem, $item
    $arritem = _FileListToArray($dir, '*')
    If IsArray($arritem) Then
        For $n = 1 To $arritem[0]
            $item = $dir & '\' & $arritem[$n]
            If StringInStr(FileGetAttrib($item), 'D') Then ;This is a folder
                _dir($item) ;Call recursively
            Else ;This is a file
                $files &= $item & '|'
            EndIf
        Next
    EndIf
EndFunc

Func main()
    $files = StringSplit($files, '|')
    For $z = 0 To $files[0]
        $item = $files[$z]
        $fOpen = FileOpen($item)
        If StringRight($item, 4) = '.au3' Then ContinueLoop
        While 1
            $line = FileReadLine($fOpen)
            If @error Then ExitLoop
            If StringInStr($line, '[Program Groups]') = 1 Then
                FileReadLine($fOpen)
                FileReadLine($fOpen) ; skipping two unnecessary lines
                While 2
                    $line2 = FileReadLine($fOpen)
                    If @error Or $line2 = '' Then ExitLoop
                    $add = False
                    For $j = 0 To $i
                        If $line2 = $array[$j][0] Then
                            $array[$j][1] += 1
                        Else
                            ReDim $array[$i + 1][2]
                            $array[$i][0] = $line2
                            $add = True
                            ;$i += 1
                        EndIf
                    Next
                    If $add = True Then $i += 1 ; but this will be True always, because one line is enough to set it True
                WEnd
            EndIf
        WEnd
    Next
EndFunc

I am testing this on 3 files, which are the duplicates of each other. Each contains the same 56 lines after the "[Program Groups]" line.

The problem is that the $add boolean will be always True after the inner loop exists because of the same lines. So $i will be always added 1. After the Main function ends, $i will have the value of 168 instead of 56. (3×56) You can check this be removing the comment character from the consolewrite after main(). The loop after this shows the lines and the number, how many times they appeared (after sorted in decreased order). The output now is something like this:

oneline 3
anotherline 3
someline 3
oneline 2
anotherline 2
someline 2
oneline 1
anotherline 1
someline 1

But I need a result like this:

oneline 3
anotherline 3
someline 3

What is wrong? I know I should change something maybe with the $i += 1 thing, but I don't know... Please help, the script is almost done. :unsure:

Thanks!

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

tkocsir,

You are overcomplicating the issue. You have a count in your array, so use it: :>

#include <File.au3>
#include <Array.au3>

Global $files = "Test.txt|Test.txt|Test.txt" ; Simulate getting fiilenames
Global $array[1][2]
Global $i = 0
$array[0][1] = 0

;_dir(@ScriptDir)

main()

_ArraySort($array, 0, 0, 0, 2)

_ArrayDisplay($array)

Func main()
    $files = StringSplit($files, '|')

    For $z = 1 To $files[0] ; 0 To $files[0] - you know the [0]element is not a filename as you are already using it as a count!
        $item = $files[$z]

        If StringRight($item, 4) = '.au3' Then ContinueLoop

        $fOpen = FileOpen($item) ; Put this after the test or you eat handles

        While 1

            $line = FileReadLine($fOpen)
            If @error Then ExitLoop

            If StringInStr($line, '[Program Groups]') = 1 Then

                FileReadLine($fOpen)
                FileReadLine($fOpen) ; skipping two unnecessary lines

                While 2
                    $line2 = FileReadLine($fOpen)
                    If @error Or $line2 = '' Then ExitLoop 2 ; You need to exit both loops

                    ; See if there is a match
                    For $j = 0 To $array[0][1]
                        If $line2 = $array[$j][0] Then
                            ; if so increase count
                            $array[$j][1] += 1
                            ; And exit the loop
                            ExitLoop
                        EndIf
                    Next
                    ; Now see if we checked all lines
                    If $j > $array[0][1] Then
                        ; If we did then we need to add another line
                        $array[0][1] += 1
                        ReDim $array[$array[0][1] + 1][2]
                        $array[$array[0][1]][0] = $line2
                        $array[$array[0][1]][1] = 1
                    EndIf
                WEnd
            EndIf
        WEnd

        FileClose($fOpen) ; Do not forget to close files

    Next

EndFunc   ;==>main

I hope the comments are clear enough - please ask if not. :unsure:

M23

Edit: I tested using this file format:

[Program Groups]
Unnecessary 1
Unnecessary 2
Line 1
Line 2
Line 3
Line 4
Line 5
Edited by Melba23

Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites

Thank you for your fast and professional reply Melba23!

Maybe my english is not perfect, but I understand every comments of you.:unsure:

The script works great with a test.txt example files, but if I test it on the "real" ones, then the arraydisplay show the files not exactly in decreasing order. I mean, maybe there are more files at the top of the window with more "hits", but the order is not good. For example:

exampleline 3
something 3
justaline 2
helloworld 2
lineline 3
someline 2
itsaline 2
lineee 1
lliinnee 2
weirdline 1

I haven't had time to search the problem yet, but if yo have idea just let me know.:> And thanks again!

Share this post


Link to post
Share on other sites

tkocsir,

Can you post some examples of "real" files so I can test the code with them? :unsure:

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites

tkocsir,

Can you post some examples of "real" files so I can test the code with them? :unsure:

M23

I attached 4 real files. They are in hungarian, but I renamed the hungarian line to "[Program Groups]".

test.zip

Share this post


Link to post
Share on other sites

tkocsir,

That is the script, not the files. :unsure:

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites

tkocsir,

That is the script, not the files. :>

M23

Sorry, this is the right file.:unsure:

test.zip

Share this post


Link to post
Share on other sites

tkocsir,

Change the _ArraySort line to read:

_ArraySort($array, 1, 0, 0, 1)

and all should be well. :>

Let me know if you still have problems. :unsure:

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites

tkocsir,

Change the _ArraySort line to read:

_ArraySort($array, 1, 0, 0, 1)

and all should be well. ;)

Let me know if you still have problems. :unsure:

M23

It works! Thank you very much for your help, you are great! :>

Share this post


Link to post
Share on other sites

tkocsir,

Glad I could help. :unsure:

The problem was that the wrong _ArraySort syntax you had used intially did not matter on the test files I had used. As soon as I used your files it bacame obvious that it was not sorting correctly. :>

So your fault for getting the syntax wrong in the first place and my fault for not checking it! ;)

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites

#11 ·  Posted (edited)

Dear Melba23,

Could please help me a little more?

I tried to sort the result in alphabetical order by using _ArraySort($array, 0, 0, 0, 2), but it doesn't work. Or it works, but only "half"... If I sort the elements with this syntax, they will be sorted in alphabetical order, but in "groups". So the names come in order from a to z, and then it starts again and again. I don't know why.

My goal is to make the script a list, where the elements are sorted by the number of their presence (this is what the script does right now - thanks! :unsure: ), but the elements, which can be found the same time as other elements, would be listed in alphabetical order.

For example:

Now:

orange 49
lime 34
zebra 34
apple 34
fruit 28

My goal:

orange 49
apple 34
lime 34
zebra 34
fruit 28
Edited by tkocsir

Share this post


Link to post
Share on other sites

tkocsir,

You need to look for the rows that have the same value in the second column and then sort them based on the first column. Here is an example: :>

#include <Array.au3>

; Create a mixed-up array
Global $aArray[9][2] = [ _
            ["C", 1], _
            ["A", 1], _
            ["B", 1], _
            ["C", 3], _
            ["B", 2], _
            ["A", 2], _
            ["B", 3], _
            ["A", 3], _
            ["C", 2]]

_ArrayDisplay($aArray, "Raw Data")

; Sort on second column
_ArraySort($aArray, 1, 0, 0, 1)

_ArrayDisplay($aArray, "Initial Sort")

; Now to sort first column within groups

; Set initial values to bound array
$iBegin = 0
$iFinal = UBound($aArray) - 1

; Now we move down the array
While 1

    ; This is the group value we are looking for
    $vValue = $aArray[$iBegin][1]
    ; So look for first row not to match it
    For $iRow = $iBegin To $iFinal
        If $aArray[$iRow][1] <> $vValue Then
            ExitLoop ; No match to exit the loop
        EndIf
    Next

    ; We now point to the final group entry + 1 so move back 1
    $iEnd = $iRow - 1

    ; And we now sort this section on the first column
    _ArraySort($aArray, 0, $iBegin, $iEnd)

    _ArrayDisplay($aArray, "Sorted " & $iBegin & " - " & $iEnd)

    ; Check we are not at end of loop
    If $iRow > $iFinal Then
        ExitLoop ; This is the escape route when we hit the bottom of the array
    EndIf

    ; Reset the Begin value for the next loop
    $iBegin = $iRow

WEnd

_ArrayDisplay($aArray, "Final")

All clear? :unsure:

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites

tkocsir,

You need to look for the rows that have the same value in the second column and then sort them based on the first column. Here is an example: ;)

#include <Array.au3>

; Create a mixed-up array
Global $aArray[9][2] = [ _
            ["C", 1], _
            ["A", 1], _
            ["B", 1], _
            ["C", 3], _
            ["B", 2], _
            ["A", 2], _
            ["B", 3], _
            ["A", 3], _
            ["C", 2]]

_ArrayDisplay($aArray, "Raw Data")

; Sort on second column
_ArraySort($aArray, 1, 0, 0, 1)

_ArrayDisplay($aArray, "Initial Sort")

; Now to sort first column within groups

; Set initial values to bound array
$iBegin = 0
$iFinal = UBound($aArray) - 1

; Now we move down the array
While 1

    ; This is the group value we are looking for
    $vValue = $aArray[$iBegin][1]
    ; So look for first row not to match it
    For $iRow = $iBegin To $iFinal
        If $aArray[$iRow][1] <> $vValue Then
            ExitLoop ; No match to exit the loop
        EndIf
    Next

    ; We now point to the final group entry + 1 so move back 1
    $iEnd = $iRow - 1

    ; And we now sort this section on the first column
    _ArraySort($aArray, 0, $iBegin, $iEnd)

    _ArrayDisplay($aArray, "Sorted " & $iBegin & " - " & $iEnd)

    ; Check we are not at end of loop
    If $iRow > $iFinal Then
        ExitLoop ; This is the escape route when we hit the bottom of the array
    EndIf

    ; Reset the Begin value for the next loop
    $iBegin = $iRow

WEnd

_ArrayDisplay($aArray, "Final")

All clear? :unsure:

M23

The method is clear, but I don't understand why it doesn't want to do what it have to do... When I add this code to my script, then the final result of list is sorted by alphabetical order and not by the numbers... Your example works, but with the file lines it doesn't. I tried to figure out the problem but couldn't.:> (All I modify your code is to comment out the _arraydisplay lines and renaming $aArray to $array to match my code)

Share this post


Link to post
Share on other sites

Probably you store the numbers as strings, thus sorting a little bit different.

Br,

UEZ


Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Share this post


Link to post
Share on other sites

#15 ·  Posted (edited)

Try this:

 

#include <Array.au3>
Global $array[5][2] = [["orange", 49], ["lime", 34], ["zebra", 34], ["apple", 34], ["fruit", 28]]

_ArrayDisplay($array, "Before")

Global $asort[2] = [1, 0]
_ArraySort_MultiColumn($array, $asort, 1)

_ArrayDisplay($array, "After")

; #FUNCTION# =============================================================================
; Name.............:    _ArraySort_MultiColumn
; Description ...:      sorts an array at given colums (multi colum sort)
; Syntax...........:    _ArraySort_MultiColumn(ByRef $aSort, ByRef $aIndices)
; Parameters ...:       $aSort - array to sort
;                       $aIndices - array with colum indices which should be sorted in specified order - zero based
;                       $oDir/$iDir - sort direction - if set to 1, sort descendingly else ascendingly
; Author .........:     UEZ
; Version ........:     v0.70 build 2013-11-20 Beta
; =========================================================================================
Func _ArraySort_MultiColumn(ByRef $aSort, ByRef $aIndices, $oDir = 0, $iDir = 0)
    If Not IsArray($aIndices) Or Not IsArray($aSort) Then Return SetError(1, 0, 0) ;checks if $aIndices is an array
    If UBound($aIndices) > UBound($aSort, 2) Then Return SetError(2, 0, 0) ;check if $aIndices array is greater the $aSort array
    Local $1st, $2nd, $x, $j, $k, $l = 0
    For $x = 0 To UBound($aIndices) - 1 ;check if array content makes sense
        If Not IsInt($aIndices[$x]) Then Return SetError(3, 0, 0) ;array content is not numeric
    Next
    If UBound($aIndices) = 1 Then Return _ArraySort($aSort, $oDir, 0, 0, $aIndices[0]) ;check if only one index is given
    _ArraySort($aSort, $oDir, 0, 0, $aIndices[0])
    Do
        $1st = $aIndices[$l]
        $2nd = $aIndices[$l + 1]
        $j = 0
        $k = 1
        While $k < UBound($aSort)
            If $aSort[$j][$1st] <> $aSort[$k][$1st] Then
                If $k - $j > 1  Then
                    _ArraySort($aSort, $iDir , $j, $k - 1, $2nd)
                    $j = $k
                Else
                    $j = $k
                EndIf
            EndIf
            $k += 1
        WEnd
        If $k - $j > 1 Then _ArraySort($aSort, $iDir, $j, $k, $2nd)
        $l += 1
    Until $l = UBound($aIndices) - 1
    Return 1
EndFunc

Br,

UEZ

Edited by UEZ

Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Share this post


Link to post
Share on other sites

Try this:

#include <Array.au3>
Global $array[5][2] = [["orange", 49], ["lime", 34], ["zebra", 34], ["apple", 34], ["fruit", 28]]

_ArrayDisplay($array, "Before")

Global $asort[2] = [1, 0]
_ArraySort_MultiColumn($array, $asort, 1)

_ArrayDisplay($array, "After")

; #FUNCTION# =============================================================================
; Name.............:    _ArraySort_MultiColumn
; Description ...:  sorts an array at given colums (multi colum sort)
; Syntax...........:    _ArraySort_MultiColumn(ByRef $aSort, ByRef $aIndices)
; Parameters ...:   $aSort - array to sort
;                           $aIndices - array with colum indices which should be sorted in specified order - zero based
;                           $dir - sort direction - if set to 1, sort descendingly
; Author .........: UEZ
; Version ........: v0.60 build 2011-04-19 Beta
; ========================================================================================
Func _ArraySort_MultiColumn(ByRef $aSort, ByRef $aIndices, $oDir = 0, $iDir = 0)
    Local $1st, $2nd
    If Not IsArray($aIndices) Or Not IsArray($aSort) Then Return SetError(1, 0, 0) ;checks if $aIndices is an array
    If UBound($aIndices) > UBound($aSort, 2) Then Return SetError(2, 0, 0) ;check if $aIndices array is greater the $aSort array
    Local $x
    For $x = 0 To UBound($aIndices) - 1 ;check if array content makes sense
        If Not IsInt($aIndices[$x]) Then Return SetError(3, 0, 0) ;array content is not numeric
    Next
    If UBound($aIndices) = 1 Then Return _ArraySort($aSort, $oDir, 0, 0, $aIndices[0]) ;check if only one index is given
    Local $j, $k, $l = 0
    _ArraySort($aSort, $oDir, 0, 0, $aIndices[0])
    Do
        $1st = $aIndices[$l]
        $2nd = $aIndices[$l + 1]
        $j = 0
        $k = 1
        While $k < UBound($aSort)
            If $aSort[$j][$1st] <> $aSort[$k][$1st] Then
                If $k - $j > 1  Then
                    _ArraySort($aSort, $iDir , $j, $k - 1, $2nd)
                    $j = $k
                Else
                    $j = $k
                EndIf
            EndIf
            $k += 1
        WEnd
        If $k - $j > 1 Then _ArraySort($aSort, $oDir, $j, $k, $2nd)
        $l += 1
    Until $l = UBound($aIndices) - 1
    Return 1
EndFunc

Br,

UEZ

Thanks UEZ (and Melba23), now it works great!

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0