Jump to content

Sorting Text and picking x number of repeated lines


Fr33b0w
 Share

Go to solution Solved by mikell,

Recommended Posts

Hi all. Have been searching and looking on the forum for a while but if someone just put me in right direction i would appreciate it much. What I need to do is to sort lines in certain .txt file and to get all lines that repeat (for example) 3 times, not more not less. Or 4 times, but no more or less.

Content of .txt file:

001

002

003

004

005

005

005

006

007

 

008

008

008

 

 

In this case I should get 005 and 008 extracted in other .txt file. Is there anz UDF that can do it easy because I would like to go more from there. Thanks in advance.

Link to comment
Share on other sites

  • Moderators

Fr33b0w,

Fun little problem: :)

#include <Array.au3>
#include <File.au3>

$aLines = FileReadToArray("Lines.txt")

_ArrayDisplay($aLines, "Lines")

Global $aCounter[1000] ; Enough elements to match all possible values in the array

For $i = 1 To UBound($aLines) - 1
    ; loop through the lines and increase counter element for each number encountered
    ConsoleWrite($aLines[$i] & @CRLF)
    $aCounter[Number($aLines[$i])] += 1
Next

_ArrayDisplay($aCounter, "Counter")

; Now see if any have the required number of matches
$iRequired = 3
$sMatches = ""
For $i = 1 To UBound($aCounter) - 1
    If $aCounter[$i] = $iRequired Then
        ; Add to string
        $sMatches &= $i & "|"
    EndIf
Next

; Convert string to array
$aMatches = StringSplit(StringTrimRight($sMatches, 1), "|")

_ArrayDisplay($aMatches, "Matches")
All clear? ;)

M23

Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Link to comment
Share on other sites

I dont code much but autoit has helped me in a past with many things. I dont go deep in code and I write something once in a while, thats how I end up asking newbish questions. I was hoping not to get yours time Melba because everytime who helps it is You or You among others and my problems are basic while you are an expert in this. Thank You very much I have my guidence now!

Edited:

I am reading it and cant stop lauhing how satisfied I am. I would never manage to write this, and it looks so small but it does big job for me. It will work great! Many many thanks!

Edited by Fr33b0w
Link to comment
Share on other sites

  • Moderators

Fr33b0w,

Glad I could help. :)

M23

Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Link to comment
Share on other sites

For me it works like this:

Global $aLines

_FileReadToArray("Lines.txt", $aLines)



...



_FileWriteFromArray("Lines2.txt", $aMatches, 1)
 

that is ok I guess?
 

And there is a problem... how can I avoid that one?

It writes data in a file but as a number, so I get "3" instead of "003" but I really need it as "003"

Edited by Fr33b0w
Link to comment
Share on other sites

  • Moderators

Fr33b0w,

Nearly - you are using the UDF function and I was using the native one. We end up in the same place if you do this:

_FileReadToArray("Lines.txt", $aLines, 0)
Otherwise you get a count in the [0] element and the end result will be skewed. ;)

M23

Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Link to comment
Share on other sites

Oh Yes, yes I was editing and trying... and made it at the end, but just one more little problem..

 
It writes data in a file but as a number, so I get "3" instead of "003" but I really need it as "003" as a final result. Any easy way to solve that one? Thanks

Link to comment
Share on other sites

  • Moderators

Fr33b0w,

Of course - just use StringFormat: ;)

#include <Constants.au3>

$sNumber = "3"

$sPadded = StringFormat("%03i", $sNumber)

MsgBox($MB_SYSTEMMODAL, "Result", $sNumber & @CRLF & @CRLF & $sPadded)
M23

Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Link to comment
Share on other sites

Well I am trying...  But I cant do it...  Problem is with those strings and arrays.

_FileReadToArray("Lines.txt", $aLines)
_ArrayDisplay($aLines, "Lines")

 

Display here is normal (001, 002 and so on) but at the another part


_ArrayDisplay($aCounter, "Counter")

it strips zeroes in front...  Also i will have somewhere data like:

x1

x2

x3

and so on. It will not work.

I did even try to convert data to letters "o" then at the end to convert them to zeroes (before writing to file) but it wont work that way. It wont work with text data just with numbers. Your brain and knowledge is for me very far away. Stuff I write are very basic and unoptimized so I have a problem solving this one. Hope there is a solution.

maybe example that I gave was to simple...  Can I try with another one?

If i have a data like:

x1

x2

x3

x4

x4

x4

x5

x6

001

002

002

002

003

004

004

004

005

006

 

And I need a strings that repeat 3 times from a text file I should get:

x4

002

004

 

I cant get "x4" as a result from example up above...  Dont know what to change... I did try but it wont work for me.

Link to comment
Share on other sites

  • Moderators

Also i will have somewhere data like:

x1

x2

x3

I really hate it when people move the goalposts half way through the thread. :mad:

Before I do anything else - are there any other different line formats that you expect to find in this file and wish to count? :huh:

Better tell me now - because doing it later is going to bring my participation in this thread to a very sudden stop. ;)

M23

Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Link to comment
Share on other sites

Here is a different route.  Created a function to return an array of all counts of a line.  You can then loop through that return for what you need.

Global Enum $iData_String, $iData_Count, $iData_UBound
; example of something read from a file, via _FileReadToArray
Local $aData[18] = ["x1","x2","x3","x4","x4","x4","x5","x6","001","002","002","002","003","004","004","004","005","006"]

$aReturn = CountLines($aData)

; Loop through and get where count = 3 or 4
For $i = 0 To UBound($aReturn)-1
    If $aReturn[$i][$iData_Count] = 3 Or $aReturn[$i][$iData_Count] = 4 Then
        ; your code
        ConsoleWrite("Line=[" & $aReturn[$i][$iData_String] & "] contains count=[" & $aReturn[$i][$iData_Count] & "]" & @CRLF)
    EndIf
Next
_ArrayDisplay($aReturn)
Exit

Func CountLines($aData)
    ; Get unique values and sorted data
    Local $aDataUnique = _ArrayUnique($aData)
    ; Remove counts
    _ArrayDelete($aDataUnique,0)
    _ArraySort($aData)

    ; Create new array to house counts/data
    Local $aDataWithCounts[UBound($aDataUnique)][$iData_UBound]

    ; Populate the new array
    For $i = 0 To UBound($aDataUnique)-1
        $aDataWithCounts[$i][$iData_String] = $aDataUnique[$i]
    Next

    ; Get counts in sorted, to insert into $aDataWithCounts
    Local $sLast, $iCount = 0
    For $i = 0 to UBound($aData) - 1
        If $aData[$i] == $sLast Then
            $iCount += 1
        Else
            $sLast = $aData[$i]
            $iCount = 1
        EndIf
        $aDataWithCounts[_ArraySearch($aDataWithCounts,$sLast,0,0,0,0,1,0)][$iData_Count] = $iCount
    Next
    If $iCount > 0 Then $aDataWithCounts[_ArraySearch($aDataWithCounts,$sLast)][$iData_Count] = $iCount
    Return $aDataWithCounts
EndFunc

output:

Line=[x4] contains count=[3]
Line=[002] contains count=[3]
Line=[004] contains count=[3]

IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.
Link to comment
Share on other sites

I am really sorry. i thought if it works for one line no matter what line contains it will work for others. I am dealing with all kind of data, at the moment I have data like:

001
003
005
006
007
008
009
010
011
012ab
014
015
016
017
018
021
022
023
024ab
025
026
028
030
031abc
032
033
034
035
036

NED Ibrahim Afellay
NED Nigel De Jong
NED Rafael Van Der Vaart
NED Eljero Elia
NED Georginio Wijnaldum
NED John Heitinga
NED Maarten Stekelenburg
CZE Tomas Sivok
CZE Milan Baros
CZE Jaroslav Plasil
CZE Tomas Hubschman
CZE Zdenek Pospech
CZE Michal Kadlec
ENG Ashley Cole

aa
bb
cc
dd
ee
ff
gg

a
b
c
d
e
f
g

I thought if it work for one line it would work for every other and it will not change data...    In the future it might be some new data, who knows. But in basic it shouldnt change any lines, just data as it is...  As data stay as it is in every line then there shouldn't be a mistake anywhere (I guess)...  I am using it for collecting stickers so I should have one list with all stickers I have and if I would like a list of every sticker that I have 3 times or more for exchanging purposes I would run a script and get that list. There You are very right Melba23, as always, I need to extract data that repeats 3 times or more from a list in a new file but the order must be with carriage return (new line after every data), not with commas or similar. I need to say that detail also if someone write some code that I dont know to change. Yes, You were right Melba23. I need 3 times or more, or 4 times or more...    Sorry for cousing problems...

 

@jdelaney, thank you. I dont need anything complex, just one fixed number of repetition and more. So if data repeats 3 times it is 3 times or more all the way. Some other time I might need stuff that repeats 4 times and more and it will be just 4 times or more. Thank You people for Your time I really appreciate it. I will try to get trough your addition jdelaney, thanks.

So I changed this for me:

$repnumber = 3 ;number of repetitions in file

    If $aReturn[$i][$iData_Count] >= $repnumber Then
 

and I am not sure how to change this:

Local $aData[18] = ["x1","x2","x3","x4","x4","x4","x5","x6","001","002","002","002","003","004","004","004","005","006"]

I did try before to count lines from a file but to add them as number of arrays to be readed I couldnt make it...

$CountLines = _FileCountLines("file.txt")

For example like Melba23 wrote:

Global $aCounter[1000]

So something like Global $aCounter[_FileCountLines("file.txt")], but that willl not work ofcourse...

Edited by Fr33b0w
Link to comment
Share on other sites

  • Solution

A bit simpler

#include <Array.au3>
#include <File.au3>

$n = 3
Local $aLines
_FileReadToArray("Lines.txt", $aLines)
$string = _ArrayToString($aLines)

Local $result[1]
For $i = 1 to $aLines[0] 
   $string = StringReplace($string, $aLines[$i], "")
   If @extended = $n Then _ArrayAdd($result, $aLines[$i])
Next
$result[0] = UBound($result)-1
_ArrayDisplay($result)
x1
x2
x3
x4
x4
x4
x5
x6
001
002
002
002
003
004
004
004
005
006

Edit

works with 3.3.8.1

Edited by mikell
Link to comment
Share on other sites

  • Moderators

Fr33b0w,

This will work for any content on a line:

#include <Array.au3>
#include <File.au3>

$aLines = FileReadToArray("Lines.txt")

Global $aCounter[1000][2] = [[0]] ; Enough elements to match all possible values in the array

For $i = 1 To UBound($aLines) - 1
    ; Loop through the lines
    $sItem = $aLines[$i]
    ; Ignore blanks
    If $sItem Then
        ; Does it exist in the array so far?
        $iIndex = _ArraySearch($aCounter, $sItem, 1, 0, 2)
        If $iIndex >0 Then
            ; If it exists, increase the count
            $aCounter[$iIndex][1] += 1
        Else
            ; It does not, so add it
            $aCounter[0][0] += 1
            $aCounter[$aCounter[0][0]][0] = $sItem ; This is the item value
            $aCounter[$aCounter[0][0]][1] = 1      ; This is the instance count
        EndIf
    EndIf
Next

_ArrayDisplay($aCounter, "Counter")

; Now see if any items have the required number of matches
$iRequired = 3
$sMatches = ""
For $i = 1 To UBound($aCounter) - 1
    If $aCounter[$i][1] = $iRequired Then
        ; Add to string
        $sMatches &= $aCounter[$i][0] & "|"
    EndIf
Next

; Convert string to array
$aMatches = StringSplit(StringTrimRight($sMatches, 1), "|")

_ArrayDisplay($aMatches, "Matches at " & $iRequired)
I used this file:

x1
x2
x3
x4
x4
x4
x5
x6

001
003
005
006
007
008
009
010
011
012ab
014
015
016
017
018
021
022
023
024ab
025
026
028
030
031abc
032
033
034
035
036

NED Ibrahim Afellay
NED Nigel De Jong
NED Rafael Van Der Vaart
NED Eljero Elia
NED Georginio Wijnaldum
NED John Heitinga
NED Maarten Stekelenburg
CZE Tomas Sivok
CZE Milan Baros
CZE Jaroslav Plasil
CZE Tomas Hubschman
CZE Zdenek Pospech
CZE Michal Kadlec
ENG Ashley Cole

aa
bb
cc
dd
ee
ff
gg

a
b
c
d
e
f
g
and it returned "x4" - which is what I expected. :)

M23

Edit:

mikell,

Clever. :thumbsup:

Edited by Melba23

Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Link to comment
Share on other sites

Guys big THANK YOU for Your time. This is code I will try to learn on. Mikell version works with 3.3.8.0 also. Its version works I have just tried. Thank You Mikell. Melba23 You always help me. I wish I have 0.01% of what You know. Jdelaney thank You very much also. You guys made my day!

Link to comment
Share on other sites

Here my version (a little bit late):

 

Global $aString[18] = ["x1", "x1", "x2", "x4", "x4", "x4", "x5", "x6", "001", "002", "002", "002", "003", "004", "004", "004", "004", "005"]
;~ Global $aString = StringSplit(StringStripCR(FileRead(@ScriptDir & "\1.txt")), @LF, 2)
ConsoleWrite(FindDup($aString) & @LF)

Func FindDup($aArray, $iMinDuplicates = 3)
    Local $i, $z = 1, $sResult
    For $i = 1 To UBound($aArray) - 1
        If $aArray[$i - 1] = $aArray[$i] Then
            $z += 1
            If $z = $iMinDuplicates Then $sResult &= $aArray[$i] & @CRLF
        Else
            $z = 1
        EndIf
    Next
    Return $sResult
EndFunc

Edit: made some modifications.

Br,

UEZ

Edited by UEZ

Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...