Jump to content
myspacee

Oder lines of text inside thousand files

Recommended Posts

myspacee

Hello,
for job i download more than 1000 files with few lines of text inside. Simplify content with this example :

______________________________ file_ordered_in_right_way.txt
Steve loves Apple
My dad is not an Avocado, but a lawyer
What do you call a sad strawberry? A blueberry
Gorilla loves Banana
1000 Blackberry phones
Orange Family - Dutch Passion

I extract some word and use them as reference to order all other files :

___ reference wordlist
Apple
Avocado
Strawberry
Banana
Blackberry
Orange

Now i want to open all files, one by one, and order them using my reference wordlistl. Post some messed example files:
 

______________________________ messed_file1.txt
Gorilla loves Banana    1.568
Steve loves Apple
My dad is not an Avocado, but a lawyer
What do you call a sad strawberry? A blueberry
Orange Family - Dutch Passion
1000 Blackberry phones

______________________________ messed_file2.txt
What do you call a sad strawberry? A blueberry
1000 Blackberry phones
Steve loves Apple
Gorilla loves Banana    2.000
Orange Family - Dutch Passion
My dad is not an Avocado, but a lawyer

______________________________ messed_file3.txt
Gorilla loves Banana    9.853
Steve loves Apple
Orange Family - Dutch Passion
What do you call a sad strawberry? A blueberry
1000 Blackberry phones
My dad is not an Avocado, but a lawyer

How order messed content files, using reference wordlist ?

In DOS i can do this using redirect:

findstr "Apple" "messed_file1.txt" > "ordered_file1.txt"
findstr "Avocado" "messed_file1.txt" >> "ordered_file1.txt"
findstr "Strawberry" "messed_file1.txt" >> "ordered_file1.txt"
findstr "Banana" "messed_file1.txt" >> "ordered_file1.txt"
findstr "Blackberry" "messed_file1.txt" >> "ordered_file1.txt"
findstr "Orange" "messed_file1.txt" >> "ordered_file1.txt"

I've already done a command line Autoit script that allow me to open these big amount of files and do classics thing with text: textreplace, delete, removeHTML, etc.

I wish i can add a sort function to my little text swiss knife.

Thank you for any help,
m.

Share this post


Link to post
Share on other sites
SlackerAl

Read file into a 2D array, in the second column assign an ID based on your keyword, e.g. Apple = 1 , avo = 2, straw = 3 etc... sort the array on that value.  You'll find it faster to read the whole file to an array first rather than reading line by line (if files are large).


Problem solving step 1: Write a simple, self-contained, running, replicator of your problem.

Share this post


Link to post
Share on other sites
myspacee

Hello,
file content are about 20/30 lines.

I like Slacker idea but i've no idea to how write the code. I imagine to create 2 array. One for my reference list,
other with file content. But how to compare (or assign) the right value to a string/array-element.

Using StringInStr() function ?
 

#include <Array.au3>
#include <File.au3>


Global $aArray_file_referencelist = FileReadToArray("referencelist.txt")
Global $aArray_file_to_order = FileReadToArray("test.txt")
 Local $iLineCount = @extended
 If @error Then
     MsgBox($MB_SYSTEMMODAL, "", "There was an error reading the file. @error: " & @error) ; An error occurred reading the current script file.
 Else
     For $i = 0 To $iLineCount - 1 ; Loop through the array. UBound($aArray_file_to_order) can also be used.
         MsgBox($MB_SYSTEMMODAL, "", $aArray_file_to_order[$i]) ; Display the contents of the array.
     Next
 EndIf

 

thank you for your help,
m.

Share this post


Link to post
Share on other sites
Melba23

myspacee,

Boring afternoon, so here is how I would do it:

#include <Array.au3>

Global $aRefWords[] = [6, "Apple", "Avocado", "Strawberry", "Banana", "Blackberry", "Orange"]

; Simulate reading file with FileReadToArray
Global $aLines_1[] = [6, "Gorilla loves Banana    1.568", _
                        "Steve loves Apple", _
                        "My dad is not an Avocado, but a lawyer", _
                        "What do you call a sad strawberry? A blueberry", _
                        "Orange Family - Dutch Passion", _
                        "1000 Blackberry phones"]

; Add another column
_ArrayColInsert($aLines_1, 1)

; And here is the result
;_ArrayDisplay($aLines_1)

; Now look for key words
For $i = 1 To $aRefWords[0]
    For $j = 1 To $aLines_1[0][0]
        ; We check first if the line has already been matched - no point in checking further if it has
        If Not($aLines_1[$j][1]) And StringinStr($aLines_1[$j][0], $aRefWords[$i]) Then
            $aLines_1[$j][1] = $i
            ; And no point in looping further if we have a match
            ExitLoop
        EndIf
    Next
Next

; And here is the result
;_ArrayDisplay($aLines_1)

; Now we sort the array on column 1
_ArraySort($aLines_1, 0, 1, 0, 1)

; And here is the result
;_ArrayDisplay($aLines_1)

; Now remove the additional column and reconvert to 1D
_ArrayColDelete($aLines_1, 1, True)

; And here is the result
_ArrayDisplay($aLines_1)

; Which you can rewrite to the file using FileWriteFromArray

You can uncomment the various _ArrayDisplay lines to see what is happening as the code runs.

M23


Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites
myspacee

Thank you!
try this tomorrow at work.

m.

Share this post


Link to post
Share on other sites
Malkey

Another one for tomorrow.
In this example, if a line does not contain any of the sort words, then that line is ignored, or, deleted from the  modified sorted file.

#include <Array.au3>
#include <File.au3>
Opt("WinTitleMatchMode", -2) ;1=start, 2=subStr, 3=exact, 4=advanced, -1 to -4=Nocase

; ------------------- Create 3 test .txt files from commented data below ----------------
Local $sFiles = StringRegExpReplace(FileRead(@ScriptFullPath), "(?s)^.*?#cs\v+|\v+#ce.*", "")
Local $aFiles = StringRegExp($sFiles, "(?s)_+\h+(\w+.txt)\R(.+?)(?:\R\R|$)", 3)
For $i = 0 To UBound($aFiles) - 1 Step 2
    If FileExists($aFiles[$i]) Then FileDelete($aFiles[$i])
    FileWrite($aFiles[$i], $aFiles[$i + 1])
    ;ShellExecute($aFiles[$i])
Next
; -------------------------- End of Create 3 test .txt files ----------------------------

Local $aRefWords[] = [6, "Apple", "Avocado", "Strawberry", "Banana", "Blackberry", "Orange"]
Local $aFileList = _FileListToArray(@ScriptDir, "messed_file*.txt", 1)
;_ArrayDisplay($aFileList)
For $i = 1 To UBound($aFileList) - 1
    Local $sSortedFile = ""
    For $j = 1 To $aRefWords[0]
        $sSortedFile &= StringRegExpReplace(FileRead($aFileList[$i]), "(?is)^.*?(\V*\b\Q" & $aRefWords[$j] & "\E\b\V*)(?:\v|$).*$", "$1") & @CRLF
    Next
    Local $sModFileName = StringRegExpReplace($aFileList[$i], "(\.txt)", "Mod$1")
    ;ConsoleWrite($sModFileName & @CRLF)
    If FileExists($sModFileName) Then FileDelete($sModFileName)
    FileWrite($sModFileName, $sSortedFile)
    ShellExecute($sModFileName)
Next

; Clean up all added files
Sleep(1000)
$ret = MsgBox(4, "Clean Up", "Press Yes to delete all files that this script added to this script's directory.") ; $MB_YESNO 4 Yes and No
If $ret = 6 Then ; YES $IDYES (6)
    For $i = 1 To UBound($aFileList) - 1
        FileDelete($aFileList[$i])
        WinClose($aFileList[$i], "")
        FileDelete(StringRegExpReplace($aFileList[$i], "(\.txt)", "Mod$1"))
        WinClose(StringRegExpReplace($aFileList[$i], "(\.txt)", "Mod$1"), "")
    Next
EndIf

#cs
______________________________ messed_file1.txt
Gorilla loves Banana    1.568
Steve loves Apple
My dad is not an Avocado, but a lawyer
What do you call a sad strawberry? A blueberry
A line without any ref. words present
Orange Family - Dutch Passion
1000 Blackberry phones

______________________________ messed_file2.txt
What do you call a sad strawberry? A blueberry
1000 Blackberry phones
Steve loves Apple
Gorilla loves Banana    2.000
Orange Family - Dutch Passion
My dad is not an Avocado, but a lawyer

______________________________ messed_file3.txt
Gorilla loves Banana    9.853
Steve loves Apple
Orange Family - Dutch Passion
What do you call a sad strawberry? A blueberry
1000 Blackberry phones
My dad is not an Avocado, but a lawyer
#ceocado, but a lawyer
#ce

 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×