Jump to content

Sorting Text Files/Removing Dupes


Ic3c0ld
 Share

Recommended Posts

Ok I have a file arranged like this.

sdfdsfas 1243123 asdfs 2 asdfsdf
sdfdsfas 234123 asdfs 4 asdfsdf
sdfdsfas 89823 asdfs 1 asdfsdf
sdfdsfas 1943345 asdfs 3 asdfsdf

I Need it to be arranged like this through the use of AutoITScript to be like this

sdfdsfas 89823 asdfs 1 asdfsdf
sdfdsfas 1243123 asdfs 2 asdfsdf
sdfdsfas 1943345 asdfs 3 asdfsdf
sdfdsfas 234123 asdfs 4 asdfsdf

As you can see this is sorted by the numerical value of the third column I can do this task quickly and easily with perl Shown in the line below. This is what i am currenlt using but I want to do it in AutoIT'

perl -ne "BEGIN {$column = 3; }" -e "s/\r?\n//; @F=split / /, $_; push @sort_col, $F[$column]; push @lines, qq~$_\n~; END {print @lines[sort {$sort_col[$a] <=> $sort_col[$b]} 0..$#sort_col]}" SortMe.txt > SortedLog.txt

Secondly I want to know if its possible to remove the duplicates from the file which i am also doing with perl right now. I have searched the forums for this with someone posting code but this code is not what i need nor does it work. When i used it. My program hung. His stores the file into an array. I changed it to write to a file instead and it hung. Its very troublesome to span all these task across multiple files. Thanks for any help you can give me.

Link to comment
Share on other sites

Perl will be more elegant in this task because string management was a core design goal (Practical Extraction and Reporting Language).

I too would recommend an Array because it is one of the few places in AutoIt that you'll find ready-made convenience routines to help you with sorting. You can use StringSplit and _ArraySort.

To get rid of dups you can iterate through the sorted array and compare the _ArrayToString value of the current row to the previous one and toss the dup before you write the results to a file.

Dale

Edited by DaleHohm

Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y

Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Link to comment
Share on other sites

Im sure someone else can do better, but here is something to get you started.

I've adapted one of my routines for your scenario.

It does NOT remove dupes. I will leave that to someone else.

Im assuming the column data is seperated by a single space

It pastes the sorted data into the clipboard, im too tired to adapt it for filewriting.

Some comments included, if you need to know what anything does PM me.

HardCopy

;*** Written with Autoit Beta 3.1.1.80

#include <File.au3>
#include <Array.au3>
Dim $MyArray, $TempArray, $status=0

_filereadtoarray("D:\somedata.txt",$MyArray); Change to Your data file Location / Reads the data straight into an array.

 _Swapit()
        _ArraySort( $MyArray,0,1)          ;Sorts it
 
 _Swapit()
        _ArrayToClip( $MyArray, 1 )     ;Copy to clipboard for pasting wherever. /  


  MsgBox(0,"Completed","Data Saved to Clipboard")
  
Exit
;*** Quit



;*** Swaps Columns, for sorting on required column 
Func _Swapit()
    For $x = 1 to UBound($MyArray)-1
        $TempArray = StringSplit($MyArray[$x]," "); splitting the dataline on the 'SPACE' Delimiter
        _pad($status)
        _ArraySwap( $TempArray[1], $TempArray[4] )  ; Swaps column 4 with column 1 / and back again on second call.
        
        $MyArray[$x] = _ArrayToString($TempArray," ",1)
    Next
        $status = 1
EndFunc


;*** Needed to sort numerics sequentially.
Func _pad($status)
    Local $Pdg = "0000000000"
    If $status = 0 Then
        $TempArray[4] = StringRight($Pdg & $TempArray[4],10)
    Else
        $TempArray[1] = Int($TempArray[1])
    EndIf
EndFunc

Contributions: UDF _DateYearFirstChildren are like Farts, you can just about stand your own.Why am I not a Vegetarian?...Well...my ancestors didn't fight & evolve to the Top of the food chain for me to survive on Salad

Link to comment
Share on other sites

Thankyou hardcopy that was very generous of you and yes and actually the examples i gave you were about identical to my list just not asdfsdfsdf part B). And I do realize that perl is better suited for the job but having people download perl to use this doesn't seem very practical. Although while just reading your replys I had kept doing in perl

Copy /b vlog-2005*.txt MergedLog.txt
perl -ne "BEGIN {$column = 1; }" -e "s/\r?\n//; @F=split /:/, $_; push @sort_col, $F[$column]; push @lines, qq~$_\n~; END {print @lines[sort {$sort_col[$a] <=> $sort_col[$b]} 0..$#sort_col]}" MergedLog.txt > SortedMerge.txt
Dupes.pl SortedMerge.txt
del MergedLog.txt
del SortedMerge.txt

==Dupes.pl====Found On Google==

#!/usr/bin/perl -w
#
# dupes.pl
#
# Removes dupes from wordlists. Each word must use
# its own line.


die "Usage: $0 wordlist" if (@ARGV!=1);
open(OUTPUT, ">FormattedLogs.txt" );
open(INPUT, $ARGV[0] ) || die "wordlist not found\n";

while(<INPUT>){

push(@words, $_)
}
close(INPUT);

foreach $word( @words ) {
@parts = split( "\n", $word );
$dupe = $parts[0];
unless( $seen{$dupe} ) {
print OUTPUT "$dupe\n";
$seen{$dupe} = 1;
}
}

close(OUTPUT);

This Is exactly what I wanted to do which is odd to bad i can download perl in the background which i can do then install it behind their backs which would be kus the window would have to be hidden and all those next next agree and stuff. :o Also note these are also diffrent logs thats why the perl is splitting them at the colon delimiter.

Edited by Ic3c0ld
Link to comment
Share on other sites

Why not compile your PERL code and distribute the binary? See Here

Dale

Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y

Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...