Jump to content

I just finished a throwtogether script to sort some text files out...


brodie28
 Share

Recommended Posts

#include <Array.au3>
#Include <File.au3>



for $i = 1 to 100

$fname = "passlist" & $i & ".txt"
dim $aArray[9999999]
ConsoleWrite($fname & @CRLF)
_FileReadToArray($fname, $aArray)
$size = $aArray[0]
redim $aArray[$size]
for $x = 1 to $aArray[0]
    ConsoleWrite("file   " & $i & "  " & $x & " from  " & $aArray[0] & @CRLF)
    if StringIsAlNum ($aArray[$x]) = 0 Then
        _ArrayDelete($aArray, $x)
    EndIf
Next
_FileWriteFromArray($fname, $aArray)    
    
Next

Basically I have 100 text files, all with about 95000 lines of text... I want to go through all of these text files and delete every line in the text file that is not alpha numeric only.

This works... But god is it slow. Any ideas on why it is only writing to the console about once a second?

EDIT: It is actually about twice a second... Maybe thats just as fast as it can go?

Edited by brodie28
Link to comment
Share on other sites

Link to comment
Share on other sites

I have to keep redimming the array because Im not sure how to define an array of unknown size in autoit (Im not sure you can).

So I have to redim it for every text file. Anyway, the script gets stuck in the nested for loop, which is where the time is being taken.

SmokeN, if I get rid of consolewrite, do you think it will make a difference? Its probably going to take hours anyway, and I would kind of like to know where its up to. Any way to do that without consolewrite?

Link to comment
Share on other sites

I ran into an error where Arraydelete was rediming the array to a smaller size, so after a while the array would be out of range with the for loop.

So I did this. strangely it is going much much faster this way, seemingly for no reason.

#include <Array.au3>
#Include <File.au3>



for $i = 1 to 100

$fname = "passlist" & $i & ".txt"
dim $aArray[9999999]
ConsoleWrite($fname & @CRLF)
_FileReadToArray($fname, $aArray)
$size = $aArray[0]
redim $aArray[$size]
for $x = 1 to $size
    ConsoleWrite("file   " & $i & "  " & $x & " from  " & $size & @CRLF)
    if StringIsAlNum ($aArray[$x]) = 0 Then
        _ArrayDelete($aArray, $x)
        $size = $size - 1
    EndIf
Next
_FileWriteFromArray($fname, $aArray)    
    
Next
Edited by brodie28
Link to comment
Share on other sites

  • Moderators

I have to keep redimming the array because Im not sure how to define an array of unknown size in autoit (Im not sure you can).

So I have to redim it for every text file. Anyway, the script gets stuck in the nested for loop, which is where the time is being taken.

SmokeN, if I get rid of consolewrite, do you think it will make a difference? Its probably going to take hours anyway, and I would kind of like to know where its up to. Any way to do that without consolewrite?

Well the redim issue is really the issue with _ArrayDelete(). You have to remember, you're redimming 90+ thousand times with a 100 files. That's CRAZY lol.

This is a partial concept (Not tested):

#include <file.au3>
Local $a_array
For $i = 1 To 100
    If FileExists("passlist" & $i & ".txt") Then _FileRemoveNonAlNumLines_Array($a_array, "passlist" & $i & ".txt")
Next

_FileWriteFromArray("passlist_checked_.txt", $a_array, 1)

Func _FileRemoveNonAlNumLines_Array(ByRef $av_array, $s_file)
    Local $a_split_file = StringSplit(StringStripCR(FileRead($s_file)), @LF)
    
    Local $i_add = 0, $a_ret, $i_base, $i_ub
    If IsArray($av_array) = 0 Then
        $a_ret = $a_split_file
        $i_base = 1
    Else
        $a_ret = $av_array
        $i_ub = UBound($a_ret)
        ReDim $a_ret[$i_ub + ($a_split_file[0] + 1)]
        $i_base = $i_ub - 1
        $i_add = $i_base
    EndIf
    
    For $i = $i_base To $a_split_file[0]
        If StringIsAlNum($a_split_file[$i]) Then
            $i_add += 1
            $a_ret[$i_add] = $a_split_file[$i]
        EndIf
    Next
    
    If Not $i_add Then Return
    ReDim $a_ret[$i_add + 1]
    $av_array = $a_ret
    
    Return $av_array
EndFunc

Edit:

This way, instead of redimming it 9 million times you only redim no more than 200 times.

Also, I'd seriously think about writing to a file after every return, and just empty the array. Looping through 9 million elements to write to a file is rediculous. That's more than half of the max allowed elements.

You'd find the speed to increase exponentially from start to finish.

Edited by SmOke_N

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

  • Moderators

I was amazed that _FileWriteFromArray() didn't have an append option!!!!

Anyway, this would probably be faster...

Local $a_array
For $i = 1 To 2
    If FileExists("passlist" & $i & ".txt") Then
        _FileRemoveNonAlNumLines_Array($a_array, "passlist" & $i & ".txt")
        _FileWriteFromArray_Append("passlist_checked_.txt", $a_array, 1)
        $a_array = ""
    EndIf
Next

Func _FileWriteFromArray_Append($s_file, $a_array, $i_base = 0, $i_ubound = 0)
    If IsArray($a_array) = 0 Then Return SetError(1, 0, 0)
    
    If FileExists($s_file) = 0 Then FileClose(FileOpen($s_file, 2))
    
    If $i_ubound = 0 Or $i_ubound = -1 Or $i_ubound = Default Then $i_ubound = UBound($a_array) - 1
    
    Local $s_write_to_file = FileRead($s_file)
    If $s_write_to_file <> "" Then
        $s_write_to_file = StringRegExpReplace($s_write_to_file, "[\r\n]+\z", "") & @CRLF
    EndIf
    
    For $i = $i_base To $i_ubound
        $s_write_to_file &= $a_array[$i] & @CRLF
    Next
    
    Return FileWrite($s_file, StringTrimRight($s_write_to_file, 2))
EndFunc

Func _FileRemoveNonAlNumLines_Array(ByRef $av_array, $s_file)
    Local $a_split_file = StringSplit(StringStripCR(FileRead($s_file)), @LF)
    
    Local $i_add = 0, $a_ret, $i_base, $i_ub
    If IsArray($av_array) = 0 Then
        $a_ret = $a_split_file
        $i_base = 1
    Else
        $a_ret = $av_array
        $i_ub = UBound($a_ret)
        ReDim $a_ret[$i_ub + ($a_split_file[0] + 1)]
        $i_base = $i_ub - 1
        $i_add = $i_base
    EndIf
    
    For $i = $i_base To $a_split_file[0]
        If StringIsAlNum($a_split_file[$i]) Then
            $i_add += 1
            $a_ret[$i_add] = $a_split_file[$i]
        EndIf
    Next
    
    If Not $i_add Then Return
    ReDim $a_ret[$i_add + 1]
    $av_array = $a_ret
    
    Return $av_array
EndFuncoÝ÷ ØßiËHÂ¥vèm¦åÉÚ-秶*Þ¦º#yË]÷Þ­éí²¶§X¤y«­¢+Ù½ÈÀÌØí¤ôÄQ¼ÄÀÀ(%%¥±á¥ÍÑÌ ÅÕ½ÐíÁÍͱ¥ÍÐÅÕ½ÐìµÀìÀÌØí¤µÀìÅÕ½Ðì¹ÑáÐÅÕ½Ðì¤Q¡¸($%}µå
ÕÍѽµÕ¹Ñ¥½¸ ÅÕ½ÐíÁÍͱ¥ÍÐÅÕ½ÐìµÀìÀÌØí¤µÀìÅÕ½Ðì¹ÑáÐÅÕ½Ðì°ÅÕ½ÐíÁÍͱ¥ÍÑ}¡­|¹ÑáÐÅÕ½Ðì¤(%¹%)9áÐ()Õ¹}µå
ÕÍѽµÕ¹Ñ¥½¸ ÀÌØíÍ}¥±°ÀÌØíÍ}½ÕÑ}¥±¤(%1½°ÀÌØí}ÍÁ±¥Ñ}¥±ôMÑÉ¥¹MÁ±¥Ð¡MÑÉ¥¹MÑÉ¥Á
H¡¥±I ÀÌØíÍ}¥±¤¤°1¤($(%1½°ÀÌØíÍ}¡½±}±¥¹ôÅÕ½ÐìÅÕ½Ðì(%½ÈÀÌØí¤ôÄQ¼ÀÌØí}ÍÁ±¥Ñ}¥±lÁt($%%MÑÉ¥¹%ͱ9Õ´ ÀÌØí}ÍÁ±¥Ñ}¥±lÀÌØí¥t¤Q¡¸($$$ÀÌØíÍ}¡½±}±¥¹µÀìôÀÌØí}ÍÁ±¥Ñ}¥±lÀÌØí¥tµÀì
I1($%¹%(%9áÐ($(%IÑÕɸ¥±]É¥Ñ ÀÌØíÍ}½ÕÑ}¥±°ÀÌØíÍ}¡½±}±¥¹¤)¹Õ¹
Would ultimately be faster than anything we've done thus far. Edited by SmOke_N

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

  • Moderators

Damn I used 2 AutoIt tags, so I can't edit!!

Anyway...

The first suggestion above, the output was:

It took: 0.875693977865902 seconds to finish 1 file with 95,000 lines.

My method I suggested last:

It took: 0.803650743320729 seconds to finish 1 file with 95,000 lines.

So, with that said... looks like you could finish in a couple of minutes with that method :).

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

Thanks, that last one worked perfectly, and MUCH faster than anything else I tried.

I assume it worked anyway, notepad really struggles to open text files of this huge size.

EDIT:

something didn't work. It is much much smaller than it should be and alot of things seem to have been deleted when they shouldnt have been. Ill try to see why.

Edit2:

Im an idiot. The script I used to split one massive text file into 100 smaller ones had an error (I accidentally left something inside a loop when it should have been out) so all the text files turned out the same. Thats now been corrected and the last script is working perfectly.

Edited by brodie28
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...