Sign in to follow this  
Followers 0
stamat

Fast delete multiple lines from text file

19 posts in this topic

I have to automate going through a text file and deleting quite a lot of lines.

I use _FileReadToArray() to read the file into array. But deleting hundreds of elements from the array one by one using _ArrayDelete() takes long. Do you know any more time efficient methods?

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

Welcome to AutoIt and the forum!

How large in megabytes is your file?

Edited by water

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

thanks water!

the file is a couple of MBs. but over 30k in lines.

Since I usually have to delete a range of lines (array elements) e.g. from 2000 to 3000 I thought I may use an array split function and then concat the new arrays. But there is no such array split function in AutoIt. :( Any suggestions?

Share this post


Link to post
Share on other sites

You delete the lines and then write the array back to disk?


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

This reads the file into an array, ignores some records and writes the rest to an output file.

Global $aInput[1]
_FileReadToArray("C:tempInput.txt")
$hOutput = FileOpen("C:tempOutput.txt", 1)
For $iIndex = 1 to $aInput[0]
  If $aInput[$iIndex] <> "..." Then FileWrite($hOutput, $aInput[$iIndex])
Next
FileClose($hOutput)


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

This also uses a loop to go through each individual index. This is slow. I need a function which splits an array. E.g.

$avArray[0] = "JPM"
$avArray[1] = "Holger"
$avArray[2] = "Jon"
$avArray[3] = "Larry"
$avArray[4] = "Jeremy"
$avArray[5] = "Valik"
$avArray[6] = "Cyberslug"
$avArray[7] = "Nutster"
$avArray[8] = "JdeB"
$avArray[9] = "Tylo"

Let's say I need to get rid of lines 3 through 7. I could split the array at positions 3 and 7. Then concatenate the first and last arrays. This will be faster than iterating through all the elements. Is this possible? Or is there a better way? Thanks.

Share this post


Link to post
Share on other sites

Looping through an array is quite fast. The following example fills an array with 30000 elements and checks each element in less then 1/2 second.

What's "slow" is reading and writing a file.

Global $aArray[30000]
Global $iTimer = TimerInit()
For $i = 0 To UBound($aArray) - 1
    $aArray[$i] = Random(1,100000, 1)
Next
ConsoleWrite(TimerDiff($iTimer) & @LF)
Global $bFlag
$iTimer = TimerInit()
ConsoleWrite(UBound($aArray) - 1 & @LF)
For $i = 0 To UBound($aArray) - 1
    If $aArray[$i] > 1000 Then $bFlag = True
Next
ConsoleWrite(TimerDiff($iTimer) & @LF)

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

BTW: How often do you need to process the file (every 5 minutes, daily, once)? And how fast do you need it?


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

True, looping through an array is fast, but the _ArrayDelete() function is slow. What I need is an "array splice" function. Thanks for the help, water. I have a 5 post per day limit so I won't be able to answer till tomorrow. :(

Share this post


Link to post
Share on other sites

stamat,

Melba23 has lifted your 5 posts limit for the first 24 hours so we can go on discussing your problem.


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

#12 ·  Posted (edited)

$sText = ''
For $i = 1 To 20
    $sText &= $i & ' Line' & @CRLF
Next
; MsgBox(0, 'Сообщение', $sText)
$iStartingLine = 10
$iEndingLine = 15

$iStartingLine -= 1
; $sText = FileRead(@ScriptDir&'file.txt')
$iPos1 = StringInStr($sText, @CRLF, 1, $iStartingLine)
$iPos2 = StringInStr($sText, @CRLF, 1, $iEndingLine - $iStartingLine, $iPos1 + 1)
; MsgBox(0, 'Сообщение', $iPos1 &@CRLF& $iPos2)
$sText = StringLeft($sText, $iPos1) & StringTrimLeft($sText, $iPos2)
MsgBox(0, 'Сообщение', $sText)

Edited by AZJIO

Share this post


Link to post
Share on other sites

#13 ·  Posted (edited)

I modified my example from post #6 so it "deletes" records 3 to 7:

Global $aInput[1]
_FileReadToArray("C:tempInput.txt")
$hOutput = FileOpen("C:tempOutput.txt", 1)
For $iIndex = 1 to $aInput[0]
    If $iIndex < 3 Or $iIndex > 7 Then FileWrite($hOutput, $aInput[$iIndex])
Next
FileClose($hOutput)

BTW:

As you can see my example does not call _ArrayDelete so it should be quite fast.

Edited by water

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

#14 ·  Posted (edited)

$sText = ''
For $i = 1 To 20
    $sText &= $i & ' Line' & @CRLF
Next
; $sText ='1 Line'
; MsgBox(0, 'Preview', $sText)
_StringDelete($sText, 2, 18)
MsgBox(0, 'After', $sText)
; MsgBox(0, 'After, @error=' & @error, $sText)

Func _StringDelete(ByRef $sText, $iStart, $iEnd)
    If $iStart > $iEnd Then
        Local $tmp = $iStart
        $iStart = $iEnd
        $iEnd = $tmp
    EndIf
    Local $iPosStart, $iPosEnd
    $iStart -= 1
    If $iStart < 1 Then
        $iPosStart = 0
        $iStart = 0
    Else
        $iPosStart = StringInStr($sText, @CRLF, 1, $iStart)
        If Not $iPosStart Then Return
    EndIf
    $iPosEnd = StringInStr($sText, @CRLF, 1, $iEnd - $iStart, $iPosStart + 1)
    If $iPosEnd Then
        $sText = StringLeft($sText, $iPosStart) & StringTrimLeft($sText, $iPosEnd)
    Else
        $sText = StringLeft($sText, $iPosStart)
    EndIf
EndFunc

Edited by AZJIO

Share this post


Link to post
Share on other sites

water, I tried your code again. And it is fast - 60k rows written in 1.3 sec. So instead of deleting array elements and writing the array back to disk, I will write each line individually using the file write stream. Thanks!! I will try to complete the code asap but I'm sure this solves my problem.

AZJIO, your code works with strings and not arrays. So to use it I will have to convert the array to string which will make it hard for me to work with. Thanks for the input.

Share this post


Link to post
Share on other sites

Glad to hear it's working for you :D

That's perfect to end the day. Now it's time for bed!


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

Thanks again. See you around on the forums :)

Share this post


Link to post
Share on other sites

#18 ·  Posted (edited)

Read the array necessary? Read how the string far faster.

You can even so, but checking that the index did not exceed the size of the array

Global $aInput[1]
_FileReadToArray("C:tempInput.txt")
$hOutput = FileOpen("C:tempOutput.txt", 1)
For $iIndex = 1 to 3
    FileWrite($hOutput, $aInput[$iIndex])
Next
For $iIndex = 7 to $aInput[0]
    FileWrite($hOutput, $aInput[$iIndex])
Next
FileClose($hOutput)
Edited by AZJIO

Share this post


Link to post
Share on other sites

#19 ·  Posted (edited)

Probably you didn't understand me

$sText = FileRead(@ScriptDir&'file.txt')
_StringDelete($sText, 2, 18)
Edited by AZJIO

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0