Sign in to follow this  
Followers 0
stg68

Reading file best method question

13 posts in this topic

Hello,

I am trying to identify the fastest processing method of the following.

Your help will be appreciated:

1. I have a large txt file. I want to read this file and split each word by space and write it to second file where all words will be listed in one column.

2. It is optional for now but will be nice to have if easy achievable. If word is already exists in second file then skip writing this word.

Regards

Share this post


Link to post
Share on other sites



...I have a large txt file...

"Large" is a relatative term. The best solution would depend on how large large is.

Here is a modified version of a remove dup UDF:

http://www.autoitscript.com/forum/index.ph...st&p=499222


[size="1"][font="Arial"].[u].[/u][/font][/size]

Share this post


Link to post
Share on other sites

"Large" is a relatative term. The best solution would depend on how large large is.

Here is a modified version of a remove dup UDF:

http://www.autoitscript.com/forum/index.ph...st&p=499222

Thank you all for your help.

Now I can use _ArrayRemoveDuplicates function to remove duplicates. Thanks!

My next step is how to build array from the txt file with the multiple lines where each word will be an element in the array.

Thanks!

Share this post


Link to post
Share on other sites

Thank you all for your help.

Now I can use _ArrayRemoveDuplicates function to remove duplicates. Thanks!

My next step is how to build array from the txt file with the multiple lines where each word will be an element in the array.

Thanks!

_FileReadToArray() will create an array with each line being an element.

FileRead() and StringSplit($string, " ") will break all words into elements.

Share this post


Link to post
Share on other sites

_FileReadToArray() will create an array with each line being an element.

FileRead() and StringSplit($string, " ") will break all words into elements.

The question is if I use _FileReadToArray() why do i need to use FileRead()

I do understand that _FileReadToArray()will create an array with each line being an element. So, how can I split it after?

Thanks

Share this post


Link to post
Share on other sites

The question is if I use _FileReadToArray() why do i need to use FileRead()

I do understand that _FileReadToArray()will create an array with each line being an element. So, how can I split it after?

Thanks

I guess I was showing you 2 seperate solutions.

You can just do:

$array = _FileReadToArray("myfile.txt")

For $X = 1 to $array[0]

$tempArray = StringSplit($array[$X], " ")

Next

Share this post


Link to post
Share on other sites

I guess I was showing you 2 seperate solutions.

You can just do:

$array = _FileReadToArray("myfile.txt")

For $X = 1 to $array[0]

$tempArray = StringSplit($array[$X], " ")

Next

Please tell me what I am doing wrong here. I just want to write to file each spitted element of the array?

#include<file.au3>

#include<array.au3>

Dim $array

_FileReadToArray("c:\temp\test\book.txt",$array)

_ArrayDisplay($array, " ")

For $X = 1 to $array[0]

$tempArray = StringSplit($array[$X], " ")

FileWriteLine("c:\temp\test\BookResults.txt",$tempArray[$x] &@CRLF)

Next

Thank you!

Share this post


Link to post
Share on other sites

I'll look at your code in a bit - for now, try this:

#include <Array.au3>
#include <File.au3>

;OutputFileHandle
$OFH = FileOpen("output.txt", 2)

; Check if file opened for writing OK
If $OFH = -1 Then
    MsgBox(0, "Error", "Unable to open file.")
    Exit
EndIf

$var = StringReplace(FileRead("input.txt"), @CRLF, " ")
$varArray = StringSplit($var, " ")
_ArrayDisplay ($varArray)
_ArrayRemoveDuplicates($varArray)
_ArrayDisplay ($varArray)
_FileWriteFromArray($OFH, $varArray)

;==================================================================
; Function Name:  _ArrayRemoveDuplicates()
;
; Description    :  Removes duplicate elements from an Array
; Parameter(s)   :  $avArray
;                   $iBase
;                   $iCaseSense
;                   $sDelimter
; Requirement(s) :  None
; Return Value(s):  On Success - Returns 1 and the cleaned up Array is set
;                   On Failure - Returns an -1 and sets @Error
;                        @Error=1 $avArray is not an array
;                        @Error=2 $iBase is different from 1 or 2
;                        @Error=3 $iCaseSense is different from 0 or 1
; Author         :  uteotw, but ALL the credits go to nitro322 and SmOke_N, see link below
; Note(s)        :  None
; Link           ;  [url="http://www.autoitscript.com/forum/index.php?showtopic=7821"]http://www.autoitscript.com/forum/index.php?showtopic=7821[/url]
; Example        ;  Yes
;==================================================================
Func _ArrayRemoveDuplicates(ByRef $avArray, $iBase = 0, $iCaseSense = 0, $sDelimter = "")
    Local $sHold

    If Not IsArray($avArray) Then
        SetError(1)
        Return -1
    EndIf
    If Not ($iBase = 0 Or $iBase = 1) Then
        SetError(2)
        Return -1
    EndIf
    If $iBase = 1 And $avArray[0] = 0 Then
        SetError(0)
        Return 0
    EndIf
    If Not ($iCaseSense = 0 Or $iCaseSense = 1) Then
        SetError(3)
        Return -1
    EndIf
    If $sDelimter = "" Then
        $sDelimter = Chr(01) & Chr(01)
    EndIf

    If $iBase = 0 Then
        For $i = $iBase To UBound($avArray) - 1
            If Not StringInStr($sDelimter & $sHold, $sDelimter & $avArray[$i] & $sDelimter, $iCaseSense) Then
                $sHold &= $avArray[$i] & $sDelimter
            EndIf
        Next
        $avNewArray = StringSplit(StringTrimRight($sHold, StringLen($sDelimter)), $sDelimter, 1)
        ReDim $avArray[$avNewArray[0]]
        For $i = 1 To $avNewArray[0]
            $avArray[$i - 1] = $avNewArray[$i]
        Next
    ElseIf $iBase = 1 Then
        For $i = $iBase To UBound($avArray) - 1
            If Not StringInStr($sDelimter & $sHold, $sDelimter & $avArray[$i] & $sDelimter, $iCaseSense) Then
                $sHold &= $avArray[$i] & $sDelimter
            EndIf
        Next
        $avArray = StringSplit(StringTrimRight($sHold, StringLen($sDelimter)), $sDelimter, 1)
    EndIf

    Return 1
EndFunc   ;==>_ArrayRemoveDuplicates


[size="1"][font="Arial"].[u].[/u][/font][/size]

Share this post


Link to post
Share on other sites

Please tell me what I am doing wrong here...

Try this for your code:
#include<file.au3>
#include<array.au3>

;OutputFileHandle
$OFH = FileOpen("output.txt", 2)

; Check if file opened for writing OK
If $OFH = -1 Then
    MsgBox(0, "Error", "Unable to open file.")
    Exit
EndIf

Dim $array
_FileReadToArray("input.txt", $array)
_ArrayDisplay($array, " ")

For $X = 1 To $array[0]
    $tempArray = StringSplit($array[$X], " ")
    For $Y = 1 To $tempArray[0]
        FileWriteLine($OFH, $tempArray[$Y])
    Next
Next

[size="1"][font="Arial"].[u].[/u][/font][/size]

Share this post


Link to post
Share on other sites

#11 ·  Posted (edited)

#include <Array.au3>

#include <File.au3>

;OutputFileHandle

$OFH = FileOpen("output.txt", 2)

; Check if file opened for writing OK

If $OFH = -1 Then

MsgBox(0, "Error", "Unable to open file.")

Exit

EndIf

$var = StringReplace(FileRead("input.txt"), @CRLF, " ")

$varArray = StringSplit($var, " ")

_ArrayDisplay ($varArray)

_ArrayRemoveDuplicates($varArray)

_ArrayDisplay ($varArray)

_FileWriteFromArray($OFH, $varArray)

;==================================================================

; Function Name: _ArrayRemoveDuplicates()

;

; Description : Removes duplicate elements from an Array

; Parameter(s) : $avArray

; $iBase

; $iCaseSense

; $sDelimter

; Requirement(s) : None

; Return Value(s): On Success - Returns 1 and the cleaned up Array is set

; On Failure - Returns an -1 and sets @Error

; @Error=1 $avArray is not an array

; @Error=2 $iBase is different from 1 or 2

; @Error=3 $iCaseSense is different from 0 or 1

; Author : uteotw, but ALL the credits go to nitro322 and SmOke_N, see link below

; Note(s) : None

; Link ; http://www.autoitscript.com/forum/index.php?showtopic=7821

; Example ; Yes

;==================================================================

Func _ArrayRemoveDuplicates(ByRef $avArray, $iBase = 0, $iCaseSense = 0, $sDelimter = "")

Local $sHold

If Not IsArray($avArray) Then

SetError(1)

Return -1

EndIf

If Not ($iBase = 0 Or $iBase = 1) Then

SetError(2)

Return -1

EndIf

If $iBase = 1 And $avArray[0] = 0 Then

SetError(0)

Return 0

EndIf

If Not ($iCaseSense = 0 Or $iCaseSense = 1) Then

SetError(3)

Return -1

EndIf

If $sDelimter = "" Then

$sDelimter = Chr(01) & Chr(01)

EndIf

If $iBase = 0 Then

For $i = $iBase To UBound($avArray) - 1

If Not StringInStr($sDelimter & $sHold, $sDelimter & $avArray[$i] & $sDelimter, $iCaseSense) Then

$sHold &= $avArray[$i] & $sDelimter

EndIf

Next

$avNewArray = StringSplit(StringTrimRight($sHold, StringLen($sDelimter)), $sDelimter, 1)

ReDim $avArray[$avNewArray[0]]

For $i = 1 To $avNewArray[0]

$avArray[$i - 1] = $avNewArray[$i]

Next

ElseIf $iBase = 1 Then

For $i = $iBase To UBound($avArray) - 1

If Not StringInStr($sDelimter & $sHold, $sDelimter & $avArray[$i] & $sDelimter, $iCaseSense) Then

$sHold &= $avArray[$i] & $sDelimter

EndIf

Next

$avArray = StringSplit(StringTrimRight($sHold, StringLen($sDelimter)), $sDelimter, 1)

EndIf

Return 1

EndFunc ;==>_ArrayRemoveDuplicates

Thank you! It works!

Is there a way to make some cosmetic changes?

When it writes from array it inserting an empty line and second line calculates total elements. Is there a way to avoid it?

Thanks!

Edited by stg68

Share this post


Link to post
Share on other sites

#12 ·  Posted (edited)

...When it writes from array it inserting an empty line and second line calculates total elements. Is there a way to avoid it?...

see code below - could not edit the code in this post w/o a forum barf Edited by herewasplato

[size="1"][font="Arial"].[u].[/u][/font][/size]

Share this post


Link to post
Share on other sites

New code after PM

;OutputFileHandle
$OFH = FileOpen("output.txt", 2)

; Check if file opened for writing OK
If $OFH = -1 Then
    MsgBox(0, "Error", "Unable to open file.")
    Exit
EndIf

$var = StringReplace(FileRead("input.txt"), @CRLF, " ")
$varArray = StringSplit($var, " ")

_ArrayRemoveDuplicates($varArray, 1)

;write unique element count to the output file
FileWriteLine($OFH, $varArray[0])

;start at 2 to avaoid extra line at the beginning???
For $i = 2 To $varArray[0]
 FileWriteLine($OFH, $varArray[$i])
Next

;==================================================================
; Function Name:  _ArrayRemoveDuplicates()
;
; Description    :  Removes duplicate elements from an Array
; Parameter(s)   :  $avArray
;                   $iBase
;                   $iCaseSense
;                   $sDelimter
; Requirement(s) :  None
; Return Value(s):  On Success - Returns 1 and the cleaned up Array is set
;                   On Failure - Returns an -1 and sets @Error
;                        @Error=1 $avArray is not an array
;                        @Error=2 $iBase is different from 1 or 2
;                        @Error=3 $iCaseSense is different from 0 or 1
; Author         :  uteotw, but ALL the credits go to nitro322 and SmOke_N, see link below
; Note(s)        :  None
; Link           ;  [url="http://www.autoitscript.com/forum/index.php?showtopic=7821"]http://www.autoitscript.com/forum/index.php?showtopic=7821[/url]
; Example        ;  Yes
;==================================================================
Func _ArrayRemoveDuplicates(ByRef $avArray, $iBase = 0, $iCaseSense = 0, $sDelimter = "")
    Local $sHold

    If Not IsArray($avArray) Then
        SetError(1)
        Return -1
    EndIf
    If Not ($iBase = 0 Or $iBase = 1) Then
        SetError(2)
        Return -1
    EndIf
    If $iBase = 1 And $avArray[0] = 0 Then
        SetError(0)
        Return 0
    EndIf
    If Not ($iCaseSense = 0 Or $iCaseSense = 1) Then
        SetError(3)
        Return -1
    EndIf
    If $sDelimter = "" Then
        $sDelimter = Chr(01) & Chr(01)
    EndIf

    If $iBase = 0 Then
        For $i = $iBase To UBound($avArray) - 1
            If Not StringInStr($sDelimter & $sHold, $sDelimter & $avArray[$i] & $sDelimter, $iCaseSense) Then
                $sHold &= $avArray[$i] & $sDelimter
            EndIf
        Next
        $avNewArray = StringSplit(StringTrimRight($sHold, StringLen($sDelimter)), $sDelimter, 1)
        ReDim $avArray[$avNewArray[0]]
        For $i = 1 To $avNewArray[0]
            $avArray[$i - 1] = $avNewArray[$i]
        Next
    ElseIf $iBase = 1 Then
        For $i = $iBase To UBound($avArray) - 1
            If Not StringInStr($sDelimter & $sHold, $sDelimter & $avArray[$i] & $sDelimter, $iCaseSense) Then
                $sHold &= $avArray[$i] & $sDelimter
            EndIf
        Next
        $avArray = StringSplit(StringTrimRight($sHold, StringLen($sDelimter)), $sDelimter, 1)
    EndIf

    Return 1
EndFunc   ;==>_ArrayRemoveDuplicates


[size="1"][font="Arial"].[u].[/u][/font][/size]

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0