Jump to content

fetch 2/more - phrases


Recommended Posts

Greetings,

I have a big text file with data as follows:

phrase1; phrase2; phrase3

phrase4; phrase 5

phrase6; phrase 7; phrase8

etc...

I need to fetch phrases of 2 or more words and remove them

so, the desired output is the text file with only 1-word phrases

please, help

thank you

Link to comment
Share on other sites

Hi,

#include <file.au3>
#include <array.au3>

;read text file into array
_FileReadToArray ("c:\mybigtextfile.txt", $arphrases)
;loop over array from last to 1st item
For $i = UBound ($arphrases) - 1 To 1 Step -1
    ;performing a StringSplit
    $temp = StringSplit ($arphrases [$i], ";")
    ;More elements then 1 -> delete item in array
    If $temp [0] > 1 Then _ArrayDelete ($arphrases, $i)
EndIf
;Display result
_ArrayDisplay ($arphrases)
;if you want, write into origin file or to another
;_FileWriteFromArray ("c:\result.txt", $arphrases, 1)

;-))

Stefan

@edit: Missed the [$i] at line with StringSplit -> corrected

@edit:rereading thread: if you want phrase1; phrase2; phrase3 become only phrase1:

#include <file.au3>
;only needed for array display then...
#include <array.au3>

;read text file into array
_FileReadToArray ("c:\mybigtextfile.txt", $arphrases)
;loop over array from last to 1st item
For $i = UBound ($arphrases) - 1 To 1 Step -1
    ;performing a StringSplit
    $temp = StringSplit ($arphrases [$i], ";")
    ;More elements then 1 -> delete item in array
    If $temp [0] > 1 Then $arphrases [$i] = $temp [1]
EndIf
;Display result
_ArrayDisplay ($arphrases)
;if you want, write into origin file or to another
;_FileWriteFromArray ("c:\result.txt", $arphrases, 1)
Edited by 99ojo
Link to comment
Share on other sites

input txt file:

phrase1; phrase2; phrase3

phrase4; phrase 5

phrase6; phrase 7; phrase8

etc...

output txt file:

phrase1; phrase2

phrase4; phrase 5

phrase8

etc...

where deleted phrase3, 6 and 7 consist of 2+ words (e.g. "amazing blue car", "fine french wine" etc. - "" are for example)

Link to comment
Share on other sites

Hi,

now it's a little bit clearer.

I think this does what you want:

#include <file.au3>
#include <array.au3>

Global $arphrases
;read text file into array
_FileReadToArray ("c:\mybigtextfile.txt", $arphrases)
;loop over array from last to 1st item
For $i = UBound ($arphrases) - 1 To 1 Step -1
    ;Stringsplit to get seperate phrases
    $temp = StringSplit ($arphrases [$i], ";")
    $string = ""
    ;loop over return array from Stringsplit
    For $j = 1 To $temp [0]
        ;Stringsplit to get seperate words in phrase
        $temp1 = StringSplit (StringStripWS ($temp [$j], 1), " ")
        ;if you have less then three words save phrase into string and 'rebuild' phrases
        ConsoleWrite ($temp [$j] & @CRLF)
        If $temp1 [0] < 3 Then
            $string &= $temp [$j] & "; "
        EndIf
    Next
    ;there are at least one phrase with only 2 words
    If $string <> "" Then 
        ;save string into array, get rid of last blank and ;
        $arphrases [$i] = StringTrimRight ($string, 2)
    Else
        ;all phrases with 3 ore more words -> delete item in array
        _ArrayDelete ($arphrases, $i)
    EndIf
Next
;Display result
_ArrayDisplay ($arphrases)
;if you want, write into origin file or to another
;_FileWriteFromArray ("c:\result.txt", $arphrases, 1)

;-))

Stefan

@Edit: Did some code corrections... :idea:

Edited by 99ojo
Link to comment
Share on other sites

Hi,

now it's a little bit clearer.

I think this does what you want:

#include <file.au3>
#include <array.au3>

Global $arphrases
;read text file into array
_FileReadToArray ("c:\mybigtextfile.txt", $arphrases)
;loop over array from last to 1st item
For $i = UBound ($arphrases) - 1 To 1 Step -1
    ;Stringsplit to get seperate phrases
    $temp = StringSplit ($arphrases [$i], ";")
    $string = ""
    ;loop over return array from Stringsplit
    For $j = 1 To $temp [0]
        ;Stringsplit to get seperate words in phrase
        $temp1 = StringSplit (StringStripWS ($temp [$j], 1), " ")
        ;if you have less then three words save phrase into string and 'rebuild' phrases
        ConsoleWrite ($temp [$j] & @CRLF)
        If $temp1 [0] < 3 Then
            $string &= $temp [$j] & "; "
        EndIf
    Next
    ;there are at least one phrase with only 2 words
    If $string <> "" Then 
        ;save string into array, get rid of last blank and ;
        $arphrases [$i] = StringTrimRight ($string, 2)
    Else
        ;all phrases with 3 ore more words -> delete item in array
        _ArrayDelete ($arphrases, $i)
    EndIf
Next
;Display result
_ArrayDisplay ($arphrases)
;if you want, write into origin file or to another
;_FileWriteFromArray ("c:\result.txt", $arphrases, 1)

;-))

Stefan

@Edit: Did some code corrections... :idea:

not really,

i need to browse for $arphrases as I do not know them all as file is very big

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...