Jump to content
Sign in to follow this  
yton

fetch 2/more - phrases

Recommended Posts

Greetings,

I have a big text file with data as follows:

phrase1; phrase2; phrase3

phrase4; phrase 5

phrase6; phrase 7; phrase8

etc...

I need to fetch phrases of 2 or more words and remove them

so, the desired output is the text file with only 1-word phrases

please, help

thank you

Share this post


Link to post
Share on other sites

Hi,

#include <file.au3>
#include <array.au3>

;read text file into array
_FileReadToArray ("c:\mybigtextfile.txt", $arphrases)
;loop over array from last to 1st item
For $i = UBound ($arphrases) - 1 To 1 Step -1
    ;performing a StringSplit
    $temp = StringSplit ($arphrases [$i], ";")
    ;More elements then 1 -> delete item in array
    If $temp [0] > 1 Then _ArrayDelete ($arphrases, $i)
EndIf
;Display result
_ArrayDisplay ($arphrases)
;if you want, write into origin file or to another
;_FileWriteFromArray ("c:\result.txt", $arphrases, 1)

;-))

Stefan

@edit: Missed the [$i] at line with StringSplit -> corrected

@edit:rereading thread: if you want phrase1; phrase2; phrase3 become only phrase1:

#include <file.au3>
;only needed for array display then...
#include <array.au3>

;read text file into array
_FileReadToArray ("c:\mybigtextfile.txt", $arphrases)
;loop over array from last to 1st item
For $i = UBound ($arphrases) - 1 To 1 Step -1
    ;performing a StringSplit
    $temp = StringSplit ($arphrases [$i], ";")
    ;More elements then 1 -> delete item in array
    If $temp [0] > 1 Then $arphrases [$i] = $temp [1]
EndIf
;Display result
_ArrayDisplay ($arphrases)
;if you want, write into origin file or to another
;_FileWriteFromArray ("c:\result.txt", $arphrases, 1)
Edited by 99ojo

Share this post


Link to post
Share on other sites

Hi,

please post an example of the input file and the output you expect.

I havn't have any idea what you want.

;-))

Stefan

Share this post


Link to post
Share on other sites

input txt file:

phrase1; phrase2; phrase3

phrase4; phrase 5

phrase6; phrase 7; phrase8

etc...

output txt file:

phrase1; phrase2

phrase4; phrase 5

phrase8

etc...

where deleted phrase3, 6 and 7 consist of 2+ words (e.g. "amazing blue car", "fine french wine" etc. - "" are for example)

Share this post


Link to post
Share on other sites

Hi,

now it's a little bit clearer.

I think this does what you want:

#include <file.au3>
#include <array.au3>

Global $arphrases
;read text file into array
_FileReadToArray ("c:\mybigtextfile.txt", $arphrases)
;loop over array from last to 1st item
For $i = UBound ($arphrases) - 1 To 1 Step -1
    ;Stringsplit to get seperate phrases
    $temp = StringSplit ($arphrases [$i], ";")
    $string = ""
    ;loop over return array from Stringsplit
    For $j = 1 To $temp [0]
        ;Stringsplit to get seperate words in phrase
        $temp1 = StringSplit (StringStripWS ($temp [$j], 1), " ")
        ;if you have less then three words save phrase into string and 'rebuild' phrases
        ConsoleWrite ($temp [$j] & @CRLF)
        If $temp1 [0] < 3 Then
            $string &= $temp [$j] & "; "
        EndIf
    Next
    ;there are at least one phrase with only 2 words
    If $string <> "" Then 
        ;save string into array, get rid of last blank and ;
        $arphrases [$i] = StringTrimRight ($string, 2)
    Else
        ;all phrases with 3 ore more words -> delete item in array
        _ArrayDelete ($arphrases, $i)
    EndIf
Next
;Display result
_ArrayDisplay ($arphrases)
;if you want, write into origin file or to another
;_FileWriteFromArray ("c:\result.txt", $arphrases, 1)

;-))

Stefan

@Edit: Did some code corrections... :idea:

Edited by 99ojo

Share this post


Link to post
Share on other sites

Hi,

now it's a little bit clearer.

I think this does what you want:

#include <file.au3>
#include <array.au3>

Global $arphrases
;read text file into array
_FileReadToArray ("c:\mybigtextfile.txt", $arphrases)
;loop over array from last to 1st item
For $i = UBound ($arphrases) - 1 To 1 Step -1
    ;Stringsplit to get seperate phrases
    $temp = StringSplit ($arphrases [$i], ";")
    $string = ""
    ;loop over return array from Stringsplit
    For $j = 1 To $temp [0]
        ;Stringsplit to get seperate words in phrase
        $temp1 = StringSplit (StringStripWS ($temp [$j], 1), " ")
        ;if you have less then three words save phrase into string and 'rebuild' phrases
        ConsoleWrite ($temp [$j] & @CRLF)
        If $temp1 [0] < 3 Then
            $string &= $temp [$j] & "; "
        EndIf
    Next
    ;there are at least one phrase with only 2 words
    If $string <> "" Then 
        ;save string into array, get rid of last blank and ;
        $arphrases [$i] = StringTrimRight ($string, 2)
    Else
        ;all phrases with 3 ore more words -> delete item in array
        _ArrayDelete ($arphrases, $i)
    EndIf
Next
;Display result
_ArrayDisplay ($arphrases)
;if you want, write into origin file or to another
;_FileWriteFromArray ("c:\result.txt", $arphrases, 1)

;-))

Stefan

@Edit: Did some code corrections... :idea:

not really,

i need to browse for $arphrases as I do not know them all as file is very big

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...