yton 0 Posted May 6, 2010 Greetings,I have a big text file with data as follows:phrase1; phrase2; phrase3phrase4; phrase 5phrase6; phrase 7; phrase8etc...I need to fetch phrases of 2 or more words and remove themso, the desired output is the text file with only 1-word phrasesplease, helpthank you Share this post Link to post Share on other sites
99ojo 0 Posted May 6, 2010 (edited) Hi, #include <file.au3> #include <array.au3> ;read text file into array _FileReadToArray ("c:\mybigtextfile.txt", $arphrases) ;loop over array from last to 1st item For $i = UBound ($arphrases) - 1 To 1 Step -1 ;performing a StringSplit $temp = StringSplit ($arphrases [$i], ";") ;More elements then 1 -> delete item in array If $temp [0] > 1 Then _ArrayDelete ($arphrases, $i) EndIf ;Display result _ArrayDisplay ($arphrases) ;if you want, write into origin file or to another ;_FileWriteFromArray ("c:\result.txt", $arphrases, 1) ;-)) Stefan @edit: Missed the [$i] at line with StringSplit -> corrected @edit:rereading thread: if you want phrase1; phrase2; phrase3 become only phrase1: #include <file.au3> ;only needed for array display then... #include <array.au3> ;read text file into array _FileReadToArray ("c:\mybigtextfile.txt", $arphrases) ;loop over array from last to 1st item For $i = UBound ($arphrases) - 1 To 1 Step -1 ;performing a StringSplit $temp = StringSplit ($arphrases [$i], ";") ;More elements then 1 -> delete item in array If $temp [0] > 1 Then $arphrases [$i] = $temp [1] EndIf ;Display result _ArrayDisplay ($arphrases) ;if you want, write into origin file or to another ;_FileWriteFromArray ("c:\result.txt", $arphrases, 1) Edited May 6, 2010 by 99ojo Share this post Link to post Share on other sites
yton 0 Posted May 6, 2010 wll, the question is how to fetch these 2+ word phrases? Share this post Link to post Share on other sites
99ojo 0 Posted May 6, 2010 Hi, please post an example of the input file and the output you expect. I havn't have any idea what you want. ;-)) Stefan Share this post Link to post Share on other sites
yton 0 Posted May 6, 2010 input txt file:phrase1; phrase2; phrase3phrase4; phrase 5phrase6; phrase 7; phrase8etc...output txt file:phrase1; phrase2phrase4; phrase 5phrase8etc...where deleted phrase3, 6 and 7 consist of 2+ words (e.g. "amazing blue car", "fine french wine" etc. - "" are for example) Share this post Link to post Share on other sites
99ojo 0 Posted May 6, 2010 (edited) Hi, now it's a little bit clearer. I think this does what you want: #include <file.au3> #include <array.au3> Global $arphrases ;read text file into array _FileReadToArray ("c:\mybigtextfile.txt", $arphrases) ;loop over array from last to 1st item For $i = UBound ($arphrases) - 1 To 1 Step -1 ;Stringsplit to get seperate phrases $temp = StringSplit ($arphrases [$i], ";") $string = "" ;loop over return array from Stringsplit For $j = 1 To $temp [0] ;Stringsplit to get seperate words in phrase $temp1 = StringSplit (StringStripWS ($temp [$j], 1), " ") ;if you have less then three words save phrase into string and 'rebuild' phrases ConsoleWrite ($temp [$j] & @CRLF) If $temp1 [0] < 3 Then $string &= $temp [$j] & "; " EndIf Next ;there are at least one phrase with only 2 words If $string <> "" Then ;save string into array, get rid of last blank and ; $arphrases [$i] = StringTrimRight ($string, 2) Else ;all phrases with 3 ore more words -> delete item in array _ArrayDelete ($arphrases, $i) EndIf Next ;Display result _ArrayDisplay ($arphrases) ;if you want, write into origin file or to another ;_FileWriteFromArray ("c:\result.txt", $arphrases, 1) ;-)) Stefan @Edit: Did some code corrections... Edited May 6, 2010 by 99ojo Share this post Link to post Share on other sites
yton 0 Posted May 6, 2010 Hi, now it's a little bit clearer. I think this does what you want: #include <file.au3> #include <array.au3> Global $arphrases ;read text file into array _FileReadToArray ("c:\mybigtextfile.txt", $arphrases) ;loop over array from last to 1st item For $i = UBound ($arphrases) - 1 To 1 Step -1 ;Stringsplit to get seperate phrases $temp = StringSplit ($arphrases [$i], ";") $string = "" ;loop over return array from Stringsplit For $j = 1 To $temp [0] ;Stringsplit to get seperate words in phrase $temp1 = StringSplit (StringStripWS ($temp [$j], 1), " ") ;if you have less then three words save phrase into string and 'rebuild' phrases ConsoleWrite ($temp [$j] & @CRLF) If $temp1 [0] < 3 Then $string &= $temp [$j] & "; " EndIf Next ;there are at least one phrase with only 2 words If $string <> "" Then ;save string into array, get rid of last blank and ; $arphrases [$i] = StringTrimRight ($string, 2) Else ;all phrases with 3 ore more words -> delete item in array _ArrayDelete ($arphrases, $i) EndIf Next ;Display result _ArrayDisplay ($arphrases) ;if you want, write into origin file or to another ;_FileWriteFromArray ("c:\result.txt", $arphrases, 1) ;-)) Stefan @Edit: Did some code corrections... not really, i need to browse for $arphrases as I do not know them all as file is very big Share this post Link to post Share on other sites
yton 0 Posted May 6, 2010 (edited) it's clear now for me thanks! ) Edited May 6, 2010 by yton Share this post Link to post Share on other sites