Sign in to follow this  
Followers 0
myspacee

Read text file and create a wordlist

13 posts in this topic

Hello to all,

try to create script that read a text file, return a wordlist of all words.

Final goal is create a worlist, sort, and delete dupes (have code for this)

anyone can help ?

m.

Share this post


Link to post
Share on other sites



Hello to all,

try to create script that read a text file, return a wordlist of all words.

Final goal is create a worlist, sort, and delete dupes (have code for this)

anyone can help ?

m.

Just writing on the fly without testing but this should do it.

#include<array.au3>
$sFile = FileRead("C:\Path\somefile.txt")
$sStr = StringReplace(StringStripCR($sFile), @LF, Chr(32))
$sStr = StringRegExpReplace($sStr, ",|\.|\?|!|:|;", "")
$aWords = StringSplit($sStr, Chr(32), 2)
$aWords = _ArrayUnique($aWords)
_ArrayDisplay($aWords, "Returned Word List")

You might want to throw _ArraySort() in after the _ArrayUnique()


George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

Thank you for reply but,

unsing autoit 3.2.12.1 and can't find _ArrayUnique() func..

Am'I wrong something ?

m.

Share this post


Link to post
Share on other sites

Thank you for reply but,

unsing autoit 3.2.12.1 and can't find _ArrayUnique() func..

Am'I wrong something ?

m.

I guess it wasn't in 3.2.12

Here it is

Just add it to your script for now. The next release version of AutoIt will include it.

CODE
Func _ArrayUnique($aArray, $iDimension = 1, $iBase = 0, $iCase = 0, $vDelim = "|")

Local $iUboundDim

;$aArray used to be ByRef, but litlmike altered it to allow for the choosing of 1 Array Dimension, without altering the original array

If $vDelim = "|" Then $vDelim = Chr(01) ; by SmOke_N, modified by litlmike

If Not IsArray($aArray) Then Return SetError(1, 0, 0) ;Check to see if it is valid array

;Checks that the given Dimension is Valid

If Not $iDimension > 0 Then

Return SetError(3, 0, 0) ;Check to see if it is valid array dimension, Should be greater than 0

Else

;If Dimension Exists, then get the number of "Rows"

$iUboundDim = UBound($aArray, 1) ;Get Number of "Rows"

If @error Then Return SetError(3, 0, 0) ;2 = Array dimension is invalid.

;If $iDimension Exists, And the number of "Rows" is Valid:

If $iDimension > 1 Then ;Makes sure the Array dimension desired is more than 1-dimensional

Local $aArrayTmp[1] ;Declare blank array, which will hold the dimension declared by user

For $i = 0 To $iUboundDim - 1 ;Loop through "Rows"

_ArrayAdd($aArrayTmp, $aArray[$i][$iDimension - 1]) ;$iDimension-1 to match Dimension

Next

_ArrayDelete($aArrayTmp, 0) ;Get rid of 1st-element which is blank

Else ;Makes sure the Array dimension desired is 1-dimensional

;If Dimension Exists, And the number of "Rows" is Valid, and the Dimension desired is not > 1, then:

;For the Case that the array is 1-Dimensional

If UBound($aArray, 0) = 1 Then ;Makes sure the Array is only 1-Dimensional

Dim $aArrayTmp[1] ;Declare blank array, which will hold the dimension declared by user

For $i = 0 To $iUboundDim - 1

_ArrayAdd($aArrayTmp, $aArray[$i])

Next

_ArrayDelete($aArrayTmp, 0) ;Get rid of 1st-element which is blank

Else ;For the Case that the array is 2-Dimensional

Dim $aArrayTmp[1] ;Declare blank array, which will hold the dimension declared by user

For $i = 0 To $iUboundDim - 1

_ArrayAdd($aArrayTmp, $aArray[$i][$iDimension - 1]) ;$iDimension-1 to match Dimension

Next

_ArrayDelete($aArrayTmp, 0) ;Get rid of 1st-element which is blank

EndIf

EndIf

EndIf

Local $sHold ;String that holds the Unique array info

For $iCC = $iBase To UBound($aArrayTmp) - 1 ;Loop Through array

;If Not the case that the element is already in $sHold, then add it

If Not StringInStr($vDelim & $sHold, $vDelim & $aArrayTmp[$iCC] & $vDelim, $iCase) Then _

$sHold &= $aArrayTmp[$iCC] & $vDelim

Next

If $sHold Then

$aArrayTmp = StringSplit(StringTrimRight($sHold, StringLen($vDelim)), $vDelim, 1) ;Split the string into an array

Return $aArrayTmp ;SmOke_N's version used to Return SetError(0, 0, 0)

EndIf

Return SetError(2, 0, 0) ;If the script gets this far, it has failed

EndFunc ;==>_ArrayUnique


George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

thank you !

in meantime i find some code and write this:

#include <GuiConstantsEx.au3>
#include <String.au3>
#include <Array.au3>


$file = FileOpen("test.txt", 0)

; Check if file opened for reading OK
If $file = -1 Then
    MsgBox(0, "Error", "Unable to open file.")
    Exit
elseif $file <> -1 Then
; Read in 1 character at a time until the EOF is reached

        $chars = FileRead($file)
        $chars = StringReplace($chars, "  ", " ")
        $chars = StringReplace($chars, ",", "")
        $chars = StringReplace($chars, ".", "")
        $chars = StringReplace($chars, "-", "")
        $chars = StringReplace($chars, "!", "")
        $chars = StringReplace($chars, "£", "")
        
        $chars = StringReplace($chars, "$", "")
        $chars = StringReplace($chars, "%", "")
        $chars = StringReplace($chars, "&", "")
        $chars = StringReplace($chars, "/", "")
        $chars = StringReplace($chars, "(", "")
        
        $chars = StringReplace($chars, ")", "")
        $chars = StringReplace($chars, "=", "")
        $chars = StringReplace($chars, "?", "")
        $chars = StringReplace($chars, "^", "")
        $chars = StringReplace($chars, "(", "")
        
        $chars = StringReplace($chars, "[", "")
        $chars = StringReplace($chars, "]", "")
        $chars = StringReplace($chars, "@", "")
        $chars = StringReplace($chars, "#", "")
        $chars = StringReplace($chars, "§", "")
        
        $chars = StringReplace($chars, ";", "")
        $chars = StringReplace($chars, ":", "")
        $chars = StringReplace($chars, "_", "")
        $chars = StringReplace($chars, "-", "")
        $chars = StringReplace($chars, "+", "")
        
        $chars = StringReplace($chars, "*", "")
        $chars = StringReplace($chars, chr(34), "")
        $chars = StringReplace($chars, "'", "")
;~      $chars = StringReplace($chars, "", "")
;~      $chars = StringReplace($chars, "+", "")

        $chars = StringReplace($chars, "0", "")
        $chars = StringReplace($chars, "1", "")
        $chars = StringReplace($chars, "2", "")
        $chars = StringReplace($chars, "3", "")
        $chars = StringReplace($chars, "4", "")
        
        $chars = StringReplace($chars, "5", "")
        $chars = StringReplace($chars, "6", "")
        $chars = StringReplace($chars, "7", "")
        $chars = StringReplace($chars, "8", "")
        $chars = StringReplace($chars, "9", "")


        $array = StringSplit($chars, " ")
        _ArraySort($Array, 0, 0, 0, 0)
        _ArrayDisplay($array, "BEFORE : with dupes")
        


    
    
    

    
        $mynewarray = dupecheckerthingy($array)
        _ArrayDisplay($mynewarray, "AFTER : without dupes")

EndIf

FileClose($file)

Func dupecheckerthingy($showmethearray)
    Local $tmparray[1] = ['']
    For $i = 1 To UBound($showmethearray) - 1
        _ArraySearch($tmparray, $showmethearray[$i])
        If @error Then _ArrayAdd($tmparray, $showmethearray[$i])
    Next
    $tmparray[0] = 'Number of elements = ' & UBound($tmparray) - 1
    Return $tmparray
EndFunc

Now need help to write array to text file as wordlist...

Can help ?

m.

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

That's essentially the long way of doing what I gave you except that it includes digits and more punctuation which I will correct in this version. The one StringRegExpReplace() line takes care of all those $Chars = StringReplace() lines that you have.

#include<array.au3>
$sFile = FileRead("C:\Path\somefile.txt")
;; Strip the @CRs and replace @LFs with spaces 
$sStr = StringReplace(StringStripCR($sFile), @LF, Chr(32))
;; Remove punctuation (except ') and digits. 
;; this one line does the same as all the stringReplace() lines in the other code
$sStr = StringRegExpReplace($sStr, "\d|\x22|~|`|@|#|%|\^|&|*|\(|\)|=|/|\[|\]|{|}|<|\\|>|+|,|\.|\?|!|:|;|\|", "")
;; Split the text on the spaces to a 0 based array
$aWords = StringSplit($sStr, Chr(32), 2)
;; remove all but one occurance of each word
$aWords = _ArrayUnique($aWords)
;; Sort the list (array)
_ArraySort($aWords, 0, 0, 0, 0)
;; Write the words to a file
$hOut = FileOpen (@DesktopDir & "\Word_List.txt", 2)
For $i = 0 To Ubound($aWords) -1
   FileWriteLine($hOut, $aWords[$i])
Next
FileClose($hFile)
;; View the file
ShellExecute(@DesktopDir & "\Word_List.txt")
Edited by GEOSoft

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

using _ArrayUnique() function but not your code 'cause return as otput this :

(using first page of bible as test)

1
2

then i apply function to my code with nice result :

#include <GuiConstantsEx.au3>
#include <String.au3>
#include <Array.au3>


$file = FileOpen("test.txt", 0)

; Check if file opened for reading OK
If $file = -1 Then
    MsgBox(0, "Error", "Unable to open file.")
    Exit
elseif $file <> -1 Then
; Read in 1 character at a time until the EOF is reached

    $chars = FileRead($file)
;~  $chars = StringRegExpReplace($chars, ",|.|-|!|£|$|%|&|/|(|)|=|?|^|[|]|@|#|§|;|:|_|\|-|+|*|~|0|1|2|3|4|5|6|7|8|9|\.|\?", "")
;~  $chars = StringReplace($chars, "  ", " ")
    $chars = StringReplace($chars, ",", "")
    $chars = StringReplace($chars, ".", "")
    $chars = StringReplace($chars, "-", "")
    $chars = StringReplace($chars, "!", "")
    $chars = StringReplace($chars, "£", "")
    
    $chars = StringReplace($chars, "$", "")
    $chars = StringReplace($chars, "%", "")
    $chars = StringReplace($chars, "&", "")
    $chars = StringReplace($chars, "/", "")
    $chars = StringReplace($chars, "(", "")
    
    $chars = StringReplace($chars, ")", "")
    $chars = StringReplace($chars, "=", "")
    $chars = StringReplace($chars, "?", "")
    $chars = StringReplace($chars, "^", "")
    $chars = StringReplace($chars, "(", "")
    
    $chars = StringReplace($chars, "[", "")
    $chars = StringReplace($chars, "]", "")
    $chars = StringReplace($chars, "@", "")
    $chars = StringReplace($chars, "#", "")
    $chars = StringReplace($chars, "§", "")
    
    $chars = StringReplace($chars, ";", "")
    $chars = StringReplace($chars, ":", "")
    $chars = StringReplace($chars, "_", "")
    $chars = StringReplace($chars, "-", "")
    $chars = StringReplace($chars, "+", "")
    
    $chars = StringReplace($chars, "*", "")
    $chars = StringReplace($chars, chr(34), "")
    $chars = StringReplace($chars, "'", "")
;~      $chars = StringReplace($chars, "", "")
;~      $chars = StringReplace($chars, "+", "")

    $chars = StringReplace($chars, "0", "")
    $chars = StringReplace($chars, "1", "")
    $chars = StringReplace($chars, "2", "")
    $chars = StringReplace($chars, "3", "")
    $chars = StringReplace($chars, "4", "")
    
    $chars = StringReplace($chars, "5", "")
    $chars = StringReplace($chars, "6", "")
    $chars = StringReplace($chars, "7", "")
    $chars = StringReplace($chars, "8", "")
    $chars = StringReplace($chars, "9", "")
;-------------------------------------------------------- order in array
    $array = StringSplit($chars, " ")
    _ArraySort($Array, 0, 0, 0, 0)
;~  _ArrayDisplay($array, "BEFORE : with dupes")
;-------------------------------------------------------- delete dupes
    $array = _ArrayUnique($array)
    _ArraySort($Array, 0, 0, 0, 0)
;~  _ArrayRemoveBlanks($Array)
    _ArrayDisplay($array, "Returned Word List")
;-------------------------------------------------------- write wordlist
FileDelete("test_sorted.txt")
$file_sorted = FileOpen("test_sorted.txt", 2)
for $i = 1 to UBound($array) - 1
;~  if $array[$i] <> "" then;or $array[$i] <> @CRLF then;or $array[$i] <> @CR or $array[$i] <> @LF Then
        FileWrite($file_sorted, $array[$i] & @CRLF)
;~  EndIf
next


EndIf

FileClose($file)
FileClose($file_sorted)


; Removes Elemets that contain only whitespace characters and returns the new array.
; The count of the return is at $aRet[0].
Func _ArrayRemoveBlanks($aID)
    Local $sTmp = ''
    For $i = 0 to Ubound($aID) -1
        If StringRegExpReplace($aID[$i], "\s", "") Then $sTmp &= $aID[$i] & Chr(0)
    Next
    Return StringSplit(StringTrimRight($sTmp, 1), Chr(0))
EndFunc


Func _ArrayUnique($aArray, $iDimension = 1, $iBase = 0, $iCase = 0, $vDelim = "|")
Local $iUboundDim
;$aArray used to be ByRef, but litlmike altered it to allow for the choosing of 1 Array Dimension, without altering the original array
If $vDelim = "|" Then $vDelim = Chr(01); by SmOke_N, modified by litlmike
If Not IsArray($aArray) Then Return SetError(1, 0, 0);Check to see if it is valid array

;Checks that the given Dimension is Valid
If Not $iDimension > 0 Then
Return SetError(3, 0, 0);Check to see if it is valid array dimension, Should be greater than 0
Else
;If Dimension Exists, then get the number of "Rows"
$iUboundDim = UBound($aArray, 1);Get Number of "Rows"
If @error Then Return SetError(3, 0, 0);2 = Array dimension is invalid.

;If $iDimension Exists, And the number of "Rows" is Valid:
If $iDimension > 1 Then;Makes sure the Array dimension desired is more than 1-dimensional
Local $aArrayTmp[1];Declare blank array, which will hold the dimension declared by user
For $i = 0 To $iUboundDim - 1;Loop through "Rows"
_ArrayAdd($aArrayTmp, $aArray[$i][$iDimension - 1]);$iDimension-1 to match Dimension
Next
_ArrayDelete($aArrayTmp, 0);Get rid of 1st-element which is blank
Else;Makes sure the Array dimension desired is 1-dimensional
;If Dimension Exists, And the number of "Rows" is Valid, and the Dimension desired is not > 1, then:
;For the Case that the array is 1-Dimensional
If UBound($aArray, 0) = 1 Then;Makes sure the Array is only 1-Dimensional
Dim $aArrayTmp[1];Declare blank array, which will hold the dimension declared by user
For $i = 0 To $iUboundDim - 1
_ArrayAdd($aArrayTmp, $aArray[$i])
Next
_ArrayDelete($aArrayTmp, 0);Get rid of 1st-element which is blank
Else;For the Case that the array is 2-Dimensional
Dim $aArrayTmp[1];Declare blank array, which will hold the dimension declared by user
For $i = 0 To $iUboundDim - 1
_ArrayAdd($aArrayTmp, $aArray[$i][$iDimension - 1]);$iDimension-1 to match Dimension
Next
_ArrayDelete($aArrayTmp, 0);Get rid of 1st-element which is blank
EndIf
EndIf
EndIf

Local $sHold;String that holds the Unique array info
For $iCC = $iBase To UBound($aArrayTmp) - 1;Loop Through array
;If Not the case that the element is already in $sHold, then add it
If Not StringInStr($vDelim & $sHold, $vDelim & $aArrayTmp[$iCC] & $vDelim, $iCase) Then _
$sHold &= $aArrayTmp[$iCC] & $vDelim
Next
If $sHold Then
$aArrayTmp = StringSplit(StringTrimRight($sHold, StringLen($vDelim)), $vDelim, 1);Split the string into an array
Return $aArrayTmp;SmOke_N's version used to Return SetError(0, 0, 0)
EndIf
Return SetError(2, 0, 0);If the script gets this far, it has failed
EndFunc;==>_ArrayUnique

But I've a lot of blank lines in text file, is there any manner to delete empty lines in text file ?

thank you for help,

m.

Share this post


Link to post
Share on other sites

Update script now with blank lines deleter.

copy & paste big test in file "test.txt" run script and create your worlist.

Find some imprecision on new _ArrayUnique() function programemd for next Autoit release,

dupes on head and bottom.

Someone teach me to use StringRegExpReplace() can't figure how manage big list of bad chars...

#include <GuiConstantsEx.au3>
#include <String.au3>
#include <Array.au3>


$file = FileOpen("test.txt", 0)

; Check if file opened for reading OK
If $file = -1 Then
    MsgBox(0, "Error", "Unable to open file.")
    Exit
elseif $file <> -1 Then
; Read in 1 character at a time until the EOF is reached

    $chars = FileRead($file)

    $chars = StringReplace($chars, ",", "")
    $chars = StringReplace($chars, ".", "")
    $chars = StringReplace($chars, "-", "")
    $chars = StringReplace($chars, "!", "")
    $chars = StringReplace($chars, "£", "")

    $chars = StringReplace($chars, "$", "")
    $chars = StringReplace($chars, "%", "")
    $chars = StringReplace($chars, "&", "")
    $chars = StringReplace($chars, "/", "")
    $chars = StringReplace($chars, "(", "")
    
    $chars = StringReplace($chars, ")", "")
    $chars = StringReplace($chars, "=", "")
    $chars = StringReplace($chars, "?", "")
    $chars = StringReplace($chars, "^", "")
    $chars = StringReplace($chars, "(", "")
    
    $chars = StringReplace($chars, "[", "")
    $chars = StringReplace($chars, "]", "")
    $chars = StringReplace($chars, "@", "")
    $chars = StringReplace($chars, "#", "")
    $chars = StringReplace($chars, "§", "")
    
    $chars = StringReplace($chars, ";", "")
    $chars = StringReplace($chars, ":", "")
    $chars = StringReplace($chars, "_", "")
    $chars = StringReplace($chars, "-", "")
    $chars = StringReplace($chars, "+", "")
    
    $chars = StringReplace($chars, "*", "")
    $chars = StringReplace($chars, chr(34), "")
    $chars = StringReplace($chars, "'", "")
    $chars = StringReplace($chars, "", "")
    $chars = StringReplace($chars, "\", "")

    $chars = StringReplace($chars, "\", "")
    $chars = StringReplace($chars, "<", "")
    $chars = StringReplace($chars, ">", "")
    $chars = StringReplace($chars, "»", "")
    $chars = StringReplace($chars, "", "")
    
    $chars = StringReplace($chars, "", "")
    $chars = StringReplace($chars, "»", "")
    $chars = StringReplace($chars, "©", "")
    $chars = StringReplace($chars, "", "")
    $chars = StringReplace($chars, "", "")

    $chars = StringReplace($chars, "", "")
    $chars = StringReplace($chars, "?", "")
    $chars = StringReplace($chars, "", "")
    $chars = StringReplace($chars, "", "")
    $chars = StringReplace($chars, "", "")
    $chars = StringReplace($chars, "", "")
    $chars = StringReplace($chars, "", "")
    $chars = StringReplace($chars, "", "")
    $chars = StringReplace($chars, "", "")
    $chars = StringReplace($chars, "", "")
    $chars = StringReplace($chars, "", "")
    $chars = StringReplace($chars, "", "")
    $chars = StringReplace($chars, "?", "")
    $chars = StringReplace($chars, "?", "")
    $chars = StringReplace($chars, "?", "")
    $chars = StringReplace($chars, "", "")
    $chars = StringReplace($chars, "", "")
    $chars = StringReplace($chars, "", "")
    $chars = StringReplace($chars, "", "")
    $chars = StringReplace($chars, "", "")
    $chars = StringReplace($chars, "", "")
    $chars = StringReplace($chars, "", "")
    $chars = StringReplace($chars, "", "")
    $chars = StringReplace($chars, "", "")
    $chars = StringReplace($chars, "", "")
    $chars = StringReplace($chars, "", "")
    $chars = StringReplace($chars, "", "")
    $chars = StringReplace($chars, "?", "")
    $chars = StringReplace($chars, "¡", "")
    $chars = StringReplace($chars, "¢", "")
    $chars = StringReplace($chars, "£", "")
    $chars = StringReplace($chars, "¤", "")
    $chars = StringReplace($chars, "¥", "")
    $chars = StringReplace($chars, "¦", "")
    $chars = StringReplace($chars, "§", "")
    $chars = StringReplace($chars, "¨", "")
    $chars = StringReplace($chars, "©", "")
    $chars = StringReplace($chars, "ª", "")
    $chars = StringReplace($chars, "«", "")
    $chars = StringReplace($chars, "¬", "")
    $chars = StringReplace($chars, "­", "")
    $chars = StringReplace($chars, "®", "")
    $chars = StringReplace($chars, "¯", "")
    $chars = StringReplace($chars, "°", "")
    $chars = StringReplace($chars, "±", "")
    $chars = StringReplace($chars, "²", "")
    $chars = StringReplace($chars, "³", "")
    $chars = StringReplace($chars, "´", "")
    $chars = StringReplace($chars, "µ", "")
    $chars = StringReplace($chars, "¶", "")
    $chars = StringReplace($chars, "·", "")
    $chars = StringReplace($chars, "¸", "")
    $chars = StringReplace($chars, "¹", "")
    $chars = StringReplace($chars, "º", "")
    $chars = StringReplace($chars, "»", "")
    $chars = StringReplace($chars, "¼", "")
    $chars = StringReplace($chars, "½", "")
    $chars = StringReplace($chars, "¾", "")
    $chars = StringReplace($chars, "¿", "")
    $chars = StringReplace($chars, "×", "")
    $chars = StringReplace($chars, "÷", "")
    
    $chars = StringReplace($chars, "0", "")
    $chars = StringReplace($chars, "1", "")
    $chars = StringReplace($chars, "2", "")
    $chars = StringReplace($chars, "3", "")
    $chars = StringReplace($chars, "4", "")
    $chars = StringReplace($chars, "5", "")
    $chars = StringReplace($chars, "6", "")
    $chars = StringReplace($chars, "7", "")
    $chars = StringReplace($chars, "8", "")
    $chars = StringReplace($chars, "9", "")
;-------------------------------------------------------- create array
    ToolTip("Create array")
    $array = StringSplit($chars, " ")
;~  _ArraySort($Array, 0, 0, 0, 0)
;~  _ArrayDisplay($array, "BEFORE : with dupes")
;-------------------------------------------------------- order in array - delete dupes
    ToolTip("delete dupes - sorting...")
    _ArraySort($Array, 0, 0, 0, 0)
    
    ToolTip("delete dupes")
    $array = _ArrayUnique($array)
    
    ToolTip("delete dupes - sorting...")
    _ArraySort($Array, 0, 0, 0, 0)
;~  _ArrayRemoveBlanks($Array)
;~  _ArrayDisplay($array, "Returned Word List")
;-------------------------------------------------------- write wordlist
    ToolTip("write wordlist")
    FileDelete("test_sorted.txt")
    $file_sorted = FileOpen("test_sorted.txt", 2)
    
    for $i = 1 to UBound($array) - 1
        FileWrite($file_sorted, $array[$i] & @CRLF)
    next
    
    FileClose($file)
    FileClose($file_sorted)
;-------------------------------------------------------- clean blank lines
ToolTip("clean blank lines")
    $file = FileOpen("test_sorted.txt", 0)

    FileDelete("Wordlist_from_text.txt")
    $file_sorted = FileOpen("Wordlist_from_text.txt", 2)

; Read in lines of text until the EOF is reached
    While 1
        $line = FileReadLine($file)
        If @error = -1 Then ExitLoop
        $line1 = StringStripWS($line,1)
        $line1 = StringStripCR($line1)

        $line1 = StringReplace($line1, "|", "")
        $line1 = StringReplace($line1, "0", "")
        $line1 = StringReplace($line1, "1", "")
        $line1 = StringReplace($line1, "2", "")
        $line1 = StringReplace($line1, "3", "")
        $line1 = StringReplace($line1, "4", "")
        $line1 = StringReplace($line1, "5", "")
        $line1 = StringReplace($line1, "6", "")
        $line1 = StringReplace($line1, "7", "")
        $line1 = StringReplace($line1, "8", "")
        $line1 = StringReplace($line1, "9", "")

        if $line1 <> "" Then
            FileWrite($file_sorted, $line1 & @CRLF)
        EndIf
    Wend


EndIf

ToolTip("")
FileClose($file)
FileClose($file_sorted)
FileDelete("test_sorted.txt")

;~      $mynewarray = dupecheckerthingy($array)
;~      _ArrayDisplay($mynewarray, "AFTER : without dupes")

;~ Func dupecheckerthingy($showmethearray)
;~   Local $tmparray[1] = ['']
;~   For $i = 1 To UBound($showmethearray) - 1
;~       _ArraySearch($tmparray, $showmethearray[$i])
;~       If @error Then _ArrayAdd($tmparray, $showmethearray[$i])
;~   Next
;~   $tmparray[0] = 'Number of elements = ' & UBound($tmparray) - 1
;~   Return $tmparray
;~ EndFunc



Func _ArrayUnique($aArray, $iDimension = 1, $iBase = 0, $iCase = 0, $vDelim = "|")
Local $iUboundDim
;$aArray used to be ByRef, but litlmike altered it to allow for the choosing of 1 Array Dimension, without altering the original array
If $vDelim = "|" Then $vDelim = Chr(01); by SmOke_N, modified by litlmike
If Not IsArray($aArray) Then Return SetError(1, 0, 0);Check to see if it is valid array

;Checks that the given Dimension is Valid
If Not $iDimension > 0 Then
Return SetError(3, 0, 0);Check to see if it is valid array dimension, Should be greater than 0
Else
;If Dimension Exists, then get the number of "Rows"
$iUboundDim = UBound($aArray, 1);Get Number of "Rows"
If @error Then Return SetError(3, 0, 0);2 = Array dimension is invalid.

;If $iDimension Exists, And the number of "Rows" is Valid:
If $iDimension > 1 Then;Makes sure the Array dimension desired is more than 1-dimensional
Local $aArrayTmp[1];Declare blank array, which will hold the dimension declared by user
For $i = 0 To $iUboundDim - 1;Loop through "Rows"
_ArrayAdd($aArrayTmp, $aArray[$i][$iDimension - 1]);$iDimension-1 to match Dimension
Next
_ArrayDelete($aArrayTmp, 0);Get rid of 1st-element which is blank
Else;Makes sure the Array dimension desired is 1-dimensional
;If Dimension Exists, And the number of "Rows" is Valid, and the Dimension desired is not > 1, then:
;For the Case that the array is 1-Dimensional
If UBound($aArray, 0) = 1 Then;Makes sure the Array is only 1-Dimensional
Dim $aArrayTmp[1];Declare blank array, which will hold the dimension declared by user
For $i = 0 To $iUboundDim - 1
_ArrayAdd($aArrayTmp, $aArray[$i])
Next
_ArrayDelete($aArrayTmp, 0);Get rid of 1st-element which is blank
Else;For the Case that the array is 2-Dimensional
Dim $aArrayTmp[1];Declare blank array, which will hold the dimension declared by user
For $i = 0 To $iUboundDim - 1
_ArrayAdd($aArrayTmp, $aArray[$i][$iDimension - 1]);$iDimension-1 to match Dimension
Next
_ArrayDelete($aArrayTmp, 0);Get rid of 1st-element which is blank
EndIf
EndIf
EndIf

Local $sHold;String that holds the Unique array info
For $iCC = $iBase To UBound($aArrayTmp) - 1;Loop Through array
;If Not the case that the element is already in $sHold, then add it
If Not StringInStr($vDelim & $sHold, $vDelim & $aArrayTmp[$iCC] & $vDelim, $iCase) Then _
$sHold &= $aArrayTmp[$iCC] & $vDelim
Next
If $sHold Then
$aArrayTmp = StringSplit(StringTrimRight($sHold, StringLen($vDelim)), $vDelim, 1);Split the string into an array
Return $aArrayTmp;SmOke_N's version used to Return SetError(0, 0, 0)
EndIf
Return SetError(2, 0, 0);If the script gets this far, it has failed
EndFunc;==>_ArrayUnique

Share this post


Link to post
Share on other sites

I already gave you one that gets most of them. Give me a list of the characters that you need replaced and I'll work on it some more.


George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

ask because can't figure how this func works :

$sStr = FileRead($file)
    $chars = StringRegExpReplace($sStr, "\d|\x22|,|.|-|!|£|$|%|&|/|(|)|=|?|^|(|[|]|@|#|§|;|:|_|-|+|*|chr(34)|'||\|\|<|>|»|||»|©||||?|||||||||||?|?|?|||||||||||||?|¡|¢|£|¤|¥|¦|§|¨|©|ª|«|¬|®|¯|°|±|²|³|´|µ|¶|·|¸|¹|º|»|¼|½|¾|¿|×|÷|", "")

thank you,

m.

(sorry for dupes)

Share this post


Link to post
Share on other sites

ask because can't figure how this func works :

$sStr = FileRead($file)
    $chars = StringRegExpReplace($sStr, "\d|\x22|,|.|-|!|£|$|%|&|/|(|)|=|?|^|(|[|]|@|#|§|;|:|_|-|+|*|chr(34)|'||\|\|<|>|»|||»|©||||?|||||||||||?|?|?|||||||||||||?|¡|¢|£|¤|¥|¦|§|¨|©|ª|«|¬|®|¯|°|±|²|³|´|µ|¶|·|¸|¹|º|»|¼|½|¾|¿|×|÷|", "")

thank you,

m.

(sorry for dupes)

When using a RegExp, The "|" symbol means OR, \d means any digit, I used \x22 to replace Chr(34) (double quote). It's not a good idea to replace Chr(39) (single quote) because of words like I'm, you're &Etc. So what it's doing is replacing any of the characters in that RegExp with a blank string. When using RegExp you can not use the literal string "Chr(34)" and like I said, I did that with \x22, although "" should also have worked. Here it is with your Chr(34) and the single quote removed

$chars = StringRegExpReplace($sStr, "\d|\x22|,|.|-|!|£|$|%|&|/|(|)|=|?|^|(|[|]|@|#|§|;|:|_|-|+|*||\|\|<|>|»|||»|©||||?|||||||||||?|?|?|||||||||||||?|¡|¢|£|¤|¥|¦|§|¨|©|ª|«|¬|®|¯|°|±|²|³|´|µ|¶|·|¸|¹|º|»|¼|½|¾|¿|×|÷|", "")

If you must remove the single quotes then I'll write anothe short one that you can use to pre-process it.


George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

#12 ·  Posted (edited)

This seems to generate a word list ok.

#include<array.au3>

$hFile = "C:\Path\somefile.txt" ; <== Enter Text file here =====
$sFile = FileRead($hFile)
$sStr = StringReplace(StringStripCR($sFile), @LF, Chr(32))
;ConsoleWrite($sStr & @CRLF)
FileClose($hFile)

; Select NO "a-z, A-Z, 0-9 or underscore (_)" Or Select "any digit (0-9)" Or Select "underscores), to be Replaced.
$sStr = StringRegExpReplace($sStr, "[^\w]|\d|_", Chr(32))
$sStr = StringStripWS($sStr, 7) ; remove all double white spaces.
;ConsoleWrite($sStr & @CRLF)
$aWords = StringSplit($sStr, Chr(32), 0)

$Numbers = "," ; Add coma to start of string for RegExp to get 1st word
For $i = 1 To UBound($aWords) - 1
    If StringRegExp($Numbers, "," & $aWords[$i] & ",") = 0 And StringStripWS($aWords[$i], 8) <> "" And _
            StringLen($aWords[$i]) > 2 Then $Numbers &= $aWords[$i] & ","
Next

$Numbers = StringTrimLeft(StringTrimRight($Numbers, 1), 1) ; Remove 1st and last comas.

$Array = StringSplit($Numbers, ",", 0)
ConsoleWrite("Number of words = " & $Array[0] & @CRLF)
_ArrayDelete($Array, 0) ;
_ArraySort($Array, 0, 0)

_ArrayDisplay($Array, "Returned Word List")

It took under 20 secs to produce 2080 word list from a 258 kb text file.

Edit: Using this, If StringRegExp($Numbers, "," & StringLower($aWords[$i]) & ",") = 0 And StringStripWS($aWords[$i], 8) <> "" And _

StringLen($aWords[$i]) > 2 Then $Numbers &= StringLower($aWords[$i]) & ","

in the For Next Loop results in all lower case returned.

Reduced word count to 1760.

Noticed New Zealand was split into two words both lower case.

Edited by Malkey

Share this post


Link to post
Share on other sites

If in the previous post the variable $Numbers caused confusion. I copied and modified the script from

http://www.autoitscript.com/forum/index.ph...st&p=609527

Here is the same script as above, but modified to allow AutoIt variable types to be added to the word list.

#include<array.au3>

;$hFile = "C:\Path\somefile.au3" ; <== Enter .au3 file here =====
$sFile = FileRead($hFile)
$sStr = StringReplace(StringStripCR($sFile), @LF, Chr(32))
ConsoleWrite($sStr & @CRLF)
FileClose($hFile)

; Select NO "a-z, A-Z, 0-9 or underscore (_) or $" Or Select "any digit (0-9)" Or Select "underscores), to be Replaced.
$sStr = StringRegExpReplace($sStr, "[^\w|$]|\d|_", Chr(32))
$sStr = StringStripWS($sStr, 7) ; remove all double white spaces.
ConsoleWrite($sStr & @CRLF)
$aWords = StringSplit($sStr, Chr(32), 0)

$sString = "," ; Add coma to start of string for RegExp to get 1st word
For $i = 1 To UBound($aWords) - 1
    If StringLeft($aWords[$i], 1) = "$" And StringRegExp($sString, ",\" & $aWords[$i] & ",") = 0 And _ ; Allows preceeding $ sign.
                                StringStripWS($aWords[$i], 8) <> "" And StringLen($aWords[$i]) > 2 Then
        $sString &= $aWords[$i] & ","
    ElseIf StringLeft($aWords[$i], 1) <> "$" And StringRegExp($sString, "," & StringLower($aWords[$i]) & ",") = 0 And _
                                                StringStripWS($aWords[$i], 8) <> "" And StringLen($aWords[$i]) > 2 Then
        $sString &= StringLower($aWords[$i]) & ","
    EndIf
Next

$sString = StringTrimLeft(StringTrimRight($sString, 1), 1) ; Remove 1st and last comas.
ConsoleWrite(StringLower("$ASD)" & @CRLF))
$Array = StringSplit($sString, ",", 0)
ConsoleWrite("Number of words = " & $Array[0] & @CRLF)
_ArrayDelete($Array, 0) ;
_ArraySort($Array, 0, 0)

_ArrayDisplay($Array, "Returned Word List")

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0