Best method for replacing a lot of different words in a string?

WoodGrain · August 19, 2019

Hi All,

I'm just working through some logic, if I have a comma separated string I'm reading from a file that has say 500 unique words in it, if I want to look up and replace some of the words with another unique word if they appear in the list, what would be the best/fastest way of doing this? The replacement list might have say 250 values to swap out but would only swap them if they are in the line that is read in.

I'm assuming running StringReplace or StringRegExpReplace across them all is not necessarily the best option? I was thinking about using an array?

Any direction would be much appreciated!

I don't have any code yet to share, and this example is a bad one cause it's easy to picture it with duplicates, but an example might be:

$readFileLine = "doberman, spots, labrador, poodle, white, grey, small, pug"
$readFileLine  = StringReplace($readFileLine, "spots", "dalmation")
$readFileLine  = StringReplace($readFileLine, "grey", "wolf")
;etc

The array option I had in mind was along the lines of exploding the string from the file into an array, then for each element in the array lookup against a 2nd array using _ArraySearch and replacing it with the corresponding value in the array, using the above example:

$readFileLine = "doberman, spots, labrador, poodle, white, grey, small, pug"
$readFileArray = _StringExplode($readFileLine, ",")
$compareArray[2][2] = [["spots", "dalmation"], ["grey", "wolf]]
;Then loop through $readFileArray elements and compare each element to all the 1st elements in $compareArray and if there's a match replace $readFileArray element with the 2nd element in $compareArray
; I hope that makes sense, it's late..

Thanks!

seadoggie01 · August 19, 2019

That's exactly how I would do it... I can't imagine there is a much better option as you'll need to check that the string is in the file before replacing it and that's what StringReplace does Though if you have a pattern that could match multiple values, you could use StringRegExprReplace and that could be faster. I guess it really depends on your data

Gianni · August 19, 2019

I don't think it's necessary to explode the string $readFileLine in an array, on the contrary, maybe working directly on the string is even faster.
You can proceed in a way similar to this for example,

$readFileLine = "doberman, spots, labrador, poodle, white, grey, small, pug"

; replacements [find][replace]
Local $compareArray[2][2] = [["spots", "dalmation"],["grey", "wolf"]]

ConsoleWrite("Before replacements: " & $readFileLine & @CRLF)

For $i = 0 To UBound($compareArray) - 1
    $readFileLine = StringReplace($readFileLine, $compareArray[$i][0], $compareArray[$i][1])
Next

ConsoleWrite("After  replacements: " & $readFileLine & @CRLF)

anyway .... there is probably some soulful cat around here, or someone else from his regexp boss gang, that, with a regular expression, can do everything in just one shot...

Edited August 19, 2019 by Chimp

FrancescoDiMuro · August 19, 2019

@WoodGrain

Post a more complete example, please

jchd · August 19, 2019

Here you are:

; given this:
Local $sIn = "doberman, spots, labrador, poodle, white, grey, small, pug, greyhound"
; and this (unique replacements)
Local $aRepl = [ _
    ["spots", "dalmatian"], _
    ["chinook", "robust"], _
    ["chihuahua", "minuscule"], _
    ["grey", "wolf"], _
    ["poodle", "very big"], _
    ["husky", "very cold there"] _
]

; build that:
Local $oRepl = ObjCreate("Scripting.Dictionary")
For $i = 0 To UBound($aRepl) - 1
    $oRepl.add($aRepl[$i][0], $aRepl[$i][1])
Next

; apply recipe
Local $sOut = Execute("'" & StringRegExpReplace($sIn, "(\b\w+\b)", "' & ($oRepl.exists('$1') ? $oRepl.item('$1') : '$1') & '") & "'")
ConsoleWrite($sOut & @LF)

WoodGrain · August 20, 2019

19 hours ago, Chimp said:
I don't think it's necessary to explode the string $readFileLine in an array, on the contrary, maybe working directly on the string is even faster.
You can proceed in a way similar to this for example,
$readFileLine = "doberman, spots, labrador, poodle, white, grey, small, pug"

; replacements [find][replace]
Local $compareArray[2][2] = [["spots", "dalmation"],["grey", "wolf"]]

ConsoleWrite("Before replacements: " & $readFileLine & @CRLF)

For $i = 0 To UBound($compareArray) - 1
    $readFileLine = StringReplace($readFileLine, $compareArray[$i][0], $compareArray[$i][1])
Next

ConsoleWrite("After  replacements: " & $readFileLine & @CRLF)
anyway .... there is probably some soulful cat around here, or someone else from his regexp boss gang, that, with a regular expression, can do everything in just one shot...

Ah, interesting, the variable structure should be the most beneficial for the data it's holding, in this case having the array for the comparison with multiple related elements is the best suited for the data structure but also for iterating through it's data by means of a loop, and the string doesn't need to become an array for it's contents to be modified (assuming even up to 500 words would make little performance difference between a string and a 2nd array).

Thanks

12 hours ago, jchd said:

Here you are:

; given this:
Local $sIn = "doberman, spots, labrador, poodle, white, grey, small, pug, greyhound"
; and this (unique replacements)
Local $aRepl = [ _
    ["spots", "dalmatian"], _
    ["chinook", "robust"], _
    ["chihuahua", "minuscule"], _
    ["grey", "wolf"], _
    ["poodle", "very big"], _
    ["husky", "very cold there"] _
]

; build that:
Local $oRepl = ObjCreate("Scripting.Dictionary")
For $i = 0 To UBound($aRepl) - 1
    $oRepl.add($aRepl[$i][0], $aRepl[$i][1])
Next

; apply recipe
Local $sOut = Execute("'" & StringRegExpReplace($sIn, "(\b\w+\b)", "' & ($oRepl.exists('$1') ? $oRepl.item('$1') : '$1') & '") & "'")
ConsoleWrite($sOut & @LF)

Wow, ok, let me just say, that looks impressive. Bear with me as I work through this as this is my first look at ObjCreate, and the test and pattern in the StringRegExReplace make sense but I'll need to work through the replace... *opens AutoIt help file* (can I just take a moment to say how awesome the help file is with AutoIt, you guys rock!)

Thanks! I love being challenged with new way of coding like this.

jchd · August 20, 2019

A scripting.dictionary is an associative array (a key-value pair) object natively supported by Windows. Looking up a key to find its associated item value is very fast. Building a scripting.dictionary and feeding key-value pairs is often faster than manipulating AutoIt arrays thru loops (albeit arrays have their own use as well).

Once we've done that part, the rest of the job is essentially a replacement of words in input by either themselves or the alternative word stored in the dictionary, when key is present.

(\b\w+\b)

is the regexp pattern to capture a word (\b stands for "word boundary", see StringRegExp help). When this word is present as a key in the dictionary, we replace the word by the stored alternative value, else we leave it in place. We do that using the ternary operator <test> ? <true branch> : <false branch>

The surrounding quotes + & and the final Execute() function concatenate all string pieces together.

Keys in dictionary need to be unique, or else a test is required before .add

Edited August 20, 2019 by jchd

WoodGrain · August 20, 2019

An associative array :yes: of course! Makes perfect sense to use that!
I'm guessing you need to versed in c++ or .NET to be aware of these objects?

Also, learned about ternary operators today :robot: 👍 thanks.

jchd · August 20, 2019

1 minute ago, WoodGrain said:

I'm guessing you need to versed in c++ or .NET to be aware of these objects?

Not at all. I don't actually use AutoIt nowadays but still try to contribute here.

If you search the forum you'll find a huge lot of scripting.dictionary -related posts and code examples.

Sign In

Best method for replacing a lot of different words in a string?

Recommended Posts

WoodGrain

Link to comment

Share on other sites

seadoggie01

Link to comment

Share on other sites

Gianni

Link to comment

Share on other sites

FrancescoDiMuro

Link to comment

Share on other sites

jchd

Link to comment

Share on other sites

WoodGrain

Link to comment

Share on other sites

jchd

Link to comment

Share on other sites

WoodGrain

Link to comment

Share on other sites

jchd

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Recently Browsing 0 members

Browse

AutoIt Resources

Release

Beta