Jump to content

Best method for replacing a lot of different words in a string?


Recommended Posts

Hi All,

I'm just working through some logic, if I have a comma separated string I'm reading from a file that has say 500 unique words in it, if I want to look up and replace some of the words with another unique word if they appear in the list, what would be the best/fastest way of doing this? The replacement list might have say 250 values to swap out but would only swap them if they are in the line that is read in.

I'm assuming running StringReplace or StringRegExpReplace across them all is not necessarily the best option? I was thinking about using an array?

Any direction would be much appreciated!

I don't have any code yet to share, and this example is a bad one cause it's easy to picture it with duplicates, but an example might be:

$readFileLine = "doberman, spots, labrador, poodle, white, grey, small, pug"
$readFileLine  = StringReplace($readFileLine, "spots", "dalmation")
$readFileLine  = StringReplace($readFileLine, "grey", "wolf")
;etc

The array option I had in mind was along the lines of exploding the string from the file into an array, then for each element in the array lookup against a 2nd array using _ArraySearch and replacing it with the corresponding value in the array, using the above example:

$readFileLine = "doberman, spots, labrador, poodle, white, grey, small, pug"
$readFileArray = _StringExplode($readFileLine, ",")
$compareArray[2][2] = [["spots", "dalmation"], ["grey", "wolf]]
;Then loop through $readFileArray elements and compare each element to all the 1st elements in $compareArray and if there's a match replace $readFileArray element with the 2nd element in $compareArray
; I hope that makes sense, it's late..

Thanks!

Link to comment
Share on other sites

That's exactly how I would do it... I can't imagine there is a much better option as you'll need to check that the string is in the file before replacing it and that's what StringReplace does :) Though if you have a pattern that could match multiple values, you could use StringRegExprReplace and that could be faster. I guess it really depends on your data

All my code provided is Public Domain... but it may not work. ;) Use it, change it, break it, whatever you want.

Spoiler

My Humble Contributions:
Personal Function Documentation - A personal HelpFile for your functions
Acro.au3 UDF - Automating Acrobat Pro
ToDo Finder - Find #ToDo: lines in your scripts
UI-SimpleWrappers UDF - Use UI Automation more Simply-er
KeePass UDF - Automate KeePass, a password manager
InputBoxes - Simple Input boxes for various variable types

Link to comment
Share on other sites

I don't think it's necessary to explode the string $readFileLine in an array, on the contrary, maybe working directly on the string is even faster.
You can proceed in a way similar to this for example,

$readFileLine = "doberman, spots, labrador, poodle, white, grey, small, pug"

; replacements [find][replace]
Local $compareArray[2][2] = [["spots", "dalmation"],["grey", "wolf"]]

ConsoleWrite("Before replacements: " & $readFileLine & @CRLF)

For $i = 0 To UBound($compareArray) - 1
    $readFileLine = StringReplace($readFileLine, $compareArray[$i][0], $compareArray[$i][1])
Next

ConsoleWrite("After  replacements: " & $readFileLine & @CRLF)

anyway .... there is probably some soulful cat around here, or someone else from his regexp boss gang, that, with a regular expression, can do everything in just one shot... :P

Edited by Chimp

 

image.jpeg.9f1a974c98e9f77d824b358729b089b0.jpeg Chimp

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Link to comment
Share on other sites

Here you are:

; given this:
Local $sIn = "doberman, spots, labrador, poodle, white, grey, small, pug, greyhound"
; and this (unique replacements)
Local $aRepl = [ _
    ["spots", "dalmatian"], _
    ["chinook", "robust"], _
    ["chihuahua", "minuscule"], _
    ["grey", "wolf"], _
    ["poodle", "very big"], _
    ["husky", "very cold there"] _
]

; build that:
Local $oRepl = ObjCreate("Scripting.Dictionary")
For $i = 0 To UBound($aRepl) - 1
    $oRepl.add($aRepl[$i][0], $aRepl[$i][1])
Next

; apply recipe
Local $sOut = Execute("'" & StringRegExpReplace($sIn, "(\b\w+\b)", "' & ($oRepl.exists('$1') ? $oRepl.item('$1') : '$1') & '") & "'")
ConsoleWrite($sOut & @LF)

 

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

19 hours ago, Chimp said:

I don't think it's necessary to explode the string $readFileLine in an array, on the contrary, maybe working directly on the string is even faster.
You can proceed in a way similar to this for example,

$readFileLine = "doberman, spots, labrador, poodle, white, grey, small, pug"

; replacements [find][replace]
Local $compareArray[2][2] = [["spots", "dalmation"],["grey", "wolf"]]

ConsoleWrite("Before replacements: " & $readFileLine & @CRLF)

For $i = 0 To UBound($compareArray) - 1
    $readFileLine = StringReplace($readFileLine, $compareArray[$i][0], $compareArray[$i][1])
Next

ConsoleWrite("After  replacements: " & $readFileLine & @CRLF)

anyway .... there is probably some soulful cat around here, or someone else from his regexp boss gang, that, with a regular expression, can do everything in just one shot... :P

Ah, interesting, the variable structure should be the most beneficial for the data it's holding, in this case having the array for the comparison with multiple related elements is the best suited for the data structure but also for iterating through it's data by means of a loop, and the string doesn't need to become an array for it's contents to be modified (assuming even up to 500 words would make little performance difference between a string and a 2nd array).

Thanks :)

12 hours ago, jchd said:

Here you are:

; given this:
Local $sIn = "doberman, spots, labrador, poodle, white, grey, small, pug, greyhound"
; and this (unique replacements)
Local $aRepl = [ _
    ["spots", "dalmatian"], _
    ["chinook", "robust"], _
    ["chihuahua", "minuscule"], _
    ["grey", "wolf"], _
    ["poodle", "very big"], _
    ["husky", "very cold there"] _
]

; build that:
Local $oRepl = ObjCreate("Scripting.Dictionary")
For $i = 0 To UBound($aRepl) - 1
    $oRepl.add($aRepl[$i][0], $aRepl[$i][1])
Next

; apply recipe
Local $sOut = Execute("'" & StringRegExpReplace($sIn, "(\b\w+\b)", "' & ($oRepl.exists('$1') ? $oRepl.item('$1') : '$1') & '") & "'")
ConsoleWrite($sOut & @LF)

 

Wow, ok, let me just say, that looks impressive. Bear with me as I work through this as this is my first look at ObjCreate, and the test and pattern in the StringRegExReplace make sense but I'll need to work through the replace... *opens AutoIt help file* (can I just take a moment to say how awesome the help file is with AutoIt, you guys rock!)

Thanks! I love being challenged with new way of coding like this.

Link to comment
Share on other sites

A scripting.dictionary is an associative array (a key-value pair) object natively supported by Windows.  Looking up a key to find its associated item value is very fast.  Building a scripting.dictionary and feeding key-value pairs is often faster than manipulating AutoIt arrays thru loops (albeit arrays have their own use as well).

Once we've done that part, the rest of the job is essentially a replacement of words in input by either themselves or the alternative word stored in the dictionary, when key is present.

(\b\w+\b)

is the regexp pattern to capture a word (\b stands for "word boundary", see StringRegExp help).  When this word is present as a key in the dictionary, we replace the word by the stored alternative value, else we leave it in place.  We do that using the ternary operator <test> ? <true branch> : <false branch>

The surrounding quotes + & and the final Execute() function concatenate all string pieces together.

Keys in dictionary need to be unique, or else a test is required before .add

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

1 minute ago, WoodGrain said:

I'm guessing you need to versed in c++ or .NET to be aware of these objects?

Not at all.  I don't actually use AutoIt nowadays but still try to contribute here.

If you search the forum you'll find a huge lot of scripting.dictionary -related posts and code examples.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...