Jump to content
CrypticKiwi

StringReplace() Start Position Issue

Recommended Posts

CrypticKiwi

 

Hello

I'd like to start by saying that Autoit is one of the most useful tools ive found on  the internet and is simply amazing.

one small issue tho

im running a program that uses StringReplace() and loops millions of times and its important to be efficient cuz it takes around 46 minutes per run and at that rate it will take me days to finish.

if I use the start position in StringReplace() instead of Searchstring I get lots of reduction in loop run time but unfortunately I cant use start position because I want to replace 6 characters with 5 characters and if I use start position it will replace characters the number of the replace string which is 6 with 6

example

$text = StringReplace("this is a line of text ",11, "Large Number")   

$text2 = StringReplace("this is a line of text "," line", "Large Number")

MsgBox(0, "New string is", $text)

MsgBox(0, "New string is", $text2)

I'd like the result to be "this is a Large Number of text" using start position.

now I hope it gets improved in next version of autoit but until then anybody got the stringreplace() function code so I can edit it or anything similar?

 

 

Edited by CrypticKiwi

Share this post


Link to post
Share on other sites
water

I suggest you have a look at StringRegExpReplace.


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (2018-06-01 - Version 1.4.9.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2018-09-01 - Version 1.3.4.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
PowerPoint (2017-06-06 - Version 0.0.5.0) - Download - General Help & Support
Excel - Example Scripts - Wiki
Word - Wiki
 
Tutorials:

ADO - Wiki

 

Share this post


Link to post
Share on other sites
Melba23

CrypticKiwi,

I want to replace 6 characters with 5 characters

This sounds like it could well be a suitable job for a RegEx - can you let us have a couple of real-life "before/after" pairs so we can work out a suitable pattern.

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites
iamtheky

You dont want the string replaced starting at that position,  you want the word at that position exchanged with your determined string, which is another beast entirely.

$sText = "this is a line of text"

Msgbox(0, '' , _StrReplWordAtPos($sText , "Large Number" ,  10))

Func _StrReplWordAtPos($sString , $sReplacement ,  $WordAtPosition)

Return StringRegExpReplace($sText , '\A.{' & $WordAtPosition & '}(\w+)' , stringleft($sText , $WordAtPosition) & $sReplacement , 1)

EndFunc

 

Edited by boththose

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Share this post


Link to post
Share on other sites
SadBunny

If OP knows the string to replace is always exactly 5 characters long, then a regex replace will be much much slower than simple string operations.

$sText = "this is a line of text"

$WordAtPosition = 10

For $i = 1 To 5000000
    $text = StringReplace($sText, 11, "Large Number") ; <-- 3.99 sec (OP example, wrong result)
    $text2 = StringReplace($sText, " line", "Large Number") ; <-- 12.45 sec (OP example, correct result but too slow)
    $text3 = StringLeft($sText, 10) & "Large Number" & StringMid($sText, 15) ; <-- 7.80 sec (faster correct result)
    $text4 = StringRegExpReplace($sText, '\A.{' & $WordAtPosition & '}(\w+)', StringLeft($sText, $WordAtPosition) & "Large Number", 1) ; <-- 20.22 sec (correct result)
Next

 

Edited by SadBunny

Roses are FF0000, violets are 0000FF... All my base are belong to you.

Share this post


Link to post
Share on other sites
iamtheky

This one is doubley slow, but any position within that word will result in that word being replaced.  so for positions 11 through 15 it replaces "line" with "Large Number" (and then at 16 moves on to replacing "of".

 

#include <array.au3>

$sText = "this is a line of text"

;~ $WordAtPosition = 10
$sReplacement = "Large Number"

For $WordAtPosition = 11 To 16
$aText = stringsplit($sText , " " ,  2)
$length = 0

for $i = 0 to ubound($aText) - 1
    $length += stringlen($aText[$i]) + 1
        If $length >= $WordAtPosition Then
        $aText[$i] = $sReplacement
        msgbox(0, $WordAtPosition , _ArrayToString($aText , " "))
        exitloop
    EndIf
Next
Next

 


,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Share this post


Link to post
Share on other sites
mikell

SadBunny,

Agreed, concerning precisely the OP requirements the regex is nearly twice slower

$sText = "this is a line of text, this is a line of text" & @crlf & "this is a line of text"

$begin = TimerInit()
For $i = 1 To 5000000
    ; $text3 = StringLeft($sText, 34) & "Large Number" & StringMid($sText, 39)  ; ~12 s
     $text4 = StringRegExpReplace($sText, '(?s).{34}\K(\w+)', "Large Number", 1) ; ~21 s
Next
$diff = TimerDiff($begin)
Msgbox(0,"", $text4 & @crlf & $diff)

 

Share this post


Link to post
Share on other sites
SadBunny

If only the OP could get back here with some more specific examples of what he does and doesn't need :) (short of the C++ sourcecode of StringReplace :) )

Edited by SadBunny

Roses are FF0000, violets are 0000FF... All my base are belong to you.

Share this post


Link to post
Share on other sites
mikell

...Autoit is one of the most useful tools ive found on  the internet and is simply amazing.

​So who cares of C++ ?  :D

Edited by mikell

Share this post


Link to post
Share on other sites
CrypticKiwi

 

thanks for the replies guys

I'll try to explain more what im doing with this example

for $i = 1 to 70000000
    $Chunk = StringMid($DataFile2,$i,6)                ;Data file full of random numbers,   gets the 6 chars to replace, in this example "996340"
    $GetSearchposition =StringInStr($DataFile,$search) ;another Datafile full of random numbers, gets position for replacement in this example "12"
    $GetReplaceData = StringMid($DataFile2,$x,3)         ; what to replace with, changes every loop and always will be 3 chars, example "21@"
$text = StringReplace("5678518345299634062889808313746700781847540610",$GetSearchposition, $GetReplaceData) ;sample from text file full of random numbers replace 996340 with 21@, this is fast
$text2 = StringReplace("5678518345299634062889808313746700781847540610", $Chunk, $GetReplaceData);slow


next

 

the string to search for is different every loop and is searched for one time only so I think RegEx wont work (not sure how RegEx works tho )

im trying to replace 6 chars with 3 chars so I think If I manage to delete 3 characters it would work

or admit defeat and learn C++ :(

Edited by CrypticKiwi

Share this post


Link to post
Share on other sites
JohnOne

Still unclear, do you have example of these files?

Are they always in same format?

What type of files are we talking about?

However, if you are looping through chunks of 6 char data, then your loop should increment Step 6

EDIT:

You seem to have enough peoples interest here, so it's in your best interest to provide a working example of your code and some non-sensitive example files.

Edited by JohnOne

AutoIt Absolute Beginners    Require a serial    Pause Script    Video Tutorials by Morthawt   ipify 

Monkey's are, like, natures humans.

Share this post


Link to post
Share on other sites
Melba23

CrypticKiwi,

I have been investigating further and I believe your argument about using the location being very much faster may be true when you need only find the location once, but rather less so if you need to find the location in a different file on each pass - which is what your code snippet above suggests.  I have also found that using StringSplit seems to be the fastest correct method:

#include <StringConstants.au3>

; Replace 996340 with 21@
$sChunk = "996340"
$sReplaceData = "21@" ; These seem to be common to each example and so can be left outside the loop

$nBegin = TimerInit()
For $i = 1 To 100000
    $iLocation = StringInStr("5678518345299634062889808313746700781847540610", $sChunk) ; But this appears to be unique to each pass
    $sText0 = StringReplace("5678518345299634062889808313746700781847540610", $iLocation, $sReplaceData)
Next
ConsoleWrite($sText0 & @TAB & "- " & TimerDiff($nBegin) & " - Fail" & @CRLF) ; and fails anyway

$nBegin = TimerInit()
For $i = 1 To 100000
    $sText1 = StringReplace("5678518345299634062889808313746700781847540610", $sChunk, $sReplaceData)
Next
ConsoleWrite($sText1 & @TAB & "- " & TimerDiff($nBegin) & " - Good" & @CRLF)

$nBegin = TimerInit()
For $i = 1 To 100000
    $sText2 = StringRegExpReplace("5678518345299634062889808313746700781847540610", "(?U)^(.*)" & $sChunk & "(.*)$", "${1}" & $sReplaceData & "${2}")
Next
ConsoleWrite($sText2 & @TAB & "- " & TimerDiff($nBegin) & " - Good" & @CRLF)

$nBegin = TimerInit()
For $i = 1 To 100000
    $aSplit = StringSplit("5678518345299634062889808313746700781847540610", $sChunk, $STR_ENTIRESPLIT)
    $sText3 = $aSplit[1] & $sReplaceData & $aSplit[2]
Next
ConsoleWrite($sText3 & @TAB & "- " & TimerDiff($nBegin) & " - Best" & @CRLF)

If you can give us a little more clarity on what exactly you are doing, and some example files to play with, we might be able to give you some really good code.

M23

 


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites
SadBunny

Also... Lets do some quick estimates here.

  • It takes my i5 pc about 20 seconds to go through 5 million lines of data with OP's "too slow" method. Let's say OP has a slower machine and it takes him 30 secs.
  • OP's remark that currently his run takes 46 minutes.
  • That means there are at least roughly 5,000,000 * 2 * 46 = 460,000,000 lines of data he needs to churn through.

OP says he needs to replace 6 chars, so the lines are at least 7 chars long (or he'd just replace the entire line), meaning the file is gonna be AT LEAST upwards of 3 gigabytes big. The lines are probably longer, so this is a very conservative estimate. That's not the kind of file you want to have to do string replacement in. If this is a recurring problem, then maybe there are better ways to handle filter your source data.


Roses are FF0000, violets are 0000FF... All my base are belong to you.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×