leuce Posted January 26, 2016 Posted January 26, 2016 Hello everyone I'm processing text files using AutoIt scripts, and in one such process I want to replace all standalone numbers with "00". In other words, where a number is a "word", I want the number replaced with "00". I'm using the regex \b\d+\b. However, this matches things that I do not consider "word boundaries". For example, a date 12-Jan-15 is not "three words", in my view of what a "word" is, but Autoit think so. Similarly, although "x; 123" or "y, 234" are two "words" each in my view, "x;123" or "y,234" are only one word each, but AutoIt thinks "x;123" and "y,234" are two "words" each. The solution to my problem, I think, is if I could redefine the meaning of "\b", or define some other class that uses a different letter in regex. Is there a way to do that? It would simplify my regular expressions later in the script if I could define that earlier in the script. Thanks Samuel
UEZ Posted January 26, 2016 Posted January 26, 2016 1 hour ago, leuce said: I'm processing text files using AutoIt scripts, and in one such process I want to replace all standalone numbers with "00". In other words, where a number is a "word", I want the number replaced with "00". Your statement is contradictory. Should be the result this way? "This is a test created on 12-Jan-15 with 1234 chars." -> "This is a test created on 12-Jan-15 with 00 chars." If not please give an example. Please don't send me any personal message and ask for support! I will not reply! Selection of finest graphical examples at Codepen.io The own fart smells best! ✌Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!¯\_(ツ)_/¯ ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ
jchd Posted January 26, 2016 Posted January 26, 2016 leuce, It's up to you to define precisely what you consider a "word boundary". For PCRE (AutoIt regex engine), \b has a very precise semantics and I understand it doesn't fit your need. Ask yourself what your "word boundary" really means and try it with that test, then any other text you can come up with: "Pi is a transcendental value close to 3.1415926 that is as simple as 1+2=3 or rather 1 + 2 = 3." Is a "stand-alone number" forcibly prefixed and suffixed by a space? Maybe another set of characters, or maybe depending of other rules like the final 3 above followed by a final point? Formalize your need and thetranslation into regex will certainly follow. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)
Skysnake Posted January 26, 2016 Posted January 26, 2016 You problem is Regex and not AutoIt. I agree with the sages above, your example is not clear. Perhaps post a sample of the source text with the replacements as you would EXPECT the outcome to be? Also, practice your Regex here: https://regex101.com/ Skysnake Why is the snake in the sky?
Malkey Posted January 26, 2016 Posted January 26, 2016 Here is my solution of my understanding of the problem. Local $sTest = "334 This is a 1 test created on 12-Jan-15 with 1234 characters (3.456) and x;123.321 and y: 456 and z,789." ConsoleWrite(StringRegExpReplace($sTest, "(?<=\s|^|\(|;|,|-)\d+(\.\d+)?(?=\s|$|\.|,|\)|-)", "00") & @LF) #cs This reg. exp. pattern will match "\d+" an integer or, "\d+(\.\d+)?" a decimal number, only if that number is preceeded by:- "\s" a space or a linefeed, or, "^" the number is at the beginning of the string, or, "(" an open bracket, ":" a semi-colon,"'" a coma, or, "-" a dash. And, the matching number must also be followed by:- "\s" a space or a linefeed, or, "$" the number is at the end of the string, or, "\." a dot, "," a coma, ")" a closing bracket, or, "-" a dash. Returns:- 00 This is a 00 test created on 00-Jan-00 with 00 characters (00) and x;00 and y: 00 and z,00. #ce
mikell Posted January 27, 2016 Posted January 27, 2016 On mardi 26 janvier 2016 at 10:11 AM, leuce said: For example, a date 12-Jan-15 is not "three words", in my view of what a "word" is, but Autoit think so. Malkey, your code doesn't fit the "explanations" in post #1 Probably it should be *something* like this (but the requirements are really imprecise) $s = "Pi is a transcendental value close to 3.1415926 that is on 12-Jan-15 as simple as 1+2=3 or rather 10 + 20 = 30." ConsoleWrite(StringRegExpReplace($s, '(?<=^|\s)\d*\.?\d+(?=[.,;\s]|$)', "00") )
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now