leuce

Please help me find/replace a character

9 posts in this topic

#1 ·  Posted (edited)

Hello everyone

In the attachment is a plain text file in UTF8 (without byte order mark) with a troublesome character between <seg> and </seg>.  I'm trying to get an AutoIt script to find that character in a text file and replace it with something else, but it resists all my efforts.  This character is so elusive that I can't even see it in my text editor, and my cursor doesn't stop at the character when I use the left/right arrow keys -- however, if I open the file in MS Word, then I can see it.  I have no idea how to encode this character so that AutoIt can find and replace it.

I hope someone here can help identify it and tell me how to find/replace it.

In MS Word, the character is called an "optional hyphen" and is displayed in MS Word as "¬", but the character itself is not "¬" (that's just MS Word's way of displaying it).

Thanks

Samuel

PS. I zipped the file to ensure that it doesn't break when I upload it (sorry, I'm not sure if the current forum software will read the attachment as text or binary, so I zipped it to ensure it's binary).

character.zip

Edited by leuce

Share this post


Link to post
Share on other sites



I found a text editor that can see the character, namely the clunky old favourite, Babelpad.  Babelpad identified the character as "U+001F INFORMATION SEPARATOR ONE".  So, my question has become: how can I tell AutoIt to find U+001F and replace it with e.g. "ASDF"?  Thanks.

Share this post


Link to post
Share on other sites

Found it.  Babelpad gives me the hex Unicode code, so I have to use Google to find the decimal Unicode code, and then add it to a variable using ChrW().  In the case of 001F, the decimal code is 31, so I was able to "find" the character in my AutoIt script using ChrW (31).

1 person likes this

Share this post


Link to post
Share on other sites

Wow! Great! I got the info which I need for my project :)

 

Thanks @leuce! TD :D


AutoIt.4.Life Clubrooms - Life is like a Donut (secret key)

Spoiler

My contributions to the AutoIt Community

Some messages & Apologizes:

If I hurt you, Please accept my apologies, I never (regardless of the situation) mean to hurt anybody!!!

Also, I am very busy with my project so I will appear in the last row of the online list, if you want to contact me: Email@TheDcoder.xyz

Or you can have a nice chat with me in freenode, I use the same nick on freenode too!

3fHNZJ.gif

PLEASE JOIN ##AutoIt AND HELP THE IRC AUTOIT COMMUNITY!

Share this post


Link to post
Share on other sites

leuce,

Both SciTE and NotePad++ show the control character in its abbreviated form (US = Unit separator). Also you can as well do the following:

#include <Array.au3>
Local $s = "<seg></seg>"  ; your text file
Local $a = StringToASCIIArray($s)
_ArrayDisplay($a)

 


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

Both SciTE and NotePad++ show the control character in its abbreviated form (US = Unit separator).

Thanks.  Does SciTE or Notepad++ also tell you that the decimal Unicode code is 31?

Share this post


Link to post
Share on other sites

You just have to use AutoIt help: Appendix > ASCII characters > Control characters and there > US : 31 / 1F / 037 / Unit separator


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

Found it.  Babelpad gives me the hex Unicode code, so I have to use Google to find the decimal Unicode code,

No you don't, ChrW will accept a hex number as well as a decimal number, they're just numbers they aren't special.

ConsoleWrite(ChrW(0x41) & @CRLF) ; decimal 65, letter A

 


If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now