Jump to content

C# - Text Encoding question


evilertoaster
 Share

Recommended Posts

I have a StringBuilder object that I append some text to and then convert to a string for a File.WriteAllText().

StringBuilder sb = new StringBuilder();
sb.Append("suff");
//Other sb.Append()'s

File.WriteAllText("C:\Something.txt",sb.ToString());

The problem is that one of my appends is this:

sb.Append(Convert.ToChar(151))

Which is supposed to make a "long dash". When the file gets written though, that byte comes out as 63, a question mark.

I've tried various Encoding parameters but none of them seem to work... how do I get it to preserve the character when writing?

Edited by evilertoaster
Link to comment
Share on other sites

It's trying to do it in ASCII encoding which only goes up to 128. You mean the character 151 that is part of ANSI. You should define in your code somewhere that it should use ANSI.

Something like:

StringBuilder sb = new StringBuilder();
sb.Append("stuff");

sb.Append(Encoding.Default.GetString(new byte[] { 151 }, 0, 1)); // Encoding.Default = ANSI

File.WriteAllText("something.txt",sb.ToString());

Edit: Btw, it has nothing to do with File.WriteAllText, but the problem was in how you converted the int to a character.

Edited by Manadar
Link to comment
Share on other sites

So the problem was actually the way I tried appending to the StringBuilder...makes sense I guess, I had kinda always envisioned StringBuilder to use the Default encoding by...default... (go figure).

I was in the same train of thought as Richard probably...a I tried telling the File.WriteAllText() to use ANSI encoding but it never worked.

Thanks all.

Link to comment
Share on other sites

Actually, check this out.

char longdash = Convert.ToChar(151);
// Value of longdash is: 151 "

StringBuilder sb = new StringBuilder();
sb.Append("stuff");
sb.Append(longdash);

char longdash2 = sb[5];
// Value of longdash2 is: 151 " (proves that it doesn't change by appending -- ruling out that Append is to blame)

sb.Append(Encoding.Default.GetString(new byte[] { 151 }, 0, 1));
char longdash3 = sb[6]; 
// Value of longdash3 is: 8212 -- (longdash) (correct)

File.WriteAllText("something.txt",sb.ToString());

By default, a lot of these things use Unicode UTF16 encoding. Same goes for StringBuilder and Convert.ToChar. But, Convert.ToChar doesn't seem to handle characters above the first 128, it seems to subtract these characters and count from the beginning. In this sense we get: 151 - 127 (since it's 0 based?) = 22, and U+0022 is the same as: " This is the only reason I can think of why it would end up with " instead of something else.

Edited by Manadar
Link to comment
Share on other sites

Actually, check this out.

char longdash = Convert.ToChar(151);
// Value of longdash is: 151 "

StringBuilder sb = new StringBuilder();
sb.Append("stuff");
sb.Append(longdash);

char longdash2 = sb[5];
// Value of longdash2 is: 151 " (proves that it doesn't change by appending -- ruling out that Append is to blame)

sb.Append(Encoding.Default.GetString(new byte[] { 151 }, 0, 1));
char longdash3 = sb[6]; 
// Value of longdash3 is: 8212 -- (longdash) (correct)

File.WriteAllText("something.txt",sb.ToString());

By default, a lot of these things use Unicode UTF16 encoding. Same goes for StringBuilder and Convert.ToChar. But, Convert.ToChar doesn't seem to handle characters above the first 128, it seems to subtract these characters and count from the beginning. In this sense we get: 151 - 127 (since it's 0 based?) = 22, and U+0022 is the same as: " This is the only reason I can think of why it would end up with " instead of something else.

I know close to nothing about C#, but aren't Convert.ToChar (and other) overloaded methods?

If you say byte[] then it interprets passed as bytes and if you say Int32[] then it interprets passed as 32-bit integers, and so on...

♡♡♡

.

eMyvnE

Link to comment
Share on other sites

hum *epiphany*... a char is 16 bits in C# (not 8!)...this changes a few assumptions I made when debugging for correctness.

Regardless I tested both your examples. The first one has a binary output of "73 74 75 66 66 E2 80 94" the second has

"73 74 75 66 66 C2 97 E2 80 94” neither is what I'm after. I want "73 74 75 66 66 97"... any idea how to get that?

Link to comment
Share on other sites

I know close to nothing about C#, but aren't Convert.ToChar (and other) overloaded methods?

If you say byte[] then it interprets passed as bytes and if you say Int32[] then it interprets passed as 32-bit integers, and so on...

Sounds about right, but no array support. It has 18 overloads for me, all taking a single object and converting it to a char.
Link to comment
Share on other sites

I've never used this File.WriteAllText function before. However, all you need to do is add the encoding parameter to the call. Use this overload instead. http://msdn.microsoft.com/en-us/library/ms143376.aspx

File.WriteAllText("something.txt", sb.ToString(), Encoding.ANSI);
Edited by Richard Robertson
Link to comment
Share on other sites

That doesn't work for me... what's your full code?

StringBuilder sb = new StringBuilder();
sb.Append("stuff");
sb.Append(Convert.ToChar(151));

File.WriteAllText("something.txt", sb.ToString(), Encoding.Default) //There is no Encoding.ANSI;

edit: btw, that was exactly what I tired before asking-

I've tried various Encoding parameters but none of them seem to work

Edited by evilertoaster
Link to comment
Share on other sites

That doesn't work for me... what's your full code?

StringBuilder sb = new StringBuilder();
sb.Append("stuff");
sb.Append(Convert.ToChar(151));

File.WriteAllText("something.txt", sb.ToString(), Encoding.Default) //There is no Encoding.ANSI;

edit: btw, that was exactly what I tired before asking-

Oops. Well try using '\x97' instead of the conversion. Character literals are probably better anyway. Use this with Encoding.Unicode as well.

And I know that encoding exists. ^^;

Link to comment
Share on other sites

I know close to nothing about C#, but aren't Convert.ToChar (and other) overloaded methods?

If you say byte[] then it interprets passed as bytes and if you say Int32[] then it interprets passed as 32-bit integers, and so on...

Yes, but I tested a few and they all seem to do somewhat of the same thing. I've tried 151 from a byte, a int, etc.etc. No special cases, I think the overloads are more for convenience than they have an actual purpose. Edited by Manadar
Link to comment
Share on other sites

Basing it off the appendix in the AutoIt Help file, matches this page http://www.alanwood.net/demos/ansi.html AFAIK. The actual character is somewhat irrelevant, as long as it renders as the exact byte value I'm trying to tell it. This output is being fed into another system that expects a byte value of 0x97 (151).

Edited by evilertoaster
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...