Sign in to follow this  
Followers 0
evilertoaster

C# - Text Encoding question

33 posts in this topic

#1 ·  Posted (edited)

I have a StringBuilder object that I append some text to and then convert to a string for a File.WriteAllText().

StringBuilder sb = new StringBuilder();
sb.Append("suff");
//Other sb.Append()'s

File.WriteAllText("C:\Something.txt",sb.ToString());

The problem is that one of my appends is this:

sb.Append(Convert.ToChar(151))

Which is supposed to make a "long dash". When the file gets written though, that byte comes out as 63, a question mark.

I've tried various Encoding parameters but none of them seem to work... how do I get it to preserve the character when writing?

Edited by evilertoaster

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

It's trying to do it in ASCII encoding which only goes up to 128. You mean the character 151 that is part of ANSI. You should define in your code somewhere that it should use ANSI.

Something like:

StringBuilder sb = new StringBuilder();
sb.Append("stuff");

sb.Append(Encoding.Default.GetString(new byte[] { 151 }, 0, 1)); // Encoding.Default = ANSI

File.WriteAllText("something.txt",sb.ToString());

Edit: Btw, it has nothing to do with File.WriteAllText, but the problem was in how you converted the int to a character.

Edited by Manadar

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

I'm pretty sure it's the file's encoding that is the problem. Convert.ToChar(151) shouldn't be a problem at all but I'd just use (char)151 instead.

How are you setting up the "File" object?

Edited by Richard Robertson

Share this post


Link to post
Share on other sites

So the problem was actually the way I tried appending to the StringBuilder...makes sense I guess, I had kinda always envisioned StringBuilder to use the Default encoding by...default... (go figure).

I was in the same train of thought as Richard probably...a I tried telling the File.WriteAllText() to use ANSI encoding but it never worked.

Thanks all.

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

Actually, check this out.

char longdash = Convert.ToChar(151);
// Value of longdash is: 151 "

StringBuilder sb = new StringBuilder();
sb.Append("stuff");
sb.Append(longdash);

char longdash2 = sb[5];
// Value of longdash2 is: 151 " (proves that it doesn't change by appending -- ruling out that Append is to blame)

sb.Append(Encoding.Default.GetString(new byte[] { 151 }, 0, 1));
char longdash3 = sb[6]; 
// Value of longdash3 is: 8212 -- (longdash) (correct)

File.WriteAllText("something.txt",sb.ToString());

By default, a lot of these things use Unicode UTF16 encoding. Same goes for StringBuilder and Convert.ToChar. But, Convert.ToChar doesn't seem to handle characters above the first 128, it seems to subtract these characters and count from the beginning. In this sense we get: 151 - 127 (since it's 0 based?) = 22, and U+0022 is the same as: " This is the only reason I can think of why it would end up with " instead of something else.

Edited by Manadar

Share this post


Link to post
Share on other sites

Actually, check this out.

char longdash = Convert.ToChar(151);
// Value of longdash is: 151 "

StringBuilder sb = new StringBuilder();
sb.Append("stuff");
sb.Append(longdash);

char longdash2 = sb[5];
// Value of longdash2 is: 151 " (proves that it doesn't change by appending -- ruling out that Append is to blame)

sb.Append(Encoding.Default.GetString(new byte[] { 151 }, 0, 1));
char longdash3 = sb[6]; 
// Value of longdash3 is: 8212 -- (longdash) (correct)

File.WriteAllText("something.txt",sb.ToString());

By default, a lot of these things use Unicode UTF16 encoding. Same goes for StringBuilder and Convert.ToChar. But, Convert.ToChar doesn't seem to handle characters above the first 128, it seems to subtract these characters and count from the beginning. In this sense we get: 151 - 127 (since it's 0 based?) = 22, and U+0022 is the same as: " This is the only reason I can think of why it would end up with " instead of something else.

I know close to nothing about C#, but aren't Convert.ToChar (and other) overloaded methods?

If you say byte[] then it interprets passed as bytes and if you say Int32[] then it interprets passed as 32-bit integers, and so on...


♡♡♡

.

eMyvnE

Share this post


Link to post
Share on other sites

hum *epiphany*... a char is 16 bits in C# (not 8!)...this changes a few assumptions I made when debugging for correctness.

Regardless I tested both your examples. The first one has a binary output of "73 74 75 66 66 E2 80 94" the second has

"73 74 75 66 66 C2 97 E2 80 94” neither is what I'm after. I want "73 74 75 66 66 97"... any idea how to get that?

Share this post


Link to post
Share on other sites

I know close to nothing about C#, but aren't Convert.ToChar (and other) overloaded methods?

If you say byte[] then it interprets passed as bytes and if you say Int32[] then it interprets passed as 32-bit integers, and so on...

Sounds about right, but no array support. It has 18 overloads for me, all taking a single object and converting it to a char.

Share this post


Link to post
Share on other sites

#10 ·  Posted (edited)

I've never used this File.WriteAllText function before. However, all you need to do is add the encoding parameter to the call. Use this overload instead. http://msdn.microsoft.com/en-us/library/ms143376.aspx

File.WriteAllText("something.txt", sb.ToString(), Encoding.ANSI);
Edited by Richard Robertson

Share this post


Link to post
Share on other sites

#11 ·  Posted (edited)

That doesn't work for me... what's your full code?

StringBuilder sb = new StringBuilder();
sb.Append("stuff");
sb.Append(Convert.ToChar(151));

File.WriteAllText("something.txt", sb.ToString(), Encoding.Default) //There is no Encoding.ANSI;

edit: btw, that was exactly what I tired before asking-

I've tried various Encoding parameters but none of them seem to work

Edited by evilertoaster

Share this post


Link to post
Share on other sites

That doesn't work for me... what's your full code?

StringBuilder sb = new StringBuilder();
sb.Append("stuff");
sb.Append(Convert.ToChar(151));

File.WriteAllText("something.txt", sb.ToString(), Encoding.Default) //There is no Encoding.ANSI;

edit: btw, that was exactly what I tired before asking-

Oops. Well try using '\x97' instead of the conversion. Character literals are probably better anyway. Use this with Encoding.Unicode as well.

And I know that encoding exists. ^^;

Share this post


Link to post
Share on other sites

Humm, no go:

StringBuilder sb = new StringBuilder();
sb.Append("stuff" + '\x97');
File.WriteAllText("something.txt", sb.ToString(), Encoding.Default);

Still writes a '?'

And using Encoding.Unicode makes the file...unicode (2 bytes per char), which is not what I what.

Share this post


Link to post
Share on other sites

Use UTF8 then.

Share this post


Link to post
Share on other sites

#15 ·  Posted (edited)

I know close to nothing about C#, but aren't Convert.ToChar (and other) overloaded methods?

If you say byte[] then it interprets passed as bytes and if you say Int32[] then it interprets passed as 32-bit integers, and so on...

Yes, but I tested a few and they all seem to do somewhat of the same thing. I've tried 151 from a byte, a int, etc.etc. No special cases, I think the overloads are more for convenience than they have an actual purpose. Edited by Manadar

Share this post


Link to post
Share on other sites

Use UTF8 then.

No go.

UTF8 Results in: "EF BB BF 73 74 75 66 66 C2 97".

Just for the record I've already tried every available static property of encoding as a parameter to the FileWrite method, I'm hesitant to believe the solution lies solely within FileWrite...

Share this post


Link to post
Share on other sites

Are you sure 151 is a long dash? Wolfram reports it as a black rectangle. http://www.wolframalpha.com/input/?i=ascii+151

Share this post


Link to post
Share on other sites

#18 ·  Posted (edited)

Basing it off the appendix in the AutoIt Help file, matches this page http://www.alanwood.net/demos/ansi.html AFAIK. The actual character is somewhat irrelevant, as long as it renders as the exact byte value I'm trying to tell it. This output is being fed into another system that expects a byte value of 0x97 (151).

Edited by evilertoaster

Share this post


Link to post
Share on other sites

Is there a reason you can't just write a byte array to the file rather than a string?

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0