Re: Encoding issues with literal strings (C++)



John Fusco <fusco_john@xxxxxxxxx> writes:
They're both the same problem. I'm not sure if this is a bug or not,
but gcc is taking more than two digits to make a string literal. In
your example:
"espec\xEDficos"

Here gcc is taking the literal as 0xedf, which is out of range. The
modulo value of 0xdf is what shows up in your output. I confirmed this
behavior in gcc 3.4.4.

Again, I always thought C only uses two digits for \x escapes, so this
smells like non-conformance to me. However, you can work around it by
terminating the sequence with whitespace, or you can make it two
strings as follows:
"espec\xED""ficos"

This is valid C syntax. The compiler will concatenate these two
strings and produce the correct characters.

What I would do, is to keep my sources encoded in utf-8, and just be
sure to output the HTML with the right "Content-type:...;charset..."
and META tag.


--
__Pascal Bourguignon__ http://www.informatimago.com/
Our enemies are innovative and resourceful, and so are we. They never
stop thinking about new ways to harm our country and our people, and
neither do we. -- Georges W. Bush
.



Relevant Pages