Re: Encoding issues with literal strings (C++)
- From: Carlos Moreno <moreno_at_mochima_dot_com@xxxxxxxxxxxxxx>
- Date: Wed, 20 Dec 2006 18:06:19 -0500
John Fusco wrote:
[...]
gcc is taking more than two digits to make a string literal. In your example:
"espec\xEDficos"
Here gcc is taking the literal as 0xedf, which is out of range. The modulo value of 0xdf is what shows up in your output. [...]
Thank you SO MUCH for noticing and pointing it out!! It wouldn't
have occured to me in a million years!!! (well, ok, I'm exaggerating,
but still, thanks so much!!!!)
Again, I always thought C only uses two digits for \x escapes, so this smells like non-conformance to me.
With much horror, I have to confirm that gcc/g++ is correct, as per
the C++ ISO/IEC standard (the 1998 one --- dunno if it's going to
be changed in the next revision, or if it has been already, in the
current draft):
From 2.13.2:
"The escape \ooo consists of the backslash followed by one, two, or
three octal digits [...]. The escape \xhhh consists of the backslash
followed by x followed by one or more hexadecimal digits that are
taken to specify the value of the desired character. There is no
limit to the number of digits in a hexadecimal sequence. A sequence
of octal or hexadecimal digits is terminated by the first character
that is not an octal digit or a hexadecimal digit, respectively."
I'm completely speechless !!!
However, you can work around it by terminating the sequence with whitespace, or you can make it two strings as follows:
"espec\xED""ficos"
This one would work --- terminating with whitespace is not an option,
since the string is what it is; I can not choose to put a space or
newline or tab after the i-acute-accent character; I simply can't:
the word is "especifico" (with the acute accent in the firrst i).
But yeah, relying on the automatic concatenation of literal strings
is definitely an option --- it seems ridiculous that I would have to
do that; but again, that goes with the "I'm speechless" part, how
horrifying I find this feature, which IMHO, is more like a gratuitous
defect of the language (I guess both C and C++ share this "defect").
BTW, it would be a *really* nice feature for the text editors (e.g.,
Kwrite in KDevelop) that they highlight the three-digit sequence, no?
Thanks,
Carlos
--
.
- References:
- Encoding issues with literal strings (C++)
- From: Carlos Moreno
- Re: Encoding issues with literal strings (C++)
- From: John Fusco
- Encoding issues with literal strings (C++)
- Prev by Date: Re: Encoding issues with literal strings (C++)
- Next by Date: Re: Encoding issues with literal strings (C++)
- Previous by thread: Re: Encoding issues with literal strings (C++)
- Next by thread: Re: Encoding issues with literal strings (C++)
- Index(es):
Relevant Pages
|