Re: Encoding issues with literal strings (C++)



John Fusco wrote:

[...]
gcc is taking more than two digits to make a string literal. In your example:
"espec\xEDficos"

Here gcc is taking the literal as 0xedf, which is out of range. The modulo value of 0xdf is what shows up in your output. [...]


Thank you SO MUCH for noticing and pointing it out!! It wouldn't
have occured to me in a million years!!! (well, ok, I'm exaggerating,
but still, thanks so much!!!!)


Again, I always thought C only uses two digits for \x escapes, so this smells like non-conformance to me.

With much horror, I have to confirm that gcc/g++ is correct, as per
the C++ ISO/IEC standard (the 1998 one --- dunno if it's going to
be changed in the next revision, or if it has been already, in the
current draft):

From 2.13.2:

"The escape \ooo consists of the backslash followed by one, two, or
three octal digits [...]. The escape \xhhh consists of the backslash
followed by x followed by one or more hexadecimal digits that are
taken to specify the value of the desired character. There is no
limit to the number of digits in a hexadecimal sequence. A sequence
of octal or hexadecimal digits is terminated by the first character
that is not an octal digit or a hexadecimal digit, respectively."


I'm completely speechless !!!

However, you can work around it by terminating the sequence with whitespace, or you can make it two strings as follows:
"espec\xED""ficos"

This one would work --- terminating with whitespace is not an option,
since the string is what it is; I can not choose to put a space or
newline or tab after the i-acute-accent character; I simply can't:
the word is "especifico" (with the acute accent in the firrst i).

But yeah, relying on the automatic concatenation of literal strings
is definitely an option --- it seems ridiculous that I would have to
do that; but again, that goes with the "I'm speechless" part, how
horrifying I find this feature, which IMHO, is more like a gratuitous
defect of the language (I guess both C and C++ share this "defect").

BTW, it would be a *really* nice feature for the text editors (e.g.,
Kwrite in KDevelop) that they highlight the three-digit sequence, no?

Thanks,

Carlos
--
.



Relevant Pages

  • Re: MD5 and MS Access
    ... typically represented as a sequence of 32 hexadecimal digits. ... following code to convert the byte array to a hex string: ...
    (microsoft.public.dotnet.framework.aspnet)
  • Re: =SDC= Q26. The hobgoblin
    ... R H Draney broadcast on alt.usage.english: ... With the added information that this sequence ends here, ... after first establishing that a number of six hexadecimal digits ...
    (alt.usage.english)
  • Re: is this compiler diagnostic legal?
    ... sequence of at least 8 hexadecimal digits, ... padded with leading zeros if necessary. ... The compiler is correct. ...
    (comp.lang.c)