Re: Encoding issues with literal strings (C++)



Pascal Bourguignon wrote:
John Fusco <fusco_john@xxxxxxxxx> writes:

They're both the same problem. I'm not sure if this is a bug or not,
but gcc is taking more than two digits to make a string literal. In
your example:
"espec\xEDficos"

Here gcc is taking the literal as 0xedf, which is out of range. The
modulo value of 0xdf is what shows up in your output. I confirmed this
behavior in gcc 3.4.4.

Again, I always thought C only uses two digits for \x escapes, so this
smells like non-conformance to me. However, you can work around it by
terminating the sequence with whitespace, or you can make it two
strings as follows:
"espec\xED""ficos"

This is valid C syntax. The compiler will concatenate these two
strings and produce the correct characters.


What I would do, is to keep my sources encoded in utf-8, and just be
sure to output the HTML with the right "Content-type:...;charset..."
and META tag.

Except that who said that the output is HTML?

If it was HTML, I'd rather encode it in HTML, and not in UTF-8;
that is, I would have written that as: espec&iacute;ficos (which
is, BTW, how I write it whenever I need to write an HTML document
containing Spanish text).

I was, in fact, considering the possibility of having my application
decode (at run-time) the literal string containing HTML entities,
or some other encoding; even URL-encoding, perhaps --- just a %
instead of a \x .

Thanks,

Carlos
--
.



Relevant Pages

  • Re: inserting line breaks in text quoted in javascript
    ... The functions for both the image swap and the text swap are ... in the head of the html, ... The string must end on 4 digits for this to work (which should be ...
    (comp.lang.javascript)
  • Re: How to convert extra long strings into their equivalent Hex Strings in VBA (Word 2K)
    ... numbers (upto 18 digits max) into its equivalent Hex String ... Public Function ExpressServiceCode(ByVal ServiceTag As String) As String ... 'the number dblTemp in the specified base, ... Dim lngTemp As Long ...
    (microsoft.public.vb.general.discussion)
  • Re: BigNum -- Floating Point
    ... The 'N' is the number of decimal digits. ... The internal representation is really just a string of bits. ... the number of shifts for various multiples of ten: ... The 'exponent' is very closely related to ...
    (comp.programming)
  • Patterns in pi, copyright law, and philosophy
    ... Whether the offset of a string found in pi can be used as a form on ... then calculate how many digits of pi one would need to raise the ... "An infinite series of numbers is not an exhaustive set of numbers. ... finite string of digits occurs within the decimal expansion of pi, ...
    (sci.math)
  • Re: Patterns in pi, copyright law, and philosophy
    ... The digits of pi are widely believed to be "normal", in the sense that every n digit combination of digits is equally likely, and pass all reasonable tests of randomness. ... The mean length of the offset must at least be equal or greater than the length of the string you are looking for. ... Pi has an infinite representation as a decimal. ... finite string of digits occurs within the decimal expansion of pi, ...
    (sci.math)