Re: _mbslen vs strlen
From: Chris Vine (chris_at_cvine--nospam--.freeserve.co.uk)
Date: 08/19/05
- Next message: QNils_O=2E_Sel=E5sdal=22?=: "Re: How to identify/determine linux filesystem type by reading rawdisk?"
- Previous message: Kasper Dupont: "Re: How to identify/determine linux filesystem type by reading rawdisk?"
- In reply to: stork: "_mbslen vs strlen"
- Next in thread: Ben Hutchings: "Re: _mbslen vs strlen"
- Reply: Ben Hutchings: "Re: _mbslen vs strlen"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Fri, 19 Aug 2005 11:24:08 +0100
stork wrote:
> I am porting a C++ Windows application to Linux. For various reasons,
> many foolish, I did not use C++ strings and have C++ classes but with
> things like strlen or wcslen, etc. My Windows application was unicode
> but I have read that wchar_t in Linux is not commonly used because it
> is 32 bytes and everything in Linux is UTF-8. So I am, as an initial
> step, making my Windows stuff work with UTF-8 and do conversions behind
> the scenes calls to the W functions of Windows calls in order to do so.
> I change everything from wchar_t to char and now have to deal with the
> fallout of multibyte strings.
>
> Microsoft has a set of functions like _mbslen for multibyte strings.
> The only reference I have seen to those is in Wine, which says to me
> that such is not the GNU/Linux way. Under GNU, what does strlen
> return? The length in characters, or the length in bytes? I have read
> that NULL checking works under UTF8 because of the way the multibyte
> characters are mapped, so I think I get strlen returning the length of
> bytes.
C++ on Linux has wide character (wchar_t) characters as well as single byte
(char_t) characters - it is mandated by the standard (although the size of
wchar_t is not, and in practice it will be the size of int). So in terms
of the language you can happily use 32 bit characters and so accommodate
UCS4.
However, on Linux most Unicode-aware GUI libraries use utf8 for their user
interfaces (although internally they may implement their unicode support
using wide characters), whereas the Windows GUI interfaces use UCS4 (utf32)
and utf16.
Unicode-aware libraries for Linux would normally have functions for
converting from one codeset to another - glib, as used by GTK+ and GNOME
provides this for example. In view of this, for programs using such
libraries, it is usually best to code everything in terms of narrow
characters so that you only have to convert (if at all) for input and
output from outside the program, but it is not a requirement to do so.
strlen() will always return the number of bytes in a null terminated string.
This is nothing to do with "GNU" but is required by the standard. glib has
g_utf8_strlen() to provide the number of characters (rather than bytes) in
a utf8 null terminated string. Other libraries will have something
similar.
Chris
-- To reply by e-mail remove the --nospam-- in the address.
- Next message: QNils_O=2E_Sel=E5sdal=22?=: "Re: How to identify/determine linux filesystem type by reading rawdisk?"
- Previous message: Kasper Dupont: "Re: How to identify/determine linux filesystem type by reading rawdisk?"
- In reply to: stork: "_mbslen vs strlen"
- Next in thread: Ben Hutchings: "Re: _mbslen vs strlen"
- Reply: Ben Hutchings: "Re: _mbslen vs strlen"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|