Subject: Re: Strings, Was: profile results for new UT_* implementations?
From: Andrew Dunbar (hippietrail@yahoo.com)
Date: Tue Jun 19 2001 - 22:41:37 CDT
Dom Lachowicz wrote:
>
> Quoting Paul Rohr <paul@abisource.com>:
>
> Hi Paul,
>
> I'd like to address your concerns.
>
> 1) Abi never had any string classes (nor anything even remotely resembling
> them). The closest thing that we had was either:
> a. UT_Bytebuf (ick)
> b. Our own management using malloc, free, new, delete, delete[], and all of the
> not-so-wonderful inconsistencies that came with them, not to mention keeping
> track of everything (size, pointers) and ownership of the strings, since to
> this point Abi had no clear ownership model for *any* of its pointers or
> references, save the singleton classes
Actually, I find UT_Bytebuf useful for strings. I use them in the text
importer and exporter so I can have one set of functions regardless of
whether I'm handling 8-bit or 16-bit text. And it'll work if and when
we have to handle 32-bit text.
> So now we have a nice wrapper class for C strings and UCS2 strings, which is
> nice on the eyes, easier on the programmers, and probably more effecient both
> in terms of time and space. They support a lot of nice, clean operations,
> manage memory for us effectively, etc... We no longer have strcat's in our
> code. This is a good thing.
This must have been discussed at some point, but I'll bring it up since
I've not seen it here yet. I read all of the Unicode mailing lists
and newsgroups I can and it seems everybody *hates* UCS-2. Except
maybe Microsoft (: The rest of the world are coming to grips with
using UTF-8 for interchange, and UTF-32 (UCS-4) internally. If you
know anything about surrogates you'll understand why. Many people
believe that using UCS-2, a character can always fit into one UCS-2
char. Some believe that if they pretend surrogates don't exist
they can keeping using UCS-2. But this is not true. Many characters
take more than one codepoint even in UTF-32. The major concern with
UTF-32 is that it doubles the amount of memory needed over UCS-2 ):
What's our position? We're going to have to look into it sooner or
later and it won't be fun.
Sorry if this has all been thrashed out before.
Andrew Dunbar.
-- http://linguaphile.sourceforge.net _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
This archive was generated by hypermail 2b25 : Tue Jun 19 2001 - 22:39:42 CDT