Subject: Re: Strings, Was: profile results for new UT_* implementations?
From: Mike Nordell (tamlin@algonet.se)
Date: Wed Jun 20 2001 - 00:51:01 CDT
Dom Lachowicz wrote:
> Abi historically has always used UCS-2 internally to represent strings,
and as
> you note, we're beginning to run into problems with that. Dealing with
UTF-8 is
> no more pleasant than dealing with UCS-2 in my experience, but perhaps it
is
> (much) more common in the programming communtiy as a whole.
I'd say dealing with UTF-8 is _much´more of a hell:
A discussion I and Joaquin had about this in the back of the cab on the way
to the .dk party turned out that while having a document in any format
on.disk, having it in UCS-2 in memory should be _much_ easier to deal with
(only indexing on unsigned chars) than UTF-8 (indexing on... oh, we can't
index). At the moment I believe we both felt it was the way to go. At least
I still feel it's reasonable.
What are the problems? Please don't say we need more than 2^16chars.
/Mike
This archive was generated by hypermail 2b25 : Wed Jun 20 2001 - 00:50:39 CDT