From: Andrew Dunbar (hippietrail@yahoo.com)
Date: Wed May 08 2002 - 23:38:14 EDT
--- F J Franklin <F.J.Franklin@sheffield.ac.uk>
wrote: > > > Support is there but incomplete. Byte
sequences
> > > longer 3 bytes will cause
> > > problems, and there isn't a UTF-8 -> UCS-4
> > > conversion yet.
> >
> > Sorry to keep whining about this but it was all in
> my lost huge Unicode
> > patch over a year ago. UTF-8 sequences can be up
> to 6 bytes long. We
> > should probably leave it up to iconv anyway since
> we have to handle
> > things like overlong sequences, illegal sequences
> etc. iconv should
> > handle this. I think my implementation used the
> ByteBuf class so that
> > it could handle UCS-2 and UCS-4 properly without
> worrying about all
> > those null bytes looking like string terminators
> and stuff.
>
> Andrew, Andrew, I know. The reason why only 3-byte
> sequences are handled
> is that the routine was written to convert Abi's
> internal UCS-2. Now that
> Abi uses UCS-4 internally I'll add the code to
> handle 6-byte sequences.
>
> In general I support the use of iconv for conversion
> between encodings,
> but conversion between validated UTF-8 and UCS-4 is
> trivial and the
> [UT_]UTF8String class was designed to handle the
> conversion without
> resorting to iconv.
Okay if it's to be used with validated UTF-8 that can
never contain overlong sequences, wrongly converted
UTF-16 surrogates, etc then I agree of course. But if
it's left as a general interface you can almost
guarantee that sooner or later people are going to use
it to process strings which will have the above
oddities in them. Remember not everybody understands
the intricacies of Unicode as well as some of us do.
Unicode solves a lot of problems but there's quite a
bit of cruft in there where things can go wrong if
you're not careful.
> Ciao, Frank
>
> ps. BTW, do you know anything about the overheads of
> using various iconv
> implementations? or their thread-safety, for
> that matter? (Genuinely
> curious/worried...)
I really like the libiconv implementation. It's very
elegant. I'm not familiar with the Linux/BSD
implementation but I'm sure they're efficient too.
We don't officially support any other iconv although
people can force AbiWord to build with other system
iconvs that is up to them.
Unfortunately I don't know about thread-safety issues
but I've got in touch with the libiconv maintainer
before and he seems pretty responsive.
Andrew Dunbar.
> Francis James Franklin
> F.J.Franklin@shef.ac.uk
>
> "No, she really likes me. She told me I look like
> Britney Spears, and why
> would you say that to somebody you don't like?"
>
> --- Elle Woods
>
>
=====
http://linguaphile.sourceforge.net http://www.abisource.com
__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com
This archive was generated by hypermail 2.1.4 : Wed May 08 2002 - 23:41:05 EDT