Subject: UCS-2 vs. UCS-4
From: Mike Nordell (tamlin@algonet.se)
Date: Fri Jun 22 2001 - 00:47:42 CDT
Please see this post as more-or-less brainstorming.
It seems that currently all (?) of us don't use anything larger than UCS-2,
but in a not too distand future perhaps we will have to use 2^32 for
character representations (makes me whish for plain ASCII and console-mode
again - I sure as hell don't want to keep track of 4 _billion_ chars).
I don't know if this is a problem already, but if it is; what about creating
a factory for encoding? Like:
ASCII_Factory
UTF8_Factory
UCS2_Factory
UCS4_Factory
and let them return objects that can handle (what to the outside looks like
a linked list of "void*") the chars from a document (or piece table or
whatever, I'm not sure at what level this should be implemented)?
My idea was something like:
Start at ASCII. If someone enter an outside-ASCII-range char the
document is "upgraded" to the nect level that can handle that type of chars.
When saving, check what max "level" is used, and save using that one.
Example: If someone used 16-bit chars but entered a UCS-4 char, the engine
would "upgrade" the full document [1] to UCS-4. When saving, if those
specific characters were removed, it would "back down" to UCS-2.
/Mike
[1] Perhaps it would be possible, even preferrable, to keep this on a
"paragraph" level?
This archive was generated by hypermail 2b25 : Fri Jun 22 2001 - 10:18:19 CDT