I'm not competent to give you a full answer to your question, but here's
what I do know.
None of the core development team are i18n experts. We know folks like
that, and respect the advice they give us enough to try to avoid doing
stupid things, but we know our limits.
Thus, we've tried hard to make sure that:
- the core document structures in memory are Unicode-friendly,
- we take advantage of some of expat's charset awareness,
- strings are isolated for easy translation, and
- menus and other GUI items are localizable.
Beyond that, we're looking for design and coding help from people with
significantly more i18n expertise than we have. The last thing we want to
do is spend a lot of time trying to teach ourselves enough to do it right.
So far, the only confirmation that we're on the right track at all has come
from European translators using Latin-1 characters. To our knowledge,
nobody has yet attempted to tackle anything more significant. Issues
specific to double-byte character sets haven't been addressed at all.
In particular, we haven't done anything to support the various kinds of
platform-specific input methods needed to support these languages. There
are probably other problems as well.
If you or anyone you know has the relevant i18n expertise and is interested
in helping us deal with these issues, this mailing list is the appropriate
place for discussing these issues.
Thanks,
Paul