Subject: Re: RemapGlyph()
From: Andrew Dunbar (hippietrail@yahoo.com)
Date: Thu Jun 21 2001 - 00:16:37 CDT
WJCarpenter wrote:
>
> >> Can somebody please explain the role of GR_Graphics::remapGlyph()?
> >> It converts zero-width characters into "degree" symbols. This is
> >> the cause of Bug 1518. Why do we do this?
>
> Sorry, I neglected to address the part about bug 1518 in my last
> reponse.
>
> remapGlyphs is not the cause of the problem in bug 1518. It is the
> cause of the *described symptoms* of bug 1518, but the actual bug lies
> elsewhere. If you have a look at the sample file that is attached
> with the bug (that is, look at it in a text editor), you'll see some
> extra junk after each of the characters that gets a degree symbol
> after in AbiWord. It's the extra junk that is being rendered as a
> degree symbol by remapGlyphs, after AbiWord has rendered the 16-bit
> characters in question (though it renders them with the wrong
> diacritical markings in most cases).
>
> I don't know how the sample document was created, but if I carefully
> cut-and-paste copies of the correctly-displayed glyphs and save the
> file (using the 4 Jun nightly bidi Windows build from the web site),
> the original stuff in the sample document still has the extra junk,
> and the pasted copies don't have the extra junk (they also displayed
> properly).
Hehe contrary to popular belief, not all characters in Unicode
require a single codepoint. Vietnamese is the prime example.
This extra junk is the tone marks and they are essential.
Vietnamese has 6 vowels and 6 tones making 36 needed characters
not including consonants! Too many for 8 bits. That's where
combining characters some in...
> If you open the sample document with MSWindows Notepad or MSWindows
> Wordpad, you'll see little hollow rectangles in the same places that
> AbiWord puts the 0xB0 degree symbol, presumably for the same reason.
> If you open the document with MSWord2000, it will ask you what
> encoding you want. If you pick UTF-8, things look fine (I guess this
> is why the bug reports comment about pasting to MSWord seemed OK). If
> you pick a different encoding, things looks various shades of not
> fine.
The file was created by typing into AbiWord with a Vietnamese keyboard
on Windows 2000. It is UTF-8. MSWord and Abi also support Windows
code page 1258 Vietnamese and VISCII encodings. Abi also supports
TCVN which also preserves the "junk" (:
> So, in summary, I think the root cause of bug 1518 is in whatever
> thing created the sample document attached to the bug, or possibly the
> importer code is not doing the right thing.
I've provided more technical details in another post I comosed offline.
Andrew.
-- http://linguaphile.sourceforge.net _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
This archive was generated by hypermail 2b25 : Thu Jun 21 2001 - 00:14:39 CDT