Subject: Re: RemapGlyph()
From: Andrew Dunbar (hippietrail@yahoo.com)
Date: Wed Jun 20 2001 - 23:53:16 CDT
> ad> Can somebody please explain the role of GR_Graphics::remapGlyph()?
> ad> It converts zero-width characters into "degree" symbols. This is
> ad> the cause of Bug 1518. Why do we do this?
>
> ms> I'm fighting with this too. Symbol fonts have major problems being
> ms> printed on Gtk. I suspect that something even more serious is
> ms> happening with them too.
>
> ms> Twice I've saved a test document with symbol fonts and both have
> ms> turned up as "bogus documents". I suspect we problems in our
> ms> import/exporters.
>
> Whoa, Nellie. I think there are three different things in this one
> Q&A.
>
> 1. What does remapGlyphs actually do? I'll come back to tha, below.
>
> 2. Printing under Gtk. My guess is that this is a specific print
> driver problem. Since remapGlyphs only comes into play when a
> character has a zero-width glyph, it can't be the source of this
> problem unless Gtk printing can somehow do something more appropriate
> with zero-width glyphs.
>
> 3. Bogus documents for documents with symbol fonts. I think you're
> right about the exporter being the problem, though I thought this had
> been fixed at least a couple months ago (I still think that). It was
> first noticed that the *.abw exporter was exporting smart quote
> characters as some other strange thing. My assumption is that other
> non-Latin1 characters would get similar treatment.
My guess is that any or all of these could be due to character
set encodings and font encoding. I understand that the old symbol
fonts had an encoding (or code page) all their own. So just as
we have converters for iso-8859-1 etc we need one for the symbol
font. I believe this is also an issue in the RTF import/export
for symbol fonts.
> OK, so what is remapGlyphs all about? Here is a description I sent to
> someone about a year ago (so some things in the code base may have
> changed since then).
Ah thanks for this description (:
> ================================================================
> On some platforms, and for some fonts, there are glyphs missing at
> positions of interest. In particular, the fonts supplied with Abi on
> Unix only have glyphs among the first 256 positions (ie, 8 bits).
> That means that any Unicode characters >=256 will be measured and
> rendered as zero-width characters. This can be somewhat confusing to
> the average user.
Is it due X or the fonts or AbiWord that missing characters are
classed as zero-width characters?
> The most common case of this is in the use of "smart quotes" in
> documents imported from MSWord. Abi MSWord and RTF importers
> correctly translate the characters to the appropriate Unicode
> characters positions, but they are all in the U+20xx range.
The MS western encoding contains characters that ISO-8859-1 does
not. "Smart quotes" are the most obvious. Importing must always
pass through iconv/mbtowc since Abi uses Unicode natively.
> The remapGlyphs feature provides preference values for which
> characters to show instead of invisible characters. The remapping is
> done only for display/printing purposes; the document itself is not
> changed. The default preferences will only do the remapping if the
> character is actually zero-width in the font being used, remappings
> are provided for the four Unicode curly quote characters, and there is
> a default remapping for any other characters that happen to come up
> zero-width.
Now we have a problem. Missing characters and zero-width characters
are not the same thing. Unicode contains many zero-width combining
characters which are fully visible. Typically accent marks which
render over the previous letter. Vietnamese also uses these even in
8 bit encodings.
This type of remapping for unsupported characters is known as
"transliteration" where we attempt to find the next best thing.
A regular "a" in place of an "á" for example. Abiword has limited
code for this in XAP_EncodingManager::approximate() which also
handles smart quotes. libiconv has beautiful support for
transliteration. I recommend we distinguish between missing and
zero-width characters, and centralize transilteration.
> AbiWord 0.7.10 had some other distracting character spacing problems,
> so the best way to see the difference is to use a recent Unix build
> and view a document with smart quotes (easiest is to import a simple
> MSWord document, but I have attached a somewhat messy document I've
> been testing with) with the preference value turned on and off.
> Besides being invisible when the preference is turned off, moving the
> cursor with arrow keys does a double step at the position of the
> zero-width character. This makes great sense to programmers but is
> confusing to regular folks.
This makes sense to regular Vietnamese folks too. I think we can
have the best of both worlds.
Andrew Dunbar.
-- http://linguaphile.sourceforge.net_________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
This archive was generated by hypermail 2b25 : Thu Jun 21 2001 - 00:07:45 CDT