From: Dom Lachowicz (domlachowicz@yahoo.com)
Date: Wed Oct 15 2003 - 08:47:57 EDT
Hi Alan,
> Also I made another point: ISCII export substitutes
> '?'
> for characters outside the character set, regardless
> of the
> existence of standard conventions for imitating
> them.
Alan, unfortunately, there are no "standard
conventions" applicable to plaintext. SGML entity
references are meant for exactly that - SGML
documents.
> You missed my point. Consider the xhtml document
> below. It
> contains a couple entities for which there are
> pretty
> standard substitutes. (Normal quote for “ the
> consecutive hyphens for —)
<!ENTITY mdash CDATA "—" -- em dash, U+2014
ISOpub -->
<!ENTITY ldquo CDATA "“" -- left double
quotation mark, U+201C ISOnum -->
<!ENTITY rdquo CDATA "”" -- right double
quotation mark, U+201D ISOnum -->
As you can see, these don't fit into ASCII at all.
They're fairly high in the unicode table - at least
8000 entries past the first 127 or 255 that one could
reasonably call "ASCII".
So, what are you asking for exactly? Saving these as
SGML entities inside of ASCII is just plain wrong. I'm
not sure that saving them as their rough ASCII
equivalents is ideal behavior, but it seems more
reasonable than the SGML entity suggestion.
I'd personally suggest that you save these documents
as UTF-8 encoded. Just about every text editor in the
world worth its salt supports UTF-8 now, and it
preserves your text in its entirety. This sounds like
correct behavior to me.
Best regards,
Dom
__________________________________
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com
-----------------------------------------------
To unsubscribe from this list, send a message to
abiword-user-request@abisource.com with the word
unsubscribe in the message body.
This archive was generated by hypermail 2.1.4 : Wed Oct 15 2003 - 09:05:25 EDT