From: Dom Lachowicz (firstname.lastname@example.org)
Date: Wed Oct 15 2003 - 08:47:57 EDT
> Also I made another point: ISCII export substitutes
> for characters outside the character set, regardless
> of the
> existence of standard conventions for imitating
Alan, unfortunately, there are no "standard
conventions" applicable to plaintext. SGML entity
references are meant for exactly that - SGML
> You missed my point. Consider the xhtml document
> below. It
> contains a couple entities for which there are
> standard substitutes. (Normal quote for “ the
> consecutive hyphens for —)
<!ENTITY mdash CDATA "—" -- em dash, U+2014
<!ENTITY ldquo CDATA "“" -- left double
quotation mark, U+201C ISOnum -->
<!ENTITY rdquo CDATA "”" -- right double
quotation mark, U+201D ISOnum -->
As you can see, these don't fit into ASCII at all.
They're fairly high in the unicode table - at least
8000 entries past the first 127 or 255 that one could
reasonably call "ASCII".
So, what are you asking for exactly? Saving these as
SGML entities inside of ASCII is just plain wrong. I'm
not sure that saving them as their rough ASCII
equivalents is ideal behavior, but it seems more
reasonable than the SGML entity suggestion.
I'd personally suggest that you save these documents
as UTF-8 encoded. Just about every text editor in the
world worth its salt supports UTF-8 now, and it
preserves your text in its entirety. This sounds like
correct behavior to me.
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
To unsubscribe from this list, send a message to
email@example.com with the word
unsubscribe in the message body.
This archive was generated by hypermail 2.1.4 : Wed Oct 15 2003 - 09:05:25 EDT