Subject: why properties are strings instead of enums
From: Paul Rohr (paul@abisource.com)
Date: Tue Jan 23 2001 - 17:09:43 CST
At 01:41 PM 1/15/01 +1100, Martin Sevior wrote:
>On Sun, 14 Jan 2001, John L. Clark wrote:
>> Is our interface to document properties really done solely by lookup
tables
>> of strings? If so, why is it not done instead like a database, with IDs
>> for properties and their values, which map to strings when necessary for
>> writing? I'm still pouring over our piecetable and surrounding
>> structure code, and it will be a while until I am at all comfortable
>> with it, so forgive my naivety.
>
>These are good questions. Yes our interface to doc properties is all
>though strings.
>
>Dom recently committed code to do binary searches on string properties but
>we have also considered using enums for doc properties too. I'm not sure
>why the abi designers went with const strings over enums. enums would
>certainly be much faster. I guess they thought that using const strings
>would be more robust and perhaps more easily interfaced to XML parsers.
I don't want to open up a flame war, but here's the history...
Perhaps we've seen too many network protocols in our lives, but Jeff and I
made that design decision. (IIRC, Eric was the first to propose converting
to enums, but he neither wrote the necessary code nor convinced us to do so.)
I think the real reason we've been happy with the strings all along is that
they're self-documenting, robust, and quite scalable. Adding a new property
requires new code only on the edges -- all of the core property-handling
logic down in the piece table and our importer/exporter stays the same.
Admittedly, this design favors our native file format at the expense of
others, but that's a feature, not a bug. ;-) It sure does wonders for our
ability to support forwards and backwards compatibility in our file format.
The two major weaknesses of this decision are:
1. The properties aren't documented well enough (outside of the property
parsing logic in the bowels of the code). This is a bug.
2. Any form of tokenization would be faster. In fact, that's why Jeff did
so much work to condense attrprops down in the piece table. However (to
borrow a page from Thomas' book), I haven't seen any profiling results since
then which suggest that's where we really need more speed.
By contrast, if we introduced a translation layer to enums (or whatever),
I'm not sure that the gains would be all that worthwhile. It introduces a
level of API complexity which just feels wrong. Consider the code you've
seen to handle subsequent versions of the following:
- binary vs. text file formats
- binary vs. text network protocols
In both cases, any performance gains tend to be dominated by the complexity
and brittleness of the code needed to implement them.
Paul
This archive was generated by hypermail 2b25 : Tue Jan 23 2001 - 17:02:10 CST