Dear BaseX Team,
I think I asked this a couple of months ago. We'd like to have XML catalog support when importing files into BaseX.
Or do you have another solution for the following issues:
1. - a customer uses XHTML documents enriched with some semantic tagging which is not XHTML markup - they refer to an XHTML DTD though - we need to use the DTD for resolving entities
Apart from the fact that it is obnoxious to retrieve the DTDs from W3C (and will be blocked if tried to heavily), the major advntage of catalogs will be that we can point BaseX to a modified XHTML DTD that is stored locally and that allows for the additional semantic tags.
=> alternative approach: Patch the data to use standard XHTML markup (such as <span class='supersemantic'>). This doesn't seem feasible because too many tools down the processing chain depend on these custom tags. => other approach: resolve all entities prior to import, discard DOCTYPE declaration. We can do that, but it seems second-best to using catalogs and unmodified data. => Patch data to refer to another DOCTYPE stored somewhere on their servers. This is clearly advisable, because these documents aren't XHTML documents. Need to install the necessary infrastructure. Catalogs are still desirable in this scenario
2. Some of their XML files refer to DTDs such as E:\book\book.dtd We need to resolve these to a DTD that is stored on the server where the import takes place. Again, we could patch the data to refer to some DTD stored centrally on one of their corporate HTTP servers Still, in order to speed up parsing, locally stored DTDs are desirable.
I remember when I proposed to use Apache XML commons resolver [1,2] that Christian replied you would put it on the wish list. I'm trying to prioritize this issue.
(N.B.: it's the same customer who paid someone of your team for solving the base-uri issue...)
Gerrit
[1] http://xml.apache.org/commons/ [2] mirrored at, e.g., http://apache.easy-webs.de//xml/commons/xml-commons-resolver-1.2.tar.gz