In case this is helpful, here are examples of code I've written to use an XML catalog with xslt:transform(). These examples were slightly modified to put into an email so there might be some typos.
Version 1:
In this example the XML document "file.xml" might be coming from a zip file or other location so temporarily writing the XML to disk was necessary.
The location of catalog.xml and DTD are relative to .basexhome. The location of the XSLT is relative to the XQuery file.
declare option db:catfile 'src/schemas/catalog.xml';
declare function local:parse-xml($xml as xs:string) as document-node() { let $file := file:create-temp-file('parse-xml-', '.xml') return ( file:write-text($file, $xml), (# db:intparse false #) (# db:dtd true #) (# db:chop false #) { doc($file) }, file:delete($file) ) };
"file.xml" => file:read-text() => local:parse-xml() => xslt:transform-text(file:resolve-path(xslt/stylesheet.xsl'))
Version 2:
If the XSLT needs access to entities defined in the DTD using the function unparsed-entity-uri() then the above example does not work. In this case, the DOCTYPE is modified using a regular expression to insert a SYSTEM DTD location so that the unparsed XML can be provided to xslt:transform-text().
declare function local:preprocess-xml($xml as xs:string, $dtd-path as xs:string) as xs:string { replace($xml, '(PUBLIC\s["'][\sa-zA-Z0-9-'()+,./:=?;!*#@$_%]*["']\s["'][a-zA-Z0-9_/:.\-]*[/\]?[a-zA-Z0-9_.-]+.dtd["'])|(SYSTEM\s["'][a-zA-Z0-9_/:.\-]*[/\]?[a-zA-Z0-9_.-]+.dtd["'])', 'SYSTEM "' || $dtd-path || ' "', 'i') };
"file.xml" => file:read-text() => local:preprocess-xml("src/schemas/my.dtd") => xslt:transform-text(file:resolve-path('xslt/stylesheet.xsl'))
I'm using xslt:transform-text() because I want the transformed XML to have the serialization options and DOCTYPE that are specified in the XSLT, but if those things are not important to you then xslt:transform() would work equally well.
These examples just show what has worked for me, and there might be better alternatives.
Kind regards, Vincent
_____________________________________________ Vincent M. Lizzi Head of Information Standards | Taylor & Francis Group vincent.lizzi@taylorandfrancis.commailto:vincent.lizzi@taylorandfrancis.com
Information Classification: General From: Lizzi, Vincent Sent: Friday, November 5, 2021 4:54 PM To: Christian Grün christian.gruen@gmail.com; Imsieke, Gerrit, le-tex gerrit.imsieke@le-tex.de Cc: BaseX basex-talk@mailman.uni-konstanz.de Subject: RE: [basex-talk] specifying the processor for xslt:transform()
Hello Christian, Gerrit, Liam, Graydon,
Is it possible to use a different XML Catalog Resolver with BaseX? I'm referring specifically to the new XML resolver that Norm Tovey-Wash presented today at Declarative Amsterdam. The presentation recording is at https://www.youtube.com/watch?v=LBuqQG8io8k&ab_channel=DeclarativeAmster... and resolver is available at https://xmlresolver.org/ and https://github.com/xmlresolver/xmlresolver/.
I haven't yet had a chance to try Norm's new XML resolver or the BaseX 10 snapshot.
However, I have also run into the limitation Gerrit mentioned about xslt:transform() not using an XML Catalog, and have used workarounds to preprocess the XML before calling xslt:transform().
Regarding useful options, the two things that I usually want to configure (apart from the contents of catalog.xml) are the location of the catalog.xml file(s) and logging verbosity. Being able to configure the catalog in a map parameter or startup parameter seem like useful additions to the existing methods (pragma, option, .basex, etc.).
Kind regards, Vincent
_____________________________________________ Vincent M. Lizzi Head of Information Standards | Taylor & Francis Group vincent.lizzi@taylorandfrancis.commailto:vincent.lizzi@taylorandfrancis.com
Information Classification: General From: BaseX-Talk <basex-talk-bounces@mailman.uni-konstanz.demailto:basex-talk-bounces@mailman.uni-konstanz.de> On Behalf Of Christian Grün Sent: Friday, November 5, 2021 8:28 AM To: Imsieke, Gerrit, le-tex <gerrit.imsieke@le-tex.demailto:gerrit.imsieke@le-tex.de> Cc: BaseX <basex-talk@mailman.uni-konstanz.demailto:basex-talk@mailman.uni-konstanz.de> Subject: Re: [basex-talk] specifying the processor for xslt:transform()
With BaseX 10, which will be based on JDK 11, we'll switch to the built-in JDK Catalog Resolver [1], which tends to get good reviews, and which allows for a much cleaner and more consistent integration. Debugging should be easier as well, as errors will always be reported back if the catalog resolution fails.
We think about replacing the CATFILE option...
1. Option: CATFILE: path/to/catalog.xml
2. or XQuery: fetch:xml('file.xml', map { 'catfile': 'path/to/catalog.xml })
...with a new CATALOG option that takes multiple keys and values:
1. Option: CATALOG: files=path/to/catalog.xml,resolve=strict,prefer=public,defer=false
2. or XQuery: fetch:xml('file.xml', map { 'catalog': map { 'files': 'path/to/catalog.xml', 'resolve': 'strict', 'prefer': 'public', 'defer': false() }})
An alternative would be to completely drop the catalog options and assign all catalog options via system properties at startup:
java -Djavax.xml.catalog.files=path/to/catalog.xml .... BaseX
I'd love to get your feedback on these ideas, and your experiences with an early BaseX 10 snapshot [2]! Christian
[1] https://docs.oracle.com/en/java/javase/11/core/xml-catalog-api1.html#GUID-96...https://docs.oracle.com/en/java/javase/11/core/xml-catalog-api1.html#GUID-96D2C9AC-641A-4BDB-BB08-9FA04358A6F4 [2] https://files.basex.org/releases/latest-10/https://files.basex.org/releases/latest-10
On Fri, Nov 5, 2021 at 9:03 AM Imsieke, Gerrit, le-tex <gerrit.imsieke@le-tex.demailto:gerrit.imsieke@le-tex.de> wrote:
On 05.11.2021 03:03, Liam R. E. Quin wrote:
On Thu, 2021-11-04 at 18:43 -0400, Graydon Saunders wrote:
Related to this, setting the catalog for use by xslt:transform() is defeating me.
The only ways i have found to debug these are (1) with strace -f, to make sure the file is being read (2) with a CatalogManager.properties file [[ verbosity=65535 # relative-catalogs=false prefer = public catalogs=mycataloguefile.xml ]]
Likely you need entries in the catalog file starting with file:///
If you are uploading queries to a BaseX server, remember it's the server that needs to have had XLASSPATH set when starting, and that relativeURIs like "catalog.xml" might be sought for in the server's directory.
Liam
Liam and Christian have thankfully added support for resolving include/import URIs and doc(...) URIs approx 2 years ago [1]. A thing that I recently found was lacking is resolution of system identifiers that occur in documents. That is, if there is a reference to a DTD in a document that is read during the transformation, the catalog resolution does not apply to the public or system identifiers.
Is this the issue that you are encountering, Graydon?
Your first argument to xslt:transform is db:open('acme_content')[1]. Does this document have a DOCTYPE declaration? I'd have guessed that the DOCTYPE declaration was stripped away when the documents were loaded into the DB, that is, parsing with the DTD only happened during import. But maybe this is different if you use the internal parser.
Gerrit
[1] https://github.com/BaseXdb/basex/issues/1719https://github.com/BaseXdb/basex/issues/1719