I’ve also been struggling with catalogs in baseX and some other programs.

To amplify what Vincent already wrote, I did discover I was able to enable catalog support for DTDs and character entity files in XTF ( which uses a rather old version of Saxon ) without any source code modifications by:
[1] running it using Java 11 or 12
[2] passing the catalog file in my startup script by passing it in the startup command with:
-Djavax.xml.catalog.files=\"file://$home/WEB-INF/uvateip4-catalog.xml\” 


I was hoping this would also fix another issue I had with the same program: 
I have a number of XInclude files which are all specified with http: URLs. 
The servers those pointed to were all reconfigured to redirect those links to https: 
Opening those http: links using the browser follows the redirect, but when the java 
Parser is set to resolve XInclude links, it ignores the HTTP redirect and inserts the 
HTML redirect message instead of the XML fragment at the redirect location. 

I had hoped that could be fixed by using catalog to redirect using the same method above,
But no luck — it seems that there are different resolution pathways for DTDs & Charents,
xsl:import & xsl:includes and XInclude links and probably other sorts of URLs in documents.

Trying to follow the Saxon documentation on this to see if there is a different fix for this specific issue. 

In the mean time, I’m fixing up the result by replacing the HTML redirect messages with the correct XML fragment by modifying the URL added in the @base attribute by the XInclude processor: 


    <xsl:template match="*:html">
        <p><xsl:value-of select="@*:base"/></p>
        <xsl:variable name="href" select="replace(@*:base,'http:','https:')"/> 
        <p><xsl:value-of select="$href"/></p>
        <xsl:copy-of copy-namespaces="no" select="document($href)"   />
    </xsl:template>


I’ve been trying to move some of the indexing I’m doing in XTF to BaseX, and I’m not clear on exactly what is working both with using the XInclude processing and with resolving entities defined in DTDs - with or without catalogs. (More on this perhaps later - some issues I’ve only just noticed today, so I’m not sure it’s not user error! ) 


— Steve M.



On Jun 2, 2022, at 4:10 PM, Lizzi, Vincent <Vincent.Lizzi@taylorandfrancis.com> wrote:

Hi Daniel and Gerrit,
 
If you are able to use Java version 11 or higher, it might be of use to try the XML Catalog support that comes built in with Java. This ticket comment has some details and an example for configuring Java and BaseX to use the same XML Catalog:
 
 
I’m not sure if this is relevant for your situation, but I’ve read somewhere (although I can’t put my hands on the source right now) that Saxon uses the XML Catalog for resolving URIs only in certain contexts. For example, a DTD DOCTYPE can be resolved using an XML Catalog, but the function fn:json-doc() does not use an XML Catalog.
 
Cheers,
Vincent
 
_____________________________________________
Vincent M. Lizzi
Head of Information Standards | Taylor & Francis Group
 

Information Classification: General
From: BaseX-Talk <basex-talk-bounces@mailman.uni-konstanz.de> On Behalf Of Zimmel, Daniel
Sent: Thursday, June 2, 2022 10:57 AM
To: 'Imsieke, Gerrit, le-tex' <gerrit.imsieke@le-tex.de>; basex-talk@mailman.uni-konstanz.de
Subject: Re: [basex-talk] XML Catalog and xslt:transform()
 
I see, thanks Gerrit and Christian for the insight. This *does* sound wickedly unfunny.

OK if I actually do not need to be able to parse the DTD wouldn't the simple workaround be:

fetch:xml('file:///C:/temp/catalog/dokument.xml')
=> xslt:transform('transform.xsl')

At least this is what works here, resulting in a new document node and trashing the DTD declaration.

Daniel

-----Ursprüngliche Nachricht-----
Von: BaseX-Talk <basex-talk-bounces@mailman.uni-konstanz.de> Im Auftrag von Imsieke, Gerrit, le-tex
Gesendet: Donnerstag, 2. Juni 2022 16:40
An: basex-talk@mailman.uni-konstanz.de
Betreff: Re: [basex-talk] XML Catalog and xslt:transform()

As a workaround, you might be able to read the documents using doc() in XQuery (this might work with the help of the catalog, in contrast to
doc() from within XSLT/Saxon) and pass them to xslt:transform() in some way. “Some way” isn’t easy, either, since xslt:transform() still relies on JAXP, and you can’t pass arbitrary XDM items such as whole documents or maps as stylesheet parameters (or can you? $params as map(*)? doesn’t rule this out, but I doubt that a parameter may have another map as value and arrive safely at the stylesheet). So you might need to wrap all inputs in a single top-level element, which of course prevents you from letting the XSLT stylesheet decide which resource to load dynamically, and you might need to change matching patterns.
But switching to XDM and implementing XPath 3.1’s fn:transform() function that would allow to was too much of a stretch for Christian at the time we paid BaseX GmbH to implement xslt:transform-report(). I think this will need another significant investment, and Christian needs to find time to implement it.

Gerrit

On 02.06.2022 16:24, Imsieke, Gerrit, le-tex wrote:
> Hi Daniel,
> 
> I think the catalog in xslt:transform() is only used for XSLT 
> imports/includes and maybe for reading documents with doc(), and only 
> for Saxon. The catalog is probably *not* used for mapping system 
> identifiers in the documents accessed this way. We should document 
> this better once we find out what is/isn’t supported.
> 
> The background is that we desperately needed to use catalogs for 
> mapping import/include URIs, and we paid Liam to implement this. He 
> succeeded with a little help from Christian, but it was not an easy 
> feat because include/import URI resolution is different from doc() URI 
> resolution in Saxon which in turn is different from system identifier 
> resolution (that is probably done by the XML parser, not by Saxon).
> 
> So I think we need to pay Liam and Christian again so that they work 
> out how to pass the catalog to the XML parser that is invoked by 
> Saxon. This definitely isn’t a fun task.
> 
> Gerrit
> 
> On 02.06.2022 14:44, Zimmel, Daniel wrote:
>> Hi,
>>
>> after reading https://docs.basex.org/wiki/Catalog_Resolver and 
>> digging in the list archives 
>> (https://mailman.uni-konstanz.de/pipermail/basex-talk/2019-March/0141
>> 99.html
>> ) I still have trouble understanding catalog files.
>>
>> Is this supposed to work with xslt:transform() and BaseX GUI 9.7.2?
>> The default option (DTD = false) is ignored by xslt:transform() 
>> because the function is definitely requesting the external DTD.
>> This prevents transforming XML with DTD declarations that are not 
>> available (if I understand correctly, a problem that the DTD option 
>> is trying to solve in general).
>>
>> When I try to solve this via catalog files (actually I do not need 
>> the DTD), I do not have success.
>> Here are my mini examples:
>>
>> Saxon HE 10.3 resides in the lib folder
>>
>> .basex setting:
>> # Local Options
>> SERIALIZER = indent=no
>> DTD = true
>>
>> XML in local folder "C:/temp/catalog":
>> <!DOCTYPE dokument
>>    SYSTEM "http://www.blahblahblah.info/dtd/dokument.dtd">
>> <dokument>
>>    <doknr>01</doknr>
>> </dokument>
>>
>> catalog.xml in local folder "C:/temp/catalog":
>> <catalog prefer="system" 
>> xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
>>    <rewriteSystem
>> systemIdStartString="http://www.blahblahblah.info/dtd/" 
>> rewritePrefix="file:///C:/temp/catalog/dtd/"/>
>> </catalog>
>>
>> dokument.dtd in local folder "C:/temp/catalog/dtd":
>> <!ELEMENT dokument (doknr)>
>> <!ELEMENT  doknr (#PCDATA)>
>>
>> XQuery query.xq in local folder "C:/temp/catalog":
>> (# db:catfile catalog.xml #) {
>>    xslt:transform('dokument.xml', 'transform.xsl') }
>>
>>
>> With or without pragma, this always results in a 
>> java.net.UnknownHostException (because the system ID is not available, 
>> that's true), but I would be expecting this would resolve to 
>> "file:///C:/temp/catalog/dtd/dokument.dtd"
>>
>> Not working in GUI nor via CCL.
>>
>> What am I getting wrong?
>>
>> Thanks, Daniel
>>
> 

-- 
Gerrit Imsieke
Geschäftsführer / Managing Director
le-tex publishing services GmbH
Weissenfelser Str. 84, 04229 Leipzig, Germany
Phone +49 341 355356 110, Fax +49 341 355356 510
gerrit.imsieke@le-tex.de, http://www.le-tex.de

Registergericht / Commercial Register: Amtsgericht Leipzig
Registernummer / Registration Number: HRB 24930

Geschäftsführer / Managing Directors:
Gerrit Imsieke, Svea Jelonek, Thomas Schmidt