OK: one answer to my own question is that instead of trying to tackle it by resolving entities when creating the database, I can use base-url() of the (in database) document to find the original, and then parse it as text using file:read-text-lines() .
for $LINE in file:read-text-lines( 'legacy/uvaBook/tei/PoeScap.xml' ) where starts-with( $LINE, "<!ENTITY" ) and contains( $LINE, "SYSTEM") et $TOK := tokenize( $LINE ) where $TOK[2] != "%" return element ENTITY { attribute ID { $TOK[2]}, translate($TOK[4],'"', '')}
Gives me something like:
<ENTITY ID="PoeAltit">uva-lib:488578</ENTITY> <ENTITY ID="PoeAlcov">uva-lib:488579</ENTITY> <ENTITY ID="PoeAlspi">uva-lib:488580</ENTITY>
That I could feed to a lookup function.
Or maybe a MAP would be more direct:
map:merge( for $LINE in file:read-text-lines( 'legacy/uvaBook/tei/PoeScap.xml' ) where starts-with( $LINE, "<!ENTITY" ) and contains( $LINE, "SYSTEM") let $TOK := tokenize( $LINE ) where $TOK[2] != "%" return map:entry( $TOK[2],translate($TOK[4],'"', '') ) )
Perhaps there is a way to cache these maps (or precompute) to avoid having to read and parse the text again ?
— Steve.
On Jul 24, 2019, at 3:19 PM, Majewski, Steven Dennis (sdm7g) sdm7g@virginia.edu wrote:
I have a corpus of TEI files with figures and page images encoded as external entities. It appears that even when choosing “Parse DTDs and entities” this info is lost when parsing files into database, And in any case, unparsed-entity-uri() is an XSLT only function.
It would appear that I first need to transform the files first and replace @entity attributes with @url attributes while these unparsed entity values are available, before creating the database, or else generate another database to map entity names to values later.
Are there any better ways to handle this case ?
Is there any way to do these transforms on the fly before parsing the files into the database ?
The only thing that comes to mind is to set up a local SaxonServlet to do the transforms, and load from URLs instead of file paths. ( I’ve been doing something similar for a different case, and running into memory errors that I don’t see when loading from a directory when creating a database. Increasing memory didn’t help much, but inserting ‘flush’ command between even ‘add’ commands seemed to work. )
— Steve Majewski