I have a corpus of TEI files with figures and page images encoded as external entities. It appears that even when choosing “Parse DTDs and entities” this info is lost when parsing files into database, And in any case, unparsed-entity-uri() is an XSLT only function.
It would appear that I first need to transform the files first and replace @entity attributes with @url attributes while these unparsed entity values are available, before creating the database, or else generate another database to map entity names to values later.
Are there any better ways to handle this case ?
Is there any way to do these transforms on the fly before parsing the files into the database ?
The only thing that comes to mind is to set up a local SaxonServlet to do the transforms, and load from URLs instead of file paths. ( I’ve been doing something similar for a different case, and running into memory errors that I don’t see when loading from a directory when creating a database. Increasing memory didn’t help much, but inserting ‘flush’ command between even ‘add’ commands seemed to work. )
— Steve Majewski