This is the kind of replacement that the parser will perform if the XHTML entity declarations are being made available to it /during import/.
So first you'll have to make sure that you didn't mirror the file http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd and nothing else. In fact, you need the whole directory mirrored under your local dir /var/www/xxx/content/dtds/ (or at least the files containing the entity declarations that you want to support, such as xhtml-lat1.ent).
The interesting entity nbsp is declared in xhtml-lat1.ent which is included into the XHTML1 DTD by virtue of <!ENTITY % HTMLlat1 PUBLIC "-//W3C//ENTITIES Latin 1 for XHTML//EN" "xhtml-lat1.ent"> %HTMLlat1;
Now you have this choice: either replace the named entities with numerical character references prior to import, using your scripting language of choice, as Pascal suggests. OR patch every XML file that you import with a DOCTYPE declaration that contains a so-called internal DTD subset:
<?xml version="1.0"?> <!DOCTYPE content [ <!ENTITY % HTMLlat1 SYSTEM "http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent"> %HTMLlat1; ]> <data> <remarks> <remark id='1'> <name>Test Remark</name> <content> <p>This contains á non-breaking space!</p> </content> </remark> </remarks> </data>
You might as well have written <!ENTITY % HTMLlat1 SYSTEM "file:///var/www/xxx/content/dtds/xhtml-lat1.ent"> which renders the catalog unnecessary.
A catalog is useful if you want to keep your XML files interchangeable with other people who wouldn't know that they should resolve anything below /var/www/xxx/content/dtds to the HTML entities.
But if you generate the XML files for the sole purpose of uploading them to BaseX (and querying the hell out of these files afterwards, of course), you may go for local paths. They will be resolved anyway.
So the following XML will be importable (call it tmp/local.xml): <?xml version="1.0"?> <!DOCTYPE content [ <!ENTITY % HTMLlat1 SYSTEM "file:///c:/cygwin/usr/share/xml/xhtml1/xhtml-lat1.ent"> %HTMLlat1; ]> <foo> <remarks> <remark id='1'> <name>Test Remark</name> <content> <p>This contains á non-breaking space!</p> </content> </remark> </remarks> </foo>
- The path reflects the entity file location on my machine. - I added another entity, aacute. - I replaced the top level element data with foo in order to illustrate that we don't have to declare the entity resolution for every top-level element that may come around. This will make sense if there is always the same element below which HTML entities are allowed.
Then:
basex -c "create database test; open test; add tmp/local.xml"
(notice: no "SET CATFILE path/to/catalog.xml;")
basex -q "collection('test')/*" > result.xml
<foo> <remarks> <remark id="1"> <name>Test Remark</name> <content> <p>This contains á non-breaking space!</p> </content> </remark> </remarks> </foo>
Look how the entities have been replaced with Unicode characters proper!
-Gerrit
On 28.01.2011 20:19, Pascal Heus wrote:
Charles: Can you replace with  ? *P
On 1/28/11 9:07 AM, Charles F. Munat wrote:
I'm having trouble using XHTML generated by TinyMCE in a BaseX database.
I get the error: The entity "nbsp" was referenced, but not declared.
OK, I understand that it doesn't parse entities automatically. I need to add a DTD.
So I have a setup like this:
In a file called catalog.xml:
<?xml version="1.0"?>
<catalog prefer="system" xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"> <rewriteSystem systemIdStartString="http://www.w3.org/TR/xhtml1/DTD/" rewritePrefix="file:///var/www/xxx/content/dtds/" />
</catalog>
In /var/www/xxx/content/dtds/ I have a file called xhtml1-strict.dtd which is exactly what it says it is (downloaded from the W3C).
Now I need to add the DOCTYPE to my database. But how?
I create this database via a web interface using Vaadin (a bunch of forms, tables, etc. Then I just do XQuery inserts, deletes, etc. There are no XML documents. It's all done piecemeal.
My root node is<data>. Some of my nodes have a<content> node that contains XHTML:
<data> <remarks> <remark id='1'> <name>Test Remark</name> <content> <p>This contains a non-breaking space!</p> </content> </remark> </remarks> </data>
How do I add the DTD? How do I indicate that it should be applied to the contents of the<content> node only? How can I do that without adding namespaces to all the XHTML tags?
Can anyone provide a brief example?
By the way, when I connect to the database I do this:
val session = new ClientSession("localhost", 1984, "admin", "x") session.setOutputStream(System.out) session.execute("SET CATFILE /var/www/xxx/catalog.xml") session.execute("SET CHOP off") session.execute("SET INTPARSE on") session.execute("SET ENTITY on") session.execute("SET PATHINDEX on") session.execute("SET TEXTINDEX on") session.execute("SET ATTRINDEX on") session.execute("SET FTINDEX on") session.execute("SET WILDCARDS on") session.execute("SET DIACRITICS on") session.execute("INFO")
Thanks!
Chas. _______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk