I'm having trouble understanding document/collection semantics with xquery update.
If a create a document and add another document to the same path, that path becomes a collection with two document nodes. However it seems to be impossible to manipulate these document nodes individually.
See the following console output:
create db test
Database 'test' created in 137.25 ms.
open test
Database 'test' was opened in 2.15 ms.
add to test.xml <root1/>
Path "test.xml" added in 19.44 ms.
add to test.xml <root2/>
Path "test.xml" added in 2.57 ms.
list test
Input Path Type Content-Type Size --------------------------------------- test.xml xml application/xml 2 test.xml xml application/xml 2
2 Resources.
xquery collection('test/test.xml')
<root1/><root2/> Query executed in 30.84 ms.
xquery doc('test/test.xml')
Stopped at line 1, column 20: [BXDB0006] Database path 'test/test.xml' must point to a single document.
xquery import module namespace functx = "http://www.functx.com"; collection('test/test.xml') ! (document-uri(.), functx:node-kind(.), .)
test/test.xml document-node<root1/> test/test.xml document-node<root2/> Query executed in 29.54 ms.
xquery delete node collection('test/test.xml')[2]
Query executed in 1.18 ms.
xquery import module namespace functx = "http://www.functx.com"; collection('test/test.xml') ! (document-uri(.), functx:node-kind(.), .)
test/test.xml document-node<root1/> test/test.xml document-node<root2/> Query executed in 43.83 ms.
xquery delete node db:open('test','test.xml')[2]
Query executed in 2.15 ms.
xquery import module namespace functx = "http://www.functx.com"; collection('test/test.xml') ! (document-uri(.), functx:node-kind(.), .)
test/test.xml document-node<root1/> test/test.xml document-node<root2/> Query executed in 36.77 ms.
Note that it doesn't seem possible to delete one of the document nodes, even using db:replace()
xquery db:replace('test','test.xml',collection('test/test.xml')[1])
Stopped at line 1, column 60: [BXDB0006] Database path '%' must point to a single document.
I don't understand this error message, unless db:replace is internally using the path of the document-node rather than the document-node itself.
delete test.xml
2 resource(s) deleted in 36.35 ms.
But I can delete both items at the same path.
The way I discovered this is that I accidentally added a document to an existing path using db:add(). However, I was unable to correct the mistake and remove the documents without completely deleting the path and re-adding its members. I don't see any way to do this with a single xquery update PUL.
It seems to me that this is a bug. Either xquery update expressions like 'delete node' should work on document nodes, or adding document-nodes to a path which has a document-node should be illegal and raise an error when attempted.
Hi Francis,
thanks for your thorough analysis.
I'm having trouble understanding document/collection semantics with xquery update. ...
yes, you stumbled upon a sensitive issue: both XQuery and its Update extension have no semantics regarding databases. If a document node is specified in a delete expression, it is simply ignored [1].
Our own database commands have been derived from the BaseX-specific database commands. As such, the semantics differs from XQuery Update. As you correctly indicated, it is currently impossible with these commands to delete one of several documents with the same name. We regard this behavior as intermediate, as future versions of BaseX will not allow more than one document with the same name anymore [2].
A general suggestion is to use distinct paths for all stored documents, or use the (slower) replace command/function to enforce distinct document paths – as long as you need to address single documents in your database.
Hope this helps, feel free to ask for more, Christian
[1] http://www.w3.org/TR/xquery-update-10/#id-delete [2] https://github.com/BaseXdb/basex/issues/429
I suspected that XQuery's database-indifference had something to do with this.
It's not clear to me what clause of the XQuery Update specification you link disallows deleting a document node. Is it this one?
"If any node in $tlist has no parent, it is removed from $tlist (and is thus ignored in the following step)."
If so, could this be resolved by having a document-node() have some kind of db:database-node node type as a parent? It would also allow us to use db:*() functions by passing around database nodes directly instead of as string names. It would also allow us to determine the database of a path without using string manipulation. I currently have to use these functions a lot:
declare function local:uri-db-and-path($docuri as xs:string) as xs:string+ { let $pathsep := '/' return (fn:substring-before($docuri, $pathsep), fn:substring-after($docuri, $pathsep)) }; declare function local:doc-db-and-path($doc as document-node()) as xs:string+ { local:uri-db-and-path(document-uri($doc)) };
I think the github issue you link to rather underestimates how broken this is. This is really a bug and not a nice-to-have. As I said, I added a path to an existing document by accident. As it is right now, you can either replace the document with db:replace() (which isn't what I wanted to do), or silently add a new document to the same path with db:add(), but there's no way to add to a path and get an error if a document exists at the path (what I wanted). A mistaken db:add() leaves the database in a state that is *very difficult to detect and fix* (I think *impossible* to fix atomically with a single Pending Update List, which can be a problem in a multi user environment), and it is made all the more difficult because of the XQuery Update limitation on document-nodes.
I also don't understand why that db:replace() expression doesn't work. Reminder:
xquery db:replace('test','test.xml', <root/>)
Stopped at line 1, column 60: [BXDB0006] Database path '%' must point to a single document.
So it seems db:replace() needs a destination path that resolves to a single document to work. I'm not sure why this would be since the destination path will be replaced anyway.
Thank you very much for your prompt reply and enlightening links.
Hi Francis,
It's not clear to me what clause of the XQuery Update specification you link disallows deleting a document node. Is it this one?
"If any node in $tlist has no parent, it is removed from $tlist (and is thus ignored in the following step)."
Exactly. It's true that we could start reflecting on the consequences of an additional super node.. But this would most likely introduce numerous other side effects to the existing query and storage architecture.
However, I see some clear advantage if the nodes to be deleted, replaced, etc, could be directly specified. We could think about extending/rewriting the existing db functions (provided that we maintain a consistent solution, and that we manage to avoid new incompatibilities with previous versions of the db module).
I think the github issue you link to rather underestimates how broken this is. This is really a bug and not a nice-to-have.
The "nice2have" tag might have been irritating. I claim, however, that it’s not a bug that multiple documents with the same name can be added to a database. The opposite is true: we have quite a number of users who appreciate the fact that the addition of database resources is lightning fast in BaseX. It’s rather the inconsistent behavior of accompanying functions that causes the problems. Beside that, it will be difficult to keep up the same performance if we start indexing all resource paths.
I also don't understand why that db:replace() expression doesn't work. Reminder:
xquery db:replace('test','test.xml', <root/>)
Stopped at line 1, column 60: [BXDB0006] Database path '%' must point to a single document.
..and I was surprised about the error message ;) I have updated the error feedback, and the behavior of db:replace(). Now, all documents will be deleted that match the specified path, and a single document will be added. Feel free to check out the latest stable snapshot [1].
Christian
Great, thank you! I got the notification from github as well. That removes a big pain point.
BaseX is a great system, and now I see it has a great community as well!
..thanks, too, for your elaborate feedback! ___________________________
Great, thank you! I got the notification from github as well. That removes a big pain point.
BaseX is a great system, and now I see it has a great community as well!
Francis Avila
basex-talk@mailman.uni-konstanz.de