Dear Christian,
"Checks if the specified resource exists and if it is an XML document". That being the case, I would think it would return false if my path argument actually contained two document-nodes.
Do your documents have the same name?
In the case I was testing, yes. I used the db:add function twice (for example):
db:add('testing',document{<a/>},'parent/doc.xml') db:add('testing',document{<b/>},'parent/doc.xml')
If I then call the following, the result is true:
db:is-xml('testing','parent/doc.xml')
However, this isn't an XML document. It is a collection of XML documents.
It isn't that I disagree with treating paths as collections as that is very useful as well. My issue is that it should be possible to give a unique name to a document and to be explicit as to when that is your intention.
In terms of what would have to be changed, I think the following would suffice:
First document-nodes already have an identifier assigned to them (the db:node-id can be used to see this), so it would seem to me that it becomes a matter of allowing a map of document names to these identifiers. Although the recommendation would seem to allow for multiple names per document-node, I don't see much value in that and it would probably require new functions to allow such a thing. To keep it simple, my recommendation will only focus on what could be done in the existing functions, rather than adding new ones. Since my issue centers around the ambiguity of a path referring to a collection or a single document, I will focus on the functions that deal with these (and only the XML related ones).
- db:open($db as item(), $path as xs:string) as document-node()*: I think this can remain unchanged. If the path is a document, the result would technically be a single document-node, but that is already true.
- db:add($db as item(), $input as item(), $path as xs:string) as empty-sequence(): This should remain unchanged as it shouldn't be required to assign a name to a document in order to add a document-node to an existing collection. However, one must be able to distinguish whether the path identifies a collection or a document within a collection. Further, as laid out in the XQuery recommendation, if the path is a document there should be a relation of that name to the existing collection names. I would think the cleanest approach would be to add an overload method: db:add($db as item(), $input as item(), $path as xs:string, $doc_name as xs:boolean) as empty-sequence: The $doc_name parameter is true if the path is intended to identify a document name, and false if it identifies a collection. The default value is false, so that nothing changes in terms of how the function currently works. As with the existing path, the delimiter character ('/') is significant in that it represents hierarchy. Therefore, the following: db:add('db', document{<a/>}, 'level_1/level_2/my_doc', true) Adds the document-node to the database named 'db' with the document name ('level_1/level_2/my_doc'). This document-node is also available under the collection 'level_1' and 'level_1/level_2' (as it is currently implemented). If the $doc_name parameter is true, and the supplied path already exists, an error should be raised.
-db:rename($db as item(), $path as xs:string, $newpath as xs:string) as empty-sequence(): will raise error if the rename results in a document name conflict. For example if I have the documents 'A/doc_1.xml' and 'B/doc_1.xml' and invoke db:rename('db','A','B') the change would not be allowed since this would result in 2 document-nodes with the name 'B/doc.xml'. Renaming a document name to an existing collection name could simply remove the document name from the document-node (i.e. unmap it).
-db:replace($db as item(), $path as xs:string, $input as item()) as empty-sequence(): this should work as it already does, since it raises an error if the path refers to more than one document node. Basically, you can replace a collection assuming it contains only one document-node. If the path is a document, no further check would be required since you know if contains a single document-node.
I would think that should cover it, but I am sure this is not exhaustive. Basically, we simply need a way to provide a unique name within the scope of a database to a single document-node. I would think that this would also allow you to implement the fn:doc and fn:document-uri functions better.
I hope this helps to clarify.
Jack
P.S. My assumption is also that the changes above would be applied to the other APIs (e.g. Java) as well.
-----Original Message----- From: Christian GrĂ¼n [mailto:christian.gruen@gmail.com] Sent: Wednesday, February 08, 2012 6:25 AM To: J Gager Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Collections and Documents
Dear Jack,
My confusion mainly arises from the documentation for the Database Module in the XQuery portal (http://docs.basex.org/wiki/Database_Module). Throughout this page, the examples provided for the functions seem to indicate that it is
possible to provide a single name which maps to a single document-node.
we have added one introductory paragraph "Commonalities" on that page that is supposed to explain the $db variable, but it may well be that it's not really noticed, or may be misleading.
When I find more time, I can provide more detailed recommendations for the above wiki page.
That would be great; you'll probably be more efficient in rephrasing the relevant snippets than us (maybe it's just one, two sentences that may need to be replaced).
"Checks if the specified resource exists and if it is an XML document". That being the case, I would think it would return false if my path argument actually contained two document-nodes.
Do your documents have the same name?
In fact, it would seem from some quick tests that I am even able to store binary resource and XML under the same path (which I would expect with folders but not with documents).
True, that's currently possible (but may be prohibited in future versions).
I hope this is useful. I still think that having a true document-node to document mapping would be useful, as it would allow one to use the handy database module functions such as add, delete, rename, and replace confidently.
What would have to be changed in your opinion to end up with a true document-node to document mapping?
Thanks, Christian