Hello all,
there seem to be two unrelated bugs here. The good news is that one seem to be already fixed. The unexpected 12 4 12 4 4 result does not appear anymore when using the latest 8.0 snapshot. Instead, independently from how I set up $doc (so either doc() or direct via document{} ) the result is 12 12 12 12 20. If you don't want to update to the latest snapshot, you could also use fetch:xml(), which also seems not to be affected by this bug.
So at least we have a consistent result now, but I agree that it is in fact incorrect. There seems to be something going wrong during the parent, followed by a child step. I opened a bug report at https://github.com/BaseXdb/basex/issues/1001.
For the time being, rewriting the query seems the only option. I would suggest using
let $path1 := $doc/child::article/child::front /child::article-meta[child::aff[contains(.,"Japan")]] /child::article-id
which does produce the correct result. In fact, I would even argue it is a bit more elegant as you don't have to rewrite article-meta and the child::aff step seems to be a selector, so using a predicator is easier to read - at least to me, but this is, of course, also a matter of taste.
Best regards, Dirk
On 09/27/2014 12:57 PM, Lars Johnsen wrote:
Hi Michael -
I got the same results with my 7.9 version. What is a bit surprising (hopefully I am not introducing any noise into your problem) is that if the last child step is cut off the paths and added to the path variables within the return query, the result becomes 4 4 4 4 4. So each $pathN without the article-id appears to be returning one copy of the parent article-meta up to the parent step:
let $path1 := $doc/child::article/child::front /child::article-meta /child::aff[contains(.,"Japan")] /parent::article-meta, $path2 := $doc/descendant::aff[contains(.,"Japan")] /parent::article-meta, $path3 := $doc/article/front/article-meta/aff[contains(.,'Japan')] /.., $path4 := $doc//aff[contains(.,'Japan')]/.., $path5 := $doc//aff/..
return (count($path1/child::article-id), count($path2/article-id), count($path3/article-id), count($path4/article-id), count($path5/article-id))
Best Lars
2014-09-26 20:35 GMT+02:00 C. M. Sperberg-McQueen cmsmcq@blackmesatech.com :
Consider the following XML document:
<article> <front> <article-meta> <aff id="aff1">Tropical and Infectious Disease Hospital, Kathmandu, Nepal</aff> <aff id="aff2">Nagasaki University, Nagasaki, Japan</aff> <aff id="aff3">Department of Radiology, Kyorin University Faculty of Medicine, Tokyo, Japan</aff> <aff id="aff5">Pentax Company Limited, Tokyo, Japan</aff> <aff id="aff6">National Research Laboratory of Molecular Complex Control, Yonsei University, Seoul, Korea</aff> <!--* ... *--> <article-id pub-id-type="pmc">2570825</article-id> <article-id pub-id-type="pmid">18325280</article-id> <article-id pub-id-type="publisher-id">07-0473</article-id> <article-id pub-id-type="doi">10.3201/eid1403.070473</article-id> </article-meta> </front> <!--* ... *--> </article
For convenience in trying to understand this problem, a copy of this document has been placed at [1].
When I issue the following search against this document, I get unexpected results.
let $doc := doc('http://blackmesatech.com/2014/LIS590DML/data/testdata.xml ')
let $path1 := $doc/child::article/child::front /child::article-meta /child::aff[contains(.,"Japan")] /parent::article-meta/child::article-id, $path2 := $doc/descendant::aff[contains(.,"Japan")] /parent::article-meta/child::article-id, $path3 := $doc/article/front/article-meta/aff[contains(.,'Japan')] /../article-id, $path4 := $doc//aff[contains(.,'Japan')]/../article-id, $path5 := $doc//aff/../article-id
return (count($path1), count($path2), count($path3), count($path4), count($path5))
What I expect is that path1, path2, path3, path4, and path5 should all return the same results, namely the set of four article-id elements in the document. So the sequence of counts returned should be 4 4 4 4 4.
What I am finding is that path1 and path3 are returning 12 results, with each article-id present three times in the result (once, apparently, for every aff element containing the string 'Japan'). Paths 2, 4, and 5 are all returning 4 results each, as I had expected them to. So the sequence of counts actually returned is 12 4 12 4 4.
In BaseX 7.6, for what it's worth, this query returns the sequence 12 12 12 12 20, which seems suggestive.
Interestingly, if I initialize the variable $doc with a direct element constructor, along the lines of
let $doc := document { <article>...</article> }
then all counts come out as expected in 7.6, but in 7.9 the result continues to be 12 4 12 4 4.
Is this an error in the handling of the / operator, or am I missing some subtle point?
Many thanks.
[1] http://blackmesatech.com/2014/LIS590DML/data/testdata.xml
- C. M. Sperberg-McQueen, Black Mesa Technologies LLC
- http://www.blackmesatech.com
- http://cmsmcq.com/mib
- http://balisage.net