Hi Michael -
I got the same results with my 7.9 version. What is a bit surprising (hopefully I am not introducing any noise into your problem) is that if the last child step is cut off the paths and added to the path variables within the return query, the result becomes 4 4 4 4 4. So each $pathN without the article-id appears to be returning one copy of the parent article-meta up to the parent step:
let $path1 := $doc/child::article/child::front /child::article-meta /child::aff[contains(.,"Japan")] /parent::article-meta, $path2 := $doc/descendant::aff[contains(.,"Japan")] /parent::article-meta, $path3 := $doc/article/front/article-meta/aff[contains(.,'Japan')] /.., $path4 := $doc//aff[contains(.,'Japan')]/.., $path5 := $doc//aff/..
return (count($path1/child::article-id), count($path2/article-id), count($path3/article-id), count($path4/article-id), count($path5/article-id))
Best Lars
2014-09-26 20:35 GMT+02:00 C. M. Sperberg-McQueen cmsmcq@blackmesatech.com :
Consider the following XML document:
<article> <front> <article-meta> <aff id="aff1">Tropical and Infectious Disease Hospital, Kathmandu, Nepal</aff> <aff id="aff2">Nagasaki University, Nagasaki, Japan</aff> <aff id="aff3">Department of Radiology, Kyorin University Faculty of Medicine, Tokyo, Japan</aff> <aff id="aff5">Pentax Company Limited, Tokyo, Japan</aff> <aff id="aff6">National Research Laboratory of Molecular Complex Control, Yonsei University, Seoul, Korea</aff> <!--* ... *--> <article-id pub-id-type="pmc">2570825</article-id> <article-id pub-id-type="pmid">18325280</article-id> <article-id pub-id-type="publisher-id">07-0473</article-id> <article-id pub-id-type="doi">10.3201/eid1403.070473</article-id> </article-meta> </front> <!--* ... *--> </article
For convenience in trying to understand this problem, a copy of this document has been placed at [1].
When I issue the following search against this document, I get unexpected results.
let $doc := doc('http://blackmesatech.com/2014/LIS590DML/data/testdata.xml ')
let $path1 := $doc/child::article/child::front /child::article-meta /child::aff[contains(.,"Japan")] /parent::article-meta/child::article-id, $path2 := $doc/descendant::aff[contains(.,"Japan")] /parent::article-meta/child::article-id, $path3 := $doc/article/front/article-meta/aff[contains(.,'Japan')] /../article-id, $path4 := $doc//aff[contains(.,'Japan')]/../article-id, $path5 := $doc//aff/../article-id
return (count($path1), count($path2), count($path3), count($path4), count($path5))
What I expect is that path1, path2, path3, path4, and path5 should all return the same results, namely the set of four article-id elements in the document. So the sequence of counts returned should be 4 4 4 4 4.
What I am finding is that path1 and path3 are returning 12 results, with each article-id present three times in the result (once, apparently, for every aff element containing the string 'Japan'). Paths 2, 4, and 5 are all returning 4 results each, as I had expected them to. So the sequence of counts actually returned is 12 4 12 4 4.
In BaseX 7.6, for what it's worth, this query returns the sequence 12 12 12 12 20, which seems suggestive.
Interestingly, if I initialize the variable $doc with a direct element constructor, along the lines of
let $doc := document { <article>...</article> }
then all counts come out as expected in 7.6, but in 7.9 the result continues to be 12 4 12 4 4.
Is this an error in the handling of the / operator, or am I missing some subtle point?
Many thanks.
[1] http://blackmesatech.com/2014/LIS590DML/data/testdata.xml
- C. M. Sperberg-McQueen, Black Mesa Technologies LLC
- http://www.blackmesatech.com
- http://cmsmcq.com/mib
- http://balisage.net