Consider the following XML document:
<article> <front> <article-meta> <aff id="aff1">Tropical and Infectious Disease Hospital, Kathmandu, Nepal</aff> <aff id="aff2">Nagasaki University, Nagasaki, Japan</aff> <aff id="aff3">Department of Radiology, Kyorin University Faculty of Medicine, Tokyo, Japan</aff> <aff id="aff5">Pentax Company Limited, Tokyo, Japan</aff> <aff id="aff6">National Research Laboratory of Molecular Complex Control, Yonsei University, Seoul, Korea</aff> <!--* ... *--> <article-id pub-id-type="pmc">2570825</article-id> <article-id pub-id-type="pmid">18325280</article-id> <article-id pub-id-type="publisher-id">07-0473</article-id> <article-id pub-id-type="doi">10.3201/eid1403.070473</article-id> </article-meta> </front> <!--* ... *--> </article
For convenience in trying to understand this problem, a copy of this document has been placed at [1].
When I issue the following search against this document, I get unexpected results.
let $doc := doc('http://blackmesatech.com/2014/LIS590DML/data/testdata.xml')
let $path1 := $doc/child::article/child::front /child::article-meta /child::aff[contains(.,"Japan")] /parent::article-meta/child::article-id, $path2 := $doc/descendant::aff[contains(.,"Japan")] /parent::article-meta/child::article-id, $path3 := $doc/article/front/article-meta/aff[contains(.,'Japan')] /../article-id, $path4 := $doc//aff[contains(.,'Japan')]/../article-id, $path5 := $doc//aff/../article-id
return (count($path1), count($path2), count($path3), count($path4), count($path5))
What I expect is that path1, path2, path3, path4, and path5 should all return the same results, namely the set of four article-id elements in the document. So the sequence of counts returned should be 4 4 4 4 4.
What I am finding is that path1 and path3 are returning 12 results, with each article-id present three times in the result (once, apparently, for every aff element containing the string 'Japan'). Paths 2, 4, and 5 are all returning 4 results each, as I had expected them to. So the sequence of counts actually returned is 12 4 12 4 4.
In BaseX 7.6, for what it's worth, this query returns the sequence 12 12 12 12 20, which seems suggestive.
Interestingly, if I initialize the variable $doc with a direct element constructor, along the lines of
let $doc := document { <article>...</article> }
then all counts come out as expected in 7.6, but in 7.9 the result continues to be 12 4 12 4 4.
Is this an error in the handling of the / operator, or am I missing some subtle point?
Many thanks.
[1] http://blackmesatech.com/2014/LIS590DML/data/testdata.xml
Hi Michael -
I got the same results with my 7.9 version. What is a bit surprising (hopefully I am not introducing any noise into your problem) is that if the last child step is cut off the paths and added to the path variables within the return query, the result becomes 4 4 4 4 4. So each $pathN without the article-id appears to be returning one copy of the parent article-meta up to the parent step:
let $path1 := $doc/child::article/child::front /child::article-meta /child::aff[contains(.,"Japan")] /parent::article-meta, $path2 := $doc/descendant::aff[contains(.,"Japan")] /parent::article-meta, $path3 := $doc/article/front/article-meta/aff[contains(.,'Japan')] /.., $path4 := $doc//aff[contains(.,'Japan')]/.., $path5 := $doc//aff/..
return (count($path1/child::article-id), count($path2/article-id), count($path3/article-id), count($path4/article-id), count($path5/article-id))
Best Lars
2014-09-26 20:35 GMT+02:00 C. M. Sperberg-McQueen cmsmcq@blackmesatech.com :
Consider the following XML document:
<article> <front> <article-meta> <aff id="aff1">Tropical and Infectious Disease Hospital, Kathmandu, Nepal</aff> <aff id="aff2">Nagasaki University, Nagasaki, Japan</aff> <aff id="aff3">Department of Radiology, Kyorin University Faculty of Medicine, Tokyo, Japan</aff> <aff id="aff5">Pentax Company Limited, Tokyo, Japan</aff> <aff id="aff6">National Research Laboratory of Molecular Complex Control, Yonsei University, Seoul, Korea</aff> <!--* ... *--> <article-id pub-id-type="pmc">2570825</article-id> <article-id pub-id-type="pmid">18325280</article-id> <article-id pub-id-type="publisher-id">07-0473</article-id> <article-id pub-id-type="doi">10.3201/eid1403.070473</article-id> </article-meta> </front> <!--* ... *--> </article
For convenience in trying to understand this problem, a copy of this document has been placed at [1].
When I issue the following search against this document, I get unexpected results.
let $doc := doc('http://blackmesatech.com/2014/LIS590DML/data/testdata.xml ')
let $path1 := $doc/child::article/child::front /child::article-meta /child::aff[contains(.,"Japan")] /parent::article-meta/child::article-id, $path2 := $doc/descendant::aff[contains(.,"Japan")] /parent::article-meta/child::article-id, $path3 := $doc/article/front/article-meta/aff[contains(.,'Japan')] /../article-id, $path4 := $doc//aff[contains(.,'Japan')]/../article-id, $path5 := $doc//aff/../article-id
return (count($path1), count($path2), count($path3), count($path4), count($path5))
What I expect is that path1, path2, path3, path4, and path5 should all return the same results, namely the set of four article-id elements in the document. So the sequence of counts returned should be 4 4 4 4 4.
What I am finding is that path1 and path3 are returning 12 results, with each article-id present three times in the result (once, apparently, for every aff element containing the string 'Japan'). Paths 2, 4, and 5 are all returning 4 results each, as I had expected them to. So the sequence of counts actually returned is 12 4 12 4 4.
In BaseX 7.6, for what it's worth, this query returns the sequence 12 12 12 12 20, which seems suggestive.
Interestingly, if I initialize the variable $doc with a direct element constructor, along the lines of
let $doc := document { <article>...</article> }
then all counts come out as expected in 7.6, but in 7.9 the result continues to be 12 4 12 4 4.
Is this an error in the handling of the / operator, or am I missing some subtle point?
Many thanks.
[1] http://blackmesatech.com/2014/LIS590DML/data/testdata.xml
- C. M. Sperberg-McQueen, Black Mesa Technologies LLC
- http://www.blackmesatech.com
- http://cmsmcq.com/mib
- http://balisage.net
Hello all,
there seem to be two unrelated bugs here. The good news is that one seem to be already fixed. The unexpected 12 4 12 4 4 result does not appear anymore when using the latest 8.0 snapshot. Instead, independently from how I set up $doc (so either doc() or direct via document{} ) the result is 12 12 12 12 20. If you don't want to update to the latest snapshot, you could also use fetch:xml(), which also seems not to be affected by this bug.
So at least we have a consistent result now, but I agree that it is in fact incorrect. There seems to be something going wrong during the parent, followed by a child step. I opened a bug report at https://github.com/BaseXdb/basex/issues/1001.
For the time being, rewriting the query seems the only option. I would suggest using
let $path1 := $doc/child::article/child::front /child::article-meta[child::aff[contains(.,"Japan")]] /child::article-id
which does produce the correct result. In fact, I would even argue it is a bit more elegant as you don't have to rewrite article-meta and the child::aff step seems to be a selector, so using a predicator is easier to read - at least to me, but this is, of course, also a matter of taste.
Best regards, Dirk
On 09/27/2014 12:57 PM, Lars Johnsen wrote:
Hi Michael -
I got the same results with my 7.9 version. What is a bit surprising (hopefully I am not introducing any noise into your problem) is that if the last child step is cut off the paths and added to the path variables within the return query, the result becomes 4 4 4 4 4. So each $pathN without the article-id appears to be returning one copy of the parent article-meta up to the parent step:
let $path1 := $doc/child::article/child::front /child::article-meta /child::aff[contains(.,"Japan")] /parent::article-meta, $path2 := $doc/descendant::aff[contains(.,"Japan")] /parent::article-meta, $path3 := $doc/article/front/article-meta/aff[contains(.,'Japan')] /.., $path4 := $doc//aff[contains(.,'Japan')]/.., $path5 := $doc//aff/..
return (count($path1/child::article-id), count($path2/article-id), count($path3/article-id), count($path4/article-id), count($path5/article-id))
Best Lars
2014-09-26 20:35 GMT+02:00 C. M. Sperberg-McQueen cmsmcq@blackmesatech.com :
Consider the following XML document:
<article> <front> <article-meta> <aff id="aff1">Tropical and Infectious Disease Hospital, Kathmandu, Nepal</aff> <aff id="aff2">Nagasaki University, Nagasaki, Japan</aff> <aff id="aff3">Department of Radiology, Kyorin University Faculty of Medicine, Tokyo, Japan</aff> <aff id="aff5">Pentax Company Limited, Tokyo, Japan</aff> <aff id="aff6">National Research Laboratory of Molecular Complex Control, Yonsei University, Seoul, Korea</aff> <!--* ... *--> <article-id pub-id-type="pmc">2570825</article-id> <article-id pub-id-type="pmid">18325280</article-id> <article-id pub-id-type="publisher-id">07-0473</article-id> <article-id pub-id-type="doi">10.3201/eid1403.070473</article-id> </article-meta> </front> <!--* ... *--> </article
For convenience in trying to understand this problem, a copy of this document has been placed at [1].
When I issue the following search against this document, I get unexpected results.
let $doc := doc('http://blackmesatech.com/2014/LIS590DML/data/testdata.xml ')
let $path1 := $doc/child::article/child::front /child::article-meta /child::aff[contains(.,"Japan")] /parent::article-meta/child::article-id, $path2 := $doc/descendant::aff[contains(.,"Japan")] /parent::article-meta/child::article-id, $path3 := $doc/article/front/article-meta/aff[contains(.,'Japan')] /../article-id, $path4 := $doc//aff[contains(.,'Japan')]/../article-id, $path5 := $doc//aff/../article-id
return (count($path1), count($path2), count($path3), count($path4), count($path5))
What I expect is that path1, path2, path3, path4, and path5 should all return the same results, namely the set of four article-id elements in the document. So the sequence of counts returned should be 4 4 4 4 4.
What I am finding is that path1 and path3 are returning 12 results, with each article-id present three times in the result (once, apparently, for every aff element containing the string 'Japan'). Paths 2, 4, and 5 are all returning 4 results each, as I had expected them to. So the sequence of counts actually returned is 12 4 12 4 4.
In BaseX 7.6, for what it's worth, this query returns the sequence 12 12 12 12 20, which seems suggestive.
Interestingly, if I initialize the variable $doc with a direct element constructor, along the lines of
let $doc := document { <article>...</article> }
then all counts come out as expected in 7.6, but in 7.9 the result continues to be 12 4 12 4 4.
Is this an error in the handling of the / operator, or am I missing some subtle point?
Many thanks.
[1] http://blackmesatech.com/2014/LIS590DML/data/testdata.xml
- C. M. Sperberg-McQueen, Black Mesa Technologies LLC
- http://www.blackmesatech.com
- http://cmsmcq.com/mib
- http://balisage.net
Dear Michael, dear Lars,
in my absence, the problem was fixed by Leo. You are invited to check out the latest snapshot [1]!
Thanks for reporting the bug, Christian
[1] http://files.basex.org/releases/latest
On Mon, Sep 29, 2014 at 10:36 AM, Dirk Kirsten dk@basex.org wrote:
Hello all,
there seem to be two unrelated bugs here. The good news is that one seem to be already fixed. The unexpected 12 4 12 4 4 result does not appear anymore when using the latest 8.0 snapshot. Instead, independently from how I set up $doc (so either doc() or direct via document{} ) the result is 12 12 12 12 20. If you don't want to update to the latest snapshot, you could also use fetch:xml(), which also seems not to be affected by this bug.
So at least we have a consistent result now, but I agree that it is in fact incorrect. There seems to be something going wrong during the parent, followed by a child step. I opened a bug report at https://github.com/BaseXdb/basex/issues/1001.
For the time being, rewriting the query seems the only option. I would suggest using
let $path1 := $doc/child::article/child::front /child::article-meta[child::aff[contains(.,"Japan")]] /child::article-id
which does produce the correct result. In fact, I would even argue it is a bit more elegant as you don't have to rewrite article-meta and the child::aff step seems to be a selector, so using a predicator is easier to read - at least to me, but this is, of course, also a matter of taste.
Best regards, Dirk
On 09/27/2014 12:57 PM, Lars Johnsen wrote:
Hi Michael -
I got the same results with my 7.9 version. What is a bit surprising (hopefully I am not introducing any noise into your problem) is that if the last child step is cut off the paths and added to the path variables within the return query, the result becomes 4 4 4 4 4. So each $pathN without the article-id appears to be returning one copy of the parent article-meta up to the parent step:
let $path1 := $doc/child::article/child::front /child::article-meta /child::aff[contains(.,"Japan")] /parent::article-meta, $path2 := $doc/descendant::aff[contains(.,"Japan")] /parent::article-meta, $path3 := $doc/article/front/article-meta/aff[contains(.,'Japan')] /.., $path4 := $doc//aff[contains(.,'Japan')]/.., $path5 := $doc//aff/..
return (count($path1/child::article-id), count($path2/article-id), count($path3/article-id), count($path4/article-id), count($path5/article-id))
Best Lars
2014-09-26 20:35 GMT+02:00 C. M. Sperberg-McQueen cmsmcq@blackmesatech.com :
Consider the following XML document:
<article> <front> <article-meta> <aff id="aff1">Tropical and Infectious Disease Hospital, Kathmandu, Nepal</aff> <aff id="aff2">Nagasaki University, Nagasaki, Japan</aff> <aff id="aff3">Department of Radiology, Kyorin University Faculty of Medicine, Tokyo, Japan</aff> <aff id="aff5">Pentax Company Limited, Tokyo, Japan</aff> <aff id="aff6">National Research Laboratory of Molecular Complex Control, Yonsei University, Seoul, Korea</aff> <!--* ... *--> <article-id pub-id-type="pmc">2570825</article-id> <article-id pub-id-type="pmid">18325280</article-id> <article-id pub-id-type="publisher-id">07-0473</article-id> <article-id pub-id-type="doi">10.3201/eid1403.070473</article-id> </article-meta> </front> <!--* ... *--> </article
For convenience in trying to understand this problem, a copy of this document has been placed at [1].
When I issue the following search against this document, I get unexpected results.
let $doc := doc('http://blackmesatech.com/2014/LIS590DML/data/testdata.xml ')
let $path1 := $doc/child::article/child::front /child::article-meta /child::aff[contains(.,"Japan")] /parent::article-meta/child::article-id, $path2 := $doc/descendant::aff[contains(.,"Japan")] /parent::article-meta/child::article-id, $path3 := $doc/article/front/article-meta/aff[contains(.,'Japan')] /../article-id, $path4 := $doc//aff[contains(.,'Japan')]/../article-id, $path5 := $doc//aff/../article-id
return (count($path1), count($path2), count($path3), count($path4), count($path5))
What I expect is that path1, path2, path3, path4, and path5 should all return the same results, namely the set of four article-id elements in the document. So the sequence of counts returned should be 4 4 4 4 4.
What I am finding is that path1 and path3 are returning 12 results, with each article-id present three times in the result (once, apparently, for every aff element containing the string 'Japan'). Paths 2, 4, and 5 are all returning 4 results each, as I had expected them to. So the sequence of counts actually returned is 12 4 12 4 4.
In BaseX 7.6, for what it's worth, this query returns the sequence 12 12 12 12 20, which seems suggestive.
Interestingly, if I initialize the variable $doc with a direct element constructor, along the lines of
let $doc := document { <article>...</article> }
then all counts come out as expected in 7.6, but in 7.9 the result continues to be 12 4 12 4 4.
Is this an error in the handling of the / operator, or am I missing some subtle point?
Many thanks.
[1] http://blackmesatech.com/2014/LIS590DML/data/testdata.xml
- C. M. Sperberg-McQueen, Black Mesa Technologies LLC
- http://www.blackmesatech.com
- http://cmsmcq.com/mib
- http://balisage.net
-- Dirk Kirsten, BaseX GmbH, http://basex.org |-- Firmensitz: Blarerstrasse 56, 78462 Konstanz |-- Registergericht Freiburg, HRB: 708285, Geschäftsführer: | Dr. Christian Grün, Dr. Alexander Holupirek, Michael Seiferle `-- Phone: 0049 7531 28 28 676, Fax: 0049 7531 20 05 22
On Sep 29, 2014, at 2:36 AM, Dirk Kirsten wrote:
... For the time being, rewriting the query seems the only option. I would suggest using
let $path1 := $doc/child::article/child::front /child::article-meta[child::aff[contains(.,"Japan")]] /child::article-id
which does produce the correct result. In fact, I would even argue it is a bit more elegant as you don't have to rewrite article-meta and the child::aff step seems to be a selector, so using a predicator is easier to read - at least to me, but this is, of course, also a matter of taste.
You are quite right, and my taste agrees with yours.
The unexpected results turned up in the evaluation of a solution to a homework problem offered by a student who is just learning XPath; I'll be happy when the students have learned enough XPath that we are able to talk, in class, about rewriting queries for clarity or to work around unexpected results.
basex-talk@mailman.uni-konstanz.de