Re: [basex-talk] OutOfMemoryError at Query#more()

22 Sep 2017

      Bonjour Fabrice,
Thanks for the suggestion. I did try that (sending a query for each document),
and it does work … sort of. Performance wise, it's really slow even if the
database is fully optimized.
As for writing my process in xquery, that’s a good question. Honestly I
don’t know as I am quite new at xquery, I lack the expertise.
I’ll try to give more detail about what I am trying to achieve.
In my database I have a series of XML documents, which, once really
simplified, look like that.
<notif id ="name1" ts="2016-01-01T08:01:05.000">
<flag>0</flag>
</notif>
<notif id ="name1" ts="2016-01-01T08:01:10.000">
<flag>0</flag>
</notif>
<notif id ="name1" ts="2016-01-01T08:01:15.000">
<flag>0</flag>
</notif>
...
<notif id ="name1" ts="2016-01-01T08:01:20.000">
<flag>1</flag>
</notif>
<notif id ="name1" ts="2016-01-01T08:01:25.000">
<flag>0</flag>
</notif>
<notif id ="name1" ts="2016-01-01T08:01:30.000">
<flag>0</flag>
</notif>
<notif id ="name1" ts="2016-01-01T08:01:35.000">
<flag>0</flag>
</notif>
...
<notif id ="name1" ts="2016-01-01T08:01:40.000">
<flag>1</flag>
</notif>
What I need to get is:
The first XML document (first as in smallest @ts value)
Then the next document with <flag>1</flag> (again next in the @ts order)
Then the next document with <flag>0</flag>
And so on…
That would be the documents highlighted in red in the above example.
Roughly only 1 out of 1000 documents has <flag>1</flag>
I tried several approaches to do that, but the faster one I found is to
iterate through all documents with a very simple xquery and keep only the
ones I need,
for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ return $d
Another approach was to first select all documents with <flag>1</flag>
for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ and $d/flag = 1
return $d
then for each of those get the next document
(for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ and $d/flag = 0
and $d/@ts > ‘[ts of previous document]’ return $d)[1]
Or select the first document,
(for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ return $d)[1]
then query the next
(for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ and $d/flag =
1 and $d/@ts > ‘[ts of previous document]’ return $d)[1]
And the next…
(for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ and $d/flag = 0
and $d/@ts > ‘[ts of previous document]’ return $d)[1]
And so on.
But none of those is as fast as the first one, and then I hit this
OutOfMemory issue.
So if there is a way to rewrite all that process in xquery that could be an
option worth trying, or if there is a more efficient way to write the query
(for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ and $d/flag = 0
and $d/@ts > ‘[ts of previous document]’ return $d)[1]
That could also solve my problem.
Regards
Simon
On 22 September 2017 at 09:53, Fabrice ETANCHAUD <
fetanchaud@pch.cerfrance.fr> wrote:
...
Bonjour  Simon,
I would send a query for each document,
externalizing the loop in java.
A question : could you process be written in xquery ? That way you might
not face memory overflow.
Best regards,
Fabrice Etanchaud
CERFrance Poitou-Charentes
*De :* basex-talk-bounces@mailman.uni-konstanz.de [mailto:
basex-talk-bounces@mailman.uni-konstanz.de] *De la part de* Simon
Chatelain
*Envoyé :* vendredi 22 septembre 2017 09:34
*À :* BaseX
*Objet :* [basex-talk] OutOfMemoryError at Query#more()
Hello,
I am facing an issue while retrieving some big amount of XML documents
from a BaseX collection.
Each document (as an XML file) is around 10 KB, and in the problematic
case I must retrieve around 70000 of them.
I am using Session#query(String query) then Query#more() and Query#next()
to iterate through the result of my query.
try (final Query query = l_Session.query(“query”)) {
while (query.more()) {
            String xml = query.next();

}
}
If there is more than a certain amount of XML document in the result of my
query I get a OutOfMemoryError (full stack trace in attached file) when
executing query.more().
I did the test with BaseX 8.6.6 and 8.6.7, Java 8, VM arguments –Xmx1024m
Increasing the Xmx value is not a solution as I don’t know what the
maximum amount of data I will have to retrieve in the future. So what I
need is a reliable way of executing such queries and iterate through the
result without exploding the heap size.
I also try to use QueryProcessor and QueryProcessor#iter() instead of Session#query(String
query). But is it safe to use it knowing that my application is
multithreaded and that each thread has its own session to query or add
elements from/to multiple collections?
Moreover, for now all access to BaseX are done through a session, so my
application can run with an embedded BaseX or with a BaseX server. If I
start using QueryProcessor, then it will be embedded BaseX only, right?
I also attached a simple example showing the problem.
Any advice would be much appreciated
Thanks
Simon

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] OutOfMemoryError at Query#more()