Re: [basex-talk] Out Of Memory

7 Nov 2014

      Hi Mansi,
...
Once, above query works and doesn't go Out Of Memory, I also need corresponding file names too:
Sorry, I skipped this one. Here is one way to do it:
declare option output:item-separator "&#xa;";
for $db in db:open('....')
let $path := db:path($db)
for $name in $db//E/@name
return $path || out:tab() || $name
I was surprised to hear that you are getting OOM errors on
command-line, because the query you mentioned should then be evaluated
in a streaming fashion (i. e., it should require very low and constant
memory).
Could you try the above query? If it fails, could you possibly send me
the query plan? On command line, it can be retrieved via the -x flag.
I just remember that you have been using xquery:eval, right? My guess
it that it occurs in combination with this function, because it may
require all results to be cached before they are being sent back to
the client. Do you think you can alternatively put your queries into
files, or do you need more flexibility?
Christian
On Thu, Nov 6, 2014 at 8:58 PM, Mansi Sheth mansi.sheth@gmail.com wrote:
...
Briefly explaining, trying to extract these values/per xml file (where .xml
files are ID), to map it to its corresponding values.
Imagine, you have 100s of customers, and each customer uses/needs 1000s of
different "@name". These "@name" would be similar across customer, but few
would be using some values, few customer some other. Trying to collect all
this information and find, which "@name" is used by most customer and so on
and so forth. There are few such use cases, this one being most generic.
On Thu, Nov 6, 2014 at 11:23 AM, Fabrice Etanchaud fetanchaud@questel.com
wrote:
...
The solution depends on the usage you will have of your extraction.
May I ask you what is your extraction for ?
Best regards,
Fabrice
De : Mansi Sheth [mailto:mansi.sheth@gmail.com]
Envoyé : jeudi 6 novembre 2014 17:11
À : Fabrice Etanchaud
Cc : Christian Grün; BaseX
Objet : Re: [basex-talk] Out Of Memory
Interesting idea, I thought of using db partition, but didn't pursue it
further, mainly due to below thought process.
Currently, I am ingesting ~3000 xml files, storing ~50 xml files per db,
which would be growing quickly. So, below approach would lead to ~3000 more
files (which would be increasing), increasing I/O operations considerably
for further pre-processing.
However, I don't really care if process takes few minutes to few hours (as
long as its not day(s) ;)). Given the situation and my options, I would
surely try this.
Database, is currently indexed at attribute level, as thats what I would
be querying the most. Do you think, I should do anything differently ?
Thanks,

Mansi

On Thu, Nov 6, 2014 at 10:48 AM, Fabrice Etanchaud
fetanchaud@questel.com wrote:
Hi Mansi,
Here you have a natural partition of your data : the files you ingested.
So my first suggestion would be to query your data on a file basis:
for $doc in db:open(‘your_collection_name’)
let $file-name := db:path($doc)
return
            file:write(

$file-name,
<names>
                           {

                                           for $name in

$doc//E/@name/data()
                                           return

<name>{$name}</name>
}
</names>
)
Is it for indexing ?
Hope it helps,
Best regards,
Fabrice Etanchaud
Questel/Orbit
De : basex-talk-bounces@mailman.uni-konstanz.de
[mailto:basex-talk-bounces@mailman.uni-konstanz.de] De la part de Mansi
Sheth
Envoyé : jeudi 6 novembre 2014 16:33
À : Christian Grün
Cc : BaseX
Objet : Re: [basex-talk] Out Of Memory
This would need a lot of details, so bear with me below:
Briefly my XML files look like:
<A name="">
<B name="">

   <C name="">

        <D name="">

             <E name=""/>

<A> can contain <B>, <C> or <D> and B, C or D can contain E. We have 1000s
(currently 3000 in my test data set) of such xml files, of size 50MB on an
average. Its tons of data ! Currently, my database is of ~18GB in size.
Query: /A/*//E/@name/string()
This query, was going OOM, within few mins.
I tried a few ways, of whitelisting, with contain clause, to truncate the
result set. That didn't help too. So, now I am out of ideas. This is giving
JVM 10GB of dedicated memory.
Once, above query works and doesn't go Out Of Memory, I also need
corresponding file names too:
XYZ.xml //E/@name
PQR.xml //E/@name
Let me know if you would need more details, to appreciate the issue ?

Mansi

On Thu, Nov 6, 2014 at 8:48 AM, Christian Grün christian.gruen@gmail.com
wrote:
Hi Mansi,
I think we need more information on the queries that are causing the
problems.
Best,
Christian
On Wed, Nov 5, 2014 at 8:48 PM, Mansi Sheth mansi.sheth@gmail.com wrote:
...
Hello,
I have a use case, where I have to extract lots in information from each
XML
in each DB. Something like, attribute values of most of the nodes in an
XML.
For such, queries based goes Out Of Memory with below exception. I am
giving
it ~12GB of RAM on i7 processor. Well I can't complain here since I am
most
definitely asking for loads of data, but is there any way I can get
these
kinds of data successfully ?
mansi-veracode:BigData mansiadmin$ ~/Downloads/basex/bin/basexhttp
BaseX 8.0 beta b45c1e2 [Server]
Server was started (port: 1984)
HTTP Server was started (port: 8984)
Exception in thread "qtp2068921630-18" java.lang.OutOfMemoryError: Java
heap
space
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.addConditionWaiter(AbstractQueuedSynchronizer.java:1857)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2073)
at
org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:342)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:526)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(QueuedThreadPool.java:44)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
at java.lang.Thread.run(Thread.java:744)
--

Mansi

--

Mansi

--

Mansi

--

Mansi

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] Out Of Memory