Re: [basex-talk] Out Of Memory

30 Dec 2014

      For my uses, "string()" seems to be extremely extremely slow at processing
big data, you should try without it.
Best regards
Florent
On Tue, Dec 30, 2014 at 2:38 PM, Mansi Sheth mansi.sheth@gmail.com wrote:
...
Hello,
Wanted to get back to this email chain and share my experience.
I got this running beautifully (including all post processing of results),
using the below command:
curl -ig '
http://localhost:8984/rest?run=get_query.xq&n=/Archives/*/descendant::D/...)'
| cut -d: -f1 | cut -d. -f1-3 | sort | uniq -c | sort -n -r
I am using Basex 8.0 beta 763cc93 build. Running this on i7  2.7GHZ MBP,
giving 8GB to basexhttp process. it took around 34 min on a 41 GB data. I
think, lot of time went in post processing (sorting) the result set, rather
than actually extracting the results from BaseX DB.
When tried a similar query on a much smaller database(3GB) on a much
powerful amazon instance, giving 20GB RAM to basex http process, got me
results with post processing within 4 mins.
Thanks for all your inputs guys,
Keep BaseXing... !!!

Mansi

On Fri, Nov 7, 2014 at 12:25 PM, Mansi Sheth mansi.sheth@gmail.com
wrote:
...
This email chain, is extremely helpful. Thanks a ton guys. Certainly one
of the most helpful folks here :)
I have to try a lot of these suggestions but currently I am being pulled
into something else, so I have to pause for the time being.
Will get back to this email thread, after trying a few things and my
relevant observations.

Mansi

On Fri, Nov 7, 2014 at 3:48 AM, Fabrice Etanchaud <fetanchaud@questel.com
...
wrote:
...
Hi Mansi,
From what I can see,
for each pqr value, you could use db:attribute-range to retrieve all the
file names, group by/count to obtain statistics.
You could also create a new collection from an extraction of only the
data you need, changing @name into element and use full text fuzzy match.
Hoping it helps
Cordialement
Fabrice
*De :* basex-talk-bounces@mailman.uni-konstanz.de [mailto:
basex-talk-bounces@mailman.uni-konstanz.de] *De la part de* Mansi Sheth
*Envoyé :* jeudi 6 novembre 2014 20:55
*À :* Christian Grün
*Cc :* BaseX
*Objet :* Re: [basex-talk] Out Of Memory
I would be doing tons of post processing. I never use UI. I either use
REST thru cURL or command line.
I would basically need data in below format:
XML File Name, @name
I am trying to whitelist picking up values for only
"starts-with(@name,"pqr"). where "pqr" is a list of 150 odd values.
My file names, are essentially some ID/keys, which I would need to map
it further using sqlite to some values and may be group by it.. etc.
So, basically I am trying to visualize some data, based on its existence
in which xml files. So, yes count(<query>) would be fine, but won't solve
much purpose, since I still need value "pqr".

Mansi

On Thu, Nov 6, 2014 at 11:19 AM, Christian Grün <
christian.gruen@gmail.com> wrote:
...
Query: /A/*//E/@name/string()
In the GUI, all results will be cached, so you could think about
switching to command line.
Do you really need to output all results, or do you do some further
processing with the intermediate results?
For example, the query "count(/A/*//E/@name/string())" will probably
run without getting stuck.
...
This query, was going OOM, within few mins.
I tried a few ways, of whitelisting, with contain clause, to truncate
the
...
result set. That didn't help too. So, now I am out of ideas. This is
giving
...
JVM 10GB of dedicated memory.
Once, above query works and doesn't go Out Of Memory, I also need
corresponding file names too:
XYZ.xml //E/@name
PQR.xml //E/@name
Let me know if you would need more details, to appreciate the issue ?

Mansi

On Thu, Nov 6, 2014 at 8:48 AM, Christian Grün <
christian.gruen@gmail.com>
...
wrote:
...
Hi Mansi,
I think we need more information on the queries that are causing the
problems.
Best,
Christian
On Wed, Nov 5, 2014 at 8:48 PM, Mansi Sheth mansi.sheth@gmail.com
wrote:
...
...
...
Hello,
I have a use case, where I have to extract lots in information from
each
...
...
...
XML
in each DB. Something like, attribute values of most of the nodes
in an
...
...
...
XML.
For such, queries based goes Out Of Memory with below exception. I
am
...
...
...
giving
it ~12GB of RAM on i7 processor. Well I can't complain here since I
am
...
...
...
most
definitely asking for loads of data, but is there any way I can get
these
kinds of data successfully ?
mansi-veracode:BigData mansiadmin$ ~/Downloads/basex/bin/basexhttp
BaseX 8.0 beta b45c1e2 [Server]
Server was started (port: 1984)
HTTP Server was started (port: 8984)
Exception in thread "qtp2068921630-18" java.lang.OutOfMemoryError:
Java
...
...
...
heap
space
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.addConditionWaiter(AbstractQueuedSynchronizer.java:1857)
...
...
...
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2073)
...
...
...
at
org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:342)
...
...
...
at
org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:526)
...
...
...
at
org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(QueuedThreadPool.java:44)
...
...
...
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
...
...
...
at java.lang.Thread.run(Thread.java:744)
--

Mansi

--

Mansi

--

Mansi

--

Mansi

--

Mansi

-- 
FLOSS Engineer & Lawyer

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] Out Of Memory