For my uses, "string()" seems to be extremely extremely slow at processing big data, you should try without it.
Best regards
Florent
On Tue, Dec 30, 2014 at 2:38 PM, Mansi Sheth mansi.sheth@gmail.com wrote:
Hello,
Wanted to get back to this email chain and share my experience.
I got this running beautifully (including all post processing of results), using the below command:
curl -ig ' http://localhost:8984/rest?run=get_query.xq&n=/Archives/*/descendant::D/...)' | cut -d: -f1 | cut -d. -f1-3 | sort | uniq -c | sort -n -r
I am using Basex 8.0 beta 763cc93 build. Running this on i7 2.7GHZ MBP, giving 8GB to basexhttp process. it took around 34 min on a 41 GB data. I think, lot of time went in post processing (sorting) the result set, rather than actually extracting the results from BaseX DB.
When tried a similar query on a much smaller database(3GB) on a much powerful amazon instance, giving 20GB RAM to basex http process, got me results with post processing within 4 mins.
Thanks for all your inputs guys,
Keep BaseXing... !!!
- Mansi
On Fri, Nov 7, 2014 at 12:25 PM, Mansi Sheth mansi.sheth@gmail.com wrote:
This email chain, is extremely helpful. Thanks a ton guys. Certainly one of the most helpful folks here :)
I have to try a lot of these suggestions but currently I am being pulled into something else, so I have to pause for the time being.
Will get back to this email thread, after trying a few things and my relevant observations.
- Mansi
On Fri, Nov 7, 2014 at 3:48 AM, Fabrice Etanchaud <fetanchaud@questel.com
wrote:
Hi Mansi,
From what I can see,
for each pqr value, you could use db:attribute-range to retrieve all the file names, group by/count to obtain statistics.
You could also create a new collection from an extraction of only the data you need, changing @name into element and use full text fuzzy match.
Hoping it helps
Cordialement
Fabrice
*De :* basex-talk-bounces@mailman.uni-konstanz.de [mailto: basex-talk-bounces@mailman.uni-konstanz.de] *De la part de* Mansi Sheth *Envoyé :* jeudi 6 novembre 2014 20:55 *À :* Christian Grün *Cc :* BaseX *Objet :* Re: [basex-talk] Out Of Memory
I would be doing tons of post processing. I never use UI. I either use REST thru cURL or command line.
I would basically need data in below format:
XML File Name, @name
I am trying to whitelist picking up values for only "starts-with(@name,"pqr"). where "pqr" is a list of 150 odd values.
My file names, are essentially some ID/keys, which I would need to map it further using sqlite to some values and may be group by it.. etc.
So, basically I am trying to visualize some data, based on its existence in which xml files. So, yes count(<query>) would be fine, but won't solve much purpose, since I still need value "pqr".
- Mansi
On Thu, Nov 6, 2014 at 11:19 AM, Christian Grün < christian.gruen@gmail.com> wrote:
Query: /A/*//E/@name/string()
In the GUI, all results will be cached, so you could think about switching to command line.
Do you really need to output all results, or do you do some further processing with the intermediate results?
For example, the query "count(/A/*//E/@name/string())" will probably run without getting stuck.
This query, was going OOM, within few mins.
I tried a few ways, of whitelisting, with contain clause, to truncate
the
result set. That didn't help too. So, now I am out of ideas. This is
giving
JVM 10GB of dedicated memory.
Once, above query works and doesn't go Out Of Memory, I also need corresponding file names too:
XYZ.xml //E/@name PQR.xml //E/@name
Let me know if you would need more details, to appreciate the issue ?
- Mansi
On Thu, Nov 6, 2014 at 8:48 AM, Christian Grün <
christian.gruen@gmail.com>
wrote:
Hi Mansi,
I think we need more information on the queries that are causing the problems.
Best, Christian
On Wed, Nov 5, 2014 at 8:48 PM, Mansi Sheth mansi.sheth@gmail.com
wrote:
Hello,
I have a use case, where I have to extract lots in information from
each
XML in each DB. Something like, attribute values of most of the nodes
in an
XML. For such, queries based goes Out Of Memory with below exception. I
am
giving it ~12GB of RAM on i7 processor. Well I can't complain here since I
am
most definitely asking for loads of data, but is there any way I can get these kinds of data successfully ?
mansi-veracode:BigData mansiadmin$ ~/Downloads/basex/bin/basexhttp BaseX 8.0 beta b45c1e2 [Server] Server was started (port: 1984) HTTP Server was started (port: 8984) Exception in thread "qtp2068921630-18" java.lang.OutOfMemoryError:
Java
heap space at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.addConditionWaiter(AbstractQueuedSynchronizer.java:1857)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2073)
at
org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:342)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:526)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(QueuedThreadPool.java:44)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
at java.lang.Thread.run(Thread.java:744)
--
- Mansi
--
- Mansi
--
- Mansi
--
- Mansi
--
- Mansi