Hi Christian,
Thank for answering.
In my case, no. All the documents are different. Similar but different, referring to my example, the list of value inside notifications are not always the same, the values are not always the same, especially the @ts attribute values are almost all different. There is only the @nid attribute who can take only about 9 different values.
Note: forget to mention that in my first message: I am using last official release of BaseX (v 8.0.2)
Regards
Simon
On Fri, Mar 13, 2015 at 2:09 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi Simon,
Thanks for your example code. I did some performance tests with the example you provided, and I could observe a similar pattern when adding the same document again and again. It didn't occur when adding an different documents, so I guess the culprit here are the internal index ID lists that get pretty huge if the documents always contain the same contents.
I would be interested to hear if you have done your experiment with the same document or different instances?
Best, Christian
On Fri, Mar 13, 2015 at 10:54 AM, Simon Chatelain schatela@gmail.com wrote:
Hello,
First let me give you the context: I have a never ending stream of XML element coming in that I want to store and then make available through a REST interface. Thus BaseX seems to be a well suited candidate. To be on the safe side I must be able to sustain an insertion rate of about 200 elements per
second.
The XML elements I have to store are of the type:
<notification ts=”2015-03-13T10.44.25.123” nid=”type-of-data”> <name-1>value1</ name-1> <name-2>value2</ name-2> <name-3>value3</ name-3> <name-4>value4</ name-4> …. </notification>
So quite simple and small. I will mainly retrieve data by selecting notifications of a specific @nid between two @ts values, thus I need an attribute index.
I am using for now an embedded BaseX DB, to test the insertion of
elements.
Here is how I configure my DB:
Context m_Context = new Context(); new Set(MainOptions.AUTOFLUSH, false).execute(m_Context); new Set(MainOptions.ADDCACHE, false).execute(m_Context); new Set(MainOptions.INTPARSE, true).execute(m_Context); new Set(MainOptions.STRIPNS, true).execute(m_Context); new Set(MainOptions.UPDINDEX, true).execute(m_Context); new Set(MainOptions.TEXTINDEX, false).execute(m_Context); new Set(MainOptions.ATTRINDEX, true).execute(m_Context); new CreateDB(_SourceId).execute(m_Context);
And this is how I insert the elements:
try { String l_XmlRepresentation = _Notification.getXmlRepresentation(); if (l_XmlRepresentation.isEmpty()) { return; } ByteArrayInputStream l_InputStream = new ByteArrayInputStream(l_XmlRepresentation.getBytes(m_Charset)); Add add = new Add(_Notification.getSourceId()); add.setInput(l_InputStream); add.execute(m_Context); if (_CurrentNotification % 10000 == 0) { // flush every 10000 notifications new Flush().execute(m_Context); } } catch (BaseXException ex) { s_Logger.log(Level.SEVERE, null, ex); }
The performances I get are as follows
Size 10'000, Speed: 1'292 Size 20'000, Speed: 625 Size 30'000, Speed: 361 Size 40'000, Speed: 248 Size 50'000, Speed: 184 Size 60'000, Speed: 148 Size 70'000, Speed: 123 Size 80'000, Speed: 104 Size 90'000, Speed: 91 Size 100'000, Speed: 77 Size 110'000, Speed: 69 Size 120'000, Speed: 61 Size 130'000, Speed: 56 Size 140'000, Speed: 46
Where “Size” is the number of elements in the collection and “Speed” is average speed of insertion [in element per second] of the last 10000 elements.
My question is: do those performances seem normal or am I doing something wrong, knowing that with UPDINDEX = false, I have a steady insertion
rate of
10000 elements per second.
Thanks a lot
Simon