Hi Simon,
Thanks for your example code. I did some performance tests with the example you provided, and I could observe a similar pattern when adding the same document again and again. It didn't occur when adding an different documents, so I guess the culprit here are the internal index ID lists that get pretty huge if the documents always contain the same contents.
I would be interested to hear if you have done your experiment with the same document or different instances?
Best, Christian
On Fri, Mar 13, 2015 at 10:54 AM, Simon Chatelain schatela@gmail.com wrote:
Hello,
First let me give you the context: I have a never ending stream of XML element coming in that I want to store and then make available through a REST interface. Thus BaseX seems to be a well suited candidate. To be on the safe side I must be able to sustain an insertion rate of about 200 elements per second.
The XML elements I have to store are of the type:
<notification ts=”2015-03-13T10.44.25.123” nid=”type-of-data”> <name-1>value1</ name-1> <name-2>value2</ name-2> <name-3>value3</ name-3> <name-4>value4</ name-4> …. </notification>
So quite simple and small. I will mainly retrieve data by selecting notifications of a specific @nid between two @ts values, thus I need an attribute index.
I am using for now an embedded BaseX DB, to test the insertion of elements.
Here is how I configure my DB:
Context m_Context = new Context(); new Set(MainOptions.AUTOFLUSH, false).execute(m_Context); new Set(MainOptions.ADDCACHE, false).execute(m_Context); new Set(MainOptions.INTPARSE, true).execute(m_Context); new Set(MainOptions.STRIPNS, true).execute(m_Context); new Set(MainOptions.UPDINDEX, true).execute(m_Context); new Set(MainOptions.TEXTINDEX, false).execute(m_Context); new Set(MainOptions.ATTRINDEX, true).execute(m_Context); new CreateDB(_SourceId).execute(m_Context);
And this is how I insert the elements:
try { String l_XmlRepresentation = _Notification.getXmlRepresentation(); if (l_XmlRepresentation.isEmpty()) { return; } ByteArrayInputStream l_InputStream = new ByteArrayInputStream(l_XmlRepresentation.getBytes(m_Charset)); Add add = new Add(_Notification.getSourceId()); add.setInput(l_InputStream); add.execute(m_Context); if (_CurrentNotification % 10000 == 0) { // flush every 10000 notifications new Flush().execute(m_Context); } } catch (BaseXException ex) { s_Logger.log(Level.SEVERE, null, ex); }
The performances I get are as follows
Size 10'000, Speed: 1'292 Size 20'000, Speed: 625 Size 30'000, Speed: 361 Size 40'000, Speed: 248 Size 50'000, Speed: 184 Size 60'000, Speed: 148 Size 70'000, Speed: 123 Size 80'000, Speed: 104 Size 90'000, Speed: 91 Size 100'000, Speed: 77 Size 110'000, Speed: 69 Size 120'000, Speed: 61 Size 130'000, Speed: 56 Size 140'000, Speed: 46
Where “Size” is the number of elements in the collection and “Speed” is average speed of insertion [in element per second] of the last 10000 elements.
My question is: do those performances seem normal or am I doing something wrong, knowing that with UPDINDEX = false, I have a steady insertion rate of 10000 elements per second.
Thanks a lot
Simon