On Fri, Mar 13, 2015 at 2:09 PM, Christian Grün <christian.gruen@gmail.com> wrote:

Hi Simon,

Thanks for your example code. I did some performance tests with the
example you provided, and I could observe a similar pattern when
adding the same document again and again. It didn't occur when adding
an different documents, so I guess the culprit here are the internal
index ID lists that get pretty huge if the documents always contain
the same contents.

I would be interested to hear if you have done your experiment with
the same document or different instances?

Best,
Christian

On Fri, Mar 13, 2015 at 10:54 AM, Simon Chatelain <schatela@gmail.com> wrote:
> Hello,
>
> First let me give you the context: I have a never ending stream of XML
> element coming in that I want to store and then make available through a
> REST interface.
> Thus BaseX seems to be a well suited candidate. To be on the safe side I
> must be able to sustain an insertion rate of about 200 elements per second.
>
> The XML elements I have to store are of the type:
>
> <notification ts=”2015-03-13T10.44.25.123” nid=”type-of-data”>
> <name-1>value1</ name-1>
> <name-2>value2</ name-2>
> <name-3>value3</ name-3>
> <name-4>value4</ name-4>
> ….
> </notification>
>
> So quite simple and small.
> I will mainly retrieve data by selecting notifications of a specific @nid
> between two @ts values, thus I need an attribute index.
>
> I am using for now an embedded BaseX DB, to test the insertion of elements.
>
> Here is how I configure my DB:
>
> Context m_Context = new Context();
> new Set(MainOptions.AUTOFLUSH, false).execute(m_Context);
> new Set(MainOptions.ADDCACHE, false).execute(m_Context);
> new Set(MainOptions.INTPARSE, true).execute(m_Context);
> new Set(MainOptions.STRIPNS, true).execute(m_Context);
> new Set(MainOptions.UPDINDEX, true).execute(m_Context);
> new Set(MainOptions.TEXTINDEX, false).execute(m_Context);
> new Set(MainOptions.ATTRINDEX, true).execute(m_Context);
> new CreateDB(_SourceId).execute(m_Context);
>
> And this is how I insert the elements:
>
> try {
> String l_XmlRepresentation = _Notification.getXmlRepresentation();
> if (l_XmlRepresentation.isEmpty()) {
> return;
> }
> ByteArrayInputStream l_InputStream = new
> ByteArrayInputStream(l_XmlRepresentation.getBytes(m_Charset));
> Add add = new Add(_Notification.getSourceId());
> add.setInput(l_InputStream);
> add.execute(m_Context);
> if (_CurrentNotification % 10000 == 0) { // flush every 10000
> notifications
> new Flush().execute(m_Context);
> }
> }
> catch (BaseXException ex) {
> s_Logger.log(Level.SEVERE, null, ex);
> }
>
>
> The performances I get are as follows
>
> Size 10'000, Speed: 1'292
> Size 20'000, Speed: 625
> Size 30'000, Speed: 361
> Size 40'000, Speed: 248
> Size 50'000, Speed: 184
> Size 60'000, Speed: 148
> Size 70'000, Speed: 123
> Size 80'000, Speed: 104
> Size 90'000, Speed: 91
> Size 100'000, Speed: 77
> Size 110'000, Speed: 69
> Size 120'000, Speed: 61
> Size 130'000, Speed: 56
> Size 140'000, Speed: 46
>
> Where “Size” is the number of elements in the collection and “Speed” is
> average speed of insertion [in element per second] of the last 10000
> elements.
>
> My question is: do those performances seem normal or am I doing something
> wrong, knowing that with UPDINDEX = false, I have a steady insertion rate of
> 10000 elements per second.
>
> Thanks a lot
>
> Simon
>