Hi Christian,

Thank for answering.

In my case, no. All the documents are different. Similar but different, referring to my example, the list of value inside notifications are not always the same, the values are not always the same, especially the @ts attribute values are almost all different. There is only the @nid attribute who can take only about 9 different values.

Note: forget to mention that in my first message: I am using last official release of BaseX (v 8.0.2)

Regards

Simon




On Fri, Mar 13, 2015 at 2:09 PM, Christian Grün <christian.gruen@gmail.com> wrote:
Hi Simon,

Thanks for your example code. I did some performance tests with the
example you provided, and I could observe a similar pattern when
adding the same document again and again. It didn't occur when adding
an different documents, so I guess the culprit here are the internal
index ID lists that get pretty huge if the documents always contain
the same contents.

I would be interested to hear if you have done your experiment with
the same document or different instances?

Best,
Christian


On Fri, Mar 13, 2015 at 10:54 AM, Simon Chatelain <schatela@gmail.com> wrote:
> Hello,
>
> First let me give you the context: I have a never ending stream of XML
> element coming in that I want to store and then make available through a
> REST interface.
> Thus BaseX seems to be a well suited candidate. To be on the safe side I
> must be able to sustain an insertion rate of about 200 elements per second.
>
> The XML elements I have to store are of the type:
>
> <notification ts=”2015-03-13T10.44.25.123” nid=”type-of-data”>
>     <name-1>value1</ name-1>
>     <name-2>value2</ name-2>
>     <name-3>value3</ name-3>
>     <name-4>value4</ name-4>
>     ….
> </notification>
>
> So quite simple and small.
> I will mainly retrieve data by selecting notifications of a specific @nid
> between two @ts values, thus I need an attribute index.
>
> I am using for now an embedded BaseX DB, to test the insertion of elements.
>
> Here is how I configure my DB:
>
> Context m_Context = new Context();
> new Set(MainOptions.AUTOFLUSH, false).execute(m_Context);
> new Set(MainOptions.ADDCACHE, false).execute(m_Context);
> new Set(MainOptions.INTPARSE, true).execute(m_Context);
> new Set(MainOptions.STRIPNS, true).execute(m_Context);
> new Set(MainOptions.UPDINDEX, true).execute(m_Context);
> new Set(MainOptions.TEXTINDEX, false).execute(m_Context);
> new Set(MainOptions.ATTRINDEX, true).execute(m_Context);
> new CreateDB(_SourceId).execute(m_Context);
>
> And this is how I insert the elements:
>
> try {
>     String l_XmlRepresentation = _Notification.getXmlRepresentation();
>     if (l_XmlRepresentation.isEmpty()) {
>         return;
>     }
>     ByteArrayInputStream l_InputStream = new
> ByteArrayInputStream(l_XmlRepresentation.getBytes(m_Charset));
>     Add add = new Add(_Notification.getSourceId());
>     add.setInput(l_InputStream);
>     add.execute(m_Context);
>     if (_CurrentNotification % 10000 == 0) { // flush every 10000
> notifications
>         new Flush().execute(m_Context);
>     }
> }
> catch (BaseXException ex) {
>     s_Logger.log(Level.SEVERE, null, ex);
> }
>
>
> The performances I get are as follows
>
> Size 10'000, Speed: 1'292
> Size 20'000, Speed: 625
> Size 30'000, Speed: 361
> Size 40'000, Speed: 248
> Size 50'000, Speed: 184
> Size 60'000, Speed: 148
> Size 70'000, Speed: 123
> Size 80'000, Speed: 104
> Size 90'000, Speed: 91
> Size 100'000, Speed: 77
> Size 110'000, Speed: 69
> Size 120'000, Speed: 61
> Size 130'000, Speed: 56
> Size 140'000, Speed: 46
>
> Where “Size” is the number of elements in the collection and “Speed” is
> average speed of insertion [in element per second] of the last 10000
> elements.
>
> My question is: do those performances seem normal or am I doing something
> wrong, knowing that with UPDINDEX = false, I have a steady insertion rate of
> 10000 elements per second.
>
> Thanks a lot
>
> Simon
>