I don't know what causes the gradual slowdown. My assumption was that it was the "optimize" which would cause the index to be built, so I didn't expect a slowdown at all during "add" calls, especially when autoflush is false.
I add documents with the following paths:
/f/f/e/ffe0f5be2aa14e81050f759c8f9c3eb7.xml
The xml file name is a hash of the contents, and it is placed in a path such that the export spreads out the files nicely into a file system tree, rather than putting a million docs into one directory.
The document content is nothing special, wrapped in a special tag:
<narthex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" id="20412518" mod="2014-09-23T11:11:51.007+02:00"> <record> <priref>20412518</priref> <current_location>FTA</current_location> <current_location.type/> <description>Ingang op de binnenplaats van de zuidvleugel</description> <collection>Fotocollectie</collection> <production.date.start>1925-08-06</production.date.start> <reproduction.format/>
<reproduction.reference>2186abf4-7108-f9b8-ffbb-902881afe836</reproduction.reference> <creator.role>Fotograaf</creator.role> <object_number>9.387</object_number> <monument.label/> <monument.zipcode/> <monument.name>Kasteel Hoensbroek</monument.name> <monument.record_number>284330</monument.record_number> <reproduction.date/> <reproduction.notes>Oude filepath: 0009\009387.jpg</reproduction.notes> <reproduction.type/> <reproduction.creator/> <rights.type>Copyright</rights.type> <technique>Neg.zw</technique> <creator>Scheepens, W.C.L.A.</creator> <order_number>avh04-2008</order_number> <input.date>2008-04-01</input.date> <edit.date>2011-05-03</edit.date> <edit.date>2008-04-28</edit.date> <monument.historical_address/> <content.subject.type value="SUBJECT" option="SUBJECT"> <text language="0">subject</text> <text language="1">onderwerp</text> <text language="2">sujet</text> <text language="3">Thema</text> <text language="4">موضوع</text> <text language="6">θέμα</text> </content.subject.type> <content.subject.type value="SUBJECT" option="SUBJECT"> <text language="0">subject</text> <text language="1">onderwerp</text> <text language="2">sujet</text> <text language="3">Thema</text> <text language="4">موضوع</text> <text language="6">θέμα</text> </content.subject.type> <content.subject>Kasteel</content.subject> <content.subject>Binnenplaats</content.subject> <monument.province>Limburg</monument.province> <monument.place>Hoensbroek</monument.place> <monument.number/> <monument.county/> <monument.country>Nederland</monument.country> <monument.house_number>18</monument.house_number> <monument.street>Klinkertstraat</monument.street> <monument.house_number.addition/> <monument.complex_number/> <monument.number.x_coordinates/> <monument.number.y_coordinates/> <monument.geographical_keyword/> <monument.complex_number.x_coordinates/> <monument.complex_number.y_coordinates/> <creator.date_of_birth/> <creator.date_of_death/> <input.name>a.vanhoute</input.name> <edit.name>RCEadmin</edit.name> <edit.name>a.vanhoute</edit.name> <creator.history/> <record_type value="OBJECT" option="OBJECT"> <text language="0">single object</text> <text language="2">objet individuel</text> <text language="3">Einzelnes Objekt</text> </record_type> <edit.time>03:10:32</edit.time> <edit.time>11:17:08</edit.time> <input.time>09:58:28</input.time> <input.source>document>photographs</input.source> <edit.source>collect>photograph</edit.source> <edit.source>document>photographs</edit.source> </record> </narthex>
On Tue, Sep 23, 2014 at 11:36 AM, Christian Grün christian.gruen@gmail.com wrote:
I set up to use the 8.0-SNAPSHOT and used the internal parser as well.
In
your example you're not really giving much of a challenge to the index, since every doc is just <a/>.
If I get it right, you assume the slowdown is due to the index structures?
With respect to ADD, I'm not seeing a significant performance difference:
Please give us more info on the data you are adding. Could you provide us with a sample document?
8.0-SNAPSHOT
10000: 9250ms 20000: 7626ms 30000: 7885ms 40000: 8111ms 50000: 8365ms 60000: 8784ms 70000: 9270ms 80000: 9692ms 90000: 10158ms 100000: 10612ms 110000: 11018ms 120000: 11478ms 130000: 11940ms 140000: 12505ms 150000: 13047ms 160000: 13536ms 170000: 14055ms 180000: 14371ms 190000: 14883ms 200000: 15330ms 210000: 15888ms 220000: 16398ms 230000: 16878ms 240000: 17038ms 250000: 17453ms 260000: 17965ms 270000: 18317ms 280000: 18832ms 290000: 19373ms 300000: 19735ms 310000: 20062ms 320000: 20675ms 330000: 21113ms 340000: 21754ms 350000: 22887ms 360000: 22810ms 370000: 22985ms 380000: 23506ms 390000: 23856ms 400000: 24338ms
7.9
10000: 8229ms 20000: 7587ms 30000: 7973ms 40000: 8282ms 50000: 8717ms 60000: 9294ms 70000: 10105ms 80000: 10669ms 90000: 11301ms 100000: 11835ms 110000: 12413ms 120000: 13000ms 130000: 13577ms 140000: 14331ms 150000: 14488ms 160000: 15025ms 170000: 15463ms 180000: 15815ms 190000: 16153ms 200000: 16314ms 210000: 16562ms 220000: 17186ms 230000: 17862ms 240000: 18340ms 250000: 18790ms 260000: 19313ms 270000: 19850ms 280000: 20225ms 290000: 20650ms 300000: 21062ms 310000: 21595ms 320000: 22022ms 330000: 22414ms 340000: 22925ms 350000: 23514ms 360000: 23762ms 370000: 24360ms 380000: 25028ms 390000: 25446ms 400000: 25700ms
- Gerald de Jong
On Thu, Sep 18, 2014 at 6:57 PM, Christian Grün <
christian.gruen@gmail.com>
wrote:
Perhaps you can give me a hint as to why inserts slow down.j
I didn't have time to check out 7.9, but I have done some testing with 8.0, and I didn't notice a real slow-down. This is Java testing script (1 mio documents are added in just 17 seconds; I'm using the internal BaseX parser to speed up the import):
Performance p = new Performance(); Context ctx = new Context(); new CreateDB("db").execute(ctx); new Set(MainOptions.AUTOFLUSH, false).execute(ctx); new Set(MainOptions.INTPARSE, true).execute(ctx); for(int i = 0; i < 1000000; i++) { new Add("db", "<a/>").execute(ctx); } ctx.close(); System.out.println(p);
Hope this helps, Christian
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805