I'm finding that adding documents to a database starts at about 1ms, but gradually gets slower (5ms after about 700,000). I'm doing this with autoflush off, and I've tried periodic flush an optimize commands but they have no effect.
Is this expected behavior? Is there any way to keep it speedy?
Hi Gerald,
yes, we are aware that database inserts slow down over the time. I would be interested in your experience with the latest snapshot of BaseX [1], which has an improved document index [2]. In some cases, the insertion of new files may get slower, but the replacement of existing files will be sped up a lot with this index.
Thanks in advance, Christian
[1] http://files.basex.org/releases/latest/ [2] https://github.com/BaseXdb/basex/issues/804
On Thu, Sep 18, 2014 at 11:58 AM, Gerald de Jong gerald@delving.eu wrote:
I'm finding that adding documents to a database starts at about 1ms, but gradually gets slower (5ms after about 700,000). I'm doing this with autoflush off, and I've tried periodic flush an optimize commands but they have no effect.
Is this expected behavior? Is there any way to keep it speedy?
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
Hi Christian,
Perhaps you can give me a hint as to why inserts slow down. I was imagining that most of the indexing work would be in the Optimize afterwards. Sounds like it's also a lot slower relative to giving a single file that contains the same as many documents too, right? Somehow this doesn't rhyme in my mind, so I must be missing something.
I will try find the time to try out the latest snapshot, but from what I read I guess you're not expecting greater Add speeds, just faster Replace.
Gerald
On Thu, Sep 18, 2014 at 1:43 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi Gerald,
yes, we are aware that database inserts slow down over the time. I would be interested in your experience with the latest snapshot of BaseX [1], which has an improved document index [2]. In some cases, the insertion of new files may get slower, but the replacement of existing files will be sped up a lot with this index.
Thanks in advance, Christian
[1] http://files.basex.org/releases/latest/ [2] https://github.com/BaseXdb/basex/issues/804
On Thu, Sep 18, 2014 at 11:58 AM, Gerald de Jong gerald@delving.eu wrote:
I'm finding that adding documents to a database starts at about 1ms, but gradually gets slower (5ms after about 700,000). I'm doing this with autoflush off, and I've tried periodic flush an optimize commands but
they
have no effect.
Is this expected behavior? Is there any way to keep it speedy?
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
Perhaps you can give me a hint as to why inserts slow down.j
I didn't have time to check out 7.9, but I have done some testing with 8.0, and I didn't notice a real slow-down. This is Java testing script (1 mio documents are added in just 17 seconds; I'm using the internal BaseX parser to speed up the import):
Performance p = new Performance(); Context ctx = new Context();
new CreateDB("db").execute(ctx); new Set(MainOptions.AUTOFLUSH, false).execute(ctx); new Set(MainOptions.INTPARSE, true).execute(ctx); for(int i = 0; i < 1000000; i++) { new Add("db", "<a/>").execute(ctx); } ctx.close(); System.out.println(p);
Hope this helps, Christian
Hi Christian,
I set up to use the 8.0-SNAPSHOT and used the internal parser as well. In your example you're not really giving much of a challenge to the index, since every doc is just <a/>.
With respect to ADD, I'm not seeing a significant performance difference:
8.0-SNAPSHOT ------- 10000: 9250ms 20000: 7626ms 30000: 7885ms 40000: 8111ms 50000: 8365ms 60000: 8784ms 70000: 9270ms 80000: 9692ms 90000: 10158ms 100000: 10612ms 110000: 11018ms 120000: 11478ms 130000: 11940ms 140000: 12505ms 150000: 13047ms 160000: 13536ms 170000: 14055ms 180000: 14371ms 190000: 14883ms 200000: 15330ms 210000: 15888ms 220000: 16398ms 230000: 16878ms 240000: 17038ms 250000: 17453ms 260000: 17965ms 270000: 18317ms 280000: 18832ms 290000: 19373ms 300000: 19735ms 310000: 20062ms 320000: 20675ms 330000: 21113ms 340000: 21754ms 350000: 22887ms 360000: 22810ms 370000: 22985ms 380000: 23506ms 390000: 23856ms 400000: 24338ms
7.9 ----- 10000: 8229ms 20000: 7587ms 30000: 7973ms 40000: 8282ms 50000: 8717ms 60000: 9294ms 70000: 10105ms 80000: 10669ms 90000: 11301ms 100000: 11835ms 110000: 12413ms 120000: 13000ms 130000: 13577ms 140000: 14331ms 150000: 14488ms 160000: 15025ms 170000: 15463ms 180000: 15815ms 190000: 16153ms 200000: 16314ms 210000: 16562ms 220000: 17186ms 230000: 17862ms 240000: 18340ms 250000: 18790ms 260000: 19313ms 270000: 19850ms 280000: 20225ms 290000: 20650ms 300000: 21062ms 310000: 21595ms 320000: 22022ms 330000: 22414ms 340000: 22925ms 350000: 23514ms 360000: 23762ms 370000: 24360ms 380000: 25028ms 390000: 25446ms 400000: 25700ms
- Gerald de Jong
On Thu, Sep 18, 2014 at 6:57 PM, Christian Grün christian.gruen@gmail.com wrote:
Perhaps you can give me a hint as to why inserts slow down.j
I didn't have time to check out 7.9, but I have done some testing with 8.0, and I didn't notice a real slow-down. This is Java testing script (1 mio documents are added in just 17 seconds; I'm using the internal BaseX parser to speed up the import):
Performance p = new Performance(); Context ctx = new Context(); new CreateDB("db").execute(ctx); new Set(MainOptions.AUTOFLUSH, false).execute(ctx); new Set(MainOptions.INTPARSE, true).execute(ctx); for(int i = 0; i < 1000000; i++) { new Add("db", "<a/>").execute(ctx); } ctx.close(); System.out.println(p);
Hope this helps, Christian
I set up to use the 8.0-SNAPSHOT and used the internal parser as well. In your example you're not really giving much of a challenge to the index, since every doc is just <a/>.
If I get it right, you assume the slowdown is due to the index structures?
With respect to ADD, I'm not seeing a significant performance difference:
Please give us more info on the data you are adding. Could you provide us with a sample document?
8.0-SNAPSHOT
10000: 9250ms 20000: 7626ms 30000: 7885ms 40000: 8111ms 50000: 8365ms 60000: 8784ms 70000: 9270ms 80000: 9692ms 90000: 10158ms 100000: 10612ms 110000: 11018ms 120000: 11478ms 130000: 11940ms 140000: 12505ms 150000: 13047ms 160000: 13536ms 170000: 14055ms 180000: 14371ms 190000: 14883ms 200000: 15330ms 210000: 15888ms 220000: 16398ms 230000: 16878ms 240000: 17038ms 250000: 17453ms 260000: 17965ms 270000: 18317ms 280000: 18832ms 290000: 19373ms 300000: 19735ms 310000: 20062ms 320000: 20675ms 330000: 21113ms 340000: 21754ms 350000: 22887ms 360000: 22810ms 370000: 22985ms 380000: 23506ms 390000: 23856ms 400000: 24338ms
7.9
10000: 8229ms 20000: 7587ms 30000: 7973ms 40000: 8282ms 50000: 8717ms 60000: 9294ms 70000: 10105ms 80000: 10669ms 90000: 11301ms 100000: 11835ms 110000: 12413ms 120000: 13000ms 130000: 13577ms 140000: 14331ms 150000: 14488ms 160000: 15025ms 170000: 15463ms 180000: 15815ms 190000: 16153ms 200000: 16314ms 210000: 16562ms 220000: 17186ms 230000: 17862ms 240000: 18340ms 250000: 18790ms 260000: 19313ms 270000: 19850ms 280000: 20225ms 290000: 20650ms 300000: 21062ms 310000: 21595ms 320000: 22022ms 330000: 22414ms 340000: 22925ms 350000: 23514ms 360000: 23762ms 370000: 24360ms 380000: 25028ms 390000: 25446ms 400000: 25700ms
- Gerald de Jong
On Thu, Sep 18, 2014 at 6:57 PM, Christian Grün christian.gruen@gmail.com wrote:
Perhaps you can give me a hint as to why inserts slow down.j
I didn't have time to check out 7.9, but I have done some testing with 8.0, and I didn't notice a real slow-down. This is Java testing script (1 mio documents are added in just 17 seconds; I'm using the internal BaseX parser to speed up the import):
Performance p = new Performance(); Context ctx = new Context(); new CreateDB("db").execute(ctx); new Set(MainOptions.AUTOFLUSH, false).execute(ctx); new Set(MainOptions.INTPARSE, true).execute(ctx); for(int i = 0; i < 1000000; i++) { new Add("db", "<a/>").execute(ctx); } ctx.close(); System.out.println(p);
Hope this helps, Christian
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
I don't know what causes the gradual slowdown. My assumption was that it was the "optimize" which would cause the index to be built, so I didn't expect a slowdown at all during "add" calls, especially when autoflush is false.
I add documents with the following paths:
/f/f/e/ffe0f5be2aa14e81050f759c8f9c3eb7.xml
The xml file name is a hash of the contents, and it is placed in a path such that the export spreads out the files nicely into a file system tree, rather than putting a million docs into one directory.
The document content is nothing special, wrapped in a special tag:
<narthex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" id="20412518" mod="2014-09-23T11:11:51.007+02:00"> <record> <priref>20412518</priref> <current_location>FTA</current_location> <current_location.type/> <description>Ingang op de binnenplaats van de zuidvleugel</description> <collection>Fotocollectie</collection> <production.date.start>1925-08-06</production.date.start> <reproduction.format/>
<reproduction.reference>2186abf4-7108-f9b8-ffbb-902881afe836</reproduction.reference> <creator.role>Fotograaf</creator.role> <object_number>9.387</object_number> <monument.label/> <monument.zipcode/> <monument.name>Kasteel Hoensbroek</monument.name> <monument.record_number>284330</monument.record_number> <reproduction.date/> <reproduction.notes>Oude filepath: 0009\009387.jpg</reproduction.notes> <reproduction.type/> <reproduction.creator/> <rights.type>Copyright</rights.type> <technique>Neg.zw</technique> <creator>Scheepens, W.C.L.A.</creator> <order_number>avh04-2008</order_number> <input.date>2008-04-01</input.date> <edit.date>2011-05-03</edit.date> <edit.date>2008-04-28</edit.date> <monument.historical_address/> <content.subject.type value="SUBJECT" option="SUBJECT"> <text language="0">subject</text> <text language="1">onderwerp</text> <text language="2">sujet</text> <text language="3">Thema</text> <text language="4">موضوع</text> <text language="6">θέμα</text> </content.subject.type> <content.subject.type value="SUBJECT" option="SUBJECT"> <text language="0">subject</text> <text language="1">onderwerp</text> <text language="2">sujet</text> <text language="3">Thema</text> <text language="4">موضوع</text> <text language="6">θέμα</text> </content.subject.type> <content.subject>Kasteel</content.subject> <content.subject>Binnenplaats</content.subject> <monument.province>Limburg</monument.province> <monument.place>Hoensbroek</monument.place> <monument.number/> <monument.county/> <monument.country>Nederland</monument.country> <monument.house_number>18</monument.house_number> <monument.street>Klinkertstraat</monument.street> <monument.house_number.addition/> <monument.complex_number/> <monument.number.x_coordinates/> <monument.number.y_coordinates/> <monument.geographical_keyword/> <monument.complex_number.x_coordinates/> <monument.complex_number.y_coordinates/> <creator.date_of_birth/> <creator.date_of_death/> <input.name>a.vanhoute</input.name> <edit.name>RCEadmin</edit.name> <edit.name>a.vanhoute</edit.name> <creator.history/> <record_type value="OBJECT" option="OBJECT"> <text language="0">single object</text> <text language="2">objet individuel</text> <text language="3">Einzelnes Objekt</text> </record_type> <edit.time>03:10:32</edit.time> <edit.time>11:17:08</edit.time> <input.time>09:58:28</input.time> <input.source>document>photographs</input.source> <edit.source>collect>photograph</edit.source> <edit.source>document>photographs</edit.source> </record> </narthex>
On Tue, Sep 23, 2014 at 11:36 AM, Christian Grün christian.gruen@gmail.com wrote:
I set up to use the 8.0-SNAPSHOT and used the internal parser as well.
In
your example you're not really giving much of a challenge to the index, since every doc is just <a/>.
If I get it right, you assume the slowdown is due to the index structures?
With respect to ADD, I'm not seeing a significant performance difference:
Please give us more info on the data you are adding. Could you provide us with a sample document?
8.0-SNAPSHOT
10000: 9250ms 20000: 7626ms 30000: 7885ms 40000: 8111ms 50000: 8365ms 60000: 8784ms 70000: 9270ms 80000: 9692ms 90000: 10158ms 100000: 10612ms 110000: 11018ms 120000: 11478ms 130000: 11940ms 140000: 12505ms 150000: 13047ms 160000: 13536ms 170000: 14055ms 180000: 14371ms 190000: 14883ms 200000: 15330ms 210000: 15888ms 220000: 16398ms 230000: 16878ms 240000: 17038ms 250000: 17453ms 260000: 17965ms 270000: 18317ms 280000: 18832ms 290000: 19373ms 300000: 19735ms 310000: 20062ms 320000: 20675ms 330000: 21113ms 340000: 21754ms 350000: 22887ms 360000: 22810ms 370000: 22985ms 380000: 23506ms 390000: 23856ms 400000: 24338ms
7.9
10000: 8229ms 20000: 7587ms 30000: 7973ms 40000: 8282ms 50000: 8717ms 60000: 9294ms 70000: 10105ms 80000: 10669ms 90000: 11301ms 100000: 11835ms 110000: 12413ms 120000: 13000ms 130000: 13577ms 140000: 14331ms 150000: 14488ms 160000: 15025ms 170000: 15463ms 180000: 15815ms 190000: 16153ms 200000: 16314ms 210000: 16562ms 220000: 17186ms 230000: 17862ms 240000: 18340ms 250000: 18790ms 260000: 19313ms 270000: 19850ms 280000: 20225ms 290000: 20650ms 300000: 21062ms 310000: 21595ms 320000: 22022ms 330000: 22414ms 340000: 22925ms 350000: 23514ms 360000: 23762ms 370000: 24360ms 380000: 25028ms 390000: 25446ms 400000: 25700ms
- Gerald de Jong
On Thu, Sep 18, 2014 at 6:57 PM, Christian Grün <
christian.gruen@gmail.com>
wrote:
Perhaps you can give me a hint as to why inserts slow down.j
I didn't have time to check out 7.9, but I have done some testing with 8.0, and I didn't notice a real slow-down. This is Java testing script (1 mio documents are added in just 17 seconds; I'm using the internal BaseX parser to speed up the import):
Performance p = new Performance(); Context ctx = new Context(); new CreateDB("db").execute(ctx); new Set(MainOptions.AUTOFLUSH, false).execute(ctx); new Set(MainOptions.INTPARSE, true).execute(ctx); for(int i = 0; i < 1000000; i++) { new Add("db", "<a/>").execute(ctx); } ctx.close(); System.out.println(p);
Hope this helps, Christian
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
Thanks for the document. The declaration of the (unused) namespace in the root element seems to be the cause for the decreasing performance (I noticed that the time for adding documents stays constant after removing the declaration). I'll do some profiling in order to find out if this can be sped up without too much effort (it may take a while, though, because I'll be on leave for a while from tomorrow).
On Tue, Sep 23, 2014 at 12:25 PM, Gerald de Jong gerald@delving.eu wrote:
I don't know what causes the gradual slowdown. My assumption was that it was the "optimize" which would cause the index to be built, so I didn't expect a slowdown at all during "add" calls, especially when autoflush is false.
I add documents with the following paths:
/f/f/e/ffe0f5be2aa14e81050f759c8f9c3eb7.xml
The xml file name is a hash of the contents, and it is placed in a path such that the export spreads out the files nicely into a file system tree, rather than putting a million docs into one directory.
The document content is nothing special, wrapped in a special tag:
<narthex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" id="20412518" mod="2014-09-23T11:11:51.007+02:00">
<record> <priref>20412518</priref> <current_location>FTA</current_location> <current_location.type/> <description>Ingang op de binnenplaats van de zuidvleugel</description> <collection>Fotocollectie</collection> <production.date.start>1925-08-06</production.date.start> <reproduction.format/>
<reproduction.reference>2186abf4-7108-f9b8-ffbb-902881afe836</reproduction.reference> <creator.role>Fotograaf</creator.role> <object_number>9.387</object_number> <monument.label/> <monument.zipcode/> <monument.name>Kasteel Hoensbroek</monument.name> <monument.record_number>284330</monument.record_number> <reproduction.date/> <reproduction.notes>Oude filepath: 0009\009387.jpg</reproduction.notes> <reproduction.type/> <reproduction.creator/> <rights.type>Copyright</rights.type> <technique>Neg.zw</technique> <creator>Scheepens, W.C.L.A.</creator> <order_number>avh04-2008</order_number> <input.date>2008-04-01</input.date> <edit.date>2011-05-03</edit.date> <edit.date>2008-04-28</edit.date> <monument.historical_address/> <content.subject.type value="SUBJECT" option="SUBJECT"> <text language="0">subject</text> <text language="1">onderwerp</text> <text language="2">sujet</text> <text language="3">Thema</text> <text language="4">موضوع</text> <text language="6">θέμα</text> </content.subject.type> <content.subject.type value="SUBJECT" option="SUBJECT"> <text language="0">subject</text> <text language="1">onderwerp</text> <text language="2">sujet</text> <text language="3">Thema</text> <text language="4">موضوع</text> <text language="6">θέμα</text> </content.subject.type> <content.subject>Kasteel</content.subject> <content.subject>Binnenplaats</content.subject> <monument.province>Limburg</monument.province> <monument.place>Hoensbroek</monument.place> <monument.number/> <monument.county/> <monument.country>Nederland</monument.country> <monument.house_number>18</monument.house_number> <monument.street>Klinkertstraat</monument.street> <monument.house_number.addition/> <monument.complex_number/> <monument.number.x_coordinates/> <monument.number.y_coordinates/> <monument.geographical_keyword/> <monument.complex_number.x_coordinates/> <monument.complex_number.y_coordinates/> <creator.date_of_birth/> <creator.date_of_death/> <input.name>a.vanhoute</input.name> <edit.name>RCEadmin</edit.name> <edit.name>a.vanhoute</edit.name> <creator.history/> <record_type value="OBJECT" option="OBJECT"> <text language="0">single object</text> <text language="2">objet individuel</text> <text language="3">Einzelnes Objekt</text> </record_type> <edit.time>03:10:32</edit.time> <edit.time>11:17:08</edit.time> <input.time>09:58:28</input.time> <input.source>document>photographs</input.source> <edit.source>collect>photograph</edit.source> <edit.source>document>photographs</edit.source>
</record> </narthex>
On Tue, Sep 23, 2014 at 11:36 AM, Christian Grün christian.gruen@gmail.com wrote:
I set up to use the 8.0-SNAPSHOT and used the internal parser as well. In your example you're not really giving much of a challenge to the index, since every doc is just <a/>.
If I get it right, you assume the slowdown is due to the index structures?
With respect to ADD, I'm not seeing a significant performance difference:
Please give us more info on the data you are adding. Could you provide us with a sample document?
8.0-SNAPSHOT
10000: 9250ms 20000: 7626ms 30000: 7885ms 40000: 8111ms 50000: 8365ms 60000: 8784ms 70000: 9270ms 80000: 9692ms 90000: 10158ms 100000: 10612ms 110000: 11018ms 120000: 11478ms 130000: 11940ms 140000: 12505ms 150000: 13047ms 160000: 13536ms 170000: 14055ms 180000: 14371ms 190000: 14883ms 200000: 15330ms 210000: 15888ms 220000: 16398ms 230000: 16878ms 240000: 17038ms 250000: 17453ms 260000: 17965ms 270000: 18317ms 280000: 18832ms 290000: 19373ms 300000: 19735ms 310000: 20062ms 320000: 20675ms 330000: 21113ms 340000: 21754ms 350000: 22887ms 360000: 22810ms 370000: 22985ms 380000: 23506ms 390000: 23856ms 400000: 24338ms
7.9
10000: 8229ms 20000: 7587ms 30000: 7973ms 40000: 8282ms 50000: 8717ms 60000: 9294ms 70000: 10105ms 80000: 10669ms 90000: 11301ms 100000: 11835ms 110000: 12413ms 120000: 13000ms 130000: 13577ms 140000: 14331ms 150000: 14488ms 160000: 15025ms 170000: 15463ms 180000: 15815ms 190000: 16153ms 200000: 16314ms 210000: 16562ms 220000: 17186ms 230000: 17862ms 240000: 18340ms 250000: 18790ms 260000: 19313ms 270000: 19850ms 280000: 20225ms 290000: 20650ms 300000: 21062ms 310000: 21595ms 320000: 22022ms 330000: 22414ms 340000: 22925ms 350000: 23514ms 360000: 23762ms 370000: 24360ms 380000: 25028ms 390000: 25446ms 400000: 25700ms
- Gerald de Jong
On Thu, Sep 18, 2014 at 6:57 PM, Christian Grün christian.gruen@gmail.com wrote:
Perhaps you can give me a hint as to why inserts slow down.j
I didn't have time to check out 7.9, but I have done some testing with 8.0, and I didn't notice a real slow-down. This is Java testing script (1 mio documents are added in just 17 seconds; I'm using the internal BaseX parser to speed up the import):
Performance p = new Performance(); Context ctx = new Context(); new CreateDB("db").execute(ctx); new Set(MainOptions.AUTOFLUSH, false).execute(ctx); new Set(MainOptions.INTPARSE, true).execute(ctx); for(int i = 0; i < 1000000; i++) { new Add("db", "<a/>").execute(ctx); } ctx.close(); System.out.println(p);
Hope this helps, Christian
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
WOW, really... the namespace? Because it's unused, or is it always going to slow when there are namespaces?
On Tue, Sep 23, 2014 at 1:13 PM, Christian Grün christian.gruen@gmail.com wrote:
Thanks for the document. The declaration of the (unused) namespace in the root element seems to be the cause for the decreasing performance (I noticed that the time for adding documents stays constant after removing the declaration). I'll do some profiling in order to find out if this can be sped up without too much effort (it may take a while, though, because I'll be on leave for a while from tomorrow).
On Tue, Sep 23, 2014 at 12:25 PM, Gerald de Jong gerald@delving.eu wrote:
I don't know what causes the gradual slowdown. My assumption was that it was the "optimize" which would cause the index to be built, so I didn't expect a slowdown at all during "add" calls, especially when autoflush is false.
I add documents with the following paths:
/f/f/e/ffe0f5be2aa14e81050f759c8f9c3eb7.xml
The xml file name is a hash of the contents, and it is placed in a path
such
that the export spreads out the files nicely into a file system tree,
rather
than putting a million docs into one directory.
The document content is nothing special, wrapped in a special tag:
<narthex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
id="20412518"
mod="2014-09-23T11:11:51.007+02:00">
<record> <priref>20412518</priref> <current_location>FTA</current_location> <current_location.type/> <description>Ingang op de binnenplaats van de
zuidvleugel</description>
<collection>Fotocollectie</collection> <production.date.start>1925-08-06</production.date.start> <reproduction.format/>
<reproduction.reference>2186abf4-7108-f9b8-ffbb-902881afe836</reproduction.reference>
<creator.role>Fotograaf</creator.role> <object_number>9.387</object_number> <monument.label/> <monument.zipcode/> <monument.name>Kasteel Hoensbroek</monument.name> <monument.record_number>284330</monument.record_number> <reproduction.date/> <reproduction.notes>Oude filepath:
0009\009387.jpg</reproduction.notes>
<reproduction.type/> <reproduction.creator/> <rights.type>Copyright</rights.type> <technique>Neg.zw</technique> <creator>Scheepens, W.C.L.A.</creator> <order_number>avh04-2008</order_number> <input.date>2008-04-01</input.date> <edit.date>2011-05-03</edit.date> <edit.date>2008-04-28</edit.date> <monument.historical_address/> <content.subject.type value="SUBJECT" option="SUBJECT"> <text language="0">subject</text> <text language="1">onderwerp</text> <text language="2">sujet</text> <text language="3">Thema</text> <text language="4">موضوع</text> <text language="6">θέμα</text> </content.subject.type> <content.subject.type value="SUBJECT" option="SUBJECT"> <text language="0">subject</text> <text language="1">onderwerp</text> <text language="2">sujet</text> <text language="3">Thema</text> <text language="4">موضوع</text> <text language="6">θέμα</text> </content.subject.type> <content.subject>Kasteel</content.subject> <content.subject>Binnenplaats</content.subject> <monument.province>Limburg</monument.province> <monument.place>Hoensbroek</monument.place> <monument.number/> <monument.county/> <monument.country>Nederland</monument.country> <monument.house_number>18</monument.house_number> <monument.street>Klinkertstraat</monument.street> <monument.house_number.addition/> <monument.complex_number/> <monument.number.x_coordinates/> <monument.number.y_coordinates/> <monument.geographical_keyword/> <monument.complex_number.x_coordinates/> <monument.complex_number.y_coordinates/> <creator.date_of_birth/> <creator.date_of_death/> <input.name>a.vanhoute</input.name> <edit.name>RCEadmin</edit.name> <edit.name>a.vanhoute</edit.name> <creator.history/> <record_type value="OBJECT" option="OBJECT"> <text language="0">single object</text> <text language="2">objet individuel</text> <text language="3">Einzelnes Objekt</text> </record_type> <edit.time>03:10:32</edit.time> <edit.time>11:17:08</edit.time> <input.time>09:58:28</input.time> <input.source>document>photographs</input.source> <edit.source>collect>photograph</edit.source> <edit.source>document>photographs</edit.source>
</record> </narthex>
On Tue, Sep 23, 2014 at 11:36 AM, Christian Grün <
christian.gruen@gmail.com>
wrote:
I set up to use the 8.0-SNAPSHOT and used the internal parser as well. In your example you're not really giving much of a challenge to the
index,
since every doc is just <a/>.
If I get it right, you assume the slowdown is due to the index
structures?
With respect to ADD, I'm not seeing a significant performance difference:
Please give us more info on the data you are adding. Could you provide us with a sample document?
8.0-SNAPSHOT
10000: 9250ms 20000: 7626ms 30000: 7885ms 40000: 8111ms 50000: 8365ms 60000: 8784ms 70000: 9270ms 80000: 9692ms 90000: 10158ms 100000: 10612ms 110000: 11018ms 120000: 11478ms 130000: 11940ms 140000: 12505ms 150000: 13047ms 160000: 13536ms 170000: 14055ms 180000: 14371ms 190000: 14883ms 200000: 15330ms 210000: 15888ms 220000: 16398ms 230000: 16878ms 240000: 17038ms 250000: 17453ms 260000: 17965ms 270000: 18317ms 280000: 18832ms 290000: 19373ms 300000: 19735ms 310000: 20062ms 320000: 20675ms 330000: 21113ms 340000: 21754ms 350000: 22887ms 360000: 22810ms 370000: 22985ms 380000: 23506ms 390000: 23856ms 400000: 24338ms
7.9
10000: 8229ms 20000: 7587ms 30000: 7973ms 40000: 8282ms 50000: 8717ms 60000: 9294ms 70000: 10105ms 80000: 10669ms 90000: 11301ms 100000: 11835ms 110000: 12413ms 120000: 13000ms 130000: 13577ms 140000: 14331ms 150000: 14488ms 160000: 15025ms 170000: 15463ms 180000: 15815ms 190000: 16153ms 200000: 16314ms 210000: 16562ms 220000: 17186ms 230000: 17862ms 240000: 18340ms 250000: 18790ms 260000: 19313ms 270000: 19850ms 280000: 20225ms 290000: 20650ms 300000: 21062ms 310000: 21595ms 320000: 22022ms 330000: 22414ms 340000: 22925ms 350000: 23514ms 360000: 23762ms 370000: 24360ms 380000: 25028ms 390000: 25446ms 400000: 25700ms
- Gerald de Jong
On Thu, Sep 18, 2014 at 6:57 PM, Christian Grün christian.gruen@gmail.com wrote:
Perhaps you can give me a hint as to why inserts slow down.j
I didn't have time to check out 7.9, but I have done some testing
with
8.0, and I didn't notice a real slow-down. This is Java testing
script
(1 mio documents are added in just 17 seconds; I'm using the internal BaseX parser to speed up the import):
Performance p = new Performance(); Context ctx = new Context(); new CreateDB("db").execute(ctx); new Set(MainOptions.AUTOFLUSH, false).execute(ctx); new Set(MainOptions.INTPARSE, true).execute(ctx); for(int i = 0; i < 1000000; i++) { new Add("db", "<a/>").execute(ctx); } ctx.close(); System.out.println(p);
Hope this helps, Christian
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
I'm completely surprised by this! You're right, the add time is completely constant without the namespace.
This namespace happens to be unnecessary, but others won't be. I'm so curious how this can be the thing.
On Tue, Sep 23, 2014 at 1:20 PM, Gerald de Jong gerald@delving.eu wrote:
WOW, really... the namespace? Because it's unused, or is it always going to slow when there are namespaces?
On Tue, Sep 23, 2014 at 1:13 PM, Christian Grün <christian.gruen@gmail.com
wrote:
Thanks for the document. The declaration of the (unused) namespace in the root element seems to be the cause for the decreasing performance (I noticed that the time for adding documents stays constant after removing the declaration). I'll do some profiling in order to find out if this can be sped up without too much effort (it may take a while, though, because I'll be on leave for a while from tomorrow).
On Tue, Sep 23, 2014 at 12:25 PM, Gerald de Jong gerald@delving.eu wrote:
I don't know what causes the gradual slowdown. My assumption was that
it
was the "optimize" which would cause the index to be built, so I didn't expect a slowdown at all during "add" calls, especially when autoflush
is
false.
I add documents with the following paths:
/f/f/e/ffe0f5be2aa14e81050f759c8f9c3eb7.xml
The xml file name is a hash of the contents, and it is placed in a path
such
that the export spreads out the files nicely into a file system tree,
rather
than putting a million docs into one directory.
The document content is nothing special, wrapped in a special tag:
<narthex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
id="20412518"
mod="2014-09-23T11:11:51.007+02:00">
<record> <priref>20412518</priref> <current_location>FTA</current_location> <current_location.type/> <description>Ingang op de binnenplaats van de
zuidvleugel</description>
<collection>Fotocollectie</collection> <production.date.start>1925-08-06</production.date.start> <reproduction.format/>
<reproduction.reference>2186abf4-7108-f9b8-ffbb-902881afe836</reproduction.reference>
<creator.role>Fotograaf</creator.role> <object_number>9.387</object_number> <monument.label/> <monument.zipcode/> <monument.name>Kasteel Hoensbroek</monument.name> <monument.record_number>284330</monument.record_number> <reproduction.date/> <reproduction.notes>Oude filepath:
0009\009387.jpg</reproduction.notes>
<reproduction.type/> <reproduction.creator/> <rights.type>Copyright</rights.type> <technique>Neg.zw</technique> <creator>Scheepens, W.C.L.A.</creator> <order_number>avh04-2008</order_number> <input.date>2008-04-01</input.date> <edit.date>2011-05-03</edit.date> <edit.date>2008-04-28</edit.date> <monument.historical_address/> <content.subject.type value="SUBJECT" option="SUBJECT"> <text language="0">subject</text> <text language="1">onderwerp</text> <text language="2">sujet</text> <text language="3">Thema</text> <text language="4">موضوع</text> <text language="6">θέμα</text> </content.subject.type> <content.subject.type value="SUBJECT" option="SUBJECT"> <text language="0">subject</text> <text language="1">onderwerp</text> <text language="2">sujet</text> <text language="3">Thema</text> <text language="4">موضوع</text> <text language="6">θέμα</text> </content.subject.type> <content.subject>Kasteel</content.subject> <content.subject>Binnenplaats</content.subject> <monument.province>Limburg</monument.province> <monument.place>Hoensbroek</monument.place> <monument.number/> <monument.county/> <monument.country>Nederland</monument.country> <monument.house_number>18</monument.house_number> <monument.street>Klinkertstraat</monument.street> <monument.house_number.addition/> <monument.complex_number/> <monument.number.x_coordinates/> <monument.number.y_coordinates/> <monument.geographical_keyword/> <monument.complex_number.x_coordinates/> <monument.complex_number.y_coordinates/> <creator.date_of_birth/> <creator.date_of_death/> <input.name>a.vanhoute</input.name> <edit.name>RCEadmin</edit.name> <edit.name>a.vanhoute</edit.name> <creator.history/> <record_type value="OBJECT" option="OBJECT"> <text language="0">single object</text> <text language="2">objet individuel</text> <text language="3">Einzelnes Objekt</text> </record_type> <edit.time>03:10:32</edit.time> <edit.time>11:17:08</edit.time> <input.time>09:58:28</input.time> <input.source>document>photographs</input.source> <edit.source>collect>photograph</edit.source> <edit.source>document>photographs</edit.source>
</record> </narthex>
On Tue, Sep 23, 2014 at 11:36 AM, Christian Grün <
christian.gruen@gmail.com>
wrote:
I set up to use the 8.0-SNAPSHOT and used the internal parser as
well.
In your example you're not really giving much of a challenge to the
index,
since every doc is just <a/>.
If I get it right, you assume the slowdown is due to the index
structures?
With respect to ADD, I'm not seeing a significant performance difference:
Please give us more info on the data you are adding. Could you provide us with a sample document?
8.0-SNAPSHOT
10000: 9250ms 20000: 7626ms 30000: 7885ms 40000: 8111ms 50000: 8365ms 60000: 8784ms 70000: 9270ms 80000: 9692ms 90000: 10158ms 100000: 10612ms 110000: 11018ms 120000: 11478ms 130000: 11940ms 140000: 12505ms 150000: 13047ms 160000: 13536ms 170000: 14055ms 180000: 14371ms 190000: 14883ms 200000: 15330ms 210000: 15888ms 220000: 16398ms 230000: 16878ms 240000: 17038ms 250000: 17453ms 260000: 17965ms 270000: 18317ms 280000: 18832ms 290000: 19373ms 300000: 19735ms 310000: 20062ms 320000: 20675ms 330000: 21113ms 340000: 21754ms 350000: 22887ms 360000: 22810ms 370000: 22985ms 380000: 23506ms 390000: 23856ms 400000: 24338ms
7.9
10000: 8229ms 20000: 7587ms 30000: 7973ms 40000: 8282ms 50000: 8717ms 60000: 9294ms 70000: 10105ms 80000: 10669ms 90000: 11301ms 100000: 11835ms 110000: 12413ms 120000: 13000ms 130000: 13577ms 140000: 14331ms 150000: 14488ms 160000: 15025ms 170000: 15463ms 180000: 15815ms 190000: 16153ms 200000: 16314ms 210000: 16562ms 220000: 17186ms 230000: 17862ms 240000: 18340ms 250000: 18790ms 260000: 19313ms 270000: 19850ms 280000: 20225ms 290000: 20650ms 300000: 21062ms 310000: 21595ms 320000: 22022ms 330000: 22414ms 340000: 22925ms 350000: 23514ms 360000: 23762ms 370000: 24360ms 380000: 25028ms 390000: 25446ms 400000: 25700ms
- Gerald de Jong
On Thu, Sep 18, 2014 at 6:57 PM, Christian Grün christian.gruen@gmail.com wrote:
> Perhaps you can give me a hint as to why inserts slow down.j I didn't have time to check out 7.9, but I have done some testing
with
8.0, and I didn't notice a real slow-down. This is Java testing
script
(1 mio documents are added in just 17 seconds; I'm using the
internal
BaseX parser to speed up the import):
Performance p = new Performance(); Context ctx = new Context(); new CreateDB("db").execute(ctx); new Set(MainOptions.AUTOFLUSH, false).execute(ctx); new Set(MainOptions.INTPARSE, true).execute(ctx); for(int i = 0; i < 1000000; i++) { new Add("db", "<a/>").execute(ctx); } ctx.close(); System.out.println(p);
Hope this helps, Christian
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
This namespace happens to be unnecessary, but others won't be. I'm so curious how this can be the thing.
Unfortunately, the intricacies of namespaces have been keeping us XML implementers busy for a long time, and the XPath and storage algorithms would be much simpler, if not trivial, without the notion of namespaces. This is why it would take quite a while to explain what are the reasons for that, and as your input document only contains one namespaces, I'm not surprised that you are surprised ;) To put it in a nutshell: it's usually easy to optimize single namespaces issues, but it's difficult to optimize all cases that happen in practice.
But I'll keep track of your use case.
On Tue, Sep 23, 2014 at 1:30 PM, Gerald de Jong gerald@delving.eu wrote:
On Tue, Sep 23, 2014 at 1:20 PM, Gerald de Jong gerald@delving.eu wrote:
WOW, really... the namespace? Because it's unused, or is it always going to slow when there are namespaces?
On Tue, Sep 23, 2014 at 1:13 PM, Christian Grün christian.gruen@gmail.com wrote:
Thanks for the document. The declaration of the (unused) namespace in the root element seems to be the cause for the decreasing performance (I noticed that the time for adding documents stays constant after removing the declaration). I'll do some profiling in order to find out if this can be sped up without too much effort (it may take a while, though, because I'll be on leave for a while from tomorrow).
On Tue, Sep 23, 2014 at 12:25 PM, Gerald de Jong gerald@delving.eu wrote:
I don't know what causes the gradual slowdown. My assumption was that it was the "optimize" which would cause the index to be built, so I didn't expect a slowdown at all during "add" calls, especially when autoflush is false.
I add documents with the following paths:
/f/f/e/ffe0f5be2aa14e81050f759c8f9c3eb7.xml
The xml file name is a hash of the contents, and it is placed in a path such that the export spreads out the files nicely into a file system tree, rather than putting a million docs into one directory.
The document content is nothing special, wrapped in a special tag:
<narthex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" id="20412518" mod="2014-09-23T11:11:51.007+02:00">
<record> <priref>20412518</priref> <current_location>FTA</current_location> <current_location.type/> <description>Ingang op de binnenplaats van de zuidvleugel</description> <collection>Fotocollectie</collection> <production.date.start>1925-08-06</production.date.start> <reproduction.format/>
<reproduction.reference>2186abf4-7108-f9b8-ffbb-902881afe836</reproduction.reference> <creator.role>Fotograaf</creator.role> <object_number>9.387</object_number> <monument.label/> <monument.zipcode/> <monument.name>Kasteel Hoensbroek</monument.name> <monument.record_number>284330</monument.record_number> <reproduction.date/> <reproduction.notes>Oude filepath: 0009\009387.jpg</reproduction.notes> <reproduction.type/> <reproduction.creator/> <rights.type>Copyright</rights.type> <technique>Neg.zw</technique> <creator>Scheepens, W.C.L.A.</creator> <order_number>avh04-2008</order_number> <input.date>2008-04-01</input.date> <edit.date>2011-05-03</edit.date> <edit.date>2008-04-28</edit.date> <monument.historical_address/> <content.subject.type value="SUBJECT" option="SUBJECT"> <text language="0">subject</text> <text language="1">onderwerp</text> <text language="2">sujet</text> <text language="3">Thema</text> <text language="4">موضوع</text> <text language="6">θέμα</text> </content.subject.type> <content.subject.type value="SUBJECT" option="SUBJECT"> <text language="0">subject</text> <text language="1">onderwerp</text> <text language="2">sujet</text> <text language="3">Thema</text> <text language="4">موضوع</text> <text language="6">θέμα</text> </content.subject.type> <content.subject>Kasteel</content.subject> <content.subject>Binnenplaats</content.subject> <monument.province>Limburg</monument.province> <monument.place>Hoensbroek</monument.place> <monument.number/> <monument.county/> <monument.country>Nederland</monument.country> <monument.house_number>18</monument.house_number> <monument.street>Klinkertstraat</monument.street> <monument.house_number.addition/> <monument.complex_number/> <monument.number.x_coordinates/> <monument.number.y_coordinates/> <monument.geographical_keyword/> <monument.complex_number.x_coordinates/> <monument.complex_number.y_coordinates/> <creator.date_of_birth/> <creator.date_of_death/> <input.name>a.vanhoute</input.name> <edit.name>RCEadmin</edit.name> <edit.name>a.vanhoute</edit.name> <creator.history/> <record_type value="OBJECT" option="OBJECT"> <text language="0">single object</text> <text language="2">objet individuel</text> <text language="3">Einzelnes Objekt</text> </record_type> <edit.time>03:10:32</edit.time> <edit.time>11:17:08</edit.time> <input.time>09:58:28</input.time> <input.source>document>photographs</input.source> <edit.source>collect>photograph</edit.source> <edit.source>document>photographs</edit.source>
</record> </narthex>
On Tue, Sep 23, 2014 at 11:36 AM, Christian Grün christian.gruen@gmail.com wrote:
I set up to use the 8.0-SNAPSHOT and used the internal parser as well. In your example you're not really giving much of a challenge to the index, since every doc is just <a/>.
If I get it right, you assume the slowdown is due to the index structures?
With respect to ADD, I'm not seeing a significant performance difference:
Please give us more info on the data you are adding. Could you provide us with a sample document?
8.0-SNAPSHOT
10000: 9250ms 20000: 7626ms 30000: 7885ms 40000: 8111ms 50000: 8365ms 60000: 8784ms 70000: 9270ms 80000: 9692ms 90000: 10158ms 100000: 10612ms 110000: 11018ms 120000: 11478ms 130000: 11940ms 140000: 12505ms 150000: 13047ms 160000: 13536ms 170000: 14055ms 180000: 14371ms 190000: 14883ms 200000: 15330ms 210000: 15888ms 220000: 16398ms 230000: 16878ms 240000: 17038ms 250000: 17453ms 260000: 17965ms 270000: 18317ms 280000: 18832ms 290000: 19373ms 300000: 19735ms 310000: 20062ms 320000: 20675ms 330000: 21113ms 340000: 21754ms 350000: 22887ms 360000: 22810ms 370000: 22985ms 380000: 23506ms 390000: 23856ms 400000: 24338ms
7.9
10000: 8229ms 20000: 7587ms 30000: 7973ms 40000: 8282ms 50000: 8717ms 60000: 9294ms 70000: 10105ms 80000: 10669ms 90000: 11301ms 100000: 11835ms 110000: 12413ms 120000: 13000ms 130000: 13577ms 140000: 14331ms 150000: 14488ms 160000: 15025ms 170000: 15463ms 180000: 15815ms 190000: 16153ms 200000: 16314ms 210000: 16562ms 220000: 17186ms 230000: 17862ms 240000: 18340ms 250000: 18790ms 260000: 19313ms 270000: 19850ms 280000: 20225ms 290000: 20650ms 300000: 21062ms 310000: 21595ms 320000: 22022ms 330000: 22414ms 340000: 22925ms 350000: 23514ms 360000: 23762ms 370000: 24360ms 380000: 25028ms 390000: 25446ms 400000: 25700ms
- Gerald de Jong
On Thu, Sep 18, 2014 at 6:57 PM, Christian Grün christian.gruen@gmail.com wrote: > > > Perhaps you can give me a hint as to why inserts slow down.j > I didn't have time to check out 7.9, but I have done some testing > with > 8.0, and I didn't notice a real slow-down. This is Java testing > script > (1 mio documents are added in just 17 seconds; I'm using the > internal > BaseX parser to speed up the import): > > Performance p = new Performance(); > Context ctx = new Context(); > > new CreateDB("db").execute(ctx); > new Set(MainOptions.AUTOFLUSH, false).execute(ctx); > new Set(MainOptions.INTPARSE, true).execute(ctx); > for(int i = 0; i < 1000000; i++) { > new Add("db", "<a/>").execute(ctx); > } > ctx.close(); > System.out.println(p); > > Hope this helps, > Christian
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
The other case I'm testing has five necessary namespaces. :(
10000: 6462ms 20000: 7592ms 30000: 8689ms 40000: 9417ms 50000: 9566ms 60000: 10368ms 70000: 10963ms 80000: 12167ms
Is there any direction you can suggest to look for a workaround?
On Tue, Sep 23, 2014 at 1:43 PM, Christian Grün christian.gruen@gmail.com wrote:
This namespace happens to be unnecessary, but others won't be. I'm so curious how this can be the thing.
Unfortunately, the intricacies of namespaces have been keeping us XML implementers busy for a long time, and the XPath and storage algorithms would be much simpler, if not trivial, without the notion of namespaces. This is why it would take quite a while to explain what are the reasons for that, and as your input document only contains one namespaces, I'm not surprised that you are surprised ;) To put it in a nutshell: it's usually easy to optimize single namespaces issues, but it's difficult to optimize all cases that happen in practice.
But I'll keep track of your use case.
On Tue, Sep 23, 2014 at 1:30 PM, Gerald de Jong gerald@delving.eu wrote:
On Tue, Sep 23, 2014 at 1:20 PM, Gerald de Jong gerald@delving.eu
wrote:
WOW, really... the namespace? Because it's unused, or is it always going to slow when there are namespaces?
On Tue, Sep 23, 2014 at 1:13 PM, Christian Grün christian.gruen@gmail.com wrote:
Thanks for the document. The declaration of the (unused) namespace in the root element seems to be the cause for the decreasing performance (I noticed that the time for adding documents stays constant after removing the declaration). I'll do some profiling in order to find out if this can be sped up without too much effort (it may take a while, though, because I'll be on leave for a while from tomorrow).
On Tue, Sep 23, 2014 at 12:25 PM, Gerald de Jong gerald@delving.eu wrote:
I don't know what causes the gradual slowdown. My assumption was
that
it was the "optimize" which would cause the index to be built, so I
didn't
expect a slowdown at all during "add" calls, especially when
autoflush
is false.
I add documents with the following paths:
/f/f/e/ffe0f5be2aa14e81050f759c8f9c3eb7.xml
The xml file name is a hash of the contents, and it is placed in a
path
such that the export spreads out the files nicely into a file system tree, rather than putting a million docs into one directory.
The document content is nothing special, wrapped in a special tag:
<narthex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" id="20412518" mod="2014-09-23T11:11:51.007+02:00">
<record> <priref>20412518</priref> <current_location>FTA</current_location> <current_location.type/> <description>Ingang op de binnenplaats van de zuidvleugel</description> <collection>Fotocollectie</collection> <production.date.start>1925-08-06</production.date.start> <reproduction.format/>
<reproduction.reference>2186abf4-7108-f9b8-ffbb-902881afe836</reproduction.reference>
<creator.role>Fotograaf</creator.role> <object_number>9.387</object_number> <monument.label/> <monument.zipcode/> <monument.name>Kasteel Hoensbroek</monument.name> <monument.record_number>284330</monument.record_number> <reproduction.date/> <reproduction.notes>Oude filepath:
0009\009387.jpg</reproduction.notes> <reproduction.type/> <reproduction.creator/> <rights.type>Copyright</rights.type> <technique>Neg.zw</technique> <creator>Scheepens, W.C.L.A.</creator> <order_number>avh04-2008</order_number> <input.date>2008-04-01</input.date> <edit.date>2011-05-03</edit.date> <edit.date>2008-04-28</edit.date> <monument.historical_address/> <content.subject.type value="SUBJECT" option="SUBJECT"> <text language="0">subject</text> <text language="1">onderwerp</text> <text language="2">sujet</text> <text language="3">Thema</text> <text language="4">موضوع</text> <text language="6">θέμα</text> </content.subject.type> <content.subject.type value="SUBJECT" option="SUBJECT"> <text language="0">subject</text> <text language="1">onderwerp</text> <text language="2">sujet</text> <text language="3">Thema</text> <text language="4">موضوع</text> <text language="6">θέμα</text> </content.subject.type> <content.subject>Kasteel</content.subject> <content.subject>Binnenplaats</content.subject> <monument.province>Limburg</monument.province> <monument.place>Hoensbroek</monument.place> <monument.number/> <monument.county/> <monument.country>Nederland</monument.country> <monument.house_number>18</monument.house_number> <monument.street>Klinkertstraat</monument.street> <monument.house_number.addition/> <monument.complex_number/> <monument.number.x_coordinates/> <monument.number.y_coordinates/> <monument.geographical_keyword/> <monument.complex_number.x_coordinates/> <monument.complex_number.y_coordinates/> <creator.date_of_birth/> <creator.date_of_death/> <input.name>a.vanhoute</input.name> <edit.name>RCEadmin</edit.name> <edit.name>a.vanhoute</edit.name> <creator.history/> <record_type value="OBJECT" option="OBJECT"> <text language="0">single object</text> <text language="2">objet individuel</text> <text language="3">Einzelnes Objekt</text> </record_type> <edit.time>03:10:32</edit.time> <edit.time>11:17:08</edit.time> <input.time>09:58:28</input.time> <input.source>document>photographs</input.source> <edit.source>collect>photograph</edit.source> <edit.source>document>photographs</edit.source>
</record> </narthex>
On Tue, Sep 23, 2014 at 11:36 AM, Christian Grün christian.gruen@gmail.com wrote:
> I set up to use the 8.0-SNAPSHOT and used the internal parser as > well. > In > your example you're not really giving much of a challenge to the > index, > since every doc is just <a/>.
If I get it right, you assume the slowdown is due to the index structures?
> With respect to ADD, I'm not seeing a significant performance > difference:
Please give us more info on the data you are adding. Could you
provide
us with a sample document?
> 8.0-SNAPSHOT > ------- > 10000: 9250ms > 20000: 7626ms > 30000: 7885ms > 40000: 8111ms > 50000: 8365ms > 60000: 8784ms > 70000: 9270ms > 80000: 9692ms > 90000: 10158ms > 100000: 10612ms > 110000: 11018ms > 120000: 11478ms > 130000: 11940ms > 140000: 12505ms > 150000: 13047ms > 160000: 13536ms > 170000: 14055ms > 180000: 14371ms > 190000: 14883ms > 200000: 15330ms > 210000: 15888ms > 220000: 16398ms > 230000: 16878ms > 240000: 17038ms > 250000: 17453ms > 260000: 17965ms > 270000: 18317ms > 280000: 18832ms > 290000: 19373ms > 300000: 19735ms > 310000: 20062ms > 320000: 20675ms > 330000: 21113ms > 340000: 21754ms > 350000: 22887ms > 360000: 22810ms > 370000: 22985ms > 380000: 23506ms > 390000: 23856ms > 400000: 24338ms > > 7.9 > ----- > 10000: 8229ms > 20000: 7587ms > 30000: 7973ms > 40000: 8282ms > 50000: 8717ms > 60000: 9294ms > 70000: 10105ms > 80000: 10669ms > 90000: 11301ms > 100000: 11835ms > 110000: 12413ms > 120000: 13000ms > 130000: 13577ms > 140000: 14331ms > 150000: 14488ms > 160000: 15025ms > 170000: 15463ms > 180000: 15815ms > 190000: 16153ms > 200000: 16314ms > 210000: 16562ms > 220000: 17186ms > 230000: 17862ms > 240000: 18340ms > 250000: 18790ms > 260000: 19313ms > 270000: 19850ms > 280000: 20225ms > 290000: 20650ms > 300000: 21062ms > 310000: 21595ms > 320000: 22022ms > 330000: 22414ms > 340000: 22925ms > 350000: 23514ms > 360000: 23762ms > 370000: 24360ms > 380000: 25028ms > 390000: 25446ms > 400000: 25700ms > > - Gerald de Jong > > > On Thu, Sep 18, 2014 at 6:57 PM, Christian Grün > christian.gruen@gmail.com > wrote: >> >> > Perhaps you can give me a hint as to why inserts slow down.j >> I didn't have time to check out 7.9, but I have done some testing >> with >> 8.0, and I didn't notice a real slow-down. This is Java testing >> script >> (1 mio documents are added in just 17 seconds; I'm using the >> internal >> BaseX parser to speed up the import): >> >> Performance p = new Performance(); >> Context ctx = new Context(); >> >> new CreateDB("db").execute(ctx); >> new Set(MainOptions.AUTOFLUSH, false).execute(ctx); >> new Set(MainOptions.INTPARSE, true).execute(ctx); >> for(int i = 0; i < 1000000; i++) { >> new Add("db", "<a/>").execute(ctx); >> } >> ctx.close(); >> System.out.println(p); >> >> Hope this helps, >> Christian > > > > > -- > Delving BV, Vasteland 8, Rotterdam > http://www.delving.eu > http://twitter.com/fluxe > skype: beautifulcode > +31629339805
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
Maybe a general question: Is the insertion really a bottleneck in your scenario? How many data do you want to store in a single database? You could e.g. store your data in multiple databases, which can then all be queried by a single XQuery expression.
On Tue, Sep 23, 2014 at 1:50 PM, Gerald de Jong gerald@delving.eu wrote:
The other case I'm testing has five necessary namespaces. :(
10000: 6462ms 20000: 7592ms 30000: 8689ms 40000: 9417ms 50000: 9566ms 60000: 10368ms 70000: 10963ms 80000: 12167ms
Is there any direction you can suggest to look for a workaround?
On Tue, Sep 23, 2014 at 1:43 PM, Christian Grün christian.gruen@gmail.com wrote:
This namespace happens to be unnecessary, but others won't be. I'm so curious how this can be the thing.
Unfortunately, the intricacies of namespaces have been keeping us XML implementers busy for a long time, and the XPath and storage algorithms would be much simpler, if not trivial, without the notion of namespaces. This is why it would take quite a while to explain what are the reasons for that, and as your input document only contains one namespaces, I'm not surprised that you are surprised ;) To put it in a nutshell: it's usually easy to optimize single namespaces issues, but it's difficult to optimize all cases that happen in practice.
But I'll keep track of your use case.
On Tue, Sep 23, 2014 at 1:30 PM, Gerald de Jong gerald@delving.eu wrote:
On Tue, Sep 23, 2014 at 1:20 PM, Gerald de Jong gerald@delving.eu wrote:
WOW, really... the namespace? Because it's unused, or is it always going to slow when there are namespaces?
On Tue, Sep 23, 2014 at 1:13 PM, Christian Grün christian.gruen@gmail.com wrote:
Thanks for the document. The declaration of the (unused) namespace in the root element seems to be the cause for the decreasing performance (I noticed that the time for adding documents stays constant after removing the declaration). I'll do some profiling in order to find out if this can be sped up without too much effort (it may take a while, though, because I'll be on leave for a while from tomorrow).
On Tue, Sep 23, 2014 at 12:25 PM, Gerald de Jong gerald@delving.eu wrote:
I don't know what causes the gradual slowdown. My assumption was that it was the "optimize" which would cause the index to be built, so I didn't expect a slowdown at all during "add" calls, especially when autoflush is false.
I add documents with the following paths:
/f/f/e/ffe0f5be2aa14e81050f759c8f9c3eb7.xml
The xml file name is a hash of the contents, and it is placed in a path such that the export spreads out the files nicely into a file system tree, rather than putting a million docs into one directory.
The document content is nothing special, wrapped in a special tag:
<narthex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" id="20412518" mod="2014-09-23T11:11:51.007+02:00">
<record> <priref>20412518</priref> <current_location>FTA</current_location> <current_location.type/> <description>Ingang op de binnenplaats van de zuidvleugel</description> <collection>Fotocollectie</collection> <production.date.start>1925-08-06</production.date.start> <reproduction.format/>
<reproduction.reference>2186abf4-7108-f9b8-ffbb-902881afe836</reproduction.reference> <creator.role>Fotograaf</creator.role> <object_number>9.387</object_number> <monument.label/> <monument.zipcode/> <monument.name>Kasteel Hoensbroek</monument.name> <monument.record_number>284330</monument.record_number> <reproduction.date/> <reproduction.notes>Oude filepath: 0009\009387.jpg</reproduction.notes> <reproduction.type/> <reproduction.creator/> <rights.type>Copyright</rights.type> <technique>Neg.zw</technique> <creator>Scheepens, W.C.L.A.</creator> <order_number>avh04-2008</order_number> <input.date>2008-04-01</input.date> <edit.date>2011-05-03</edit.date> <edit.date>2008-04-28</edit.date> <monument.historical_address/> <content.subject.type value="SUBJECT" option="SUBJECT"> <text language="0">subject</text> <text language="1">onderwerp</text> <text language="2">sujet</text> <text language="3">Thema</text> <text language="4">موضوع</text> <text language="6">θέμα</text> </content.subject.type> <content.subject.type value="SUBJECT" option="SUBJECT"> <text language="0">subject</text> <text language="1">onderwerp</text> <text language="2">sujet</text> <text language="3">Thema</text> <text language="4">موضوع</text> <text language="6">θέμα</text> </content.subject.type> <content.subject>Kasteel</content.subject> <content.subject>Binnenplaats</content.subject> <monument.province>Limburg</monument.province> <monument.place>Hoensbroek</monument.place> <monument.number/> <monument.county/> <monument.country>Nederland</monument.country> <monument.house_number>18</monument.house_number> <monument.street>Klinkertstraat</monument.street> <monument.house_number.addition/> <monument.complex_number/> <monument.number.x_coordinates/> <monument.number.y_coordinates/> <monument.geographical_keyword/> <monument.complex_number.x_coordinates/> <monument.complex_number.y_coordinates/> <creator.date_of_birth/> <creator.date_of_death/> <input.name>a.vanhoute</input.name> <edit.name>RCEadmin</edit.name> <edit.name>a.vanhoute</edit.name> <creator.history/> <record_type value="OBJECT" option="OBJECT"> <text language="0">single object</text> <text language="2">objet individuel</text> <text language="3">Einzelnes Objekt</text> </record_type> <edit.time>03:10:32</edit.time> <edit.time>11:17:08</edit.time> <input.time>09:58:28</input.time> <input.source>document>photographs</input.source> <edit.source>collect>photograph</edit.source> <edit.source>document>photographs</edit.source>
</record> </narthex>
On Tue, Sep 23, 2014 at 11:36 AM, Christian Grün christian.gruen@gmail.com wrote: > > > I set up to use the 8.0-SNAPSHOT and used the internal parser as > > well. > > In > > your example you're not really giving much of a challenge to the > > index, > > since every doc is just <a/>. > > If I get it right, you assume the slowdown is due to the index > structures? > > > With respect to ADD, I'm not seeing a significant performance > > difference: > > Please give us more info on the data you are adding. Could you > provide > us with a sample document? > > > > 8.0-SNAPSHOT > > ------- > > 10000: 9250ms > > 20000: 7626ms > > 30000: 7885ms > > 40000: 8111ms > > 50000: 8365ms > > 60000: 8784ms > > 70000: 9270ms > > 80000: 9692ms > > 90000: 10158ms > > 100000: 10612ms > > 110000: 11018ms > > 120000: 11478ms > > 130000: 11940ms > > 140000: 12505ms > > 150000: 13047ms > > 160000: 13536ms > > 170000: 14055ms > > 180000: 14371ms > > 190000: 14883ms > > 200000: 15330ms > > 210000: 15888ms > > 220000: 16398ms > > 230000: 16878ms > > 240000: 17038ms > > 250000: 17453ms > > 260000: 17965ms > > 270000: 18317ms > > 280000: 18832ms > > 290000: 19373ms > > 300000: 19735ms > > 310000: 20062ms > > 320000: 20675ms > > 330000: 21113ms > > 340000: 21754ms > > 350000: 22887ms > > 360000: 22810ms > > 370000: 22985ms > > 380000: 23506ms > > 390000: 23856ms > > 400000: 24338ms > > > > 7.9 > > ----- > > 10000: 8229ms > > 20000: 7587ms > > 30000: 7973ms > > 40000: 8282ms > > 50000: 8717ms > > 60000: 9294ms > > 70000: 10105ms > > 80000: 10669ms > > 90000: 11301ms > > 100000: 11835ms > > 110000: 12413ms > > 120000: 13000ms > > 130000: 13577ms > > 140000: 14331ms > > 150000: 14488ms > > 160000: 15025ms > > 170000: 15463ms > > 180000: 15815ms > > 190000: 16153ms > > 200000: 16314ms > > 210000: 16562ms > > 220000: 17186ms > > 230000: 17862ms > > 240000: 18340ms > > 250000: 18790ms > > 260000: 19313ms > > 270000: 19850ms > > 280000: 20225ms > > 290000: 20650ms > > 300000: 21062ms > > 310000: 21595ms > > 320000: 22022ms > > 330000: 22414ms > > 340000: 22925ms > > 350000: 23514ms > > 360000: 23762ms > > 370000: 24360ms > > 380000: 25028ms > > 390000: 25446ms > > 400000: 25700ms > > > > - Gerald de Jong > > > > > > On Thu, Sep 18, 2014 at 6:57 PM, Christian Grün > > christian.gruen@gmail.com > > wrote: > >> > >> > Perhaps you can give me a hint as to why inserts slow down.j > >> I didn't have time to check out 7.9, but I have done some > >> testing > >> with > >> 8.0, and I didn't notice a real slow-down. This is Java testing > >> script > >> (1 mio documents are added in just 17 seconds; I'm using the > >> internal > >> BaseX parser to speed up the import): > >> > >> Performance p = new Performance(); > >> Context ctx = new Context(); > >> > >> new CreateDB("db").execute(ctx); > >> new Set(MainOptions.AUTOFLUSH, false).execute(ctx); > >> new Set(MainOptions.INTPARSE, true).execute(ctx); > >> for(int i = 0; i < 1000000; i++) { > >> new Add("db", "<a/>").execute(ctx); > >> } > >> ctx.close(); > >> System.out.println(p); > >> > >> Hope this helps, > >> Christian > > > > > > > > > > -- > > Delving BV, Vasteland 8, Rotterdam > > http://www.delving.eu > > http://twitter.com/fluxe > > skype: beautifulcode > > +31629339805
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
Considering that the dataset I just mentioned involves 1.2 million add commands, it does become a bit of annoyance with some large datasets like this. We can have some patience for insertion, even with such a slowdown, so I wouldn't say bottleneck exactly.
Can you point me to an example of querying multiple databases? I could try splitting the big datasets up.
The big problem I have right now is the IllegalMonitorStateException that freezes the basexserver. After this happens I have to kill -9 the process even.
On Tue, Sep 23, 2014 at 1:55 PM, Christian Grün christian.gruen@gmail.com wrote:
Maybe a general question: Is the insertion really a bottleneck in your scenario? How many data do you want to store in a single database? You could e.g. store your data in multiple databases, which can then all be queried by a single XQuery expression.
On Tue, Sep 23, 2014 at 1:50 PM, Gerald de Jong gerald@delving.eu wrote:
The other case I'm testing has five necessary namespaces. :(
10000: 6462ms 20000: 7592ms 30000: 8689ms 40000: 9417ms 50000: 9566ms 60000: 10368ms 70000: 10963ms 80000: 12167ms
Is there any direction you can suggest to look for a workaround?
On Tue, Sep 23, 2014 at 1:43 PM, Christian Grün <
christian.gruen@gmail.com>
wrote:
This namespace happens to be unnecessary, but others won't be. I'm so curious how this can be the thing.
Unfortunately, the intricacies of namespaces have been keeping us XML implementers busy for a long time, and the XPath and storage algorithms would be much simpler, if not trivial, without the notion of namespaces. This is why it would take quite a while to explain what are the reasons for that, and as your input document only contains one namespaces, I'm not surprised that you are surprised ;) To put it in a nutshell: it's usually easy to optimize single namespaces issues, but it's difficult to optimize all cases that happen in practice.
But I'll keep track of your use case.
On Tue, Sep 23, 2014 at 1:30 PM, Gerald de Jong gerald@delving.eu
wrote:
On Tue, Sep 23, 2014 at 1:20 PM, Gerald de Jong gerald@delving.eu wrote:
WOW, really... the namespace? Because it's unused, or is it always going to slow when there are namespaces?
On Tue, Sep 23, 2014 at 1:13 PM, Christian Grün christian.gruen@gmail.com wrote:
Thanks for the document. The declaration of the (unused) namespace
in
the root element seems to be the cause for the decreasing
performance
(I noticed that the time for adding documents stays constant after removing the declaration). I'll do some profiling in order to find
out
if this can be sped up without too much effort (it may take a while, though, because I'll be on leave for a while from tomorrow).
On Tue, Sep 23, 2014 at 12:25 PM, Gerald de Jong <gerald@delving.eu
wrote: > I don't know what causes the gradual slowdown. My assumption was > that > it > was the "optimize" which would cause the index to be built, so I > didn't > expect a slowdown at all during "add" calls, especially when > autoflush > is > false. > > I add documents with the following paths: > > /f/f/e/ffe0f5be2aa14e81050f759c8f9c3eb7.xml > > The xml file name is a hash of the contents, and it is placed in a > path > such > that the export spreads out the files nicely into a file system > tree, > rather > than putting a million docs into one directory. > > The document content is nothing special, wrapped in a special tag: > > <narthex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > id="20412518" > mod="2014-09-23T11:11:51.007+02:00"> > <record> > <priref>20412518</priref> > <current_location>FTA</current_location> > <current_location.type/> > <description>Ingang op de binnenplaats van de > zuidvleugel</description> > <collection>Fotocollectie</collection> > <production.date.start>1925-08-06</production.date.start> > <reproduction.format/> > > > >
<reproduction.reference>2186abf4-7108-f9b8-ffbb-902881afe836</reproduction.reference>
> <creator.role>Fotograaf</creator.role> > <object_number>9.387</object_number> > <monument.label/> > <monument.zipcode/> > <monument.name>Kasteel Hoensbroek</monument.name> > <monument.record_number>284330</monument.record_number> > <reproduction.date/> > <reproduction.notes>Oude filepath: > 0009\009387.jpg</reproduction.notes> > <reproduction.type/> > <reproduction.creator/> > <rights.type>Copyright</rights.type> > <technique>Neg.zw</technique> > <creator>Scheepens, W.C.L.A.</creator> > <order_number>avh04-2008</order_number> > <input.date>2008-04-01</input.date> > <edit.date>2011-05-03</edit.date> > <edit.date>2008-04-28</edit.date> > <monument.historical_address/> > <content.subject.type value="SUBJECT" option="SUBJECT"> > <text language="0">subject</text> > <text language="1">onderwerp</text> > <text language="2">sujet</text> > <text language="3">Thema</text> > <text language="4">موضوع</text> > <text language="6">θέμα</text> > </content.subject.type> > <content.subject.type value="SUBJECT" option="SUBJECT"> > <text language="0">subject</text> > <text language="1">onderwerp</text> > <text language="2">sujet</text> > <text language="3">Thema</text> > <text language="4">موضوع</text> > <text language="6">θέμα</text> > </content.subject.type> > <content.subject>Kasteel</content.subject> > <content.subject>Binnenplaats</content.subject> > <monument.province>Limburg</monument.province> > <monument.place>Hoensbroek</monument.place> > <monument.number/> > <monument.county/> > <monument.country>Nederland</monument.country> > <monument.house_number>18</monument.house_number> > <monument.street>Klinkertstraat</monument.street> > <monument.house_number.addition/> > <monument.complex_number/> > <monument.number.x_coordinates/> > <monument.number.y_coordinates/> > <monument.geographical_keyword/> > <monument.complex_number.x_coordinates/> > <monument.complex_number.y_coordinates/> > <creator.date_of_birth/> > <creator.date_of_death/> > <input.name>a.vanhoute</input.name> > <edit.name>RCEadmin</edit.name> > <edit.name>a.vanhoute</edit.name> > <creator.history/> > <record_type value="OBJECT" option="OBJECT"> > <text language="0">single object</text> > <text language="2">objet individuel</text> > <text language="3">Einzelnes Objekt</text> > </record_type> > <edit.time>03:10:32</edit.time> > <edit.time>11:17:08</edit.time> > <input.time>09:58:28</input.time> > <input.source>document>photographs</input.source> > <edit.source>collect>photograph</edit.source> > <edit.source>document>photographs</edit.source> > </record> > </narthex> > > On Tue, Sep 23, 2014 at 11:36 AM, Christian Grün > christian.gruen@gmail.com > wrote: >> >> > I set up to use the 8.0-SNAPSHOT and used the internal parser
as
>> > well. >> > In >> > your example you're not really giving much of a challenge to
the
>> > index, >> > since every doc is just <a/>. >> >> If I get it right, you assume the slowdown is due to the index >> structures? >> >> > With respect to ADD, I'm not seeing a significant performance >> > difference: >> >> Please give us more info on the data you are adding. Could you >> provide >> us with a sample document? >> >> >> > 8.0-SNAPSHOT >> > ------- >> > 10000: 9250ms >> > 20000: 7626ms >> > 30000: 7885ms >> > 40000: 8111ms >> > 50000: 8365ms >> > 60000: 8784ms >> > 70000: 9270ms >> > 80000: 9692ms >> > 90000: 10158ms >> > 100000: 10612ms >> > 110000: 11018ms >> > 120000: 11478ms >> > 130000: 11940ms >> > 140000: 12505ms >> > 150000: 13047ms >> > 160000: 13536ms >> > 170000: 14055ms >> > 180000: 14371ms >> > 190000: 14883ms >> > 200000: 15330ms >> > 210000: 15888ms >> > 220000: 16398ms >> > 230000: 16878ms >> > 240000: 17038ms >> > 250000: 17453ms >> > 260000: 17965ms >> > 270000: 18317ms >> > 280000: 18832ms >> > 290000: 19373ms >> > 300000: 19735ms >> > 310000: 20062ms >> > 320000: 20675ms >> > 330000: 21113ms >> > 340000: 21754ms >> > 350000: 22887ms >> > 360000: 22810ms >> > 370000: 22985ms >> > 380000: 23506ms >> > 390000: 23856ms >> > 400000: 24338ms >> > >> > 7.9 >> > ----- >> > 10000: 8229ms >> > 20000: 7587ms >> > 30000: 7973ms >> > 40000: 8282ms >> > 50000: 8717ms >> > 60000: 9294ms >> > 70000: 10105ms >> > 80000: 10669ms >> > 90000: 11301ms >> > 100000: 11835ms >> > 110000: 12413ms >> > 120000: 13000ms >> > 130000: 13577ms >> > 140000: 14331ms >> > 150000: 14488ms >> > 160000: 15025ms >> > 170000: 15463ms >> > 180000: 15815ms >> > 190000: 16153ms >> > 200000: 16314ms >> > 210000: 16562ms >> > 220000: 17186ms >> > 230000: 17862ms >> > 240000: 18340ms >> > 250000: 18790ms >> > 260000: 19313ms >> > 270000: 19850ms >> > 280000: 20225ms >> > 290000: 20650ms >> > 300000: 21062ms >> > 310000: 21595ms >> > 320000: 22022ms >> > 330000: 22414ms >> > 340000: 22925ms >> > 350000: 23514ms >> > 360000: 23762ms >> > 370000: 24360ms >> > 380000: 25028ms >> > 390000: 25446ms >> > 400000: 25700ms >> > >> > - Gerald de Jong >> > >> > >> > On Thu, Sep 18, 2014 at 6:57 PM, Christian Grün >> > christian.gruen@gmail.com >> > wrote: >> >> >> >> > Perhaps you can give me a hint as to why inserts slow down.j >> >> I didn't have time to check out 7.9, but I have done some >> >> testing >> >> with >> >> 8.0, and I didn't notice a real slow-down. This is Java
testing
>> >> script >> >> (1 mio documents are added in just 17 seconds; I'm using the >> >> internal >> >> BaseX parser to speed up the import): >> >> >> >> Performance p = new Performance(); >> >> Context ctx = new Context(); >> >> >> >> new CreateDB("db").execute(ctx); >> >> new Set(MainOptions.AUTOFLUSH, false).execute(ctx); >> >> new Set(MainOptions.INTPARSE, true).execute(ctx); >> >> for(int i = 0; i < 1000000; i++) { >> >> new Add("db", "<a/>").execute(ctx); >> >> } >> >> ctx.close(); >> >> System.out.println(p); >> >> >> >> Hope this helps, >> >> Christian >> > >> > >> > >> > >> > -- >> > Delving BV, Vasteland 8, Rotterdam >> > http://www.delving.eu >> > http://twitter.com/fluxe >> > skype: beautifulcode >> > +31629339805 > > > > > -- > Delving BV, Vasteland 8, Rotterdam > http://www.delving.eu > http://twitter.com/fluxe > skype: beautifulcode > +31629339805
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
A philosophical question, perhaps, or one that might be easily answered by someone with a lot more BaseX experience than me:
Would it make more sense to store one big "file" in BaseX corresponding to the, say, 1.2 million records, rather than storing 1.2 million cleverly named xml documents as i'm doing now? I suppose add would then become insert (after - for speed), but would that maybe overcome the namespace-related performance issue and even be faster in general?
On Tue, Sep 23, 2014 at 2:05 PM, Gerald de Jong gerald@delving.eu wrote:
Considering that the dataset I just mentioned involves 1.2 million add commands, it does become a bit of annoyance with some large datasets like this. We can have some patience for insertion, even with such a slowdown, so I wouldn't say bottleneck exactly.
Can you point me to an example of querying multiple databases? I could try splitting the big datasets up.
The big problem I have right now is the IllegalMonitorStateException that freezes the basexserver. After this happens I have to kill -9 the process even.
On Tue, Sep 23, 2014 at 1:55 PM, Christian Grün <christian.gruen@gmail.com
wrote:
Maybe a general question: Is the insertion really a bottleneck in your scenario? How many data do you want to store in a single database? You could e.g. store your data in multiple databases, which can then all be queried by a single XQuery expression.
On Tue, Sep 23, 2014 at 1:50 PM, Gerald de Jong gerald@delving.eu wrote:
The other case I'm testing has five necessary namespaces. :(
10000: 6462ms 20000: 7592ms 30000: 8689ms 40000: 9417ms 50000: 9566ms 60000: 10368ms 70000: 10963ms 80000: 12167ms
Is there any direction you can suggest to look for a workaround?
On Tue, Sep 23, 2014 at 1:43 PM, Christian Grün <
christian.gruen@gmail.com>
wrote:
This namespace happens to be unnecessary, but others won't be. I'm
so
curious how this can be the thing.
Unfortunately, the intricacies of namespaces have been keeping us XML implementers busy for a long time, and the XPath and storage algorithms would be much simpler, if not trivial, without the notion of namespaces. This is why it would take quite a while to explain what are the reasons for that, and as your input document only contains one namespaces, I'm not surprised that you are surprised ;) To put it in a nutshell: it's usually easy to optimize single namespaces issues, but it's difficult to optimize all cases that happen in practice.
But I'll keep track of your use case.
On Tue, Sep 23, 2014 at 1:30 PM, Gerald de Jong gerald@delving.eu
wrote:
On Tue, Sep 23, 2014 at 1:20 PM, Gerald de Jong gerald@delving.eu wrote:
WOW, really... the namespace? Because it's unused, or is it always going to slow when there are namespaces?
On Tue, Sep 23, 2014 at 1:13 PM, Christian Grün christian.gruen@gmail.com wrote: > > Thanks for the document. The declaration of the (unused) namespace
in
> the root element seems to be the cause for the decreasing
performance
> (I noticed that the time for adding documents stays constant after > removing the declaration). I'll do some profiling in order to find
out
> if this can be sped up without too much effort (it may take a
while,
> though, because I'll be on leave for a while from tomorrow). > > > On Tue, Sep 23, 2014 at 12:25 PM, Gerald de Jong <
gerald@delving.eu>
> wrote: > > I don't know what causes the gradual slowdown. My assumption was > > that > > it > > was the "optimize" which would cause the index to be built, so I > > didn't > > expect a slowdown at all during "add" calls, especially when > > autoflush > > is > > false. > > > > I add documents with the following paths: > > > > /f/f/e/ffe0f5be2aa14e81050f759c8f9c3eb7.xml > > > > The xml file name is a hash of the contents, and it is placed in
a
> > path > > such > > that the export spreads out the files nicely into a file system > > tree, > > rather > > than putting a million docs into one directory. > > > > The document content is nothing special, wrapped in a special
tag:
> > > > <narthex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > > id="20412518" > > mod="2014-09-23T11:11:51.007+02:00"> > > <record> > > <priref>20412518</priref> > > <current_location>FTA</current_location> > > <current_location.type/> > > <description>Ingang op de binnenplaats van de > > zuidvleugel</description> > > <collection>Fotocollectie</collection> > > <production.date.start>1925-08-06</production.date.start> > > <reproduction.format/> > > > > > > > >
<reproduction.reference>2186abf4-7108-f9b8-ffbb-902881afe836</reproduction.reference>
> > <creator.role>Fotograaf</creator.role> > > <object_number>9.387</object_number> > > <monument.label/> > > <monument.zipcode/> > > <monument.name>Kasteel Hoensbroek</monument.name> > > <monument.record_number>284330</monument.record_number> > > <reproduction.date/> > > <reproduction.notes>Oude filepath: > > 0009\009387.jpg</reproduction.notes> > > <reproduction.type/> > > <reproduction.creator/> > > <rights.type>Copyright</rights.type> > > <technique>Neg.zw</technique> > > <creator>Scheepens, W.C.L.A.</creator> > > <order_number>avh04-2008</order_number> > > <input.date>2008-04-01</input.date> > > <edit.date>2011-05-03</edit.date> > > <edit.date>2008-04-28</edit.date> > > <monument.historical_address/> > > <content.subject.type value="SUBJECT" option="SUBJECT"> > > <text language="0">subject</text> > > <text language="1">onderwerp</text> > > <text language="2">sujet</text> > > <text language="3">Thema</text> > > <text language="4">موضوع</text> > > <text language="6">θέμα</text> > > </content.subject.type> > > <content.subject.type value="SUBJECT" option="SUBJECT"> > > <text language="0">subject</text> > > <text language="1">onderwerp</text> > > <text language="2">sujet</text> > > <text language="3">Thema</text> > > <text language="4">موضوع</text> > > <text language="6">θέμα</text> > > </content.subject.type> > > <content.subject>Kasteel</content.subject> > > <content.subject>Binnenplaats</content.subject> > > <monument.province>Limburg</monument.province> > > <monument.place>Hoensbroek</monument.place> > > <monument.number/> > > <monument.county/> > > <monument.country>Nederland</monument.country> > > <monument.house_number>18</monument.house_number> > > <monument.street>Klinkertstraat</monument.street> > > <monument.house_number.addition/> > > <monument.complex_number/> > > <monument.number.x_coordinates/> > > <monument.number.y_coordinates/> > > <monument.geographical_keyword/> > > <monument.complex_number.x_coordinates/> > > <monument.complex_number.y_coordinates/> > > <creator.date_of_birth/> > > <creator.date_of_death/> > > <input.name>a.vanhoute</input.name> > > <edit.name>RCEadmin</edit.name> > > <edit.name>a.vanhoute</edit.name> > > <creator.history/> > > <record_type value="OBJECT" option="OBJECT"> > > <text language="0">single object</text> > > <text language="2">objet individuel</text> > > <text language="3">Einzelnes Objekt</text> > > </record_type> > > <edit.time>03:10:32</edit.time> > > <edit.time>11:17:08</edit.time> > > <input.time>09:58:28</input.time> > > <input.source>document>photographs</input.source> > > <edit.source>collect>photograph</edit.source> > > <edit.source>document>photographs</edit.source> > > </record> > > </narthex> > > > > On Tue, Sep 23, 2014 at 11:36 AM, Christian Grün > > christian.gruen@gmail.com > > wrote: > >> > >> > I set up to use the 8.0-SNAPSHOT and used the internal parser
as
> >> > well. > >> > In > >> > your example you're not really giving much of a challenge to
the
> >> > index, > >> > since every doc is just <a/>. > >> > >> If I get it right, you assume the slowdown is due to the index > >> structures? > >> > >> > With respect to ADD, I'm not seeing a significant performance > >> > difference: > >> > >> Please give us more info on the data you are adding. Could you > >> provide > >> us with a sample document? > >> > >> > >> > 8.0-SNAPSHOT > >> > ------- > >> > 10000: 9250ms > >> > 20000: 7626ms > >> > 30000: 7885ms > >> > 40000: 8111ms > >> > 50000: 8365ms > >> > 60000: 8784ms > >> > 70000: 9270ms > >> > 80000: 9692ms > >> > 90000: 10158ms > >> > 100000: 10612ms > >> > 110000: 11018ms > >> > 120000: 11478ms > >> > 130000: 11940ms > >> > 140000: 12505ms > >> > 150000: 13047ms > >> > 160000: 13536ms > >> > 170000: 14055ms > >> > 180000: 14371ms > >> > 190000: 14883ms > >> > 200000: 15330ms > >> > 210000: 15888ms > >> > 220000: 16398ms > >> > 230000: 16878ms > >> > 240000: 17038ms > >> > 250000: 17453ms > >> > 260000: 17965ms > >> > 270000: 18317ms > >> > 280000: 18832ms > >> > 290000: 19373ms > >> > 300000: 19735ms > >> > 310000: 20062ms > >> > 320000: 20675ms > >> > 330000: 21113ms > >> > 340000: 21754ms > >> > 350000: 22887ms > >> > 360000: 22810ms > >> > 370000: 22985ms > >> > 380000: 23506ms > >> > 390000: 23856ms > >> > 400000: 24338ms > >> > > >> > 7.9 > >> > ----- > >> > 10000: 8229ms > >> > 20000: 7587ms > >> > 30000: 7973ms > >> > 40000: 8282ms > >> > 50000: 8717ms > >> > 60000: 9294ms > >> > 70000: 10105ms > >> > 80000: 10669ms > >> > 90000: 11301ms > >> > 100000: 11835ms > >> > 110000: 12413ms > >> > 120000: 13000ms > >> > 130000: 13577ms > >> > 140000: 14331ms > >> > 150000: 14488ms > >> > 160000: 15025ms > >> > 170000: 15463ms > >> > 180000: 15815ms > >> > 190000: 16153ms > >> > 200000: 16314ms > >> > 210000: 16562ms > >> > 220000: 17186ms > >> > 230000: 17862ms > >> > 240000: 18340ms > >> > 250000: 18790ms > >> > 260000: 19313ms > >> > 270000: 19850ms > >> > 280000: 20225ms > >> > 290000: 20650ms > >> > 300000: 21062ms > >> > 310000: 21595ms > >> > 320000: 22022ms > >> > 330000: 22414ms > >> > 340000: 22925ms > >> > 350000: 23514ms > >> > 360000: 23762ms > >> > 370000: 24360ms > >> > 380000: 25028ms > >> > 390000: 25446ms > >> > 400000: 25700ms > >> > > >> > - Gerald de Jong > >> > > >> > > >> > On Thu, Sep 18, 2014 at 6:57 PM, Christian Grün > >> > christian.gruen@gmail.com > >> > wrote: > >> >> > >> >> > Perhaps you can give me a hint as to why inserts slow
down.j
> >> >> I didn't have time to check out 7.9, but I have done some > >> >> testing > >> >> with > >> >> 8.0, and I didn't notice a real slow-down. This is Java
testing
> >> >> script > >> >> (1 mio documents are added in just 17 seconds; I'm using the > >> >> internal > >> >> BaseX parser to speed up the import): > >> >> > >> >> Performance p = new Performance(); > >> >> Context ctx = new Context(); > >> >> > >> >> new CreateDB("db").execute(ctx); > >> >> new Set(MainOptions.AUTOFLUSH, false).execute(ctx); > >> >> new Set(MainOptions.INTPARSE, true).execute(ctx); > >> >> for(int i = 0; i < 1000000; i++) { > >> >> new Add("db", "<a/>").execute(ctx); > >> >> } > >> >> ctx.close(); > >> >> System.out.println(p); > >> >> > >> >> Hope this helps, > >> >> Christian > >> > > >> > > >> > > >> > > >> > -- > >> > Delving BV, Vasteland 8, Rotterdam > >> > http://www.delving.eu > >> > http://twitter.com/fluxe > >> > skype: beautifulcode > >> > +31629339805 > > > > > > > > > > -- > > Delving BV, Vasteland 8, Rotterdam > > http://www.delving.eu > > http://twitter.com/fluxe > > skype: beautifulcode > > +31629339805
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
Hi Gerald, not sure but take into account that, AFAIK, there are limitations on the size (number of nodes) that can be kept in a single DB. M.
On 23/09/2014 15:32, Gerald de Jong wrote:
A philosophical question, perhaps, or one that might be easily answered by someone with a lot more BaseX experience than me:
Would it make more sense to store one big "file" in BaseX corresponding to the, say, 1.2 million records, rather than storing 1.2 million cleverly named xml documents as i'm doing now? I suppose add would then become insert (after - for speed), but would that maybe overcome the namespace-related performance issue and even be faster in general?
On Tue, Sep 23, 2014 at 2:05 PM, Gerald de Jong <gerald@delving.eu mailto:gerald@delving.eu> wrote:
Considering that the dataset I just mentioned involves 1.2 million add commands, it does become a bit of annoyance with some large datasets like this. We can have some patience for insertion, even with such a slowdown, so I wouldn't say bottleneck exactly. Can you point me to an example of querying multiple databases? I could try splitting the big datasets up. The big problem I have right now is the IllegalMonitorStateException that freezes the basexserver. After this happens I have to kill -9 the process even. On Tue, Sep 23, 2014 at 1:55 PM, Christian Grün <christian.gruen@gmail.com <mailto:christian.gruen@gmail.com>> wrote: Maybe a general question: Is the insertion really a bottleneck in your scenario? How many data do you want to store in a single database? You could e.g. store your data in multiple databases, which can then all be queried by a single XQuery expression. On Tue, Sep 23, 2014 at 1:50 PM, Gerald de Jong <gerald@delving.eu <mailto:gerald@delving.eu>> wrote: > The other case I'm testing has five necessary namespaces. :( > > 10000: 6462ms > 20000: 7592ms > 30000: 8689ms > 40000: 9417ms > 50000: 9566ms > 60000: 10368ms > 70000: 10963ms > 80000: 12167ms > > Is there any direction you can suggest to look for a workaround? > > > On Tue, Sep 23, 2014 at 1:43 PM, Christian Grün <christian.gruen@gmail.com <mailto:christian.gruen@gmail.com>> > wrote: >> >> > This namespace happens to be unnecessary, but others won't be. I'm so >> > curious how this can be the thing. >> >> Unfortunately, the intricacies of namespaces have been keeping us XML >> implementers busy for a long time, and the XPath and storage >> algorithms would be much simpler, if not trivial, without the notion >> of namespaces. This is why it would take quite a while to explain what >> are the reasons for that, and as your input document only contains one >> namespaces, I'm not surprised that you are surprised ;) To put it in a >> nutshell: it's usually easy to optimize single namespaces issues, but >> it's difficult to optimize all cases that happen in practice. >> >> But I'll keep track of your use case. >> >> >> On Tue, Sep 23, 2014 at 1:30 PM, Gerald de Jong <gerald@delving.eu <mailto:gerald@delving.eu>> wrote: >> > >> > On Tue, Sep 23, 2014 at 1:20 PM, Gerald de Jong <gerald@delving.eu <mailto:gerald@delving.eu>> >> > wrote: >> >> >> >> WOW, really... the namespace? Because it's unused, or is it always >> >> going >> >> to slow when there are namespaces? >> >> >> >> On Tue, Sep 23, 2014 at 1:13 PM, Christian Grün >> >> <christian.gruen@gmail.com <mailto:christian.gruen@gmail.com>> wrote: >> >>> >> >>> Thanks for the document. The declaration of the (unused) namespace in >> >>> the root element seems to be the cause for the decreasing performance >> >>> (I noticed that the time for adding documents stays constant after >> >>> removing the declaration). I'll do some profiling in order to find out >> >>> if this can be sped up without too much effort (it may take a while, >> >>> though, because I'll be on leave for a while from tomorrow). >> >>> >> >>> >> >>> On Tue, Sep 23, 2014 at 12:25 PM, Gerald de Jong <gerald@delving.eu <mailto:gerald@delving.eu>> >> >>> wrote: >> >>> > I don't know what causes the gradual slowdown. My assumption was >> >>> > that >> >>> > it >> >>> > was the "optimize" which would cause the index to be built, so I >> >>> > didn't >> >>> > expect a slowdown at all during "add" calls, especially when >> >>> > autoflush >> >>> > is >> >>> > false. >> >>> > >> >>> > I add documents with the following paths: >> >>> > >> >>> > /f/f/e/ffe0f5be2aa14e81050f759c8f9c3eb7.xml >> >>> > >> >>> > The xml file name is a hash of the contents, and it is placed in a >> >>> > path >> >>> > such >> >>> > that the export spreads out the files nicely into a file system >> >>> > tree, >> >>> > rather >> >>> > than putting a million docs into one directory. >> >>> > >> >>> > The document content is nothing special, wrapped in a special tag: >> >>> > >> >>> > <narthex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >> >>> > id="20412518" >> >>> > mod="2014-09-23T11:11:51.007+02:00"> >> >>> > <record> >> >>> > <priref>20412518</priref> >> >>> > <current_location>FTA</current_location> >> >>> > <current_location.type/> >> >>> > <description>Ingang op de binnenplaats van de >> >>> > zuidvleugel</description> >> >>> > <collection>Fotocollectie</collection> >> >>> > <production.date.start>1925-08-06</production.date.start> >> >>> > <reproduction.format/> >> >>> > >> >>> > >> >>> > >> >>> > <reproduction.reference>2186abf4-7108-f9b8-ffbb-902881afe836</reproduction.reference> >> >>> > <creator.role>Fotograaf</creator.role> >> >>> > <object_number>9.387</object_number> >> >>> > <monument.label/> >> >>> > <monument.zipcode/> >> >>> > <monument.name <http://monument.name>>Kasteel Hoensbroek</monument.name <http://monument.name>> >> >>> > <monument.record_number>284330</monument.record_number> >> >>> > <reproduction.date/> >> >>> > <reproduction.notes>Oude filepath: >> >>> > 0009\009387.jpg</reproduction.notes> >> >>> > <reproduction.type/> >> >>> > <reproduction.creator/> >> >>> > <rights.type>Copyright</rights.type> >> >>> > <technique>Neg.zw</technique> >> >>> > <creator>Scheepens, W.C.L.A.</creator> >> >>> > <order_number>avh04-2008</order_number> >> >>> > <input.date>2008-04-01</input.date> >> >>> > <edit.date>2011-05-03</edit.date> >> >>> > <edit.date>2008-04-28</edit.date> >> >>> > <monument.historical_address/> >> >>> > <content.subject.type value="SUBJECT" option="SUBJECT"> >> >>> > <text language="0">subject</text> >> >>> > <text language="1">onderwerp</text> >> >>> > <text language="2">sujet</text> >> >>> > <text language="3">Thema</text> >> >>> > <text language="4">موضوع</text> >> >>> > <text language="6">θέμα</text> >> >>> > </content.subject.type> >> >>> > <content.subject.type value="SUBJECT" option="SUBJECT"> >> >>> > <text language="0">subject</text> >> >>> > <text language="1">onderwerp</text> >> >>> > <text language="2">sujet</text> >> >>> > <text language="3">Thema</text> >> >>> > <text language="4">موضوع</text> >> >>> > <text language="6">θέμα</text> >> >>> > </content.subject.type> >> >>> > <content.subject>Kasteel</content.subject> >> >>> > <content.subject>Binnenplaats</content.subject> >> >>> > <monument.province>Limburg</monument.province> >> >>> > <monument.place>Hoensbroek</monument.place> >> >>> > <monument.number/> >> >>> > <monument.county/> >> >>> > <monument.country>Nederland</monument.country> >> >>> > <monument.house_number>18</monument.house_number> >> >>> > <monument.street>Klinkertstraat</monument.street> >> >>> > <monument.house_number.addition/> >> >>> > <monument.complex_number/> >> >>> > <monument.number.x_coordinates/> >> >>> > <monument.number.y_coordinates/> >> >>> > <monument.geographical_keyword/> >> >>> > <monument.complex_number.x_coordinates/> >> >>> > <monument.complex_number.y_coordinates/> >> >>> > <creator.date_of_birth/> >> >>> > <creator.date_of_death/> >> >>> > <input.name <http://input.name>>a.vanhoute</input.name <http://input.name>> >> >>> > <edit.name <http://edit.name>>RCEadmin</edit.name <http://edit.name>> >> >>> > <edit.name <http://edit.name>>a.vanhoute</edit.name <http://edit.name>> >> >>> > <creator.history/> >> >>> > <record_type value="OBJECT" option="OBJECT"> >> >>> > <text language="0">single object</text> >> >>> > <text language="2">objet individuel</text> >> >>> > <text language="3">Einzelnes Objekt</text> >> >>> > </record_type> >> >>> > <edit.time>03:10:32</edit.time> >> >>> > <edit.time>11:17:08</edit.time> >> >>> > <input.time>09:58:28</input.time> >> >>> > <input.source>document>photographs</input.source> >> >>> > <edit.source>collect>photograph</edit.source> >> >>> > <edit.source>document>photographs</edit.source> >> >>> > </record> >> >>> > </narthex> >> >>> > >> >>> > On Tue, Sep 23, 2014 at 11:36 AM, Christian Grün >> >>> > <christian.gruen@gmail.com <mailto:christian.gruen@gmail.com>> >> >>> > wrote: >> >>> >> >> >>> >> > I set up to use the 8.0-SNAPSHOT and used the internal parser as >> >>> >> > well. >> >>> >> > In >> >>> >> > your example you're not really giving much of a challenge to the >> >>> >> > index, >> >>> >> > since every doc is just <a/>. >> >>> >> >> >>> >> If I get it right, you assume the slowdown is due to the index >> >>> >> structures? >> >>> >> >> >>> >> > With respect to ADD, I'm not seeing a significant performance >> >>> >> > difference: >> >>> >> >> >>> >> Please give us more info on the data you are adding. Could you >> >>> >> provide >> >>> >> us with a sample document? >> >>> >> >> >>> >> >> >>> >> > 8.0-SNAPSHOT >> >>> >> > ------- >> >>> >> > 10000: 9250ms >> >>> >> > 20000: 7626ms >> >>> >> > 30000: 7885ms >> >>> >> > 40000: 8111ms >> >>> >> > 50000: 8365ms >> >>> >> > 60000: 8784ms >> >>> >> > 70000: 9270ms >> >>> >> > 80000: 9692ms >> >>> >> > 90000: 10158ms >> >>> >> > 100000: 10612ms >> >>> >> > 110000: 11018ms >> >>> >> > 120000: 11478ms >> >>> >> > 130000: 11940ms >> >>> >> > 140000: 12505ms >> >>> >> > 150000: 13047ms >> >>> >> > 160000: 13536ms >> >>> >> > 170000: 14055ms >> >>> >> > 180000: 14371ms >> >>> >> > 190000: 14883ms >> >>> >> > 200000: 15330ms >> >>> >> > 210000: 15888ms >> >>> >> > 220000: 16398ms >> >>> >> > 230000: 16878ms >> >>> >> > 240000: 17038ms >> >>> >> > 250000: 17453ms >> >>> >> > 260000: 17965ms >> >>> >> > 270000: 18317ms >> >>> >> > 280000: 18832ms >> >>> >> > 290000: 19373ms >> >>> >> > 300000: 19735ms >> >>> >> > 310000: 20062ms >> >>> >> > 320000: 20675ms >> >>> >> > 330000: 21113ms >> >>> >> > 340000: 21754ms >> >>> >> > 350000: 22887ms >> >>> >> > 360000: 22810ms >> >>> >> > 370000: 22985ms >> >>> >> > 380000: 23506ms >> >>> >> > 390000: 23856ms >> >>> >> > 400000: 24338ms >> >>> >> > >> >>> >> > 7.9 >> >>> >> > ----- >> >>> >> > 10000: 8229ms >> >>> >> > 20000: 7587ms >> >>> >> > 30000: 7973ms >> >>> >> > 40000: 8282ms >> >>> >> > 50000: 8717ms >> >>> >> > 60000: 9294ms >> >>> >> > 70000: 10105ms >> >>> >> > 80000: 10669ms >> >>> >> > 90000: 11301ms >> >>> >> > 100000: 11835ms >> >>> >> > 110000: 12413ms >> >>> >> > 120000: 13000ms >> >>> >> > 130000: 13577ms >> >>> >> > 140000: 14331ms >> >>> >> > 150000: 14488ms >> >>> >> > 160000: 15025ms >> >>> >> > 170000: 15463ms >> >>> >> > 180000: 15815ms >> >>> >> > 190000: 16153ms >> >>> >> > 200000: 16314ms >> >>> >> > 210000: 16562ms >> >>> >> > 220000: 17186ms >> >>> >> > 230000: 17862ms >> >>> >> > 240000: 18340ms >> >>> >> > 250000: 18790ms >> >>> >> > 260000: 19313ms >> >>> >> > 270000: 19850ms >> >>> >> > 280000: 20225ms >> >>> >> > 290000: 20650ms >> >>> >> > 300000: 21062ms >> >>> >> > 310000: 21595ms >> >>> >> > 320000: 22022ms >> >>> >> > 330000: 22414ms >> >>> >> > 340000: 22925ms >> >>> >> > 350000: 23514ms >> >>> >> > 360000: 23762ms >> >>> >> > 370000: 24360ms >> >>> >> > 380000: 25028ms >> >>> >> > 390000: 25446ms >> >>> >> > 400000: 25700ms >> >>> >> > >> >>> >> > - Gerald de Jong >> >>> >> > >> >>> >> > >> >>> >> > On Thu, Sep 18, 2014 at 6:57 PM, Christian Grün >> >>> >> > <christian.gruen@gmail.com <mailto:christian.gruen@gmail.com>> >> >>> >> > wrote: >> >>> >> >> >> >>> >> >> > Perhaps you can give me a hint as to why inserts slow down.j >> >>> >> >> I didn't have time to check out 7.9, but I have done some >> >>> >> >> testing >> >>> >> >> with >> >>> >> >> 8.0, and I didn't notice a real slow-down. This is Java testing >> >>> >> >> script >> >>> >> >> (1 mio documents are added in just 17 seconds; I'm using the >> >>> >> >> internal >> >>> >> >> BaseX parser to speed up the import): >> >>> >> >> >> >>> >> >> Performance p = new Performance(); >> >>> >> >> Context ctx = new Context(); >> >>> >> >> >> >>> >> >> new CreateDB("db").execute(ctx); >> >>> >> >> new Set(MainOptions.AUTOFLUSH, false).execute(ctx); >> >>> >> >> new Set(MainOptions.INTPARSE, true).execute(ctx); >> >>> >> >> for(int i = 0; i < 1000000; i++) { >> >>> >> >> new Add("db", "<a/>").execute(ctx); >> >>> >> >> } >> >>> >> >> ctx.close(); >> >>> >> >> System.out.println(p); >> >>> >> >> >> >>> >> >> Hope this helps, >> >>> >> >> Christian >> >>> >> > >> >>> >> > >> >>> >> > >> >>> >> > >> >>> >> > -- >> >>> >> > Delving BV, Vasteland 8, Rotterdam >> >>> >> > http://www.delving.eu >> >>> >> > http://twitter.com/fluxe >> >>> >> > skype: beautifulcode >> >>> >> > +31629339805 <tel:%2B31629339805> >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > -- >> >>> > Delving BV, Vasteland 8, Rotterdam >> >>> > http://www.delving.eu >> >>> > http://twitter.com/fluxe >> >>> > skype: beautifulcode >> >>> > +31629339805 <tel:%2B31629339805> >> >> >> >> >> >> >> >> >> >> -- >> >> Delving BV, Vasteland 8, Rotterdam >> >> http://www.delving.eu >> >> http://twitter.com/fluxe >> >> skype: beautifulcode >> >> +31629339805 <tel:%2B31629339805> >> > >> > >> > >> > >> > -- >> > Delving BV, Vasteland 8, Rotterdam >> > http://www.delving.eu >> > http://twitter.com/fluxe >> > skype: beautifulcode >> > +31629339805 <tel:%2B31629339805> > > > > > -- > Delving BV, Vasteland 8, Rotterdam > http://www.delving.eu > http://twitter.com/fluxe > skype: beautifulcode > +31629339805 <tel:%2B31629339805> -- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805 <tel:%2B31629339805>
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
I've been looking around and I can't find what those limitations are. I'll scan the 7.9 book tonight, maybe it's there.
Alternatively, maybe it would make sense to store, say, 100,000 documents per database, and then query over multiple when necessary.
On Tue, Sep 23, 2014 at 3:40 PM, Marco Lettere marco.lettere@dedalus.eu wrote:
Hi Gerald, not sure but take into account that, AFAIK, there are limitations on the size (number of nodes) that can be kept in a single DB. M.
On 23/09/2014 15:32, Gerald de Jong wrote:
A philosophical question, perhaps, or one that might be easily answered by someone with a lot more BaseX experience than me:
Would it make more sense to store one big "file" in BaseX corresponding to the, say, 1.2 million records, rather than storing 1.2 million cleverly named xml documents as i'm doing now? I suppose add would then become insert (after - for speed), but would that maybe overcome the namespace-related performance issue and even be faster in general?
On Tue, Sep 23, 2014 at 2:05 PM, Gerald de Jong gerald@delving.eu wrote:
Considering that the dataset I just mentioned involves 1.2 million add commands, it does become a bit of annoyance with some large datasets like this. We can have some patience for insertion, even with such a slowdown, so I wouldn't say bottleneck exactly.
Can you point me to an example of querying multiple databases? I could try splitting the big datasets up.
The big problem I have right now is the IllegalMonitorStateException that freezes the basexserver. After this happens I have to kill -9 the process even.
On Tue, Sep 23, 2014 at 1:55 PM, Christian Grün < christian.gruen@gmail.com> wrote:
Maybe a general question: Is the insertion really a bottleneck in your scenario? How many data do you want to store in a single database? You could e.g. store your data in multiple databases, which can then all be queried by a single XQuery expression.
On Tue, Sep 23, 2014 at 1:50 PM, Gerald de Jong gerald@delving.eu wrote:
The other case I'm testing has five necessary namespaces. :(
10000: 6462ms 20000: 7592ms 30000: 8689ms 40000: 9417ms 50000: 9566ms 60000: 10368ms 70000: 10963ms 80000: 12167ms
Is there any direction you can suggest to look for a workaround?
On Tue, Sep 23, 2014 at 1:43 PM, Christian Grün <
christian.gruen@gmail.com>
wrote:
This namespace happens to be unnecessary, but others won't be. I'm
so
curious how this can be the thing.
Unfortunately, the intricacies of namespaces have been keeping us XML implementers busy for a long time, and the XPath and storage algorithms would be much simpler, if not trivial, without the notion of namespaces. This is why it would take quite a while to explain what are the reasons for that, and as your input document only contains one namespaces, I'm not surprised that you are surprised ;) To put it in a nutshell: it's usually easy to optimize single namespaces issues, but it's difficult to optimize all cases that happen in practice.
But I'll keep track of your use case.
On Tue, Sep 23, 2014 at 1:30 PM, Gerald de Jong gerald@delving.eu
wrote:
On Tue, Sep 23, 2014 at 1:20 PM, Gerald de Jong gerald@delving.eu wrote: > > WOW, really... the namespace? Because it's unused, or is it always > going > to slow when there are namespaces? > > On Tue, Sep 23, 2014 at 1:13 PM, Christian Grün > christian.gruen@gmail.com wrote: >> >> Thanks for the document. The declaration of the (unused)
namespace in
>> the root element seems to be the cause for the decreasing
performance
>> (I noticed that the time for adding documents stays constant after >> removing the declaration). I'll do some profiling in order to
find out
>> if this can be sped up without too much effort (it may take a
while,
>> though, because I'll be on leave for a while from tomorrow). >> >> >> On Tue, Sep 23, 2014 at 12:25 PM, Gerald de Jong <
gerald@delving.eu>
>> wrote: >> > I don't know what causes the gradual slowdown. My assumption
was
>> > that >> > it >> > was the "optimize" which would cause the index to be built, so I >> > didn't >> > expect a slowdown at all during "add" calls, especially when >> > autoflush >> > is >> > false. >> > >> > I add documents with the following paths: >> > >> > /f/f/e/ffe0f5be2aa14e81050f759c8f9c3eb7.xml >> > >> > The xml file name is a hash of the contents, and it is placed
in a
>> > path >> > such >> > that the export spreads out the files nicely into a file system >> > tree, >> > rather >> > than putting a million docs into one directory. >> > >> > The document content is nothing special, wrapped in a special
tag:
>> > >> > <narthex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >> > id="20412518" >> > mod="2014-09-23T11:11:51.007+02:00"> >> > <record> >> > <priref>20412518</priref> >> > <current_location>FTA</current_location> >> > <current_location.type/> >> > <description>Ingang op de binnenplaats van de >> > zuidvleugel</description> >> > <collection>Fotocollectie</collection> >> > <production.date.start>1925-08-06</production.date.start> >> > <reproduction.format/> >> > >> > >> > >> >
<reproduction.reference>2186abf4-7108-f9b8-ffbb-902881afe836</reproduction.reference>
>> > <creator.role>Fotograaf</creator.role> >> > <object_number>9.387</object_number> >> > <monument.label/> >> > <monument.zipcode/> >> > <monument.name>Kasteel Hoensbroek</monument.name> >> > <monument.record_number>284330</monument.record_number> >> > <reproduction.date/> >> > <reproduction.notes>Oude filepath: >> > 0009\009387.jpg</reproduction.notes> >> > <reproduction.type/> >> > <reproduction.creator/> >> > <rights.type>Copyright</rights.type> >> > <technique>Neg.zw</technique> >> > <creator>Scheepens, W.C.L.A.</creator> >> > <order_number>avh04-2008</order_number> >> > <input.date>2008-04-01</input.date> >> > <edit.date>2011-05-03</edit.date> >> > <edit.date>2008-04-28</edit.date> >> > <monument.historical_address/> >> > <content.subject.type value="SUBJECT" option="SUBJECT"> >> > <text language="0">subject</text> >> > <text language="1">onderwerp</text> >> > <text language="2">sujet</text> >> > <text language="3">Thema</text> >> > <text language="4">موضوع</text> >> > <text language="6">θέμα</text> >> > </content.subject.type> >> > <content.subject.type value="SUBJECT" option="SUBJECT"> >> > <text language="0">subject</text> >> > <text language="1">onderwerp</text> >> > <text language="2">sujet</text> >> > <text language="3">Thema</text> >> > <text language="4">موضوع</text> >> > <text language="6">θέμα</text> >> > </content.subject.type> >> > <content.subject>Kasteel</content.subject> >> > <content.subject>Binnenplaats</content.subject> >> > <monument.province>Limburg</monument.province> >> > <monument.place>Hoensbroek</monument.place> >> > <monument.number/> >> > <monument.county/> >> > <monument.country>Nederland</monument.country> >> > <monument.house_number>18</monument.house_number> >> > <monument.street>Klinkertstraat</monument.street> >> > <monument.house_number.addition/> >> > <monument.complex_number/> >> > <monument.number.x_coordinates/> >> > <monument.number.y_coordinates/> >> > <monument.geographical_keyword/> >> > <monument.complex_number.x_coordinates/> >> > <monument.complex_number.y_coordinates/> >> > <creator.date_of_birth/> >> > <creator.date_of_death/> >> > <input.name>a.vanhoute</input.name> >> > <edit.name>RCEadmin</edit.name> >> > <edit.name>a.vanhoute</edit.name> >> > <creator.history/> >> > <record_type value="OBJECT" option="OBJECT"> >> > <text language="0">single object</text> >> > <text language="2">objet individuel</text> >> > <text language="3">Einzelnes Objekt</text> >> > </record_type> >> > <edit.time>03:10:32</edit.time> >> > <edit.time>11:17:08</edit.time> >> > <input.time>09:58:28</input.time> >> > <input.source>document>photographs</input.source> >> > <edit.source>collect>photograph</edit.source> >> > <edit.source>document>photographs</edit.source> >> > </record> >> > </narthex> >> > >> > On Tue, Sep 23, 2014 at 11:36 AM, Christian Grün >> > christian.gruen@gmail.com >> > wrote: >> >> >> >> > I set up to use the 8.0-SNAPSHOT and used the internal
parser as
>> >> > well. >> >> > In >> >> > your example you're not really giving much of a challenge to
the
>> >> > index, >> >> > since every doc is just <a/>. >> >> >> >> If I get it right, you assume the slowdown is due to the index >> >> structures? >> >> >> >> > With respect to ADD, I'm not seeing a significant performance >> >> > difference: >> >> >> >> Please give us more info on the data you are adding. Could you >> >> provide >> >> us with a sample document? >> >> >> >> >> >> > 8.0-SNAPSHOT >> >> > ------- >> >> > 10000: 9250ms >> >> > 20000: 7626ms >> >> > 30000: 7885ms >> >> > 40000: 8111ms >> >> > 50000: 8365ms >> >> > 60000: 8784ms >> >> > 70000: 9270ms >> >> > 80000: 9692ms >> >> > 90000: 10158ms >> >> > 100000: 10612ms >> >> > 110000: 11018ms >> >> > 120000: 11478ms >> >> > 130000: 11940ms >> >> > 140000: 12505ms >> >> > 150000: 13047ms >> >> > 160000: 13536ms >> >> > 170000: 14055ms >> >> > 180000: 14371ms >> >> > 190000: 14883ms >> >> > 200000: 15330ms >> >> > 210000: 15888ms >> >> > 220000: 16398ms >> >> > 230000: 16878ms >> >> > 240000: 17038ms >> >> > 250000: 17453ms >> >> > 260000: 17965ms >> >> > 270000: 18317ms >> >> > 280000: 18832ms >> >> > 290000: 19373ms >> >> > 300000: 19735ms >> >> > 310000: 20062ms >> >> > 320000: 20675ms >> >> > 330000: 21113ms >> >> > 340000: 21754ms >> >> > 350000: 22887ms >> >> > 360000: 22810ms >> >> > 370000: 22985ms >> >> > 380000: 23506ms >> >> > 390000: 23856ms >> >> > 400000: 24338ms >> >> > >> >> > 7.9 >> >> > ----- >> >> > 10000: 8229ms >> >> > 20000: 7587ms >> >> > 30000: 7973ms >> >> > 40000: 8282ms >> >> > 50000: 8717ms >> >> > 60000: 9294ms >> >> > 70000: 10105ms >> >> > 80000: 10669ms >> >> > 90000: 11301ms >> >> > 100000: 11835ms >> >> > 110000: 12413ms >> >> > 120000: 13000ms >> >> > 130000: 13577ms >> >> > 140000: 14331ms >> >> > 150000: 14488ms >> >> > 160000: 15025ms >> >> > 170000: 15463ms >> >> > 180000: 15815ms >> >> > 190000: 16153ms >> >> > 200000: 16314ms >> >> > 210000: 16562ms >> >> > 220000: 17186ms >> >> > 230000: 17862ms >> >> > 240000: 18340ms >> >> > 250000: 18790ms >> >> > 260000: 19313ms >> >> > 270000: 19850ms >> >> > 280000: 20225ms >> >> > 290000: 20650ms >> >> > 300000: 21062ms >> >> > 310000: 21595ms >> >> > 320000: 22022ms >> >> > 330000: 22414ms >> >> > 340000: 22925ms >> >> > 350000: 23514ms >> >> > 360000: 23762ms >> >> > 370000: 24360ms >> >> > 380000: 25028ms >> >> > 390000: 25446ms >> >> > 400000: 25700ms >> >> > >> >> > - Gerald de Jong >> >> > >> >> > >> >> > On Thu, Sep 18, 2014 at 6:57 PM, Christian Grün >> >> > christian.gruen@gmail.com >> >> > wrote: >> >> >> >> >> >> > Perhaps you can give me a hint as to why inserts slow
down.j
>> >> >> I didn't have time to check out 7.9, but I have done some >> >> >> testing >> >> >> with >> >> >> 8.0, and I didn't notice a real slow-down. This is Java
testing
>> >> >> script >> >> >> (1 mio documents are added in just 17 seconds; I'm using the >> >> >> internal >> >> >> BaseX parser to speed up the import): >> >> >> >> >> >> Performance p = new Performance(); >> >> >> Context ctx = new Context(); >> >> >> >> >> >> new CreateDB("db").execute(ctx); >> >> >> new Set(MainOptions.AUTOFLUSH, false).execute(ctx); >> >> >> new Set(MainOptions.INTPARSE, true).execute(ctx); >> >> >> for(int i = 0; i < 1000000; i++) { >> >> >> new Add("db", "<a/>").execute(ctx); >> >> >> } >> >> >> ctx.close(); >> >> >> System.out.println(p); >> >> >> >> >> >> Hope this helps, >> >> >> Christian >> >> > >> >> > >> >> > >> >> > >> >> > -- >> >> > Delving BV, Vasteland 8, Rotterdam >> >> > http://www.delving.eu >> >> > http://twitter.com/fluxe >> >> > skype: beautifulcode >> >> > +31629339805 >> > >> > >> > >> > >> > -- >> > Delving BV, Vasteland 8, Rotterdam >> > http://www.delving.eu >> > http://twitter.com/fluxe >> > skype: beautifulcode >> > +31629339805 > > > > > -- > Delving BV, Vasteland 8, Rotterdam > http://www.delving.eu > http://twitter.com/fluxe > skype: beautifulcode > +31629339805
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
I see on http://stackoverflow.com/questions/25113900/inserting-millions-of-xml-files-...
The limit is 2^29 which is 546,870,912 (the number of stored XML documents). The limit for XML elements is 2^31 which is 2,147,483,648 (although this includes all nodes including attributes, texts, etc.).
On Tue, Sep 23, 2014 at 3:50 PM, Gerald de Jong gerald@delving.eu wrote:
I've been looking around and I can't find what those limitations are. I'll scan the 7.9 book tonight, maybe it's there.
Alternatively, maybe it would make sense to store, say, 100,000 documents per database, and then query over multiple when necessary.
On Tue, Sep 23, 2014 at 3:40 PM, Marco Lettere marco.lettere@dedalus.eu wrote:
Hi Gerald, not sure but take into account that, AFAIK, there are limitations on the size (number of nodes) that can be kept in a single DB. M.
On 23/09/2014 15:32, Gerald de Jong wrote:
A philosophical question, perhaps, or one that might be easily answered by someone with a lot more BaseX experience than me:
Would it make more sense to store one big "file" in BaseX corresponding to the, say, 1.2 million records, rather than storing 1.2 million cleverly named xml documents as i'm doing now? I suppose add would then become insert (after - for speed), but would that maybe overcome the namespace-related performance issue and even be faster in general?
On Tue, Sep 23, 2014 at 2:05 PM, Gerald de Jong gerald@delving.eu wrote:
Considering that the dataset I just mentioned involves 1.2 million add commands, it does become a bit of annoyance with some large datasets like this. We can have some patience for insertion, even with such a slowdown, so I wouldn't say bottleneck exactly.
Can you point me to an example of querying multiple databases? I could try splitting the big datasets up.
The big problem I have right now is the IllegalMonitorStateException that freezes the basexserver. After this happens I have to kill -9 the process even.
On Tue, Sep 23, 2014 at 1:55 PM, Christian Grün < christian.gruen@gmail.com> wrote:
Maybe a general question: Is the insertion really a bottleneck in your scenario? How many data do you want to store in a single database? You could e.g. store your data in multiple databases, which can then all be queried by a single XQuery expression.
On Tue, Sep 23, 2014 at 1:50 PM, Gerald de Jong gerald@delving.eu wrote:
The other case I'm testing has five necessary namespaces. :(
10000: 6462ms 20000: 7592ms 30000: 8689ms 40000: 9417ms 50000: 9566ms 60000: 10368ms 70000: 10963ms 80000: 12167ms
Is there any direction you can suggest to look for a workaround?
On Tue, Sep 23, 2014 at 1:43 PM, Christian Grün <
christian.gruen@gmail.com>
wrote:
> This namespace happens to be unnecessary, but others won't be.
I'm so
> curious how this can be the thing.
Unfortunately, the intricacies of namespaces have been keeping us XML implementers busy for a long time, and the XPath and storage algorithms would be much simpler, if not trivial, without the notion of namespaces. This is why it would take quite a while to explain
what
are the reasons for that, and as your input document only contains
one
namespaces, I'm not surprised that you are surprised ;) To put it in
a
nutshell: it's usually easy to optimize single namespaces issues, but it's difficult to optimize all cases that happen in practice.
But I'll keep track of your use case.
On Tue, Sep 23, 2014 at 1:30 PM, Gerald de Jong gerald@delving.eu
wrote:
> > On Tue, Sep 23, 2014 at 1:20 PM, Gerald de Jong <gerald@delving.eu
> wrote: >> >> WOW, really... the namespace? Because it's unused, or is it always >> going >> to slow when there are namespaces? >> >> On Tue, Sep 23, 2014 at 1:13 PM, Christian Grün >> christian.gruen@gmail.com wrote: >>> >>> Thanks for the document. The declaration of the (unused)
namespace in
>>> the root element seems to be the cause for the decreasing
performance
>>> (I noticed that the time for adding documents stays constant
after
>>> removing the declaration). I'll do some profiling in order to
find out
>>> if this can be sped up without too much effort (it may take a
while,
>>> though, because I'll be on leave for a while from tomorrow). >>> >>> >>> On Tue, Sep 23, 2014 at 12:25 PM, Gerald de Jong <
gerald@delving.eu>
>>> wrote: >>> > I don't know what causes the gradual slowdown. My assumption
was
>>> > that >>> > it >>> > was the "optimize" which would cause the index to be built, so
I
>>> > didn't >>> > expect a slowdown at all during "add" calls, especially when >>> > autoflush >>> > is >>> > false. >>> > >>> > I add documents with the following paths: >>> > >>> > /f/f/e/ffe0f5be2aa14e81050f759c8f9c3eb7.xml >>> > >>> > The xml file name is a hash of the contents, and it is placed
in a
>>> > path >>> > such >>> > that the export spreads out the files nicely into a file system >>> > tree, >>> > rather >>> > than putting a million docs into one directory. >>> > >>> > The document content is nothing special, wrapped in a special
tag:
>>> > >>> > <narthex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >>> > id="20412518" >>> > mod="2014-09-23T11:11:51.007+02:00"> >>> > <record> >>> > <priref>20412518</priref> >>> > <current_location>FTA</current_location> >>> > <current_location.type/> >>> > <description>Ingang op de binnenplaats van de >>> > zuidvleugel</description> >>> > <collection>Fotocollectie</collection> >>> > <production.date.start>1925-08-06</production.date.start> >>> > <reproduction.format/> >>> > >>> > >>> > >>> >
<reproduction.reference>2186abf4-7108-f9b8-ffbb-902881afe836</reproduction.reference>
>>> > <creator.role>Fotograaf</creator.role> >>> > <object_number>9.387</object_number> >>> > <monument.label/> >>> > <monument.zipcode/> >>> > <monument.name>Kasteel Hoensbroek</monument.name> >>> > <monument.record_number>284330</monument.record_number> >>> > <reproduction.date/> >>> > <reproduction.notes>Oude filepath: >>> > 0009\009387.jpg</reproduction.notes> >>> > <reproduction.type/> >>> > <reproduction.creator/> >>> > <rights.type>Copyright</rights.type> >>> > <technique>Neg.zw</technique> >>> > <creator>Scheepens, W.C.L.A.</creator> >>> > <order_number>avh04-2008</order_number> >>> > <input.date>2008-04-01</input.date> >>> > <edit.date>2011-05-03</edit.date> >>> > <edit.date>2008-04-28</edit.date> >>> > <monument.historical_address/> >>> > <content.subject.type value="SUBJECT" option="SUBJECT"> >>> > <text language="0">subject</text> >>> > <text language="1">onderwerp</text> >>> > <text language="2">sujet</text> >>> > <text language="3">Thema</text> >>> > <text language="4">موضوع</text> >>> > <text language="6">θέμα</text> >>> > </content.subject.type> >>> > <content.subject.type value="SUBJECT" option="SUBJECT"> >>> > <text language="0">subject</text> >>> > <text language="1">onderwerp</text> >>> > <text language="2">sujet</text> >>> > <text language="3">Thema</text> >>> > <text language="4">موضوع</text> >>> > <text language="6">θέμα</text> >>> > </content.subject.type> >>> > <content.subject>Kasteel</content.subject> >>> > <content.subject>Binnenplaats</content.subject> >>> > <monument.province>Limburg</monument.province> >>> > <monument.place>Hoensbroek</monument.place> >>> > <monument.number/> >>> > <monument.county/> >>> > <monument.country>Nederland</monument.country> >>> > <monument.house_number>18</monument.house_number> >>> > <monument.street>Klinkertstraat</monument.street> >>> > <monument.house_number.addition/> >>> > <monument.complex_number/> >>> > <monument.number.x_coordinates/> >>> > <monument.number.y_coordinates/> >>> > <monument.geographical_keyword/> >>> > <monument.complex_number.x_coordinates/> >>> > <monument.complex_number.y_coordinates/> >>> > <creator.date_of_birth/> >>> > <creator.date_of_death/> >>> > <input.name>a.vanhoute</input.name> >>> > <edit.name>RCEadmin</edit.name> >>> > <edit.name>a.vanhoute</edit.name> >>> > <creator.history/> >>> > <record_type value="OBJECT" option="OBJECT"> >>> > <text language="0">single object</text> >>> > <text language="2">objet individuel</text> >>> > <text language="3">Einzelnes Objekt</text> >>> > </record_type> >>> > <edit.time>03:10:32</edit.time> >>> > <edit.time>11:17:08</edit.time> >>> > <input.time>09:58:28</input.time> >>> > <input.source>document>photographs</input.source> >>> > <edit.source>collect>photograph</edit.source> >>> > <edit.source>document>photographs</edit.source> >>> > </record> >>> > </narthex> >>> > >>> > On Tue, Sep 23, 2014 at 11:36 AM, Christian Grün >>> > christian.gruen@gmail.com >>> > wrote: >>> >> >>> >> > I set up to use the 8.0-SNAPSHOT and used the internal
parser as
>>> >> > well. >>> >> > In >>> >> > your example you're not really giving much of a challenge
to the
>>> >> > index, >>> >> > since every doc is just <a/>. >>> >> >>> >> If I get it right, you assume the slowdown is due to the index >>> >> structures? >>> >> >>> >> > With respect to ADD, I'm not seeing a significant
performance
>>> >> > difference: >>> >> >>> >> Please give us more info on the data you are adding. Could you >>> >> provide >>> >> us with a sample document? >>> >> >>> >> >>> >> > 8.0-SNAPSHOT >>> >> > ------- >>> >> > 10000: 9250ms >>> >> > 20000: 7626ms >>> >> > 30000: 7885ms >>> >> > 40000: 8111ms >>> >> > 50000: 8365ms >>> >> > 60000: 8784ms >>> >> > 70000: 9270ms >>> >> > 80000: 9692ms >>> >> > 90000: 10158ms >>> >> > 100000: 10612ms >>> >> > 110000: 11018ms >>> >> > 120000: 11478ms >>> >> > 130000: 11940ms >>> >> > 140000: 12505ms >>> >> > 150000: 13047ms >>> >> > 160000: 13536ms >>> >> > 170000: 14055ms >>> >> > 180000: 14371ms >>> >> > 190000: 14883ms >>> >> > 200000: 15330ms >>> >> > 210000: 15888ms >>> >> > 220000: 16398ms >>> >> > 230000: 16878ms >>> >> > 240000: 17038ms >>> >> > 250000: 17453ms >>> >> > 260000: 17965ms >>> >> > 270000: 18317ms >>> >> > 280000: 18832ms >>> >> > 290000: 19373ms >>> >> > 300000: 19735ms >>> >> > 310000: 20062ms >>> >> > 320000: 20675ms >>> >> > 330000: 21113ms >>> >> > 340000: 21754ms >>> >> > 350000: 22887ms >>> >> > 360000: 22810ms >>> >> > 370000: 22985ms >>> >> > 380000: 23506ms >>> >> > 390000: 23856ms >>> >> > 400000: 24338ms >>> >> > >>> >> > 7.9 >>> >> > ----- >>> >> > 10000: 8229ms >>> >> > 20000: 7587ms >>> >> > 30000: 7973ms >>> >> > 40000: 8282ms >>> >> > 50000: 8717ms >>> >> > 60000: 9294ms >>> >> > 70000: 10105ms >>> >> > 80000: 10669ms >>> >> > 90000: 11301ms >>> >> > 100000: 11835ms >>> >> > 110000: 12413ms >>> >> > 120000: 13000ms >>> >> > 130000: 13577ms >>> >> > 140000: 14331ms >>> >> > 150000: 14488ms >>> >> > 160000: 15025ms >>> >> > 170000: 15463ms >>> >> > 180000: 15815ms >>> >> > 190000: 16153ms >>> >> > 200000: 16314ms >>> >> > 210000: 16562ms >>> >> > 220000: 17186ms >>> >> > 230000: 17862ms >>> >> > 240000: 18340ms >>> >> > 250000: 18790ms >>> >> > 260000: 19313ms >>> >> > 270000: 19850ms >>> >> > 280000: 20225ms >>> >> > 290000: 20650ms >>> >> > 300000: 21062ms >>> >> > 310000: 21595ms >>> >> > 320000: 22022ms >>> >> > 330000: 22414ms >>> >> > 340000: 22925ms >>> >> > 350000: 23514ms >>> >> > 360000: 23762ms >>> >> > 370000: 24360ms >>> >> > 380000: 25028ms >>> >> > 390000: 25446ms >>> >> > 400000: 25700ms >>> >> > >>> >> > - Gerald de Jong >>> >> > >>> >> > >>> >> > On Thu, Sep 18, 2014 at 6:57 PM, Christian Grün >>> >> > christian.gruen@gmail.com >>> >> > wrote: >>> >> >> >>> >> >> > Perhaps you can give me a hint as to why inserts slow
down.j
>>> >> >> I didn't have time to check out 7.9, but I have done some >>> >> >> testing >>> >> >> with >>> >> >> 8.0, and I didn't notice a real slow-down. This is Java
testing
>>> >> >> script >>> >> >> (1 mio documents are added in just 17 seconds; I'm using
the
>>> >> >> internal >>> >> >> BaseX parser to speed up the import): >>> >> >> >>> >> >> Performance p = new Performance(); >>> >> >> Context ctx = new Context(); >>> >> >> >>> >> >> new CreateDB("db").execute(ctx); >>> >> >> new Set(MainOptions.AUTOFLUSH, false).execute(ctx); >>> >> >> new Set(MainOptions.INTPARSE, true).execute(ctx); >>> >> >> for(int i = 0; i < 1000000; i++) { >>> >> >> new Add("db", "<a/>").execute(ctx); >>> >> >> } >>> >> >> ctx.close(); >>> >> >> System.out.println(p); >>> >> >> >>> >> >> Hope this helps, >>> >> >> Christian >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > -- >>> >> > Delving BV, Vasteland 8, Rotterdam >>> >> > http://www.delving.eu >>> >> > http://twitter.com/fluxe >>> >> > skype: beautifulcode >>> >> > +31629339805 >>> > >>> > >>> > >>> > >>> > -- >>> > Delving BV, Vasteland 8, Rotterdam >>> > http://www.delving.eu >>> > http://twitter.com/fluxe >>> > skype: beautifulcode >>> > +31629339805 >> >> >> >> >> -- >> Delving BV, Vasteland 8, Rotterdam >> http://www.delving.eu >> http://twitter.com/fluxe >> skype: beautifulcode >> +31629339805 > > > > > -- > Delving BV, Vasteland 8, Rotterdam > http://www.delving.eu > http://twitter.com/fluxe > skype: beautifulcode > +31629339805
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
Hi all,
Yes there can be up to 2^31 nodes, up to 2^29 files (around 536 millions)
http://docs.basex.org/wiki/Statistics
But whatever the strategy – document versus xml – you will encounter the same limitation on the number of nodes.
In my experience, document strategy is close to nosql document data stores like couchbase or mongodb.
If you update your collection per document, you can use the replace command instead of xquery update and get free of pending update list limitations.
Christian, from what I read in the last exchanges, the document index is now a persistent data structure ? Could you tell us if document paths are indexed and if this index is incremental or has to be rebuilt with the optimize command ?
If so, using document strategy could be a real benefit because you do not have to reindex the attribute or text index in order to update an entire document ‘s content. (if you store your documents in a single big document, you have to maintain metadata in each root element in order to access them directly, and so you have to reindex after each update query)
Here is my use case : 80 million documents partitioned in a few collections, and about 400 000 documents inserted/replaced each week. because of the previous limitation of the document list, I had to use the xquery update strategy, aggregating documents in big documents. I can say that finally I spend more time updating than reindexing, Because I have to update all the sub-documents of each collection at once in order to use the indexes.
The new document data structure is very good news !
Best regards, Fabrice Etanchaud Questel/Orbit
De : basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.de] De la part de Marco Lettere Envoyé : mardi 23 septembre 2014 15:40 À : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] Adding documents slows over time
Hi Gerald, not sure but take into account that, AFAIK, there are limitations on the size (number of nodes) that can be kept in a single DB. M.
On 23/09/2014 15:32, Gerald de Jong wrote: A philosophical question, perhaps, or one that might be easily answered by someone with a lot more BaseX experience than me:
Would it make more sense to store one big "file" in BaseX corresponding to the, say, 1.2 million records, rather than storing 1.2 million cleverly named xml documents as i'm doing now? I suppose add would then become insert (after - for speed), but would that maybe overcome the namespace-related performance issue and even be faster in general?
On Tue, Sep 23, 2014 at 2:05 PM, Gerald de Jong <gerald@delving.eumailto:gerald@delving.eu> wrote: Considering that the dataset I just mentioned involves 1.2 million add commands, it does become a bit of annoyance with some large datasets like this. We can have some patience for insertion, even with such a slowdown, so I wouldn't say bottleneck exactly.
Can you point me to an example of querying multiple databases? I could try splitting the big datasets up.
The big problem I have right now is the IllegalMonitorStateException that freezes the basexserver. After this happens I have to kill -9 the process even.
On Tue, Sep 23, 2014 at 1:55 PM, Christian Grün <christian.gruen@gmail.commailto:christian.gruen@gmail.com> wrote: Maybe a general question: Is the insertion really a bottleneck in your scenario? How many data do you want to store in a single database? You could e.g. store your data in multiple databases, which can then all be queried by a single XQuery expression.
On Tue, Sep 23, 2014 at 1:50 PM, Gerald de Jong <gerald@delving.eumailto:gerald@delving.eu> wrote:
The other case I'm testing has five necessary namespaces. :(
10000: 6462ms 20000: 7592ms 30000: 8689ms 40000: 9417ms 50000: 9566ms 60000: 10368ms 70000: 10963ms 80000: 12167ms
Is there any direction you can suggest to look for a workaround?
On Tue, Sep 23, 2014 at 1:43 PM, Christian Grün <christian.gruen@gmail.commailto:christian.gruen@gmail.com> wrote:
This namespace happens to be unnecessary, but others won't be. I'm so curious how this can be the thing.
Unfortunately, the intricacies of namespaces have been keeping us XML implementers busy for a long time, and the XPath and storage algorithms would be much simpler, if not trivial, without the notion of namespaces. This is why it would take quite a while to explain what are the reasons for that, and as your input document only contains one namespaces, I'm not surprised that you are surprised ;) To put it in a nutshell: it's usually easy to optimize single namespaces issues, but it's difficult to optimize all cases that happen in practice.
But I'll keep track of your use case.
On Tue, Sep 23, 2014 at 1:30 PM, Gerald de Jong <gerald@delving.eumailto:gerald@delving.eu> wrote:
On Tue, Sep 23, 2014 at 1:20 PM, Gerald de Jong <gerald@delving.eumailto:gerald@delving.eu> wrote:
WOW, really... the namespace? Because it's unused, or is it always going to slow when there are namespaces?
On Tue, Sep 23, 2014 at 1:13 PM, Christian Grün <christian.gruen@gmail.commailto:christian.gruen@gmail.com> wrote:
Thanks for the document. The declaration of the (unused) namespace in the root element seems to be the cause for the decreasing performance (I noticed that the time for adding documents stays constant after removing the declaration). I'll do some profiling in order to find out if this can be sped up without too much effort (it may take a while, though, because I'll be on leave for a while from tomorrow).
On Tue, Sep 23, 2014 at 12:25 PM, Gerald de Jong <gerald@delving.eumailto:gerald@delving.eu> wrote:
I don't know what causes the gradual slowdown. My assumption was that it was the "optimize" which would cause the index to be built, so I didn't expect a slowdown at all during "add" calls, especially when autoflush is false.
I add documents with the following paths:
/f/f/e/ffe0f5be2aa14e81050f759c8f9c3eb7.xml
The xml file name is a hash of the contents, and it is placed in a path such that the export spreads out the files nicely into a file system tree, rather than putting a million docs into one directory.
The document content is nothing special, wrapped in a special tag:
<narthex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" id="20412518" mod="2014-09-23T11:11:51.007+02:00">
<record> <priref>20412518</priref> <current_location>FTA</current_location> <current_location.type/> <description>Ingang op de binnenplaats van de zuidvleugel</description> <collection>Fotocollectie</collection> <production.date.start>1925-08-06</production.date.start> <reproduction.format/>
<reproduction.reference>2186abf4-7108-f9b8-ffbb-902881afe836</reproduction.reference> <creator.role>Fotograaf</creator.role> <object_number>9.387</object_number> <monument.label/> <monument.zipcode/> <monument.namehttp://monument.name>Kasteel Hoensbroek</monument.namehttp://monument.name> <monument.record_number>284330</monument.record_number> <reproduction.date/> <reproduction.notes>Oude filepath: 0009\009387.jpg</reproduction.notes> <reproduction.type/> <reproduction.creator/> <rights.type>Copyright</rights.type> <technique>Neg.zw</technique> <creator>Scheepens, W.C.L.A.</creator> <order_number>avh04-2008</order_number> <input.date>2008-04-01</input.date> <edit.date>2011-05-03</edit.date> <edit.date>2008-04-28</edit.date> <monument.historical_address/> <content.subject.type value="SUBJECT" option="SUBJECT"> <text language="0">subject</text> <text language="1">onderwerp</text> <text language="2">sujet</text> <text language="3">Thema</text> <text language="4">موضوع</text> <text language="6">θέμα</text> </content.subject.type> <content.subject.type value="SUBJECT" option="SUBJECT"> <text language="0">subject</text> <text language="1">onderwerp</text> <text language="2">sujet</text> <text language="3">Thema</text> <text language="4">موضوع</text> <text language="6">θέμα</text> </content.subject.type> <content.subject>Kasteel</content.subject> <content.subject>Binnenplaats</content.subject> <monument.province>Limburg</monument.province> <monument.place>Hoensbroek</monument.place> <monument.number/> <monument.county/> <monument.country>Nederland</monument.country> <monument.house_number>18</monument.house_number> <monument.street>Klinkertstraat</monument.street> <monument.house_number.addition/> <monument.complex_number/> <monument.number.x_coordinates/> <monument.number.y_coordinates/> <monument.geographical_keyword/> <monument.complex_number.x_coordinates/> <monument.complex_number.y_coordinates/> <creator.date_of_birth/> <creator.date_of_death/> <input.namehttp://input.name>a.vanhoute</input.namehttp://input.name> <edit.namehttp://edit.name>RCEadmin</edit.namehttp://edit.name> <edit.namehttp://edit.name>a.vanhoute</edit.namehttp://edit.name> <creator.history/> <record_type value="OBJECT" option="OBJECT"> <text language="0">single object</text> <text language="2">objet individuel</text> <text language="3">Einzelnes Objekt</text> </record_type> <edit.time>03:10:32</edit.time> <edit.time>11:17:08</edit.time> <input.time>09:58:28</input.time> <input.source>document>photographs</input.source> <edit.source>collect>photograph</edit.source> <edit.source>document>photographs</edit.source>
</record> </narthex>
On Tue, Sep 23, 2014 at 11:36 AM, Christian Grün <christian.gruen@gmail.commailto:christian.gruen@gmail.com> wrote: > > > I set up to use the 8.0-SNAPSHOT and used the internal parser as > > well. > > In > > your example you're not really giving much of a challenge to the > > index, > > since every doc is just <a/>. > > If I get it right, you assume the slowdown is due to the index > structures? > > > With respect to ADD, I'm not seeing a significant performance > > difference: > > Please give us more info on the data you are adding. Could you > provide > us with a sample document? > > > > 8.0-SNAPSHOT > > ------- > > 10000: 9250ms > > 20000: 7626ms > > 30000: 7885ms > > 40000: 8111ms > > 50000: 8365ms > > 60000: 8784ms > > 70000: 9270ms > > 80000: 9692ms > > 90000: 10158ms > > 100000: 10612ms > > 110000: 11018ms > > 120000: 11478ms > > 130000: 11940ms > > 140000: 12505ms > > 150000: 13047ms > > 160000: 13536ms > > 170000: 14055ms > > 180000: 14371ms > > 190000: 14883ms > > 200000: 15330ms > > 210000: 15888ms > > 220000: 16398ms > > 230000: 16878ms > > 240000: 17038ms > > 250000: 17453ms > > 260000: 17965ms > > 270000: 18317ms > > 280000: 18832ms > > 290000: 19373ms > > 300000: 19735ms > > 310000: 20062ms > > 320000: 20675ms > > 330000: 21113ms > > 340000: 21754ms > > 350000: 22887ms > > 360000: 22810ms > > 370000: 22985ms > > 380000: 23506ms > > 390000: 23856ms > > 400000: 24338ms > > > > 7.9 > > ----- > > 10000: 8229ms > > 20000: 7587ms > > 30000: 7973ms > > 40000: 8282ms > > 50000: 8717ms > > 60000: 9294ms > > 70000: 10105ms > > 80000: 10669ms > > 90000: 11301ms > > 100000: 11835ms > > 110000: 12413ms > > 120000: 13000ms > > 130000: 13577ms > > 140000: 14331ms > > 150000: 14488ms > > 160000: 15025ms > > 170000: 15463ms > > 180000: 15815ms > > 190000: 16153ms > > 200000: 16314ms > > 210000: 16562ms > > 220000: 17186ms > > 230000: 17862ms > > 240000: 18340ms > > 250000: 18790ms > > 260000: 19313ms > > 270000: 19850ms > > 280000: 20225ms > > 290000: 20650ms > > 300000: 21062ms > > 310000: 21595ms > > 320000: 22022ms > > 330000: 22414ms > > 340000: 22925ms > > 350000: 23514ms > > 360000: 23762ms > > 370000: 24360ms > > 380000: 25028ms > > 390000: 25446ms > > 400000: 25700ms > > > > - Gerald de Jong > > > > > > On Thu, Sep 18, 2014 at 6:57 PM, Christian Grün > > <christian.gruen@gmail.commailto:christian.gruen@gmail.com> > > wrote: > >> > >> > Perhaps you can give me a hint as to why inserts slow down.j > >> I didn't have time to check out 7.9, but I have done some > >> testing > >> with > >> 8.0, and I didn't notice a real slow-down. This is Java testing > >> script > >> (1 mio documents are added in just 17 seconds; I'm using the > >> internal > >> BaseX parser to speed up the import): > >> > >> Performance p = new Performance(); > >> Context ctx = new Context(); > >> > >> new CreateDB("db").execute(ctx); > >> new Set(MainOptions.AUTOFLUSH, false).execute(ctx); > >> new Set(MainOptions.INTPARSE, true).execute(ctx); > >> for(int i = 0; i < 1000000; i++) { > >> new Add("db", "<a/>").execute(ctx); > >> } > >> ctx.close(); > >> System.out.println(p); > >> > >> Hope this helps, > >> Christian > > > > > > > > > > -- > > Delving BV, Vasteland 8, Rotterdam > > http://www.delving.eu > > http://twitter.com/fluxe > > skype: beautifulcode > > +31629339805tel:%2B31629339805
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805tel:%2B31629339805
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805tel:%2B31629339805
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805tel:%2B31629339805
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805tel:%2B31629339805
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805tel:%2B31629339805
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
Hi Fabrice,
If you update your collection per document, you can use the replace command instead of xquery update and get free of pending update list limitations.
I would be interested what limitations you have observed so far?
Christian, from what I read in the last exchanges, the document index is now a persistent data structure ?
Exactly. After it has been requested for the first time, it will additionally stored on disk and updated incrementally. I would be interested to have your feedback on the latest snapshot.
Christian
-----Message d'origine----- De : Fabrice Etanchaud Envoyé : mardi 23 septembre 2014 18:00 À : 'Christian Grün' Objet : RE: [basex-talk] Adding documents slows over time
Dear Christian,
In our old tests, we found that in a collection with several millions documents, opening that collection, or replacing a document was very very long.
In latest snapshot, could you tell us how to use the index on the document names ? Given 10 000 000 documents named $i.xml containing <xml>{$i}</xml> We found that text index is 470x faster than documents' one :
Compiling: - pre-evaluating (7000001 to 7001000) Query: for $i in 7000001 to 7001000 return db:open('docs', xs:string($i) || '.xml') Optimized Query: for $i_0 in (7000001 to 7001000) return db:open("docs", fn:concat($i_0 cast as xs:string, ".xml")) Result: - Hit(s): 1000 Items - Updated: 0 Items - Printed: 19500 Bytes - Read Locking: local [docs] - Write Locking: none Timing: - Parsing: 0.91 ms - Compiling: 0.24 ms - Evaluating: 68514.39 ms - Printing: 1.61 ms - Total Time: 68517.16 ms
Compiling: - pre-evaluating (7000001 to 7001000) Query: for $i in 7000001 to 7001000 return db:text('docs', xs:string($i))/root() Optimized Query: for $i_0 in (7000001 to 7001000) return db:text("docs", $i_0 cast as xs:string)/fn:root() Result: - Hit(s): 1000 Items - Updated: 0 Items - Printed: 19500 Bytes - Read Locking: local [docs] - Write Locking: none Timing: - Parsing: 2.62 ms - Compiling: 0.23 ms - Evaluating: 143.72 ms - Printing: 1.59 ms - Total Time: 148.16 ms
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : mardi 23 septembre 2014 16:34 À : Fabrice Etanchaud Cc : Marco Lettere; basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] Adding documents slows over time
Hi Fabrice,
If you update your collection per document, you can use the replace command instead of xquery update and get free of pending update list limitations.
I would be interested what limitations you have observed so far?
Christian, from what I read in the last exchanges, the document index is now a persistent data structure ?
Exactly. After it has been requested for the first time, it will additionally stored on disk and updated incrementally. I would be interested to have your feedback on the latest snapshot.
Christian
In latest snapshot, could you tell us how to use the index on the document names ?
The index should be created automatically after having run your first path-based query; subsequent queries should give you better results.
Given 10 000 000 documents named $i.xml containing <xml>{$i}</xml> We found that text index is 470x faster than documents' one :
Compiling:
- pre-evaluating (7000001 to 7001000)
Query: for $i in 7000001 to 7001000 return db:open('docs', xs:string($i) || '.xml') Optimized Query: for $i_0 in (7000001 to 7001000) return db:open("docs", fn:concat($i_0 cast as xs:string, ".xml")) Result:
- Hit(s): 1000 Items
- Updated: 0 Items
- Printed: 19500 Bytes
- Read Locking: local [docs]
- Write Locking: none
Timing:
- Parsing: 0.91 ms
- Compiling: 0.24 ms
- Evaluating: 68514.39 ms
- Printing: 1.61 ms
- Total Time: 68517.16 ms
Compiling:
- pre-evaluating (7000001 to 7001000)
Query: for $i in 7000001 to 7001000 return db:text('docs', xs:string($i))/root() Optimized Query: for $i_0 in (7000001 to 7001000) return db:text("docs", $i_0 cast as xs:string)/fn:root() Result:
- Hit(s): 1000 Items
- Updated: 0 Items
- Printed: 19500 Bytes
- Read Locking: local [docs]
- Write Locking: none
Timing:
- Parsing: 2.62 ms
- Compiling: 0.23 ms
- Evaluating: 143.72 ms
- Printing: 1.59 ms
- Total Time: 148.16 ms
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : mardi 23 septembre 2014 16:34 À : Fabrice Etanchaud Cc : Marco Lettere; basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] Adding documents slows over time
Hi Fabrice,
If you update your collection per document, you can use the replace command instead of xquery update and get free of pending update list limitations.
I would be interested what limitations you have observed so far?
Christian, from what I read in the last exchanges, the document index is now a persistent data structure ?
Exactly. After it has been requested for the first time, it will additionally stored on disk and updated incrementally. I would be interested to have your feedback on the latest snapshot.
Christian
Dear Christian, By path based query, do you mean db:open or collection calls ? These requests are very slow, it's like documents' list was not indexed at all.
Best regards,
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : mardi 23 septembre 2014 18:03 À : Fabrice Etanchaud Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] TR: Adding documents slows over time
In latest snapshot, could you tell us how to use the index on the document names ?
The index should be created automatically after having run your first path-based query; subsequent queries should give you better results.
Given 10 000 000 documents named $i.xml containing <xml>{$i}</xml> We found that text index is 470x faster than documents' one :
Compiling:
- pre-evaluating (7000001 to 7001000)
Query: for $i in 7000001 to 7001000 return db:open('docs', xs:string($i) || '.xml') Optimized Query: for $i_0 in (7000001 to 7001000) return db:open("docs", fn:concat($i_0 cast as xs:string, ".xml")) Result:
- Hit(s): 1000 Items
- Updated: 0 Items
- Printed: 19500 Bytes
- Read Locking: local [docs]
- Write Locking: none
Timing:
- Parsing: 0.91 ms
- Compiling: 0.24 ms
- Evaluating: 68514.39 ms
- Printing: 1.61 ms
- Total Time: 68517.16 ms
Compiling:
- pre-evaluating (7000001 to 7001000)
Query: for $i in 7000001 to 7001000 return db:text('docs', xs:string($i))/root() Optimized Query: for $i_0 in (7000001 to 7001000) return db:text("docs", $i_0 cast as xs:string)/fn:root() Result:
- Hit(s): 1000 Items
- Updated: 0 Items
- Printed: 19500 Bytes
- Read Locking: local [docs]
- Write Locking: none
Timing:
- Parsing: 2.62 ms
- Compiling: 0.23 ms
- Evaluating: 143.72 ms
- Printing: 1.59 ms
- Total Time: 148.16 ms
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : mardi 23 septembre 2014 16:34 À : Fabrice Etanchaud Cc : Marco Lettere; basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] Adding documents slows over time
Hi Fabrice,
If you update your collection per document, you can use the replace command instead of xquery update and get free of pending update list limitations.
I would be interested what limitations you have observed so far?
Christian, from what I read in the last exchanges, the document index is now a persistent data structure ?
Exactly. After it has been requested for the first time, it will additionally stored on disk and updated incrementally. I would be interested to have your feedback on the latest snapshot.
Christian
By path based query, do you mean db:open or collection calls ?
It should apply on both.
These requests are very slow, it's like documents' list was not indexed at all.
Have you already tried 8.0? If yes, you should find a "doc.basex" file in your database directory after running your query.
Gerald,
I'm glad to tell that the latest snapshot [1] contains some additional optimizations for adding documents with namespaces. It should now be irrelevant if your added document has a namespace on top or not.
I'll be offline for some days (and I hope I didn't introduce a bad bug with the latest commit ;).
Have fun, Christian
[1] http://files.basex.org/releases/latest
On Tue, Sep 23, 2014 at 1:43 PM, Christian Grün christian.gruen@gmail.com wrote:
This namespace happens to be unnecessary, but others won't be. I'm so curious how this can be the thing.
Unfortunately, the intricacies of namespaces have been keeping us XML implementers busy for a long time, and the XPath and storage algorithms would be much simpler, if not trivial, without the notion of namespaces. This is why it would take quite a while to explain what are the reasons for that, and as your input document only contains one namespaces, I'm not surprised that you are surprised ;) To put it in a nutshell: it's usually easy to optimize single namespaces issues, but it's difficult to optimize all cases that happen in practice.
But I'll keep track of your use case.
On Tue, Sep 23, 2014 at 1:30 PM, Gerald de Jong gerald@delving.eu wrote:
On Tue, Sep 23, 2014 at 1:20 PM, Gerald de Jong gerald@delving.eu wrote:
WOW, really... the namespace? Because it's unused, or is it always going to slow when there are namespaces?
On Tue, Sep 23, 2014 at 1:13 PM, Christian Grün christian.gruen@gmail.com wrote:
Thanks for the document. The declaration of the (unused) namespace in the root element seems to be the cause for the decreasing performance (I noticed that the time for adding documents stays constant after removing the declaration). I'll do some profiling in order to find out if this can be sped up without too much effort (it may take a while, though, because I'll be on leave for a while from tomorrow).
On Tue, Sep 23, 2014 at 12:25 PM, Gerald de Jong gerald@delving.eu wrote:
I don't know what causes the gradual slowdown. My assumption was that it was the "optimize" which would cause the index to be built, so I didn't expect a slowdown at all during "add" calls, especially when autoflush is false.
I add documents with the following paths:
/f/f/e/ffe0f5be2aa14e81050f759c8f9c3eb7.xml
The xml file name is a hash of the contents, and it is placed in a path such that the export spreads out the files nicely into a file system tree, rather than putting a million docs into one directory.
The document content is nothing special, wrapped in a special tag:
<narthex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" id="20412518" mod="2014-09-23T11:11:51.007+02:00">
<record> <priref>20412518</priref> <current_location>FTA</current_location> <current_location.type/> <description>Ingang op de binnenplaats van de zuidvleugel</description> <collection>Fotocollectie</collection> <production.date.start>1925-08-06</production.date.start> <reproduction.format/>
<reproduction.reference>2186abf4-7108-f9b8-ffbb-902881afe836</reproduction.reference> <creator.role>Fotograaf</creator.role> <object_number>9.387</object_number> <monument.label/> <monument.zipcode/> <monument.name>Kasteel Hoensbroek</monument.name> <monument.record_number>284330</monument.record_number> <reproduction.date/> <reproduction.notes>Oude filepath: 0009\009387.jpg</reproduction.notes> <reproduction.type/> <reproduction.creator/> <rights.type>Copyright</rights.type> <technique>Neg.zw</technique> <creator>Scheepens, W.C.L.A.</creator> <order_number>avh04-2008</order_number> <input.date>2008-04-01</input.date> <edit.date>2011-05-03</edit.date> <edit.date>2008-04-28</edit.date> <monument.historical_address/> <content.subject.type value="SUBJECT" option="SUBJECT"> <text language="0">subject</text> <text language="1">onderwerp</text> <text language="2">sujet</text> <text language="3">Thema</text> <text language="4">موضوع</text> <text language="6">θέμα</text> </content.subject.type> <content.subject.type value="SUBJECT" option="SUBJECT"> <text language="0">subject</text> <text language="1">onderwerp</text> <text language="2">sujet</text> <text language="3">Thema</text> <text language="4">موضوع</text> <text language="6">θέμα</text> </content.subject.type> <content.subject>Kasteel</content.subject> <content.subject>Binnenplaats</content.subject> <monument.province>Limburg</monument.province> <monument.place>Hoensbroek</monument.place> <monument.number/> <monument.county/> <monument.country>Nederland</monument.country> <monument.house_number>18</monument.house_number> <monument.street>Klinkertstraat</monument.street> <monument.house_number.addition/> <monument.complex_number/> <monument.number.x_coordinates/> <monument.number.y_coordinates/> <monument.geographical_keyword/> <monument.complex_number.x_coordinates/> <monument.complex_number.y_coordinates/> <creator.date_of_birth/> <creator.date_of_death/> <input.name>a.vanhoute</input.name> <edit.name>RCEadmin</edit.name> <edit.name>a.vanhoute</edit.name> <creator.history/> <record_type value="OBJECT" option="OBJECT"> <text language="0">single object</text> <text language="2">objet individuel</text> <text language="3">Einzelnes Objekt</text> </record_type> <edit.time>03:10:32</edit.time> <edit.time>11:17:08</edit.time> <input.time>09:58:28</input.time> <input.source>document>photographs</input.source> <edit.source>collect>photograph</edit.source> <edit.source>document>photographs</edit.source>
</record> </narthex>
On Tue, Sep 23, 2014 at 11:36 AM, Christian Grün christian.gruen@gmail.com wrote:
> I set up to use the 8.0-SNAPSHOT and used the internal parser as > well. > In > your example you're not really giving much of a challenge to the > index, > since every doc is just <a/>.
If I get it right, you assume the slowdown is due to the index structures?
> With respect to ADD, I'm not seeing a significant performance > difference:
Please give us more info on the data you are adding. Could you provide us with a sample document?
> 8.0-SNAPSHOT > ------- > 10000: 9250ms > 20000: 7626ms > 30000: 7885ms > 40000: 8111ms > 50000: 8365ms > 60000: 8784ms > 70000: 9270ms > 80000: 9692ms > 90000: 10158ms > 100000: 10612ms > 110000: 11018ms > 120000: 11478ms > 130000: 11940ms > 140000: 12505ms > 150000: 13047ms > 160000: 13536ms > 170000: 14055ms > 180000: 14371ms > 190000: 14883ms > 200000: 15330ms > 210000: 15888ms > 220000: 16398ms > 230000: 16878ms > 240000: 17038ms > 250000: 17453ms > 260000: 17965ms > 270000: 18317ms > 280000: 18832ms > 290000: 19373ms > 300000: 19735ms > 310000: 20062ms > 320000: 20675ms > 330000: 21113ms > 340000: 21754ms > 350000: 22887ms > 360000: 22810ms > 370000: 22985ms > 380000: 23506ms > 390000: 23856ms > 400000: 24338ms > > 7.9 > ----- > 10000: 8229ms > 20000: 7587ms > 30000: 7973ms > 40000: 8282ms > 50000: 8717ms > 60000: 9294ms > 70000: 10105ms > 80000: 10669ms > 90000: 11301ms > 100000: 11835ms > 110000: 12413ms > 120000: 13000ms > 130000: 13577ms > 140000: 14331ms > 150000: 14488ms > 160000: 15025ms > 170000: 15463ms > 180000: 15815ms > 190000: 16153ms > 200000: 16314ms > 210000: 16562ms > 220000: 17186ms > 230000: 17862ms > 240000: 18340ms > 250000: 18790ms > 260000: 19313ms > 270000: 19850ms > 280000: 20225ms > 290000: 20650ms > 300000: 21062ms > 310000: 21595ms > 320000: 22022ms > 330000: 22414ms > 340000: 22925ms > 350000: 23514ms > 360000: 23762ms > 370000: 24360ms > 380000: 25028ms > 390000: 25446ms > 400000: 25700ms > > - Gerald de Jong > > > On Thu, Sep 18, 2014 at 6:57 PM, Christian Grün > christian.gruen@gmail.com > wrote: >> >> > Perhaps you can give me a hint as to why inserts slow down.j >> I didn't have time to check out 7.9, but I have done some testing >> with >> 8.0, and I didn't notice a real slow-down. This is Java testing >> script >> (1 mio documents are added in just 17 seconds; I'm using the >> internal >> BaseX parser to speed up the import): >> >> Performance p = new Performance(); >> Context ctx = new Context(); >> >> new CreateDB("db").execute(ctx); >> new Set(MainOptions.AUTOFLUSH, false).execute(ctx); >> new Set(MainOptions.INTPARSE, true).execute(ctx); >> for(int i = 0; i < 1000000; i++) { >> new Add("db", "<a/>").execute(ctx); >> } >> ctx.close(); >> System.out.println(p); >> >> Hope this helps, >> Christian > > > > > -- > Delving BV, Vasteland 8, Rotterdam > http://www.delving.eu > http://twitter.com/fluxe > skype: beautifulcode > +31629339805
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
Wonderful, Christian! Thanks. I will try it out.
On Tue, Sep 23, 2014 at 6:34 PM, Christian Grün christian.gruen@gmail.com wrote:
Gerald,
I'm glad to tell that the latest snapshot [1] contains some additional optimizations for adding documents with namespaces. It should now be irrelevant if your added document has a namespace on top or not.
I'll be offline for some days (and I hope I didn't introduce a bad bug with the latest commit ;).
Have fun, Christian
[1] http://files.basex.org/releases/latest
On Tue, Sep 23, 2014 at 1:43 PM, Christian Grün christian.gruen@gmail.com wrote:
This namespace happens to be unnecessary, but others won't be. I'm so curious how this can be the thing.
Unfortunately, the intricacies of namespaces have been keeping us XML implementers busy for a long time, and the XPath and storage algorithms would be much simpler, if not trivial, without the notion of namespaces. This is why it would take quite a while to explain what are the reasons for that, and as your input document only contains one namespaces, I'm not surprised that you are surprised ;) To put it in a nutshell: it's usually easy to optimize single namespaces issues, but it's difficult to optimize all cases that happen in practice.
But I'll keep track of your use case.
On Tue, Sep 23, 2014 at 1:30 PM, Gerald de Jong gerald@delving.eu
wrote:
On Tue, Sep 23, 2014 at 1:20 PM, Gerald de Jong gerald@delving.eu
wrote:
WOW, really... the namespace? Because it's unused, or is it always
going
to slow when there are namespaces?
On Tue, Sep 23, 2014 at 1:13 PM, Christian Grün christian.gruen@gmail.com wrote:
Thanks for the document. The declaration of the (unused) namespace in the root element seems to be the cause for the decreasing performance (I noticed that the time for adding documents stays constant after removing the declaration). I'll do some profiling in order to find out if this can be sped up without too much effort (it may take a while, though, because I'll be on leave for a while from tomorrow).
On Tue, Sep 23, 2014 at 12:25 PM, Gerald de Jong gerald@delving.eu wrote:
I don't know what causes the gradual slowdown. My assumption was
that
it was the "optimize" which would cause the index to be built, so I
didn't
expect a slowdown at all during "add" calls, especially when
autoflush
is false.
I add documents with the following paths:
/f/f/e/ffe0f5be2aa14e81050f759c8f9c3eb7.xml
The xml file name is a hash of the contents, and it is placed in a
path
such that the export spreads out the files nicely into a file system
tree,
rather than putting a million docs into one directory.
The document content is nothing special, wrapped in a special tag:
<narthex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" id="20412518" mod="2014-09-23T11:11:51.007+02:00">
<record> <priref>20412518</priref> <current_location>FTA</current_location> <current_location.type/> <description>Ingang op de binnenplaats van de zuidvleugel</description> <collection>Fotocollectie</collection> <production.date.start>1925-08-06</production.date.start> <reproduction.format/>
<reproduction.reference>2186abf4-7108-f9b8-ffbb-902881afe836</reproduction.reference>
<creator.role>Fotograaf</creator.role> <object_number>9.387</object_number> <monument.label/> <monument.zipcode/> <monument.name>Kasteel Hoensbroek</monument.name> <monument.record_number>284330</monument.record_number> <reproduction.date/> <reproduction.notes>Oude filepath:
0009\009387.jpg</reproduction.notes> <reproduction.type/> <reproduction.creator/> <rights.type>Copyright</rights.type> <technique>Neg.zw</technique> <creator>Scheepens, W.C.L.A.</creator> <order_number>avh04-2008</order_number> <input.date>2008-04-01</input.date> <edit.date>2011-05-03</edit.date> <edit.date>2008-04-28</edit.date> <monument.historical_address/> <content.subject.type value="SUBJECT" option="SUBJECT"> <text language="0">subject</text> <text language="1">onderwerp</text> <text language="2">sujet</text> <text language="3">Thema</text> <text language="4">موضوع</text> <text language="6">θέμα</text> </content.subject.type> <content.subject.type value="SUBJECT" option="SUBJECT"> <text language="0">subject</text> <text language="1">onderwerp</text> <text language="2">sujet</text> <text language="3">Thema</text> <text language="4">موضوع</text> <text language="6">θέμα</text> </content.subject.type> <content.subject>Kasteel</content.subject> <content.subject>Binnenplaats</content.subject> <monument.province>Limburg</monument.province> <monument.place>Hoensbroek</monument.place> <monument.number/> <monument.county/> <monument.country>Nederland</monument.country> <monument.house_number>18</monument.house_number> <monument.street>Klinkertstraat</monument.street> <monument.house_number.addition/> <monument.complex_number/> <monument.number.x_coordinates/> <monument.number.y_coordinates/> <monument.geographical_keyword/> <monument.complex_number.x_coordinates/> <monument.complex_number.y_coordinates/> <creator.date_of_birth/> <creator.date_of_death/> <input.name>a.vanhoute</input.name> <edit.name>RCEadmin</edit.name> <edit.name>a.vanhoute</edit.name> <creator.history/> <record_type value="OBJECT" option="OBJECT"> <text language="0">single object</text> <text language="2">objet individuel</text> <text language="3">Einzelnes Objekt</text> </record_type> <edit.time>03:10:32</edit.time> <edit.time>11:17:08</edit.time> <input.time>09:58:28</input.time> <input.source>document>photographs</input.source> <edit.source>collect>photograph</edit.source> <edit.source>document>photographs</edit.source>
</record> </narthex>
On Tue, Sep 23, 2014 at 11:36 AM, Christian Grün christian.gruen@gmail.com wrote: > > > I set up to use the 8.0-SNAPSHOT and used the internal parser as > > well. > > In > > your example you're not really giving much of a challenge to the > > index, > > since every doc is just <a/>. > > If I get it right, you assume the slowdown is due to the index > structures? > > > With respect to ADD, I'm not seeing a significant performance > > difference: > > Please give us more info on the data you are adding. Could you
provide
> us with a sample document? > > > > 8.0-SNAPSHOT > > ------- > > 10000: 9250ms > > 20000: 7626ms > > 30000: 7885ms > > 40000: 8111ms > > 50000: 8365ms > > 60000: 8784ms > > 70000: 9270ms > > 80000: 9692ms > > 90000: 10158ms > > 100000: 10612ms > > 110000: 11018ms > > 120000: 11478ms > > 130000: 11940ms > > 140000: 12505ms > > 150000: 13047ms > > 160000: 13536ms > > 170000: 14055ms > > 180000: 14371ms > > 190000: 14883ms > > 200000: 15330ms > > 210000: 15888ms > > 220000: 16398ms > > 230000: 16878ms > > 240000: 17038ms > > 250000: 17453ms > > 260000: 17965ms > > 270000: 18317ms > > 280000: 18832ms > > 290000: 19373ms > > 300000: 19735ms > > 310000: 20062ms > > 320000: 20675ms > > 330000: 21113ms > > 340000: 21754ms > > 350000: 22887ms > > 360000: 22810ms > > 370000: 22985ms > > 380000: 23506ms > > 390000: 23856ms > > 400000: 24338ms > > > > 7.9 > > ----- > > 10000: 8229ms > > 20000: 7587ms > > 30000: 7973ms > > 40000: 8282ms > > 50000: 8717ms > > 60000: 9294ms > > 70000: 10105ms > > 80000: 10669ms > > 90000: 11301ms > > 100000: 11835ms > > 110000: 12413ms > > 120000: 13000ms > > 130000: 13577ms > > 140000: 14331ms > > 150000: 14488ms > > 160000: 15025ms > > 170000: 15463ms > > 180000: 15815ms > > 190000: 16153ms > > 200000: 16314ms > > 210000: 16562ms > > 220000: 17186ms > > 230000: 17862ms > > 240000: 18340ms > > 250000: 18790ms > > 260000: 19313ms > > 270000: 19850ms > > 280000: 20225ms > > 290000: 20650ms > > 300000: 21062ms > > 310000: 21595ms > > 320000: 22022ms > > 330000: 22414ms > > 340000: 22925ms > > 350000: 23514ms > > 360000: 23762ms > > 370000: 24360ms > > 380000: 25028ms > > 390000: 25446ms > > 400000: 25700ms > > > > - Gerald de Jong > > > > > > On Thu, Sep 18, 2014 at 6:57 PM, Christian Grün > > christian.gruen@gmail.com > > wrote: > >> > >> > Perhaps you can give me a hint as to why inserts slow down.j > >> I didn't have time to check out 7.9, but I have done some
testing
> >> with > >> 8.0, and I didn't notice a real slow-down. This is Java testing > >> script > >> (1 mio documents are added in just 17 seconds; I'm using the > >> internal > >> BaseX parser to speed up the import): > >> > >> Performance p = new Performance(); > >> Context ctx = new Context(); > >> > >> new CreateDB("db").execute(ctx); > >> new Set(MainOptions.AUTOFLUSH, false).execute(ctx); > >> new Set(MainOptions.INTPARSE, true).execute(ctx); > >> for(int i = 0; i < 1000000; i++) { > >> new Add("db", "<a/>").execute(ctx); > >> } > >> ctx.close(); > >> System.out.println(p); > >> > >> Hope this helps, > >> Christian > > > > > > > > > > -- > > Delving BV, Vasteland 8, Rotterdam > > http://www.delving.eu > > http://twitter.com/fluxe > > skype: beautifulcode > > +31629339805
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
basex-talk@mailman.uni-konstanz.de