Hello guys,
We came across the following error when we imported hundreds of thousands of xml documents.
>
Traceback (most recent call last): File "load.py", line 24, in <module> main(sys.argv) File "load.py", line 21, in main loadzip(session, path) File "load.py", line 12, in loadzip loadxml(session, info.filename, stream.read()) File "load.py", line 6, in loadxml session.replace(path, xml) File "/home/mwp/loader/BaseXClient.py", line 118, in replace self.sendInput(12, path, content) File "/home/mwp/loader/BaseXClient.py", line 205, in sendInput raise IOError(self.info()) IOError: Improper use? Potential bug? Your feedback is welcome: Contact: basex-talk@mailman.uni-konstanz.de Version: BaseX 7.2 Java: Sun Microsystems Inc., 1.6.0_23 OS: Linux, amd64 Stack Trace: java.lang.ArrayIndexOutOfBoundsException: 2147483647 org.basex.io.random.TableDiskAccess.cursor(TableDiskAccess.java:401) org.basex.io.random.TableDiskAccess.insert(TableDiskAccess.java:278) org.basex.data.Data.insert(Data.java:970) org.basex.data.Data.insert(Data.java:744) org.basex.core.cmd.Add.run(Add.java:125) org.basex.core.Command.run(Command.java:345) org.basex.core.Command.run(Command.java:115) org.basex.core.cmd.Replace.run(Replace.java:64) org.basex.core.Command.run(Command.java:345) org.basex.core.Command.exec(Command.java:324) org.basex.core.Command.execute(Command.java:76) org.basex.core.Command.execute(Command.java:88) org.basex.server.ClientListener.execute(ClientListener.java:368) org.basex.server.ClientListener.replace(ClientListener.java:347) org.basex.server.ClientListener.run(ClientListener.java:150)
>
Our documents have the same structure and most of them are successfully imported, so it's unlikely to be something wrong in data.
Can I ask you what this error actually indicates and what caused this?
Our data is under contract so it's difficult to provide it directly. But if you have any questions about data, don't hesitate to ask us.
Best regards, Kento
Hi Tarui
From the backtrace it looks, like BaseX and not python client got somehow
out of order. I would propose trying to execute the same action by basex command line client and see, if the problem occures too. What type of action are you trying to perform? Is it adding documents to the database? Is it possible to create another sample with some publicly usable data, which would show, what you are actually trying to do? (even, when it is not crashing).
With best regards
Jan
*Ing. Jan Vlčinský* TamTam Research s.r.o. Slunečnicová 338/3, 734 01 Karviná Ráj, Czech Republic tel: +420-597 602 024; mob: +420-608 979 040 skype: janvlcinsky; GoogleTalk: jan.vlcinsky@gmail.com http://cz.linkedin.com/in/vlcinsky
On 23 April 2012 06:32, Tarui, Kento ktarui@qualcomm.com wrote:
Hello guys,
We came across the following error when we imported hundreds of thousands of xml documents.
>>
Traceback (most recent call last): File "load.py", line 24, in <module> main(sys.argv) File "load.py", line 21, in main loadzip(session, path) File "load.py", line 12, in loadzip loadxml(session, info.filename, stream.read()) File "load.py", line 6, in loadxml session.replace(path, xml) File "/home/mwp/loader/BaseXClient.py", line 118, in replace self.sendInput(12, path, content) File "/home/mwp/loader/BaseXClient.py", line 205, in sendInput raise IOError(self.info()) IOError: Improper use? Potential bug? Your feedback is welcome: Contact: basex-talk@mailman.uni-konstanz.de Version: BaseX 7.2 Java: Sun Microsystems Inc., 1.6.0_23 OS: Linux, amd64 Stack Trace: java.lang.ArrayIndexOutOfBoundsException: 2147483647 org.basex.io.random.TableDiskAccess.cursor(TableDiskAccess.java:401) org.basex.io.random.TableDiskAccess.insert(TableDiskAccess.java:278) org.basex.data.Data.insert(Data.java:970) org.basex.data.Data.insert(Data.java:744) org.basex.core.cmd.Add.run(Add.java:125) org.basex.core.Command.run(Command.java:345) org.basex.core.Command.run(Command.java:115) org.basex.core.cmd.Replace.run(Replace.java:64) org.basex.core.Command.run(Command.java:345) org.basex.core.Command.exec(Command.java:324) org.basex.core.Command.execute(Command.java:76) org.basex.core.Command.execute(Command.java:88) org.basex.server.ClientListener.execute(ClientListener.java:368) org.basex.server.ClientListener.replace(ClientListener.java:347) org.basex.server.ClientListener.run(ClientListener.java:150)
>>
Our documents have the same structure and most of them are successfully imported, so it's unlikely to be something wrong in data.
Can I ask you what this error actually indicates and what caused this?
Our data is under contract so it's difficult to provide it directly. But if you have any questions about data, don't hesitate to ask us.
Best regards, Kento _______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Hi Jan,
Thank you very much for your reply. We’ve just tried to add a bulk of xml files to a database.
I’ll create some samples for your information. Beforehand, I show the script which raised the IOError.
>>
import BaseXClient import sys import zipfile
def loadxml(session, path, xml): session.replace(path, xml)
def loadzip(session, path): archive = zipfile.ZipFile(path) for info in archive.infolist(): stream = archive.open(info) loadxml(session, info.filename, stream.read()) stream.close() archive.close()
def main(argv): session = BaseXClient.Session('localhost', 1984, 'admin', 'admin') session.execute("open {0}".format(argv[1])) for path in argv[2:]: print path loadzip(session, path) session.close()
main(sys.argv)
>>
Thanks, Kento
From: jan.vlcinsky@gmail.com [mailto:jan.vlcinsky@gmail.com] On Behalf Of Jan Vl?insky (TamTam Research) Sent: Monday, April 23, 2012 4:55 PM To: Tarui, Kento Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] BaseXClient.py raised IOError
Hi Tarui From the backtrace it looks, like BaseX and not python client got somehow out of order. I would propose trying to execute the same action by basex command line client and see, if the problem occures too. What type of action are you trying to perform? Is it adding documents to the database? Is it possible to create another sample with some publicly usable data, which would show, what you are actually trying to do? (even, when it is not crashing).
With best regards
Jan
Ing. Jan Vlčinský TamTam Research s.r.o. Slunečnicová 338/3, 734 01 Karviná Ráj, Czech Republic tel: +420-597 602 024; mob: +420-608 979 040 skype: janvlcinsky; GoogleTalk: jan.vlcinsky@gmail.com http://cz.linkedin.com/in/vlcinsky
On 23 April 2012 06:32, Tarui, Kento ktarui@qualcomm.com wrote: Hello guys,
We came across the following error when we imported hundreds of thousands of xml documents.
>
Traceback (most recent call last): File "load.py", line 24, in <module> main(sys.argv) File "load.py", line 21, in main loadzip(session, path) File "load.py", line 12, in loadzip loadxml(session, info.filename, stream.read()) File "load.py", line 6, in loadxml session.replace(path, xml) File "/home/mwp/loader/BaseXClient.py", line 118, in replace self.sendInput(12, path, content) File "/home/mwp/loader/BaseXClient.py", line 205, in sendInput raise IOError(self.info()) IOError: Improper use? Potential bug? Your feedback is welcome: Contact: basex-talk@mailman.uni-konstanz.de Version: BaseX 7.2 Java: Sun Microsystems Inc., 1.6.0_23 OS: Linux, amd64 Stack Trace: java.lang.ArrayIndexOutOfBoundsException: 2147483647 org.basex.io.random.TableDiskAccess.cursor(TableDiskAccess.java:401) org.basex.io.random.TableDiskAccess.insert(TableDiskAccess.java:278) org.basex.data.Data.insert(Data.java:970) org.basex.data.Data.insert(Data.java:744) org.basex.core.cmd.Add.run(Add.java:125) org.basex.core.Command.run(Command.java:345) org.basex.core.Command.run(Command.java:115) org.basex.core.cmd.Replace.run(Replace.java:64) org.basex.core.Command.run(Command.java:345) org.basex.core.Command.exec(Command.java:324) org.basex.core.Command.execute(Command.java:76) org.basex.core.Command.execute(Command.java:88) org.basex.server.ClientListener.execute(ClientListener.java:368) org.basex.server.ClientListener.replace(ClientListener.java:347) org.basex.server.ClientListener.run(ClientListener.java:150)
>
Our documents have the same structure and most of them are successfully imported, so it's unlikely to be something wrong in data.
Can I ask you what this error actually indicates and what caused this?
Our data is under contract so it's difficult to provide it directly. But if you have any questions about data, don't hesitate to ask us.
Best regards, Kento _______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
I most often see ArrayIndexOutOfBoundsException when the same database is being accessed from two instances at once (for instance, both by the BaseX GUI and by a BaseX server instance). Is there any chance that could be the case here?
On 04/22/2012 11:32 PM, Tarui, Kento wrote:
java.lang.ArrayIndexOutOfBoundsException: 2147483647 org.basex.io.random.TableDiskAccess.cursor(TableDiskAccess.java:401) org.basex.io.random.TableDiskAccess.insert(TableDiskAccess.java:278) org.basex.data.Data.insert(Data.java:970) org.basex.data.Data.insert(Data.java:744) org.basex.core.cmd.Add.run(Add.java:125) org.basex.core.Command.run(Command.java:345) org.basex.core.Command.run(Command.java:115) org.basex.core.cmd.Replace.run(Replace.java:64) org.basex.core.Command.run(Command.java:345) org.basex.core.Command.exec(Command.java:324) org.basex.core.Command.execute(Command.java:76) org.basex.core.Command.execute(Command.java:88) org.basex.server.ClientListener.execute(ClientListener.java:368) org.basex.server.ClientListener.replace(ClientListener.java:347) org.basex.server.ClientListener.run(ClientListener.java:150)
Hi Kento,
what's the total size of your XML documents? As Mattijs indicated, it may be that you have reached the id limit of 2^31 entries. In this case, you can distribute your data to multiple database instances, all of which can be queried by a single XQuery.
Our documentation contains some statistics on large databases that have been created with BaseX [1].
Hope this helps, your feedback is welcome, Christian
[1] http://docs.basex.org/wiki/Statistics ___________________________
On Mon, Apr 23, 2012 at 5:30 PM, Mattijs Ugen m.ugen@student.utwente.nl wrote:
Is it a coincidence that this value is exactly 2 * 31 - 1 (Integer.MAX_VALUE) ?
java.lang.ArrayIndexOutOfBoundsException: 2147483647
*snip*
Mattijs
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Hi Christian,
Thank you very much for the reply.
I make sense if the node limit is the problem. Even though we couldn't check the actual database info because it was broken then :-) Our data consists of hundreds of thousands xml documents and each of them has several thousands of nodes. The error occurred while we added it to a database.
Thanks, Kento
-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: Tuesday, April 24, 2012 1:37 AM To: Tarui, Kento Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] BaseXClient.py raised IOError
Hi Kento,
what's the total size of your XML documents? As Mattijs indicated, it may be that you have reached the id limit of 2^31 entries. In this case, you can distribute your data to multiple database instances, all of which can be queried by a single XQuery.
Our documentation contains some statistics on large databases that have been created with BaseX [1].
Hope this helps, your feedback is welcome, Christian
[1] http://docs.basex.org/wiki/Statistics ___________________________
On Mon, Apr 23, 2012 at 5:30 PM, Mattijs Ugen m.ugen@student.utwente.nl wrote:
Is it a coincidence that this value is exactly 2 * 31 - 1 (Integer.MAX_VALUE) ?
java.lang.ArrayIndexOutOfBoundsException: 2147483647
*snip*
Mattijs
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Kento san,
Our data consists of hundreds of thousands xml documents and each of them has several thousands of nodes. The error occurred while we added it to a database.
thanks for the feedback. In this case, you probably need to store your data in several database instances. As "databases" are pretty light-weight data structures in BaseX [1], we may rename them to "collections", and add another indirection in a future version to increase the maximum number of supported documents (nodes) per database.
Hope this helps, Christian
A little PS reg. your import script: the "add" command will be faster than "replace" in most cases. Next, when doing bulk updates, it is recommended to turn off the "autoflush" option to improve performance.
[1] http://docs.basex.org/wiki/Databases [2] http://docs.basex.org/wiki/Options#AUTOFLUSH
-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: Tuesday, April 24, 2012 1:37 AM To: Tarui, Kento Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] BaseXClient.py raised IOError
Hi Kento,
what's the total size of your XML documents? As Mattijs indicated, it may be that you have reached the id limit of 2^31 entries. In this case, you can distribute your data to multiple database instances, all of which can be queried by a single XQuery.
Our documentation contains some statistics on large databases that have been created with BaseX [1].
Hope this helps, your feedback is welcome, Christian
[1] http://docs.basex.org/wiki/Statistics ___________________________
On Mon, Apr 23, 2012 at 5:30 PM, Mattijs Ugen m.ugen@student.utwente.nl wrote:
Is it a coincidence that this value is exactly 2 * 31 - 1 (Integer.MAX_VALUE) ?
java.lang.ArrayIndexOutOfBoundsException: 2147483647
*snip*
Mattijs
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Hi Christian,
Thank you so much for the feedback. Yes, we should try to use several separating "collections" instead.
Your advice for our script is very informative, too. Thanks again.
Regards, Kento
-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: Tuesday, April 24, 2012 11:40 AM To: Tarui, Kento Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] BaseXClient.py raised IOError
Kento san,
Our data consists of hundreds of thousands xml documents and each of them has several thousands of nodes. The error occurred while we added it to a database.
thanks for the feedback. In this case, you probably need to store your data in several database instances. As "databases" are pretty light-weight data structures in BaseX [1], we may rename them to "collections", and add another indirection in a future version to increase the maximum number of supported documents (nodes) per database.
Hope this helps, Christian
A little PS reg. your import script: the "add" command will be faster than "replace" in most cases. Next, when doing bulk updates, it is recommended to turn off the "autoflush" option to improve performance.
[1] http://docs.basex.org/wiki/Databases [2] http://docs.basex.org/wiki/Options#AUTOFLUSH
-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: Tuesday, April 24, 2012 1:37 AM To: Tarui, Kento Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] BaseXClient.py raised IOError
Hi Kento,
what's the total size of your XML documents? As Mattijs indicated, it may be that you have reached the id limit of 2^31 entries. In this case, you can distribute your data to multiple database instances, all of which can be queried by a single XQuery.
Our documentation contains some statistics on large databases that have been created with BaseX [1].
Hope this helps, your feedback is welcome, Christian
[1] http://docs.basex.org/wiki/Statistics ___________________________
On Mon, Apr 23, 2012 at 5:30 PM, Mattijs Ugen m.ugen@student.utwente.nl wrote:
Is it a coincidence that this value is exactly 2 * 31 - 1 (Integer.MAX_VALUE) ?
java.lang.ArrayIndexOutOfBoundsException: 2147483647
*snip*
Mattijs
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
basex-talk@mailman.uni-konstanz.de