Hello,
Unfortunately I am unable to use BaseX (6.5) as I keep getting data corruption. I run BaseX on a dedicated server.
The xml-datafiles I insert into the database have the following layout: <EEGData bla="x" > <signal name="x"> <timeslot start="2008-07-24T14:22:16">-53.2942688282 -37.5091841347 -35.6337285276 (many more values)</timeslot> <!-- more timeslot nodes--> </signal> <!-- more signals --> </EEGData>
Running "list" on the client returns the following: Name Documents Size Path ------------------------------------------------------------------------ ------------------ fullxml 3789 522153128560 /path/to/xml
This database was created by running the command: java -cp basex.jar org.basex.BaseXClient -c "create db fullxml /path/to/xml/files/"
The problems start when I execute the "list fullxml" command, this returns:
list fullxml
Possible bug? Your feedback is welcome: Contact: basex-talk@mailman.uni-konstanz.de Version: BaseX 6.5 Java: Sun Microsystems Inc., 1.6.0_17 OS: Linux, amd64 Stack Trace: java.lang.ArrayIndexOutOfBoundsException: 48 org.basex.util.Compress.pull(Compress.java:143) org.basex.util.Compress.unpack(Compress.java:103) org.basex.data.DiskData.txt(DiskData.java:206) org.basex.data.DiskData.text(DiskData.java:173) org.basex.core.cmd.ListDB.run(ListDB.java:48) org.basex.core.Command.run(Command.java:292) org.basex.core.Command.exec(Command.java:274) org.basex.core.Command.execute(Command.java:67) org.basex.server.ServerProcess.run(ServerProcess.java:172)
I randomly queried some XML documents and they looked correct, but after some searching I found a few problematic ones. When queried, the (text)nodes contain garbage like this one: <timeslot start="2008-07-25T11:21:46">HLAD IIrHEHAHIimimO Ga 1YI G@OXba O GW O O OI1R? O</timeslot>
Weirdly enough, the attributes are correct.
info storage 230000000 230000010
PRE DIS SIZ ATS NS KIND CONTENT ------------------------------------------------------------------------ -------------------- 230000000 1 1 1 0 ATTR start="2006-09-05T15:21:22" 230000001 2 1 1 0 TEXT 4EpfL AchD KIAEAEA ABAH IIGRmI I 9mr0LHIKDIIII OIKA 230000002 1698 3 2 0 ELEM timeslot 230000003 1 1 1 0 ATTR start="2006-09-05T15:21:23" 230000004 2 1 1 0 TEXT yoio i o gcyg@hBP i gA kd o RrP kbad n gJua a3kd gA o 230000005 1701 3 2 0 ELEM timeslot 230000006 1 1 1 0 ATTR start="2006-09-05T15:21:24" 230000007 2 1 1 0 TEXT 4EUAHIO rRjRNJNJNLATO N iYOIO KcHADIKTO NIGXLANAIAD 230000008 1704 3 2 0 ELEM timeslot 230000009 1 1 1 0 ATTR start="2006-09-05T15:21:25" 230000010 2 1 1 0 TEXT mIIKEAH GXDA ACIIIO GaIimO
Retrieving all timeslot nodes within this document is not possible and results in an error: Error: Possible bug? Your feedback is welcome: Contact: basex-talk@mailman.uni-konstanz.de Version: BaseX 6.5 Java: Sun Microsystems Inc., 1.6.0_17 OS: Linux, amd64 Stack Trace: java.lang.ArrayIndexOutOfBoundsException: 32 org.basex.util.Compress.pull(Compress.java:127) org.basex.util.Compress.unpack(Compress.java:112) org.basex.data.DiskData.txt(DiskData.java:206) org.basex.data.DiskData.text(DiskData.java:173) org.basex.data.Serializer.node(Serializer.java:298) org.basex.data.Serializer.node(Serializer.java:259) org.basex.query.item.DBNode.serialize(DBNode.java:110) org.basex.query.item.FElem.serialize(FElem.java:245) org.basex.core.cmd.AQuery.query(AQuery.java:97) org.basex.core.cmd.XQuery.run(XQuery.java:22) org.basex.core.Command.run(Command.java:292) org.basex.core.Command.exec(Command.java:274) org.basex.core.Command.execute(Command.java:67) org.basex.server.ServerProcess.run(ServerProcess.java:172)
Furthermore, I cannot delete the problematic documents:
delete a0008788.edf.xml
Possible bug? Your feedback is welcome: Contact: basex-talk@mailman.uni-konstanz.de Version: BaseX 6.5 Java: Sun Microsystems Inc., 1.6.0_17 OS: Linux, amd64 Stack Trace: java.lang.ArrayIndexOutOfBoundsException: 32 org.basex.util.Compress.pull(Compress.java:143) org.basex.util.Compress.unpack(Compress.java:107) org.basex.data.DiskData.txt(DiskData.java:206) org.basex.data.DiskData.text(DiskData.java:173) org.basex.core.cmd.Delete.run(Delete.java:36) org.basex.core.Command.run(Command.java:292) org.basex.core.Command.exec(Command.java:274) org.basex.core.Command.execute(Command.java:67) org.basex.server.ServerProcess.run(ServerProcess.java:172)
When running "info storage" on a document given a certain query, I also get an exception:
info storage
db:open("fullxml/a0008788.edf.xml")/EEGData/signal[position()=1]/timeslo t[position()=1] Possible bug? Your feedback is welcome: Contact: basex-talk@mailman.uni-konstanz.de Version: BaseX 6.5 Java: Sun Microsystems Inc., 1.6.0_17 OS: Linux, amd64 Stack Trace: java.lang.NullPointerException org.basex.data.DataPaths.find(DataPaths.java:93) org.basex.data.DataPaths.doc(DataPaths.java:79) org.basex.data.Data.doc(Data.java:213) org.basex.query.func.FNDb.open(FNDb.java:88) org.basex.query.func.FNDb.iter(FNDb.java:52) org.basex.query.QueryContext.iter(QueryContext.java:304) org.basex.query.expr.ParseExpr.value(ParseExpr.java:73) org.basex.query.func.Fun.comp(Fun.java:47) org.basex.query.path.Path.comp(Path.java:40) org.basex.query.QueryContext.compile(QueryContext.java:206) org.basex.query.QueryProcessor.compile(QueryProcessor.java:82) org.basex.query.QueryProcessor.execute(QueryProcessor.java:103) org.basex.query.QueryProcessor.queryNodes(QueryProcessor.java:182) org.basex.core.cmd.AQuery.queryNodes(AQuery.java:148) org.basex.core.cmd.InfoStorage.run(InfoStorage.java:41) org.basex.core.Command.run(Command.java:292) org.basex.core.Command.exec(Command.java:274) org.basex.core.Command.execute(Command.java:67) org.basex.server.ServerProcess.run(ServerProcess.java:172)
The XML document itself is valid and I can import and query it in BaseX on my local machine without a problem. I even dropped the entire database and recreated it, with the same problem on the same file(s). It seems that after a certain amount of data, the database gets corrupted; all documents inserted after this point contain garbage. I also tried to create the database with "intparse" set to ON, but I get the same problem.
Regards,
Elmer
Hi Elmer,
thanks for the report. Could you provide us with the original data? This way, we might be able to find out if this problem is related to the data compression algorithms, or the client/server architecture (…if we manage to reproduce the problem).
Cheers, Christian ___________________________
Christian Grün Uni KN, Box 188 78457 Konstanz, Germany http://www.inf.uni-konstanz.de/~gruen
On Thu, Feb 10, 2011 at 1:28 PM, e.e.h.lastdrager@student.utwente.nl wrote:
Hello,
Unfortunately I am unable to use BaseX (6.5) as I keep getting data corruption. I run BaseX on a dedicated server.
The xml-datafiles I insert into the database have the following layout:
<EEGData bla="x" > <signal name="x"> <timeslot start="2008-07-24T14:22:16">-53.2942688282 -37.5091841347 -35.6337285276 (many more values)</timeslot> <!-- more timeslot nodes--> </signal> <!-- more signals --> </EEGData>
Running "list" on the client returns the following: Name Documents Size Path
fullxml 3789 522153128560 /path/to/xml
This database was created by running the command: java -cp basex.jar org.basex.BaseXClient -c "create db fullxml /path/to/xml/files/"
The problems start when I execute the "list fullxml" command, this returns:
list fullxml
Possible bug? Your feedback is welcome: Contact: basex-talk@mailman.uni-konstanz.de Version: BaseX 6.5 Java: Sun Microsystems Inc., 1.6.0_17 OS: Linux, amd64 Stack Trace: java.lang.ArrayIndexOutOfBoundsException: 48 org.basex.util.Compress.pull(Compress.java:143) org.basex.util.Compress.unpack(Compress.java:103) org.basex.data.DiskData.txt(DiskData.java:206) org.basex.data.DiskData.text(DiskData.java:173) org.basex.core.cmd.ListDB.run(ListDB.java:48) org.basex.core.Command.run(Command.java:292) org.basex.core.Command.exec(Command.java:274) org.basex.core.Command.execute(Command.java:67) org.basex.server.ServerProcess.run(ServerProcess.java:172)
I randomly queried some XML documents and they looked correct, but after some searching I found a few problematic ones. When queried, the (text)nodes contain garbage like this one: <timeslot start="2008-07-25T11:21:46">HLAD IIrHEHAHIimimO Ga 1YI G@OXba O GW O O OI1R? O</timeslot>
Weirdly enough, the attributes are correct.
info storage 230000000 230000010
PRE DIS SIZ ATS NS KIND CONTENT
230000000 1 1 1 0 ATTR start="2006-09-05T15:21:22" 230000001 2 1 1 0 TEXT 4EpfL AchD KIAEAEA ABAH IIGRmI I 9mr0LHIKDIIII OIKA 230000002 1698 3 2 0 ELEM timeslot 230000003 1 1 1 0 ATTR start="2006-09-05T15:21:23" 230000004 2 1 1 0 TEXT yoio i o gcyg@hBP i gA kd o RrP kbad n gJua a3kd gA o 230000005 1701 3 2 0 ELEM timeslot 230000006 1 1 1 0 ATTR start="2006-09-05T15:21:24" 230000007 2 1 1 0 TEXT 4EUAHIO rRjRNJNJNLATO N iYOIO KcHADIKTO NIGXLANAIAD 230000008 1704 3 2 0 ELEM timeslot 230000009 1 1 1 0 ATTR start="2006-09-05T15:21:25" 230000010 2 1 1 0 TEXT mIIKEAH GXDA ACIIIO GaIimO
Retrieving all timeslot nodes within this document is not possible and results in an error: Error: Possible bug? Your feedback is welcome: Contact: basex-talk@mailman.uni-konstanz.de Version: BaseX 6.5 Java: Sun Microsystems Inc., 1.6.0_17 OS: Linux, amd64 Stack Trace: java.lang.ArrayIndexOutOfBoundsException: 32 org.basex.util.Compress.pull(Compress.java:127) org.basex.util.Compress.unpack(Compress.java:112) org.basex.data.DiskData.txt(DiskData.java:206) org.basex.data.DiskData.text(DiskData.java:173) org.basex.data.Serializer.node(Serializer.java:298) org.basex.data.Serializer.node(Serializer.java:259) org.basex.query.item.DBNode.serialize(DBNode.java:110) org.basex.query.item.FElem.serialize(FElem.java:245) org.basex.core.cmd.AQuery.query(AQuery.java:97) org.basex.core.cmd.XQuery.run(XQuery.java:22) org.basex.core.Command.run(Command.java:292) org.basex.core.Command.exec(Command.java:274) org.basex.core.Command.execute(Command.java:67) org.basex.server.ServerProcess.run(ServerProcess.java:172)
Furthermore, I cannot delete the problematic documents:
delete a0008788.edf.xml
Possible bug? Your feedback is welcome: Contact: basex-talk@mailman.uni-konstanz.de Version: BaseX 6.5 Java: Sun Microsystems Inc., 1.6.0_17 OS: Linux, amd64 Stack Trace: java.lang.ArrayIndexOutOfBoundsException: 32 org.basex.util.Compress.pull(Compress.java:143) org.basex.util.Compress.unpack(Compress.java:107) org.basex.data.DiskData.txt(DiskData.java:206) org.basex.data.DiskData.text(DiskData.java:173) org.basex.core.cmd.Delete.run(Delete.java:36) org.basex.core.Command.run(Command.java:292) org.basex.core.Command.exec(Command.java:274) org.basex.core.Command.execute(Command.java:67) org.basex.server.ServerProcess.run(ServerProcess.java:172)
When running "info storage" on a document given a certain query, I also get an exception:
info storage
db:open("fullxml/a0008788.edf.xml")/EEGData/signal[position()=1]/timeslo t[position()=1] Possible bug? Your feedback is welcome: Contact: basex-talk@mailman.uni-konstanz.de Version: BaseX 6.5 Java: Sun Microsystems Inc., 1.6.0_17 OS: Linux, amd64 Stack Trace: java.lang.NullPointerException org.basex.data.DataPaths.find(DataPaths.java:93) org.basex.data.DataPaths.doc(DataPaths.java:79) org.basex.data.Data.doc(Data.java:213) org.basex.query.func.FNDb.open(FNDb.java:88) org.basex.query.func.FNDb.iter(FNDb.java:52) org.basex.query.QueryContext.iter(QueryContext.java:304) org.basex.query.expr.ParseExpr.value(ParseExpr.java:73) org.basex.query.func.Fun.comp(Fun.java:47) org.basex.query.path.Path.comp(Path.java:40) org.basex.query.QueryContext.compile(QueryContext.java:206) org.basex.query.QueryProcessor.compile(QueryProcessor.java:82) org.basex.query.QueryProcessor.execute(QueryProcessor.java:103) org.basex.query.QueryProcessor.queryNodes(QueryProcessor.java:182) org.basex.core.cmd.AQuery.queryNodes(AQuery.java:148) org.basex.core.cmd.InfoStorage.run(InfoStorage.java:41) org.basex.core.Command.run(Command.java:292) org.basex.core.Command.exec(Command.java:274) org.basex.core.Command.execute(Command.java:67) org.basex.server.ServerProcess.run(ServerProcess.java:172)
The XML document itself is valid and I can import and query it in BaseX on my local machine without a problem. I even dropped the entire database and recreated it, with the same problem on the same file(s). It seems that after a certain amount of data, the database gets corrupted; all documents inserted after this point contain garbage. I also tried to create the database with "intparse" set to ON, but I get the same problem.
Regards,
Elmer _______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Hi Christian,
The dataset is over 450GB and it contains privacy-sensitive data, both of which make sharing difficult.
I am, however, at this moment trying to insert the data again, by running java -cp basex.jar org.basex.BaseXServer -c "create db fullxml /xml/" to circumvent the client/server things.
Regards, Elmer
-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: Thursday, February 10, 2011 1:39 PM To: Lastdrager, E.E.H. (Elmer, Student B-TI,M-CSC) Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] data corruption
Hi Elmer,
thanks for the report. Could you provide us with the original data? This way, we might be able to find out if this problem is related to the data compression algorithms, or the client/server architecture (...if we manage to reproduce the problem).
Cheers, Christian ___________________________
Christian Grün Uni KN, Box 188 78457 Konstanz, Germany http://www.inf.uni-konstanz.de/~gruen
On Thu, Feb 10, 2011 at 1:28 PM, e.e.h.lastdrager@student.utwente.nl wrote:
Hello,
Unfortunately I am unable to use BaseX (6.5) as I keep getting data corruption. I run BaseX on a dedicated server.
The xml-datafiles I insert into the database have the following layout:
<EEGData bla="x" > <signal name="x"> <timeslot start="2008-07-24T14:22:16">-53.2942688282 -37.5091841347 -35.6337285276 (many more values)</timeslot> <!-- more timeslot nodes--> </signal> <!-- more signals --> </EEGData>
Running "list" on the client returns the following: Name Documents Size Path
fullxml 3789 522153128560 /path/to/xml
This database was created by running the command: java -cp basex.jar org.basex.BaseXClient -c "create db fullxml /path/to/xml/files/"
The problems start when I execute the "list fullxml" command, this returns:
list fullxml
Possible bug? Your feedback is welcome: Contact: basex-talk@mailman.uni-konstanz.de Version: BaseX 6.5 Java: Sun Microsystems Inc., 1.6.0_17 OS: Linux, amd64 Stack Trace: java.lang.ArrayIndexOutOfBoundsException: 48 org.basex.util.Compress.pull(Compress.java:143) org.basex.util.Compress.unpack(Compress.java:103) org.basex.data.DiskData.txt(DiskData.java:206) org.basex.data.DiskData.text(DiskData.java:173) org.basex.core.cmd.ListDB.run(ListDB.java:48) org.basex.core.Command.run(Command.java:292) org.basex.core.Command.exec(Command.java:274) org.basex.core.Command.execute(Command.java:67) org.basex.server.ServerProcess.run(ServerProcess.java:172)
I randomly queried some XML documents and they looked correct, but after some searching I found a few problematic ones. When queried, the (text)nodes contain garbage like this one: <timeslot start="2008-07-25T11:21:46">HLAD IIrHEHAHIimimO Ga 1YI G@OXba O GW O O OI1R? O</timeslot>
Weirdly enough, the attributes are correct.
info storage 230000000 230000010
PRE DIS SIZ ATS NS KIND CONTENT
230000000 1 1 1 0 ATTR start="2006-09-05T15:21:22" 230000001 2 1 1 0 TEXT 4EpfL AchD KIAEAEA ABAH IIGRmI I 9mr0LHIKDIIII OIKA 230000002 1698 3 2 0 ELEM timeslot 230000003 1 1 1 0 ATTR start="2006-09-05T15:21:23" 230000004 2 1 1 0 TEXT yoio i o gcyg@hBP i gA kd o RrP kbad n gJua a3kd gA o 230000005 1701 3 2 0 ELEM timeslot 230000006 1 1 1 0 ATTR start="2006-09-05T15:21:24" 230000007 2 1 1 0 TEXT 4EUAHIO rRjRNJNJNLATO N iYOIO KcHADIKTO NIGXLANAIAD 230000008 1704 3 2 0 ELEM timeslot 230000009 1 1 1 0 ATTR start="2006-09-05T15:21:25" 230000010 2 1 1 0 TEXT mIIKEAH GXDA ACIIIO GaIimO
Retrieving all timeslot nodes within this document is not possible and results in an error: Error: Possible bug? Your feedback is welcome: Contact: basex-talk@mailman.uni-konstanz.de Version: BaseX 6.5 Java: Sun Microsystems Inc., 1.6.0_17 OS: Linux, amd64 Stack Trace: java.lang.ArrayIndexOutOfBoundsException: 32 org.basex.util.Compress.pull(Compress.java:127) org.basex.util.Compress.unpack(Compress.java:112) org.basex.data.DiskData.txt(DiskData.java:206) org.basex.data.DiskData.text(DiskData.java:173) org.basex.data.Serializer.node(Serializer.java:298) org.basex.data.Serializer.node(Serializer.java:259) org.basex.query.item.DBNode.serialize(DBNode.java:110) org.basex.query.item.FElem.serialize(FElem.java:245) org.basex.core.cmd.AQuery.query(AQuery.java:97) org.basex.core.cmd.XQuery.run(XQuery.java:22) org.basex.core.Command.run(Command.java:292) org.basex.core.Command.exec(Command.java:274) org.basex.core.Command.execute(Command.java:67) org.basex.server.ServerProcess.run(ServerProcess.java:172)
Furthermore, I cannot delete the problematic documents:
delete a0008788.edf.xml
Possible bug? Your feedback is welcome: Contact: basex-talk@mailman.uni-konstanz.de Version: BaseX 6.5 Java: Sun Microsystems Inc., 1.6.0_17 OS: Linux, amd64 Stack Trace: java.lang.ArrayIndexOutOfBoundsException: 32 org.basex.util.Compress.pull(Compress.java:143) org.basex.util.Compress.unpack(Compress.java:107) org.basex.data.DiskData.txt(DiskData.java:206) org.basex.data.DiskData.text(DiskData.java:173) org.basex.core.cmd.Delete.run(Delete.java:36) org.basex.core.Command.run(Command.java:292) org.basex.core.Command.exec(Command.java:274) org.basex.core.Command.execute(Command.java:67) org.basex.server.ServerProcess.run(ServerProcess.java:172)
When running "info storage" on a document given a certain query, I also get an exception:
info storage
db:open("fullxml/a0008788.edf.xml")/EEGData/signal[position()=1]/timeslo t[position()=1] Possible bug? Your feedback is welcome: Contact: basex-talk@mailman.uni-konstanz.de Version: BaseX 6.5 Java: Sun Microsystems Inc., 1.6.0_17 OS: Linux, amd64 Stack Trace: java.lang.NullPointerException org.basex.data.DataPaths.find(DataPaths.java:93) org.basex.data.DataPaths.doc(DataPaths.java:79) org.basex.data.Data.doc(Data.java:213) org.basex.query.func.FNDb.open(FNDb.java:88) org.basex.query.func.FNDb.iter(FNDb.java:52) org.basex.query.QueryContext.iter(QueryContext.java:304) org.basex.query.expr.ParseExpr.value(ParseExpr.java:73) org.basex.query.func.Fun.comp(Fun.java:47) org.basex.query.path.Path.comp(Path.java:40) org.basex.query.QueryContext.compile(QueryContext.java:206) org.basex.query.QueryProcessor.compile(QueryProcessor.java:82) org.basex.query.QueryProcessor.execute(QueryProcessor.java:103) org.basex.query.QueryProcessor.queryNodes(QueryProcessor.java:182) org.basex.core.cmd.AQuery.queryNodes(AQuery.java:148) org.basex.core.cmd.InfoStorage.run(InfoStorage.java:41) org.basex.core.Command.run(Command.java:292) org.basex.core.Command.exec(Command.java:274) org.basex.core.Command.execute(Command.java:67) org.basex.server.ServerProcess.run(ServerProcess.java:172)
The XML document itself is valid and I can import and query it in BaseX on my local machine without a problem. I even dropped the entire database and recreated it, with the same problem on the same file(s). It seems that after a certain amount of data, the database gets corrupted; all documents inserted after this point contain garbage. I also tried to create the database with "intparse" set to ON, but I get the same problem.
Regards,
Elmer _______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Yes, 450 GB is quite a lot for one database instance. I'm still wondering about the compression issue, though: I would have expected a warning that the data might be too large for being stored in one database instance.
I'd recommend to try to split the data into several snippets, this might ease the process. Regarding the process of querying the data, you can address several databases by one single XQuery expression.
Good luck, keep us updated, Christian
Hi Christian,
The dataset is over 450GB and it contains privacy-sensitive data, both of which make sharing difficult.
I am, however, at this moment trying to insert the data again, by running java -cp basex.jar org.basex.BaseXServer -c "create db fullxml /xml/" to circumvent the client/server things.
Regards, Elmer
-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: Thursday, February 10, 2011 1:39 PM To: Lastdrager, E.E.H. (Elmer, Student B-TI,M-CSC) Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] data corruption
Hi Elmer,
thanks for the report. Could you provide us with the original data? This way, we might be able to find out if this problem is related to the data compression algorithms, or the client/server architecture (...if we manage to reproduce the problem).
Cheers, Christian ___________________________
Christian Grün Uni KN, Box 188 78457 Konstanz, Germany http://www.inf.uni-konstanz.de/~gruen
On Thu, Feb 10, 2011 at 1:28 PM, e.e.h.lastdrager@student.utwente.nl wrote:
Hello,
Unfortunately I am unable to use BaseX (6.5) as I keep getting data corruption. I run BaseX on a dedicated server.
The xml-datafiles I insert into the database have the following layout:
<EEGData bla="x" > <signal name="x"> <timeslot start="2008-07-24T14:22:16">-53.2942688282 -37.5091841347 -35.6337285276 (many more values)</timeslot> <!-- more timeslot nodes--> </signal> <!-- more signals --> </EEGData>
Running "list" on the client returns the following: Name Documents Size Path
fullxml 3789 522153128560 /path/to/xml
This database was created by running the command: java -cp basex.jar org.basex.BaseXClient -c "create db fullxml /path/to/xml/files/"
The problems start when I execute the "list fullxml" command, this returns:
list fullxml
Possible bug? Your feedback is welcome: Contact: basex-talk@mailman.uni-konstanz.de Version: BaseX 6.5 Java: Sun Microsystems Inc., 1.6.0_17 OS: Linux, amd64 Stack Trace: java.lang.ArrayIndexOutOfBoundsException: 48 org.basex.util.Compress.pull(Compress.java:143) org.basex.util.Compress.unpack(Compress.java:103) org.basex.data.DiskData.txt(DiskData.java:206) org.basex.data.DiskData.text(DiskData.java:173) org.basex.core.cmd.ListDB.run(ListDB.java:48) org.basex.core.Command.run(Command.java:292) org.basex.core.Command.exec(Command.java:274) org.basex.core.Command.execute(Command.java:67) org.basex.server.ServerProcess.run(ServerProcess.java:172)
I randomly queried some XML documents and they looked correct, but after some searching I found a few problematic ones. When queried, the (text)nodes contain garbage like this one: <timeslot start="2008-07-25T11:21:46">HLAD IIrHEHAHIimimO Ga 1YI G@OXba O GW O O OI1R? O</timeslot>
Weirdly enough, the attributes are correct.
info storage 230000000 230000010
PRE DIS SIZ ATS NS KIND CONTENT
230000000 1 1 1 0 ATTR start="2006-09-05T15:21:22" 230000001 2 1 1 0 TEXT 4EpfL AchD KIAEAEA ABAH IIGRmI I 9mr0LHIKDIIII OIKA 230000002 1698 3 2 0 ELEM timeslot 230000003 1 1 1 0 ATTR start="2006-09-05T15:21:23" 230000004 2 1 1 0 TEXT yoio i o gcyg@hBP i gA kd o RrP kbad n gJua a3kd gA o 230000005 1701 3 2 0 ELEM timeslot 230000006 1 1 1 0 ATTR start="2006-09-05T15:21:24" 230000007 2 1 1 0 TEXT 4EUAHIO rRjRNJNJNLATO N iYOIO KcHADIKTO NIGXLANAIAD 230000008 1704 3 2 0 ELEM timeslot 230000009 1 1 1 0 ATTR start="2006-09-05T15:21:25" 230000010 2 1 1 0 TEXT mIIKEAH GXDA ACIIIO GaIimO
Retrieving all timeslot nodes within this document is not possible and results in an error: Error: Possible bug? Your feedback is welcome: Contact: basex-talk@mailman.uni-konstanz.de Version: BaseX 6.5 Java: Sun Microsystems Inc., 1.6.0_17 OS: Linux, amd64 Stack Trace: java.lang.ArrayIndexOutOfBoundsException: 32 org.basex.util.Compress.pull(Compress.java:127) org.basex.util.Compress.unpack(Compress.java:112) org.basex.data.DiskData.txt(DiskData.java:206) org.basex.data.DiskData.text(DiskData.java:173) org.basex.data.Serializer.node(Serializer.java:298) org.basex.data.Serializer.node(Serializer.java:259) org.basex.query.item.DBNode.serialize(DBNode.java:110) org.basex.query.item.FElem.serialize(FElem.java:245) org.basex.core.cmd.AQuery.query(AQuery.java:97) org.basex.core.cmd.XQuery.run(XQuery.java:22) org.basex.core.Command.run(Command.java:292) org.basex.core.Command.exec(Command.java:274) org.basex.core.Command.execute(Command.java:67) org.basex.server.ServerProcess.run(ServerProcess.java:172)
Furthermore, I cannot delete the problematic documents:
delete a0008788.edf.xml
Possible bug? Your feedback is welcome: Contact: basex-talk@mailman.uni-konstanz.de Version: BaseX 6.5 Java: Sun Microsystems Inc., 1.6.0_17 OS: Linux, amd64 Stack Trace: java.lang.ArrayIndexOutOfBoundsException: 32 org.basex.util.Compress.pull(Compress.java:143) org.basex.util.Compress.unpack(Compress.java:107) org.basex.data.DiskData.txt(DiskData.java:206) org.basex.data.DiskData.text(DiskData.java:173) org.basex.core.cmd.Delete.run(Delete.java:36) org.basex.core.Command.run(Command.java:292) org.basex.core.Command.exec(Command.java:274) org.basex.core.Command.execute(Command.java:67) org.basex.server.ServerProcess.run(ServerProcess.java:172)
When running "info storage" on a document given a certain query, I also get an exception:
info storage
db:open("fullxml/a0008788.edf.xml")/EEGData/signal[position()=1]/timeslo t[position()=1] Possible bug? Your feedback is welcome: Contact: basex-talk@mailman.uni-konstanz.de Version: BaseX 6.5 Java: Sun Microsystems Inc., 1.6.0_17 OS: Linux, amd64 Stack Trace: java.lang.NullPointerException org.basex.data.DataPaths.find(DataPaths.java:93) org.basex.data.DataPaths.doc(DataPaths.java:79) org.basex.data.Data.doc(Data.java:213) org.basex.query.func.FNDb.open(FNDb.java:88) org.basex.query.func.FNDb.iter(FNDb.java:52) org.basex.query.QueryContext.iter(QueryContext.java:304) org.basex.query.expr.ParseExpr.value(ParseExpr.java:73) org.basex.query.func.Fun.comp(Fun.java:47) org.basex.query.path.Path.comp(Path.java:40) org.basex.query.QueryContext.compile(QueryContext.java:206) org.basex.query.QueryProcessor.compile(QueryProcessor.java:82) org.basex.query.QueryProcessor.execute(QueryProcessor.java:103) org.basex.query.QueryProcessor.queryNodes(QueryProcessor.java:182) org.basex.core.cmd.AQuery.queryNodes(AQuery.java:148) org.basex.core.cmd.InfoStorage.run(InfoStorage.java:41) org.basex.core.Command.run(Command.java:292) org.basex.core.Command.exec(Command.java:274) org.basex.core.Command.execute(Command.java:67) org.basex.server.ServerProcess.run(ServerProcess.java:172)
The XML document itself is valid and I can import and query it in BaseX on my local machine without a problem. I even dropped the entire database and recreated it, with the same problem on the same file(s). It seems that after a certain amount of data, the database gets corrupted; all documents inserted after this point contain garbage. I also tried to create the database with "intparse" set to ON, but I get the same problem.
Regards,
Elmer _______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
basex-talk@mailman.uni-konstanz.de