Hello Christian,
thanks for your answer. I managed to solve the problem using the
latest snapshot, but there are some issues/notes i want to share.
First it seems (either in 7.7.2 nor 7.8 beta) not possible to
change the parser options (at least there were no changes in
behaviour)
I'm running basex using the bin/basexhttp script. If i change the
intparse or dtd option using bin/basexclient they are restored to
default when restarting the server, i'm not sure wether this is
desired behaviour or not. But even without restart its not
possible to get the questioned xmls parsed in 7.7.2.
The second note is that the latest snapshot is having some serious
concurrency issues which 7.7.2 doesn't have.
I am using a node.js environment to PUT around 10000 xml files to
the db. If i start those PUT requests all at once (i have no idea
how node internally queues them or if it fires them all at once on
the network) i get these Exceptions after a few successful PUTs
with the latest snapshot:
Improper use? Potential bug? Your feedback is welcome:
Contact: basex-talk@mailman.uni-konstanz.de
Version: BaseX 7.8 beta 4cfa54c
Java: Oracle Corporation, 1.7.0_25
OS: Linux, amd64
Stack Trace:
java.lang.RuntimeException: Data Access out of bounds:
- pre value: 1950001
- #used blocks: 7618
- #total locks: 7618
- access: 7617 (7618 > 7617]
at org.basex.util.Util.notExpected(Util.java:53)
at org.basex.io.random.TableDiskAccess.cursor(TableDiskAccess.java:508)
at org.basex.io.random.TableDiskAccess.read5(TableDiskAccess.java:216)
at org.basex.data.Data.textOff(Data.java:422)
at org.basex.data.DiskData.text(DiskData.java:234)
at org.basex.core.cmd.List.listDB(List.java:132)
at org.basex.core.cmd.List.run(List.java:50)
at org.basex.core.Command.run(Command.java:329)
at org.basex.http.rest.RESTCmd.run(RESTCmd.java:93)
at org.basex.http.rest.RESTCmd.run(RESTCmd.java:82)
at org.basex.http.rest.RESTRetrieve.run0(RESTRetrieve.java:51)
at org.basex.http.rest.RESTCmd.run(RESTCmd.java:61)
at org.basex.core.Command.run(Command.java:329)
at org.basex.core.Command.execute(Command.java:94)
at org.basex.core.Command.execute(Command.java:117)
at org.basex.http.rest.RESTServlet.run(RESTServlet.java:21)
at org.basex.http.BaseXServlet.service(BaseXServlet.java:58)
....
sometimes the collection is not even accessible per GET afterwards
(other collections are).
PUTting the xml files one by one and waiting for the last result
first however works fine.
7.7.2 doesn't have this issue, so is this maybe some regression
bug?
best,
Martin
On 28.01.2014 23:59, Christian Grün wrote:
An update: I noticed that external entity references were resolved by
the parser even if DTD parsing was switched off, leading to long
waiting times. The issue is resolved in the very latest snapshot, both
with the internal and Java’s default parser. If you still want to
parse all entities, simply activate DTD parsing.
On Tue, Jan 28, 2014 at 6:44 PM, Christian Grün
<christian.gruen@gmail.com> wrote:
Hi Martin,
thanks for your feedback. The problem should be solved with Version
7.8 of BaseX. The official version will be out soon, but you are
invited to check out the latest stable snapshot [1].
If you want to use BaseX 7.7.2, you can also switch to Java’s default
parser (via SET INTPARSE false, or by deactivating "Use internal XML
parser" in the "Database" → "New…" dialog and the "Parsing" tab).
Hope this helps,
Christian
[1] http://files.basex.org/releases/latest/
On Tue, Jan 28, 2014 at 6:36 PM, Martin Reckziegel
<reckziegel@informatik.uni-leipzig.de> wrote:
Hello everybody,
i'm using basex 7.7.2 in a university based project. I'm trying to store TEI
XML files in the database but there is an error storing certain valid files.
Using a rest PUT request to store a file starting like this:
<?xml version="1.0"?>
<!DOCTYPE TEI.2 PUBLIC "-//TEI P4//DTD Main DTD Driver File//EN"
"http://www.tei-c.org/Guidelines/DTD/tei2.dtd" [
<!ENTITY % TEI.XML "INCLUDE">
<!ENTITY % PersProse PUBLIC "-//Perseus P4//DTD Perseus Prose//EN"
"http://www.perseus.tufts.edu/DTD/1.0/PersProse.dtd" >
%PersProse;
]>
<TEI.2>
<teiHeader type="text" status="new">
....
results in this error:
"tlg0003.xml.xml" (Line 5): ']' expected, '<' found.
(Line 5 is %PersProse;)
I have no clue how to interpret the error since non of the mention
characters are in that line. Maybe this is resulting in some internal
replacement?
Anyway deleting line 5 resolves the error (but of course does not solve my
problem since i don't want to alter the files)
The problematic files are all valid, at least according to
http://www.validome.org/xml/validate/ and http://validator.w3.org/check so i
wonder why they are rejected by basex?
kind regards,
Martin Reckziegel
_______________________________________________
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk