Hy Christian,
Some idea from me: could it be a buffer problem (overlapping issue) ? The Query_1 produce data with around 2,1 GB. With the same query, the angle bracket is always missing on the same entity on the same (byte) offset in the result. May be some output buffer from the server side or some input buffer from the client side has a bug?
What do you mean with serialization api? for connecting and querying i use the java api: - org.basex.server.ClientSession - org.basex.server.ClientQuery - while(query.more()) - query.next()
regards Sven
On 03.11.2011 13:54, Christian Grün wrote:
Hi Sven,
I'm not sure what to think about this issue. This could also be a serialization problem, as angle brackets (<,>) are not stored in the database at all. Which API do you use to serialize the results/have you tried another API as well?
Christian ___________________________
On Thu, Nov 3, 2011 at 1:38 PM, Sven Regasven.rega@gmx.de wrote:
Hy Christian,
i tested the import with the sax and with the internal parser and results are also corrupt. Process: import data from one large xml file (2,5GB) into one empty db
my data:
<root> <article ID="xys"> <values> .... <value ..>.....</value> <value ..>.....</value> <value ID="abcdefg" refcontent="other article">&t;b>System requirements: </b></value> </values> </article> </root>
QUERY_1: for $n in //article[(@ID='Article' or @ID='Other')]/values return<pv><utid>{data($n/../@ID)}</utid>{$n}</pv>
QUERY_2: for $n in //article[(@ID='Article' or @ID='Other') and ID="abc"]/values return<pv><utid>{data($n/../@ID)}</utid>{$n}</pv>
If i query with QUERY_1, i get one corrupt results which looks like:
<pv> <utid>abc</utid> .. <value ...> ......<value<----- here the closing ">" is missing .. </pv>
but if i use query QUERY_2, which only grab the one specific article with the corrupt data, i get a correct result, without the missing closing tag.
So the DB contains all correct information, but the query (QUERY_1) over all article returns corrupt data.
Do you have some other ideas for the reason of this problem/bug? Do you know, which code parts are changed from version 6.7.1 to 7.0.1, which are involved in this problem?
regards Sven
On 02.11.2011 18:20, Christian Grün wrote:
Hi Sven,
Currently i try the basex version 7.0.1. Now, the speed/performance isn't the problem (is very fast now :-) )
..always nice to hear..
but i get an problem with my xml data and i think, it is an heisenbug.
..not that nice to hear ;)..
What i'm doing wrong? Or is this a bug? Why this problem was never seen in the basex 6.7.1 version?
Since Version 7.0, we use our internal XML parser as default parser. It would be interesting to hear what happens if you switch back to the Java XML parser (see [1] for details).
Best, Christian