Hy Christian,
i tested the import with the sax and with the internal parser and results are also corrupt. Process: import data from one large xml file (2,5GB) into one empty db
my data: <root> <article ID="xys"> <values> .... <value ..>.....</value> <value ..>.....</value> <value ID="abcdefg" refcontent="other article">&t;b>System requirements: </b></value> </values> </article> </root>
QUERY_1: for $n in //article[(@ID='Article' or @ID='Other')]/values return <pv><utid>{data($n/../@ID)}</utid>{$n}</pv>
QUERY_2: for $n in //article[(@ID='Article' or @ID='Other') and ID="abc"]/values return <pv><utid>{data($n/../@ID)}</utid>{$n}</pv>
If i query with QUERY_1, i get one corrupt results which looks like: <pv> <utid>abc</utid> .. <value ...> ...... <value <----- here the closing ">" is missing .. </pv>
but if i use query QUERY_2, which only grab the one specific article with the corrupt data, i get a correct result, without the missing closing tag.
So the DB contains all correct information, but the query (QUERY_1) over all article returns corrupt data.
Do you have some other ideas for the reason of this problem/bug? Do you know, which code parts are changed from version 6.7.1 to 7.0.1, which are involved in this problem?
regards Sven
On 02.11.2011 18:20, Christian GrĂ¼n wrote:
Hi Sven,
Currently i try the basex version 7.0.1. Now, the speed/performance isn't the problem (is very fast now :-) )
..always nice to hear..
but i get an problem with my xml data and i think, it is an heisenbug.
..not that nice to hear ;)..
What i'm doing wrong? Or is this a bug? Why this problem was never seen in the basex 6.7.1 version?
Since Version 7.0, we use our internal XML parser as default parser. It would be interesting to hear what happens if you switch back to the Java XML parser (see [1] for details).
Best, Christian