Hello all, I have a huge extraction of XML documents in a single file, without a root element, something along these lines:
<a> <b/> </a> <a> <b/> </a> ...
Other than by editing it to add a root element, is there a clever way of creating a Basex DB from this?
Thanks for any pointers, C.
________________________________
Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The Netherlands, Registration No. 33156677, Registered in The Netherlands.
Hello Constantine,
as this is not a valid XML document there is no way to add this to BaseX. You could write an XQuery, which reads in the document as a text file and chops the file into valid XML documents (of course you could also do this with some external line-processing tool like sed).
Of course, as you mentioned, adding a root node is the easiest option.
Cheers Dirk
On Tue, May 19, 2015 at 11:32 AM, Hondros, Constantine (ELS-AMS) < C.Hondros@elsevier.com> wrote:
Hello all,
I have a huge extraction of XML documents in a single file, without a root element, something along these lines:
<a>
<b/>
</a>
<a>
<b/>
</a>
…
Other than by editing it to add a root element, is there a clever way of creating a Basex DB from this?
Thanks for any pointers,
C.
Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The Netherlands, Registration No. 33156677, Registered in The Netherlands.
Thanks, I suspected I would have to tokenise on an end element.
Cheers, Constantine
From: Dirk Kirsten [mailto:dk@basex.org] Sent: 19 May 2015 11:46 To: Hondros, Constantine (ELS-AMS) Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Create DB from large XML extraction without root element?
Hello Constantine, as this is not a valid XML document there is no way to add this to BaseX. You could write an XQuery, which reads in the document as a text file and chops the file into valid XML documents (of course you could also do this with some external line-processing tool like sed). Of course, as you mentioned, adding a root node is the easiest option. Cheers Dirk
On Tue, May 19, 2015 at 11:32 AM, Hondros, Constantine (ELS-AMS) <C.Hondros@elsevier.commailto:C.Hondros@elsevier.com> wrote: Hello all, I have a huge extraction of XML documents in a single file, without a root element, something along these lines:
<a> <b/> </a> <a> <b/> </a> …
Other than by editing it to add a root element, is there a clever way of creating a Basex DB from this?
Thanks for any pointers, C.
________________________________
Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The Netherlands, Registration No. 33156677, Registered in The Netherlands.
-- Dirk Kirsten, BaseX GmbH, http://basex.orghttp://basex.org/ |-- Firmensitz: Blarerstrasse 56, 78462 Konstanz |-- Registergericht Freiburg, HRB: 708285, Geschäftsführer: | Dr. Christian Grün, Dr. Alexander Holupirek, Michael Seiferle `-- Phone: 0049 7531 28 28 676, Fax: 0049 7531 20 05 22
________________________________
Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The Netherlands, Registration No. 33156677, Registered in The Netherlands.
Hi Constantine,
you could use some Java code (see the attached file).
By the way, have you already created the database with CHOP=true?
Cheers, Christian
On Tue, May 19, 2015 at 11:58 AM, Hondros, Constantine (ELS-AMS) C.Hondros@elsevier.com wrote:
Thanks, I suspected I would have to tokenise on an end element.
Cheers,
Constantine
From: Dirk Kirsten [mailto:dk@basex.org] Sent: 19 May 2015 11:46 To: Hondros, Constantine (ELS-AMS) Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Create DB from large XML extraction without root element?
Hello Constantine,
as this is not a valid XML document there is no way to add this to BaseX. You could write an XQuery, which reads in the document as a text file and chops the file into valid XML documents (of course you could also do this with some external line-processing tool like sed).
Of course, as you mentioned, adding a root node is the easiest option.
Cheers
Dirk
On Tue, May 19, 2015 at 11:32 AM, Hondros, Constantine (ELS-AMS) C.Hondros@elsevier.com wrote:
Hello all,
I have a huge extraction of XML documents in a single file, without a root element, something along these lines:
<a>
<b/>
</a>
<a>
<b/>
</a>
…
Other than by editing it to add a root element, is there a clever way of creating a Basex DB from this?
Thanks for any pointers,
C.
Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The Netherlands, Registration No. 33156677, Registered in The Netherlands.
--
Dirk Kirsten, BaseX GmbH, http://basex.org
|-- Firmensitz: Blarerstrasse 56, 78462 Konstanz |-- Registergericht Freiburg, HRB: 708285, Geschäftsführer: | Dr. Christian Grün, Dr. Alexander Holupirek, Michael Seiferle `-- Phone: 0049 7531 28 28 676, Fax: 0049 7531 20 05 22
Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The Netherlands, Registration No. 33156677, Registered in The Netherlands.
Hi Christian,
Thanks, that's very helpful. And no, I haven't yet recreated the database with CHOP=true - I'll let you know the resulting DB size when I have.
Incidentally it might be of interest just what I am doing: I am trying to analyse a single DB I created containing some 30 million scientific references. I seem to be at the upper limit of what my hardware + BaseX can reasonably achieve. For example, attempting to splice a new DB with a couple of million entries using XQuery alone causes BaseX to hang. About the only successful operation I can perform is file:append using a for loop (throwing results into a variable and extracting a well-formed document also hangs).
So I am in work-around mode, but always a pleasure to be hacking with BaseX - I still have yet to find an XML-related problem that I can't solve somehow with BaseX.
C.
-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: 19 May 2015 12:14 To: Hondros, Constantine (ELS-AMS) Cc: Dirk Kirsten; basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Create DB from large XML extraction without root element?
Hi Constantine,
you could use some Java code (see the attached file).
By the way, have you already created the database with CHOP=true?
Cheers, Christian
On Tue, May 19, 2015 at 11:58 AM, Hondros, Constantine (ELS-AMS) C.Hondros@elsevier.com wrote:
Thanks, I suspected I would have to tokenise on an end element.
Cheers,
Constantine
From: Dirk Kirsten [mailto:dk@basex.org] Sent: 19 May 2015 11:46 To: Hondros, Constantine (ELS-AMS) Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Create DB from large XML extraction without root element?
Hello Constantine,
as this is not a valid XML document there is no way to add this to BaseX. You could write an XQuery, which reads in the document as a text file and chops the file into valid XML documents (of course you could also do this with some external line-processing tool like sed).
Of course, as you mentioned, adding a root node is the easiest option.
Cheers
Dirk
On Tue, May 19, 2015 at 11:32 AM, Hondros, Constantine (ELS-AMS) C.Hondros@elsevier.com wrote:
Hello all,
I have a huge extraction of XML documents in a single file, without a root element, something along these lines:
<a>
<b/>
</a>
<a>
<b/>
</a>
…
Other than by editing it to add a root element, is there a clever way of creating a Basex DB from this?
Thanks for any pointers,
C.
Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The Netherlands, Registration No. 33156677, Registered in The Netherlands.
--
Dirk Kirsten, BaseX GmbH, http://basex.org
|-- Firmensitz: Blarerstrasse 56, 78462 Konstanz |-- Registergericht Freiburg, HRB: 708285, Geschäftsführer: | Dr. Christian Grün, Dr. Alexander Holupirek, Michael Seiferle `-- Phone: 0049 7531 28 28 676, Fax: 0049 7531 20 05 22
Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The Netherlands, Registration No. 33156677, Registered in The Netherlands.
________________________________
Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The Netherlands, Registration No. 33156677, Registered in The Netherlands.
basex-talk@mailman.uni-konstanz.de