Create DB from large XML extraction without root element?

List overview All Threads
Download

newer

older

rest vs. restxq - strange...

Server protocol bug?

Hondros, Constantine (ELS-AMS)

19 May 2015 19 May '15

5:32 a.m.

Hello all, I have a huge extraction of XML documents in a single file, without a root element, something along these lines:

<a> </a> <a> </a> ...

Other than by editing it to add a root element, is there a clever way of creating a Basex DB from this?

Thanks for any pointers, C.

________________________________

Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The Netherlands, Registration No. 33156677, Registered in The Netherlands.

Attachments:

attachment.html (text/html — 2.2 KB)

Show replies by date

Dirk Kirsten

19 May 19 May

5:45 a.m.

Hello Constantine,

as this is not a valid XML document there is no way to add this to BaseX. You could write an XQuery, which reads in the document as a text file and chops the file into valid XML documents (of course you could also do this with some external line-processing tool like sed).

Of course, as you mentioned, adding a root node is the easiest option.

Cheers Dirk

On Tue, May 19, 2015 at 11:32 AM, Hondros, Constantine (ELS-AMS) < C.Hondros@elsevier.com> wrote:

...

Hello all,

I have a huge extraction of XML documents in a single file, without a root element, something along these lines:

<a>
 
</a>

<a>
 
</a>

…

Other than by editing it to add a root element, is there a clever way of creating a Basex DB from this?

Thanks for any pointers,

C.

Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The Netherlands, Registration No. 33156677, Registered in The Netherlands.

-- Dirk Kirsten, BaseX GmbH, http://basex.org |-- Firmensitz: Blarerstrasse 56, 78462 Konstanz |-- Registergericht Freiburg, HRB: 708285, Geschäftsführer: | Dr. Christian Grün, Dr. Alexander Holupirek, Michael Seiferle `-- Phone: 0049 7531 28 28 676, Fax: 0049 7531 20 05 22

Hondros, Constantine (ELS-AMS)

5:58 a.m.

New subject: Create DB from large XML extraction without root element?

Thanks, I suspected I would have to tokenise on an end element.

Cheers, Constantine

From: Dirk Kirsten [mailto:dk@basex.org] Sent: 19 May 2015 11:46 To: Hondros, Constantine (ELS-AMS) Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Create DB from large XML extraction without root element?

Hello Constantine, as this is not a valid XML document there is no way to add this to BaseX. You could write an XQuery, which reads in the document as a text file and chops the file into valid XML documents (of course you could also do this with some external line-processing tool like sed). Of course, as you mentioned, adding a root node is the easiest option. Cheers Dirk

On Tue, May 19, 2015 at 11:32 AM, Hondros, Constantine (ELS-AMS) <C.Hondros@elsevier.commailto:C.Hondros@elsevier.com> wrote: Hello all, I have a huge extraction of XML documents in a single file, without a root element, something along these lines:

<a> </a> <a> </a> …

Other than by editing it to add a root element, is there a clever way of creating a Basex DB from this?

Thanks for any pointers, C.

________________________________

Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The Netherlands, Registration No. 33156677, Registered in The Netherlands.

-- Dirk Kirsten, BaseX GmbH, http://basex.org http://basex.org/ |-- Firmensitz: Blarerstrasse 56, 78462 Konstanz |-- Registergericht Freiburg, HRB: 708285, Geschäftsführer: | Dr. Christian Grün, Dr. Alexander Holupirek, Michael Seiferle `-- Phone: 0049 7531 28 28 676, Fax: 0049 7531 20 05 22

________________________________

Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The Netherlands, Registration No. 33156677, Registered in The Netherlands.

Christian Grün

6:14 a.m.

Hi Constantine,

you could use some Java code (see the attached file).

By the way, have you already created the database with CHOP=true?

Cheers, Christian

On Tue, May 19, 2015 at 11:58 AM, Hondros, Constantine (ELS-AMS) C.Hondros@elsevier.com wrote:

...

Thanks, I suspected I would have to tokenise on an end element.

Cheers,

Constantine

From: Dirk Kirsten [mailto:dk@basex.org] Sent: 19 May 2015 11:46 To: Hondros, Constantine (ELS-AMS) Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Create DB from large XML extraction without root element?

Hello Constantine,

as this is not a valid XML document there is no way to add this to BaseX. You could write an XQuery, which reads in the document as a text file and chops the file into valid XML documents (of course you could also do this with some external line-processing tool like sed).

Of course, as you mentioned, adding a root node is the easiest option.

Cheers

Dirk

On Tue, May 19, 2015 at 11:32 AM, Hondros, Constantine (ELS-AMS) C.Hondros@elsevier.com wrote:

Hello all,

I have a huge extraction of XML documents in a single file, without a root element, something along these lines:

<a>
 
</a>

<a>
 
</a>

…

Other than by editing it to add a root element, is there a clever way of creating a Basex DB from this?

Thanks for any pointers,

C.

Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The Netherlands, Registration No. 33156677, Registered in The Netherlands.

--

Dirk Kirsten, BaseX GmbH, http://basex.org

|-- Firmensitz: Blarerstrasse 56, 78462 Konstanz |-- Registergericht Freiburg, HRB: 708285, Geschäftsführer: | Dr. Christian Grün, Dr. Alexander Holupirek, Michael Seiferle `-- Phone: 0049 7531 28 28 676, Fax: 0049 7531 20 05 22

Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The Netherlands, Registration No. 33156677, Registered in The Netherlands.

Hondros, Constantine (ELS-AMS)

6:49 a.m.

New subject: Create DB from large XML extraction without root element?

Hi Christian,

Thanks, that's very helpful. And no, I haven't yet recreated the database with CHOP=true - I'll let you know the resulting DB size when I have.

Incidentally it might be of interest just what I am doing: I am trying to analyse a single DB I created containing some 30 million scientific references. I seem to be at the upper limit of what my hardware + BaseX can reasonably achieve. For example, attempting to splice a new DB with a couple of million entries using XQuery alone causes BaseX to hang. About the only successful operation I can perform is file:append using a for loop (throwing results into a variable and extracting a well-formed document also hangs).

So I am in work-around mode, but always a pleasure to be hacking with BaseX - I still have yet to find an XML-related problem that I can't solve somehow with BaseX.

-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: 19 May 2015 12:14 To: Hondros, Constantine (ELS-AMS) Cc: Dirk Kirsten; basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Create DB from large XML extraction without root element?

Hi Constantine,

you could use some Java code (see the attached file).

By the way, have you already created the database with CHOP=true?

Cheers, Christian

On Tue, May 19, 2015 at 11:58 AM, Hondros, Constantine (ELS-AMS) C.Hondros@elsevier.com wrote:

...

Thanks, I suspected I would have to tokenise on an end element.

Cheers,

Constantine

From: Dirk Kirsten [mailto:dk@basex.org] Sent: 19 May 2015 11:46 To: Hondros, Constantine (ELS-AMS) Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Create DB from large XML extraction without root element?

Hello Constantine,

as this is not a valid XML document there is no way to add this to BaseX. You could write an XQuery, which reads in the document as a text file and chops the file into valid XML documents (of course you could also do this with some external line-processing tool like sed).

Of course, as you mentioned, adding a root node is the easiest option.

Cheers

Dirk

On Tue, May 19, 2015 at 11:32 AM, Hondros, Constantine (ELS-AMS) C.Hondros@elsevier.com wrote:

Hello all,

I have a huge extraction of XML documents in a single file, without a root element, something along these lines:

<a>
 
</a>

<a>
 
</a>

…

Other than by editing it to add a root element, is there a clever way of creating a Basex DB from this?

Thanks for any pointers,

C.

Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The Netherlands, Registration No. 33156677, Registered in The Netherlands.

--

Dirk Kirsten, BaseX GmbH, http://basex.org

|-- Firmensitz: Blarerstrasse 56, 78462 Konstanz |-- Registergericht Freiburg, HRB: 708285, Geschäftsführer: | Dr. Christian Grün, Dr. Alexander Holupirek, Michael Seiferle `-- Phone: 0049 7531 28 28 676, Fax: 0049 7531 20 05 22

Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The Netherlands, Registration No. 33156677, Registered in The Netherlands.

________________________________

Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The Netherlands, Registration No. 33156677, Registered in The Netherlands.

3713

Age (days ago)

3713

Last active (days ago)

basex-talk@mailman.uni-konstanz.de

4 comments

3 participants

tags (0)

participants (3)

Christian Grün
Dirk Kirsten
Hondros, Constantine (ELS-AMS)