Hello,
I have got question regarding the export function of Basex. I used BaseX to load the Reuters Corpus Volume 1 which is a corpus of around 800k files, unzipped nealry 3GB in size. Is there any way to export some parts of these files (with xpath), parsed into a Basex database, to csv? I am aware that those kind of questions would probably fall under the support which costs. But if there is any simpler solution for that it would be very nice if you could post it. It would help me a lot!
Best regards,
Andreas
Hi Andreas,
I am not sure if this is the most elegant solution, there are may be better options, but it should work.
To solve this without programming in languages other than XQuery you might try the following:
1) Run BaseX from the command line, with the serializer set to text [2, 3], this omits the XML elements:
$ java -cp basex-6.8-SNAPSHOT.jar org.basex.BaseX -s"method=text" query.xq -o output.txt
2) Make sure a file query.xq exists and has contents similar to those in [1], it basically makes sure each line contains a given set of values
3) The output will then be written to output.txt.
This should hopefully give something similar to the desired result, feel free to ask for more help if needed!
Kind regards Michael
[1] https://gist.github.com/dbdcf1e6121f397e828a [2] http://docs.basex.org/wiki/Serialization [3] http://www.w3.org/TR/xslt-xquery-serialization/#text-output
Am 26.09.2011 um 04:27 schrieb Andreas Karpf:
Hello,
I have got question regarding the export function of Basex. I used BaseX to load the Reuters Corpus Volume 1 which is a corpus of around 800k files, unzipped nealry 3GB in size. Is there any way to export some parts of these files (with xpath), parsed into a Basex database, to csv? I am aware that those kind of questions would probably fall under the support which costs. But if there is any simpler solution for that it would be very nice if you could post it. It would help me a lot!
Best regards,
Andreas
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Hi Andreas,
the best way to produce CSV is to directly use XQuery and format the output as required. A simple example (Leo, a member of our team, will give you a more elaborate reply in a while):
declare option output:format "no"; for $rows in 1 to 2 return ( let $cols := ('1', '2', '3') return string-join($cols, ','), ' ')
As you mentioned our professional offerings: If you don't feel that safe in writing XQuery, or if you believe that we might be faster, we'll be glad to find an individual solution that suits all your needs; feel free to write to support@basex.org.
Christian ___________________________
On Mon, Sep 26, 2011 at 4:27 AM, Andreas Karpf andreas.karpf@gmail.com wrote:
Hello, I have got question regarding the export function of Basex. I used BaseX to load the Reuters Corpus Volume 1 which is a corpus of around 800k files, unzipped nealry 3GB in size. Is there any way to export some parts of these files (with xpath), parsed into a Basex database, to csv? I am aware that those kind of questions would probably fall under the support which costs. But if there is any simpler solution for that it would be very nice if you could post it. It would help me a lot! Best regards, Andreas
basex-talk@mailman.uni-konstanz.de