Re: [basex-talk] Load LibreOffice- and Word-documents?

28 Jan 2020


      Hi Ben
This will be problematic with real world docx files at least. The text 
in there can be split into numerous tags disregarding any word 
boundaries depending on the edit history of the document. As BaseX has 
no means to ignore inline elements in the index this will always be a 
rather slow process. To formulate an XQuery will be a complicated task. 
Unless you clean up the docx XML beforehand that is.
Omar
Am 28.01.2020 um 14:01 schrieb Ben Engbers:
...
Hi,
While we were discussing possible usecases for basex, a colleague asked
me if it is also possible to load libreoffice and Word documents into
Basex and then perform full-text analysis on them. In essence, these are
both XML files, so it should be possible.
Does anybody have experience with this?
Ben

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] Load LibreOffice- and Word-documents?