Re: [basex-talk] Whitespace handling on ingest

22 Feb 2013


      Hi Wendell,
the CHOP option has been introduced at a verly stage of BaseX, and I’m
not sure if we had added it today. We could add one or more additional
options to normalize whitespaces or removing PIs/comments from the
input, but the wish list, and the exception list, would probably
continue to grow, so I believe that it would be more convenient to
have a general pre-processing step that takes care of all the
normalization steps. I’m not sure, however, what’s the best approach
to do this within BaseX. If it’s possible to cache files on disk
before adding them to the database, I would recommend XQuery or BaseX
command scripts, XProc or anything else to prepare the data and delete
it afterwards.
Comments are welcome,
Christan
___________________________
On Wed, Feb 20, 2013 at 5:35 PM, Wendell Piez wapiez@wendellpiez.com wrote:
...
Hi,
I see the 'CHOP' option, turned on by default, for trimming leading
and trailing whitespace and eliminating empty text nodes.
What about going further? Is there a good way to normalize whitespace
entirely, collapsing any runs of tab-LF-space into single spaces in my
data?
I think I mentioned earlier the idea of specifying an XSLT
transformation to filter data on ingest (for a similar requirement,
namely removing all comments and PIs). That might be going too far but
any hints you can give me (or pointers to docs) about functionality to
address this sort of thing in general would be welcome.
Thanks!
Wendell
--
Wendell Piez | http://www.wendellpiez.com
XML | XSLT | electronic publishing
Eat Your Vegetables
_____oo_________o_o___ooooo____ooooooo_^
_______________________________________________
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] Whitespace handling on ingest