Hi Wendell,
the CHOP option has been introduced at a verly stage of BaseX, and I’m not sure if we had added it today. We could add one or more additional options to normalize whitespaces or removing PIs/comments from the input, but the wish list, and the exception list, would probably continue to grow, so I believe that it would be more convenient to have a general pre-processing step that takes care of all the normalization steps. I’m not sure, however, what’s the best approach to do this within BaseX. If it’s possible to cache files on disk before adding them to the database, I would recommend XQuery or BaseX command scripts, XProc or anything else to prepare the data and delete it afterwards.
Comments are welcome, Christan ___________________________
On Wed, Feb 20, 2013 at 5:35 PM, Wendell Piez wapiez@wendellpiez.com wrote:
Hi,
I see the 'CHOP' option, turned on by default, for trimming leading and trailing whitespace and eliminating empty text nodes.
What about going further? Is there a good way to normalize whitespace entirely, collapsing any runs of tab-LF-space into single spaces in my data?
I think I mentioned earlier the idea of specifying an XSLT transformation to filter data on ingest (for a similar requirement, namely removing all comments and PIs). That might be going too far but any hints you can give me (or pointers to docs) about functionality to address this sort of thing in general would be welcome.
Thanks! Wendell
-- Wendell Piez | http://www.wendellpiez.com XML | XSLT | electronic publishing Eat Your Vegetables _____oo_________o_o___ooooo____ooooooo_^ _______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk