Dear BaseX team,
if
you are interested in further boosting the power of BaseX as a resource
monitoring tool, you might consider the tiny, yet useful extension
described below.
Currently we have various functions for parsing non-XML formats into node trees:
json:parse($text ...)
csv:parse($text ...)
html:parse($text ,,,
These
functions expect as input the text to be parsed, not the URI from which
to retrieve the text. Of course, it is trivial to combine retrieval and
parsing, using fn:unparsed-text(), like so:
unparsed-text('foo.json') ! json:parse(.)
However,
the resulting document does not have a document URI, and it would be
cumbersome to associate it with one. So how about adding three functions
json:doc($uri ...)
csv:doc($uri ...)
html:doc($uri ...)
Two advantages: first the document URI is available, second - sheer elegance.
As
an example of this elegance, consider the task to create a list of all
.csv files found in a directory tree which have inconsistent record
lengths. Using csv:doc, the solution is a simple expression, rather than
a program:
file:list($dir, true(), "*.csv") ! concat($dir, '/', .) !
csv:doc(.) [1 eq count(//record/count(*) => distinct-values())] / document-uri(.)
Cheers,
Hans-Jürgen