Hello Hans-Juergen,
This looks very interesting, I will give it a try! Combining resources and documents in the same query is a great idea. Can it treat stdin as one of those resources? I suppose on unix you can use the special /dev/stdin path.
Regards, Iwan
On Wed, 23 Jun 2021, 00:42 Hans-Juergen Rennau, hrennau@yahoo.de wrote:
Good evening, Iwan,
I think your ideas are very interesting, and they are about unleashing a considerable potential. Your basexsed sketch could also be summarized liks this: look, here's an expression E (or a query file Q); and there's a selection of resources R - why should I say more in order to have the expression (or query) applied to the selected resources? And such aggregated evaluation makes perfect sense, as the data model of XQuery takes care that single resources and sets of resources are equivalent from the point of view of evaluation and result construction. Adding to this BaseX's fabulous ability to treat non-XML (JSON, CSV, HTML) as node trees we get another factor with which to multiply the usefulness.
Now let me address this "selection of resources". At first sight this seems to have nothing to do with XPath or XQuery, so why give second thoughts to *how* to select resources - let's just repeat the expressions and constructs we are used to from time-honoured tools. (And I repeat that doing that, following your sketch, something very useful and powerful can emerge!) However, here we might also consider a radical alternative: draw the resources into the realm of XPath tree navigation - extend XPath so as to support navigation of resource trees *as well as* resource contents (node trees). An airplane that can swim, a flying fish!
That is what Foxpath [1] does, which is simply an extended version of XPath 3.0. In a nutshell: there are two path operators, / and \ - one for node tree navigation, one for resource tree navigation, and you can mix both types of navigation within an expression seamlessly - e.g. navigate the resource tree and then drill into resource contents, or use node tree navigation in predicates of resource tree navigation. (By default, / is for resources, \ for nodes, but you can swap them using option -b.)
Taking this road, you do not say any more - here's an expression E, and there's a selection of resources R - you say: here's an expression. Full stop. Any selection of resources is a part of it.
For example: in order to get a list of all XML resources in a BaseX installation - excluding the contents of folders "webapp" and "data", I can use the following fox:
fox "/prog*86*/basex//(*.xml, *.xsd, *.xsl*)[not((ancestor~::webapp, ancestor~::data))]" => /Program Files (x86)/BaseX/etc/factbook.xml /Program Files (x86)/BaseX/etc/w3-catalog.xml /Program Files (x86)/BaseX/repo/http-www.functx.com-1.0/cxan.xml /Program Files (x86)/BaseX/repo/http-www.functx.com-1.0/expath-pkg.xml /Program Files (x86)/BaseX/repo/http-www.functx.com-1.0/functx/functx.xsl
If I want a list of their root element names, I just *continue the navigation into* the resources:
fox "/prog*86*/basex//(*.xml, *.xsd, *.xsl*)[not((ancestor~::webapp, ancestor~::data))]*\concat(bfname(), ' - ', name())" => factbook.xml - mondial w3-catalog.xml - catalog cxan.xml - package expath-pkg.xml - package functx.xsl - xsl:stylesheet
The next example shows the elegance of remaining within one single, unified expression language, no matter if dealing with nodes or resources - enjoying the arrow operator (counting XSLTs in an oXygen installation):
fox "/programme/oxyg*//*.xsl => count()" => 2342
Here's node tree navigation in a predicate of resource tree navigation - selecting folders containing XSLTs using version 1:
fox "/programme/oxyg*//*.xsl[*@version\number() lt 2]/.." => /programme/Oxygen XML Editor 19/frameworks/dita/DITA-OT/plugins/com.oxygenxml.webhelp.classic/xsl /programme/Oxygen XML Editor 19/frameworks/dita/DITA-OT/plugins/com.oxygenxml.webhelp.classic/xsl/dita/mobile /programme/Oxygen XML Editor 19/frameworks/dita/DITA-OT/plugins/com.oxygenxml.webhelp.classic/xsl/dita/original /programme/Oxygen XML Editor 19/frameworks/dita/DITA-OT/plugins/com.oxygenxml.webhelp.classic/xsl/docbook /programme/Oxygen XML Editor 19/frameworks/dita/DITA-OT/plugins/com.oxygenxml.webhelp.classic/xsl/docbook/desktop /programme/Oxygen XML Editor 19/frameworks/dita/DITA-OT/plugins/com.oxygenxml.webhelp.classic/xsl/docbook/mobile /programme/Oxygen XML Editor 19/frameworks/dita/DITA-OT/plugins/com.oxygenxml.webhelp.common/xsl/dita ...
Very important is the automated parsing of non-XML (e.g. JSON) into node trees, making it accessible to navigation. Here's a fox selecting OpenAPI specifications (JSON) defining at least 200 endpoints:
fox "/projects/bhub/download/bhub-20210621//*.json[*\paths[count(*) ge 200]]" => /projects/bhub/download/bhub-20210621/C4C/edmx-openapi.businesspartner.json
/projects/bhub/download/bhub-20210621/S4HANAOPAPI/edmx-openapi.OP_API_GRMASTERDATA_SRV_0001.json
/projects/bhub/download/bhub-20210621/S4HANAOPAPIUTL/edmx-openapi.UT_ERP_ISU_UMC_0001.json
/projects/bhub/download/bhub-20210621/SAPS4HANACloud/edmx-openapi.API_GRMASTERDATA_SRV.json
/projects/bhub/download/bhub-20210621/SAPS4HANACloud/edmx-openapi.C_TRIALBALANCE_CDS.json
These examples were meant to suggest the efficiency and elegance obtained when taking a unified view of navigation at both levels - resource contents and resource trees (like file systems).
Wrapping up - the basexsed idea appears to me very good, but how about carrying things one step further, similar to what Foxpath did?
Kind regards, Hans-Jürgen
[1] https://github.com/hrennau/foxpath (Tip: doc/foxpath-into.pdf offers a gentle introduction.)
Am Montag, 21. Juni 2021, 14:59:54 MESZ hat Iwan Briquemont < tracnar@gmail.com> Folgendes geschrieben:
Hi,
BaseX works very well to do some quick queries or updates over files (rather than using it as a DB). I usually use the GUI for that but I think it could be quicker to have a simple CLI tool for the terminal like sed to do things like:
Have a fancy XPath tool: (-e is -q in basex) echo "<root><ele>1</ele><ele>2</ele></root>" | basexsed -e 'root/ele' 1 2
Work over multiple documents quickly (somewhat equivalent to "for $path in $args let $doc := doc($path) return xquery:eval($expr, map { '': $doc}))) basexsed -e 'some/xquery' xmldocuments/*.xml # Output the result
Allow multiple documents on stdin too, so you can chain expressions: echo "<doc1/><doc2/><doc1/>' | basexsed -e 'doc1' | basexsed -e 'doc1/x'
Do updating expressions, by default outputting the changes: basexsed -e 'delete node //n' xmldocuments/*.xml # Outputs the docs without the n nodes
(-i is -u in basex) basexsed -i -e 'delete node //n' xmldocuments/*.xml # Changes the files in place
Automatically handle other formats like json, csv, html: basexsed -e 'map:keys(.)' jsondocuments/*.json # Outputs as json as well
AFAIK everything is already there in the basex command, so the main change would be to support multiple files/documents in a streaming fashion, and then a few smaller things like using stdin as the context- item if no file is given and allowing the same arguments as in standard tools like sed. To make it nicer you could also add syntax highlighting to the console, some shortcuts for longer XQuery parameters like serialization indentation.
This would provide an alternative to tools like Xidel ( https://www.videlibri.de/xidel.html ) or jq ( https://stedolan.github.io/jq/ ).
Is this something you would be interested in implementing? Or have some people already done something similar?
Regards, Iwan