Is it possible to do faceted browsing with BaseX ?
I think the answer is no, however I may have been initially fooled into trying because my first test case was far simpler than my other use cases.
The simple case was getting counts of publisher from an EAD collection and sorting by greatest number. Here’s my working code:
declare function local:normalize( $s ) { translate( replace( lower-case($s), '^(us-)+', '' ), '-', '' ) };
declare variable $orgs := doc('ead-inst/ead-inst.xml'); declare variable $orgcodes := collection('published')/ead/eadheader/eadid/@mainagencycode ! local:normalize(.) => distinct-values() ;
declare function local:countpubfacets( $c ) {
for $x in ( for $ead in $c let $ORG := local:normalize($ead/*:ead/*:eadheader/*:eadid/@mainagencycode) group by $ORG order by $ORG let $inst := if ($ORG != "") then ($orgs/list/inst[@prefix=$ORG],$orgs/list/inst[lower-case(@oclc) = $ORG] ) return array{ count($ead), $ORG, $inst/@orgcode/string(), $inst/string() } ) order by $x(1) descending return $x };
local:countpubfacets( collection('published'))
In this case: (1) The number of unique @mainagencycode’s are less than 100, and (2) There is only one location for those codes. and performance is acceptable (or at least it seems to be in my tests).
My other unsuccessful attempts have been with trying to rank //subject or //persname ’s. In this case, there are many thousands of unique subjects and names, and the subject and persname elements can occur in multiple locations in the file.
Attempts to search similar to the above method ( as well as a couple of other variations I’ve tried ), even on a smaller subset of categories take entirely too much time — often I have to kill the search before it manages to complete.
I have tried looking at index:facets() https://docs.basex.org/wiki/Index_Module#index:facets https://docs.basex.org/wiki/Index_Module#index:facets Which has only reinforced my notion that it’s not possible.
So for now, I’m resigned to deferring that functionality, and exploring building a specialized index along side the BaseX indexes - either using Solr and querying the Solr index from BaseX, or else building some other index structure DB in BaseX along side my document DB.
Eager to hear any tips or feedback on this problem or alternate solutions, and also general info about BaseX index structure and what useful info can be caught by introspection by those index module functions.
Aside from the faceting, search by //subject (or other fields) is quite acceptable performance, even chaining several filters together with =>
declare function eadsearch:findBySubj( $ctx, $subj as xs:string?, $opt ) { if ( $subj ) then $ctx/*[ft:contains( .//subject, ft:tokenize($subj), $opt )] else $ctx };
— Steve M.