BaseX-Talk March 2022

basex-talk@mailman.uni-konstanz.de

29 participants
35 discussions

BaseX 9.6: The Summer Edition
by Christian Grün 28 Nov '24

28 Nov '24

Dear all, We provide you with a new and fresh version of BaseX, our open source XML framework, database system and XQuery 3.1 processor: https://basex.org/ Apart from our main focus (query rewritings and optimizations), we have added the following enhancements: XQUERY: MODULES, FEATURES - Archive Module, archive:write: stream large archives to file - SQL Module: support for more SQL types - Full-Text Module, ft:thesaurus: perform Thesaurus queries - Fulltext, fuzzy search: specify Levenshtein limit - UNROLLLIMIT option: control limit for unrolling loops XQUERY: JAVA BINDINGS - Java objects of unknown type are wrapped into function items - results of constructor calls are returned as function items - the standard package "java.lang." has become optional - array arguments can be specified with the middle dot notation - conversion can be controlled with the WRAPJAVA option - better support for XQuery arrays and maps WEB APPLICATIONS - RESTXQ: Server-Timing HTTP headers are attached to the response For a more comprehensive list of added and updated features, look into our documentation (docs.basex.org) and check out the GitHub issues (github.com/BaseXdb/basex/issues). Have fun, Your BaseX Team

11 43

recursively used variables
by Rob Stapper 12 Aug '24

12 Aug '24

Hi, The code[1] below and send as attachment generates a error message: “Static variable depends on itself: $Q{http://www.w3.org/2005/xquery-local-functions}test”. I use these variables to refer to my private functions in my modules so I can easyly refer to them in a inheritance situation. It’s not a big problem for me but I was wondering if the error-triggering is justified or that it should work. [1]=========================================== declare variable $local:test := local:test#1 ; declare %private function local:test( $i) { if ( $i > 0) then $local:test( $i - 1) } ; $local:test( 10) =========================================== Kind regards, Rob Stapper Sent from Mail for Windows 10 -- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus

4 7

Add a comment to a backup?
by Jonathan Robie 13 May '22

13 May '22

I have been making backups before doing particularly complex things to my treebanks, and I find myself writing down information about what stage of processing a given backup corresponds to. "after replacing subtrees for missing compounds" I wish I could associate these strings with backups in BaseX so I can more easily know which one I would restore if something went wrong. Jonathan

4 8

User group meeting in Prague?
by Imsieke, Gerrit, le-tex 10 Apr '22

10 Apr '22

Fellow BaseX Users! You might have heard that XML Prague 2022 will take place in June. (Unless the then-prevalent Greek letter makes it impossible even in that time of year, of course.) I asked Christian whether the BaseX team will organize a user group meeting after it had not happened for years now. Christian didn’t seem to be very fond of organizing such a meeting. I asked him whether he would be available to present new features, the roadmap, and for a Q&A session if the users themselves organized such a meeting. He agreed, and therefore I hereby ask the list members whether anyone will join me in organizing this. The plan looks as follows: We will apply for one or two 90-minute slots via the CFP process (https://www.xmlprague.cz/cfp/). We don’t need to have a fixed schedule yet by Dec. 20 (end of CFP date as currently announced – it will be extended anyway). Christian was so kind as to create a new repo, user-group, on Github. We will use one or more of its Wiki pages [1] in order to plan the event. The page will eventually evolve into an agenda if you agree. Looking forward to meeting many of you in Prague in June. And another organizing volunteer (or other volunteers), please come forward. Maybe we can also deal with it in the Wiki [2]. Gerrit [1] https://github.com/BaseXdb/user-group/wiki/2022-06-XML-Prague [2] https://github.com/BaseXdb/user-group/wiki/Members

3 3

keeping relative paths in the document-uri?
by Graydon Saunders 31 Mar '22

31 Mar '22

Hello -- An approach that uses db:create and file:descendant doesn't seem to work; the file paths have all been truncated to the name part of the path in the document-uri property of the individual documents in the DB. I'm using 9.6.4 with Oracle Java 1.8.0_211 on a Windows 10 machine. (I have no control over any of the environment.) Is there a way to keep the full path (relative or absolute) in the document uri property? I'm trying to inspect files with identical names in different directories, so I need the directory name to tell the files apart. Thanks! Graydon

3 2

Simple delete bug
by Jonathan Robie 30 Mar '22

30 Mar '22

Expressions like this fail silently without raising an error in BaseXGUI: delete //m/@n Of course, it should be: delete nodes //m/@n But there should be an error to remind me ;-> Jonathan

2 2

Serialization ... one attribute per line ...
by Jonathan Robie 29 Mar '22

29 Mar '22

I have a set of syntax trees that I would like to serialize with one attribute per line, like this: https://github.com/biblicalhumanities/greek-new-testament/blob/master/synta… The current serialization looks more like this: https://github.com/biblicalhumanities/greek-new-testament/blob/master/synta… Is there a way to get there using just serialization parameters, or do I need to do something fancier? Jonathan

1 0

fn:count() performance
by Josselin Morvan 29 Mar '22

29 Mar '22

Hi everyone, We are experimenting a small issue with the count() function, and I wanted to know if you have any idea to reduce the response time of the server : We have a database containing almost 5.000 descriptions of expert reports from the 18th century. As a report can take place over several years, we want to list all the years mentioned in the reports and then count for each year how many reports we have. Our goal with this query is to produce a filter by year for our web application. But, if the first part of the query is quite fast, it is not the case for the second part… here is a simplified sample of our code : xquery version "3.1"; declare default element namespace "xpr" ; (: to create the xprDB ;) db:create('xpr', 'https://raw.githubusercontent.com/anrExperts/data/master/db/xpr.xml') :) let $years := fn:distinct-values(db:open('xpr')/xpr/expertises/expertise/description/sessions/date[@when castable as xs:date]/fn:year-from-date(@when)) for $year in $years return $year || ' : ' (:this first part of the query is quite fast 0.09sec on my old computer:) || fn:count(db:open('xpr')/xpr/expertises/expertise[description/sessions/date[fn:matches(@when, xs:string($year))]]) (:it takes 5sec to execute the second part of the query:) Do you see anything we are doing wrong or we can improve to reduce the server response time ? We thank you in advance! Best, Josselin

2 2

(no subject)
by Charles Bearden 29 Mar '22

29 Mar '22

I have loaded all the Pubmed baseline XML records into a series of 20 BaseX databases, 55 or 56 of the baseline files per database, each of which is about 12GB in size and has between 520 and 530 million nodes in 55 or 56 documents. Text, Token, and Attribute indices are enabled, but with 6GB RAM allocated to the Java VM it would not create a full text index. Each of the 55 or 56 documents has 30000 article records in it under a root PubmedArticleSet element. I typically use basexgui for interactive work and basex for scripted loads & queries, and I allocate 6G to the Java VM in each case: BASEX_JVM="-Xmx6g $BASEX_JVM" I'm exploring ways to search the data in a moderately performant way, starting with the realtively simple lookup by PubMed ID: /PubmedArticleSet/PubmedArticle[MedlineCitation/PMID[text()=$pmid]] I generated a seriers of five index XML files that pair PubMed IDs with database names, like so: <index> <entry> <dbname>pmed_baseline_a</dbname> <pmid>579614</pmid> </entry> … </index> Each file contains entries for four of the 20 BaseX databases. I loaded the five files into a single database. My hope was that I could quickly lookup the name of the database that contained a record by that record's PMID, and that I could then open that collection and quickly obtain that record, but it isn't working the way I had hoped. If I query the index database by PMID, I get the answer in 156ms: let $pmid := '22345065' let $icoll := collection('idx_pmed_baseline') let $pmid_lookup := $icoll/index/entry/pmid let $entry := $pmid_lookup[text()=$pmid] let $dbname := $entry/parent::entry/dbname/text() return $dbname (: returns 'pmed_baseline_s' :) If I open and query that collection by XPath for that PMID, also get the answer quickly, in about 420ms: let $coll := collection('pmed_baseline_s') let $pmid := '22345065' let $wanted := $coll/PubmedArticleSet/PubmedArticle[MedlineCitation/PMID/text()=$pmid] return $wanted (: returns desired XML record :) But if I combine the code to run in a single execution, it takes about 40s: let $pmid := '22345065' let $icoll := collection('idx_pmed_baseline') let $pmid_lookup := $icoll/index/entry/pmid let $entry := $pmid_lookup[text()=$pmid] let $dbname := $entry/parent::entry/dbname/text() let $coll := collection($dbname) let $wanted := $coll/PubmedArticleSet/PubmedArticle[MedlineCitation/PMID/text()=$pmid] return $wanted I feel like I must be doing some simple thing wrong, but the only difference I see in my code between the two separate steps and the single execution version is that I'm passing the db name in a variable instead of as a string literal to the collection() function, and I'm running the whole thing in a single execution. Note that before each execution of an XQuery, I exited basexgui and restarted it to avoid any caching effect in memory at least. The VM I'm running on is modest (spinning drives in RAID 1, four modest AMD CPU cores, dynamic memory growth up to 32GB). But these factors would not explain the difference in speed between the two steps in separate executions and both steps in a single execution. Can anyone point out what I'm doing wrong? And is there a better way to go about this? Many thanks & all the best, Chuck

2 5

attribute-range()
by Johannes Bauer 29 Mar '22

29 Mar '22

Hello, when I use the db:attribute-range function with the attribute name filter I always get all attributes: db:attribute-range("db", "1", "1000", "id") I would expect that it only returns the "id" attributes. It seems the attribute name does not matter at all. The results are always the same even for a bogus attribute name. Regards Johannes

2 1

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

BaseX-Talk March 2022