BaseX-Talk

basex-talk@mailman.uni-konstanz.de

4 participants
5268 discussions

StratML Query Service?
by Owen Ambur 07 Feb '22

07 Feb '22

I'm looking for a partner(s) to develop a StratML query service(s) for potential hosting at https://aboutthem.info/ The basic requirements are outlined at https://stratml.us/carmel/iso/SMLTASwStyle.xml#_15446932-208f-11e6-a80e-733… There are more than 5K files in the StratML collection <https://stratml.us/drybridge/index.htm> that can be used for demonstration/MVP <https://en.wikipedia.org/wiki/Minimum_viable_product> purposes and all of their URLs are listed in sitemap format at https://stratml.us/sitemap.xml Jorge Sanchez put a lot of effort into a prototype based on BaseX, which is briefly documented at https://stratml.us/#Vionta However, it appears he may have found it too difficult to accomplish and given up on the project. For additional background on the bigger picture, see https://connectedcommunity.net/ Please let me know if you or anyone you may know might be interested in partnering with me to develop and maintain such a service. My RMDs <https://www.irs.gov/retirement-plans/plan-participant-employee/retirement-t…> are kicking in this year and I can't think of a better way to spend some of them. Owen https://www.linkedin.com/in/owenambur/

1 0

Download As Binary for Docs in Databases?
by Eliot Kimber 07 Feb '22

07 Feb '22

I’d like to be able to create Zip files from large docs stored in a database (i.e., Oxygen validation reports, which can be 10s of MBs as raw XML). Looking at the Archive module it’s not immediately clear how to go about it (or how best to go about it)—it would make the most sense to construct the Zip as a stream and return the stream, as opposed to constructing the Zip in memory (or on disk) and then streaming it from there. Or am I overthinking it or missing some obvious solution? Thanks, E. _____________________________________________ Eliot Kimber Sr Staff Content Engineer O: 512 554 9368 M: 512 554 9368 servicenow.com<https://www.servicenow.com> LinkedIn<https://www.linkedin.com/company/servicenow> | Twitter<https://twitter.com/servicenow> | YouTube<https://www.youtube.com/user/servicenowinc> | Facebook<https://www.facebook.com/servicenow>

2 3

Doc Loaded by db:create() has invalid URL according to base-uri()
by Eliot Kimber 04 Feb '22

04 Feb '22

I think this is a bug: I’m using db:create() to initialize a database with files from the file system. The OS is linux (but I see the same issue with macOS). The docs all load without error, but then when processing the loaded content with code that uses base-uri() to get the base URI of a document, I get this failure: URI '/rome/product/human-resources/task/predict-assignment-group[1].dita' is invalid Which is correct—the square brackets must be escaped. However, I didn’t create the URI, BaseX did when it loaded the data, which suggests that there’s a URI escaping action that isn’t being performed when the data is loaded: I would expect the URI to be “predict-assignment-group%5B1%5D.dita” as returned by base-uri() (and as stored in the <resource> entry for the document). If I do: collection('/rome/product/human-resources/task/predict-assignment-group[1].dita')/* I also get invalid collection URI: '/rome/product/human-resources/task/predict-assignment-group[1].dita' Using db:dir to examine the <resource> entry for the file I see: <resource raw="false" content-type="application/xml" modified-date="2022-02-03T22:44:29.107Z">predict-assignment-group[1].dita</resource> So the brackets are not escaped there. I can use the db:* functions to correct the paths (or just delete the nodes) so not a hard-stop problem. Cheers, E. _____________________________________________ Eliot Kimber Sr Staff Content Engineer O: 512 554 9368 M: 512 554 9368 servicenow.com<https://www.servicenow.com> LinkedIn<https://www.linkedin.com/company/servicenow> | Twitter<https://twitter.com/servicenow> | YouTube<https://www.youtube.com/user/servicenowinc> | Facebook<https://www.facebook.com/servicenow>

4 5

Re: [basex-talk] BaseX-Talk Digest, Vol 130, Issue 4
by Rob Stapper 04 Feb '22

04 Feb '22

Hi Christian, Referring to an old issue of 10-10-2020. Is there any change the solution to the issue as mentioned below is part BaseX 10? I would realy like to address every module-functions through a public variable even in a cyclic module-construction. Another point that I would very much be concidered is that the "at"-clause of the import-command could also handle database locations. I would like my functionality to be stored as close as possible to its relevant dataStructures. Is that something tat cn be considered? Love to hear from you, Rob Stapper > Op 09-10-2020 12:00 schreef basex-talk-request(a)mailman.uni-konstanz.de: > > > Send BaseX-Talk mailing list submissions to > basex-talk(a)mailman.uni-konstanz.de > > To subscribe or unsubscribe via the World Wide Web, visit > https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk > or, via email, send a message with subject or body 'help' to > basex-talk-request(a)mailman.uni-konstanz.de > > You can reach the person managing the list at > basex-talk-owner(a)mailman.uni-konstanz.de > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of BaseX-Talk digest..." > > > Today's Topics: > > 1. recursively used variables (Rob Stapper) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 8 Oct 2020 14:17:48 +0200 > From: Rob Stapper <r.stapper(a)lijbrandt.nl> > To: "basex-talk(a)mailman.uni-konstanz.de" > <basex-talk(a)mailman.uni-konstanz.de> > Subject: [basex-talk] recursively used variables > Message-ID: <48217f7a-0960-11eb-811f-00505699b758(a)smtp.kpnmail.nl> > Content-Type: text/plain; charset="utf-8" > > Hi, > > The code[1] below and send as attachment generates a error message: ?Static variable depends on itself: $Q{http://www.w3.org/2005/xquery-local-functions}test?. > I use these variables to refer to my private functions in my modules so I can easyly refer to them in a inheritance situation. > It?s not a big problem for me but I was wondering if the error-triggering is justified or that it should work. > > [1]=========================================== > declare variable $local:test := local:test#1 ; > declare %private function local:test( $i) { if ( $i > 0) then $local:test( $i - 1) } ; > > $local:test( 10) > =========================================== > > Kind regards, > > Rob Stapper > > > Sent from Mail for Windows 10 > > > > -- > This email has been checked for viruses by Avast antivirus software. > https://www.avast.com/antivirus >

2 1

Optimizing Lookup from Custom Indexes
by Eliot Kimber 03 Feb '22

03 Feb '22

In my content set (DITA maps and topics) I construct an index that maps each map or topic to the names of the root maps that ultimately use that topic. My index structure is: <doc-to-bundle-index> <doc-to-bundle-index-entry key="product/customer-communities/reference/gamification-components-badges.dita"> <filename>gamification-components-badges.dita</filename> <bundles> <bundle>No-bundle-found</bundle> </bundles> </doc-to-bundle-index-entry> </doc-to-bundle-index> I then want to get, for all the topics, the bundle names for each topic, grouped by bundle name (i.e., construct a map of bundle names to topics in that bundle). (This is in the service of a report that relates Oxygen map validation reports to the documents associated with the incidents in the report, grouped by bundle.) I have 10K topics in my test set. Getting the set of topic elements and the index keys for each topic is fast: about 0.1 seconds total. However, using the keys to do a lookup of the bundles for each topic takes about 2 minutes, i.e.: let $bundlesForDocs as xs:string* := for $key in $keysForDocs return $dtbIndex/doc-to-bundle-index-entry[@key eq $key]/bundles/bundle ! string(.) return $bundlesForDocs (I would really be building a map of bundles-to-docs but I used this loop just to gather timing info and take map construction out of the equation, not that I would expect map construction itself to be slow.) An obvious solution would be to capture the bundle-to-document mapping at the time I construct the index, which I will do. But my larger question is: Am I doing anything wrong or inefficient in this initial approach that is making this lookup of index entries by key slower than it should be? Or is this just an inherently slow operation that I should just not try to do if at all possible? That is, is there a way to either construct the content of the index or configure BaseX that will make this kind of bulk lookup faster? Or am I thinking about this particular use case all wrong? Thanks, Eliot _____________________________________________ Eliot Kimber Sr Staff Content Engineer O: 512 554 9368 M: 512 554 9368 servicenow.com<https://www.servicenow.com> LinkedIn<https://www.linkedin.com/company/servicenow> | Twitter<https://twitter.com/servicenow> | YouTube<https://www.youtube.com/user/servicenowinc> | Facebook<https://www.facebook.com/servicenow>

2 2

Issue with HTTP client and authentication
by Tim Thompson 03 Feb '22

03 Feb '22

Hello, I'm trying to post a SPARQL query to an endpoint using Digest authentication with the HTTP client. The query works fine using curl: curl --digest --user user:pass -X POST -d@'test.rq' \ -H "Content-type: application/sparql-query" \ 'http://example.org/sparql' But the equivalent request in BaseX fails with 401 Unauthorized: let $endpoint := "http://example.org/sparql" let $user := "user" let $pass := "pass" let $type := "application/sparql-query" let $response := ( http:send-request( <http:request method="POST" href="{$endpoint}" username="{$user}" password="{$pass}" auth-method="Digest" send-authorization="true"> <http:header name="Content-Type" value="{$type}; charset=utf-8"/> <http:body media-type="{$type}">{ ``[ select * where {?s ?p ?o} limit 1 ]`` }</http:body> </http:request> ) ) return $response Any ideas about what might be causing the BaseX HTTP client to be denied here? Thanks in advance, Tim -- Tim A. Thompson Metadata Librarian Yale University Library

5 20

Techniques for Unit Testing Updating Operations
by Eliot Kimber 02 Feb '22

02 Feb '22

I’m setting up unit tests for my code that creates various custom indexes. I have content on the file system that serves as my known input. With that data I then need to create the content database and run the processes that create the various indexes over the content database. Thus I need to create the databases, populate them, then verify the populated result. As you can’t create a database and query it in the same XQuery, I don’t see a way to use %unit:before-module to initialize my databases before running unit tests in the same module. The solution seems to be to use a BaseX script to do the database initialization, which seems easy enough: # Run unit tests with fresh database # Make sure the databases exist so we can then drop them check pce-test-01 check _lrc_pce-test-01_link_records # Drop them drop db pce-test-01 drop db _lrc_pce-test-01_link_records # Now create them fresh check pce-test-01 check _lrc_pce-test-01_link_records # # Now run the tests that use these databases in the # required order # test ./test-db-from-git.xqy test ./test-link-record-keeping.xqy However, in running this from the BaseX GUI, it appears that the test commands are trying to find files relative to the location of the basexgui command rather than relative to the script being run: Resource "/Users/eliot.kimber/apps/basex/bin/test-db-from-git.xqy" not found. I don’t see anything in the commands documentation that suggests a way to parameterize the values passed to commands. Am I missing a way to have this kind of setup script be portable from within the GUI or is there a better/different way to initialize the databases for unit tests? Thanks, E. _____________________________________________ Eliot Kimber Sr Staff Content Engineer O: 512 554 9368 M: 512 554 9368 servicenow.com<https://www.servicenow.com> LinkedIn<https://www.linkedin.com/company/servicenow> | Twitter<https://twitter.com/servicenow> | YouTube<https://www.youtube.com/user/servicenowinc> | Facebook<https://www.facebook.com/servicenow>

3 12

Bug (?) with dynamic namespace constructor
by Hans-Juergen Rennau 31 Jan '22

31 Jan '22

Dear BaseX people, is this a bug: basex "<foo>{namespace {''}{'bar'}}</foo>"=>[XQDY0102] Duplicate namespace declaration: ''.? This works as expected:basex "<foo>{namespace {'xxx'}{'bar'}}</foo>"=><foo xmlns:xxx="bar"/> With kind regards,Hans-Jürgen

2 4

Techniques for Parallelizing Updating Operations?
by Eliot Kimber 30 Jan '22

30 Jan '22

I’ve worked out how to optimize my process that indexes DITA topics based on what top-level maps they are ultimately used from (turned out I needed to first index the maps in ref count order from least to most, which meant I could then just look up the top-level maps used by any direct-reference maps that reference a given topic—with that in place each topic only requires a single index lookup). However, on my laptop these lookups still take about 0.1 second/topic so for 1000s of topics it’s a long time (relatively speaking). But the topic index process is 100% parallelizable, so I would be able to have at least 2 or 3 ingestion threads going on my 4-CPU server machine. Note that my ingestion process is two-phased: Phase 1: Construct an XQuery map with the index details for the input topics (the topics already exist in the database, only the index is new). Phrase 2: Persist the map to the database as XML elements. I do the map construction in order to both take advantage of map:merge() and because it’s the only way I can do indexing of the DITA maps and topics in one transaction: build the doc-to-root-map for the DITA maps and then use that data to build the doc-to-root-map entries for all the topics, then persist the lot to the database for future use. This is in the context of a one-time mass load of content from a new git work tree. Subsequent changes to the content database will be on individual files and the index can be easily updated incrementally. So I’m just trying to optimize the startup time so that it doesn’t take two hours to load and index our typical content set. I can also try to optimize the low-level operations, although they’re pretty simple so I don’t see much opportunity for significant improvement, but I also haven’t had time to try different options and measure them. I must also say how useful the built-in unit testing framework is—that’s really made this work easier. Cheers, Eliot _____________________________________________ Eliot Kimber Sr Staff Content Engineer O: 512 554 9368 M: 512 554 9368 servicenow.com<https://www.servicenow.com> LinkedIn<https://www.linkedin.com/company/servicenow> | Twitter<https://twitter.com/servicenow> | YouTube<https://www.youtube.com/user/servicenowinc> | Facebook<https://www.facebook.com/servicenow>

1 0

Managing Interactions with Long-Running Processes
by Eliot Kimber 26 Jan '22

26 Jan '22

I’m making good progress on our BaseX-based validation dashboard application. The basic process here is we use Oxygen’s scripting support to do DITA map validation and then ingest the result into a database (along with the content documents that were validated) and provide reports on the validation result. The practical challenge here seems to be running the Oxygen process successfully from BaseX—because our content is so huge it can take 10s of minutes for the Oxygen process to run. I set the command time-out to be much longer than the process should run but running it from the HTTP app’s query panel it eventually failed with an error that wasn’t a time out (my code had earlier reported legit errors so I know errors will be properly reported). As soon as the Oxygen process ends I want to ingest the resulting XML file, which is why I started with doing it from within BaseX. But I’m wondering if this is a bad idea and I should really be doing it with i.e., a shell script run via cron or some such? I was trying to keep everything in BaseX as much as possible just to keep it simple. Any general tips or cautions for this time of integration of BaseX with the outside world? Thanks, E. _____________________________________________ Eliot Kimber Sr Staff Content Engineer O: 512 554 9368 M: 512 554 9368 servicenow.com<https://www.servicenow.com> LinkedIn<https://www.linkedin.com/company/servicenow> | Twitter<https://twitter.com/servicenow> | YouTube<https://www.youtube.com/user/servicenowinc> | Facebook<https://www.facebook.com/servicenow>

2 2

Jump to page:

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

BaseX-Talk