Feature Preview: Caching Module

List overview All Threads
Download

newer

older

unexpected behavior from...

BaseX performance with Apple M1 Max

Christian Grün

5 May 2022 5 May '22

6:10 a.m.

Hi there,

BaseX 10 will come with a new Caching Module [1], which will allow you to store XQuery values (atomic items, nodes, sequences, maps, arrays, anything except function items) in a main-memory cache. Caches are persistent: Its contents will be written to disk at shutdown time and retrieved from disk in a new or restarted BaseX instance when accessed for the first time.

The cache size is only limited by the available RAM. We started off with an XML format for representing caches in files, but we switched to a binary format to speed up the processing of large caches. It’s also possible to put large XML documents in the cache, but the classic database representation will give you better results in most cases.

A first snapshot is available [2].

In addition, we’ll soon introduce new database functions, which will enable you to store XQuery values (…including maps) in a database.

Have fun, Christian

[1] https://docs.basex.org/wiki/Caching_Module [2] https://files.basex.org/releases/latest-10/

Attachments:

attachment.html (text/html — 2.5 KB)

Show replies by date

Omar Siam

5 May 5 May

7:13 a.m.

A very interesting feature for me.

I have to admit after I posted my last explanation I found out I already heavily cached data for some search requests in my TEI/XML snippet API. But I hit a low of almost constant 2s for retrieving data using that cache.

I probably will do a writeup of that part and try the new caching module. Maybe that is faster.

Is there a cache for XQuery code? I work with small snippets and can most of the time choose if something is a literal or passed as an input variable.

Best regards

-- Mag. Ing. Omar Siam Austrian Center for Digital Humanities and Cultural Heritage Österreichische Akademie der Wissenschaften | Austrian Academy of Sciences Stellvertretende Behindertenvertrauensperson | Deputy representative for disabled persons Wohllebengasse 12-14, 1040 Wien, Österreich | Vienna, Austria T: +43 1 51581-7295 omar.siam@oeaw.ac.at | www.oeaw.ac.at/acdh

Christian Grün

7:40 a.m.

...

Is there a cache for XQuery code? I work with small snippets and can most of the time choose if something is a literal or passed as an input variable.

If you want to resort to the string representation of a query, you could store it in the cache as well, and evaluate it later on, e.g. as follows:

'name-of-query' => cache:get() => xquery:eval()

Query strings could also be organized in an XQuery map:

(: run cached query :) let $map := cache:get('queries') let $query := $map?name-of-query return xquery:eval($query)

(: register query :) '123 + 456' => cache:get('queries') => map:put('name-of-query', $query) => cache:put()

Liam R. E. Quin

3:03 p.m.

On Thu, 2022-05-05 at 12:10 +0200, Christian Grün wrote:

...

contents will be written to disk at shutdown time

What happens on a crash (e.g. power failure)?

E/g/ for the listtle teszt/experiment site i have at www.fromoldbooks.org (and www.fromoldbooks.org/Search/) there's a framework i wrote that calls out to BaseX and keeps a cached result in a separate file, one per query. The cache is deleted entirely on any update, which is not optimal but was easy :) and a background process repopulates it based on popular queries. The cache can get up to a few gigabytes sometimes.

These days for most of the queries, BaseX is faster than the framework, so the site would speed up without it :) but there are a few that can be slow, and the cache reduces server load when someone's bot goes crazy.

Philosophy question: Is a cache so different from a view in SQL, a constructed dynamic table?

-- Liam Quin, https://www.delightfulcomputing.com/ Available for XML/Document/Information Architecture/XSLT/ XSL/XQuery/Web/Text Processing/A11Y training, work & consulting. Barefoot Web-slave, antique illustrations: http://www.fromoldbooks.org

Christian Grün

6 May 6 May

10:25 a.m.

Hi Liam,

...

What happens on a crash (e.g. power failure)?

If BaseX is shut down gracefully, the data will be stored; otherwise, it might get lost indeed. If the cached data is important, it’s advisable to call cache:write after each update.

In the documentation, I mentioned that the cache will automatically be written to disk at shutdown time. Based on some more feedback I got, I imagine there can be cases in which you simply want to create a temporary cache without making it persistent. I think I’ll change this, and I will only serialize the cache if a cache file already exists on disk (as a result of a previous explicit cache:write call).

...

E/g/ for the listtle teszt/experiment site i have at www.fromoldbooks.org (and www.fromoldbooks.org/Search/) there's a framework i wrote that calls out to BaseX and keeps a cached result in a separate file, one per query. […]

By reading your reply and the one from Omar, I wonder if the »Cache« is really the best term to describe what the module offers. It’s basically a Main-Memory Key-Value Store that can be made persistent, similar to e.g. Redis. Suggestions for a better name are welcome.

All the best, Christian

Graydon

11:09 a.m.

On Fri, May 06, 2022 at 04:25:44PM +0200, Christian Grün scripsit:

...

By reading your reply and the one from Omar, I wonder if the »Cache« is really the best term to describe what the module offers. It’s basically a Main-Memory Key-Value Store that can be made persistent, similar to e.g. Redis. Suggestions for a better name are welcome.

The temptation to call it the dynamic static context is great, but perhaps "session-persistent context"? SPC for short?

-- Graydon Saunders | graydonish@gmail.com Þæs oferéode, ðisses swá mæg. -- Deor ("That passed, so may this.")

Andy Bunce

7 May 7 May

11:30 a.m.

"There are only two hard things in Computer Science: cache invalidation and naming things." [1]

This topic combines the two ;) As for alternative names I offer: the exotic: entrepot, the industrial: kvstore, the bohemian: stash. Personally I think cache is fine.

I have sometimes thought of using BaseX and Redis together managed by docker-compose. I like the data structures [3] and the concept of 'variables' that self destruct after some time [4]

/Andy

[1] https://www.martinfowler.com/bliki/TwoHardThings.html [2] https://en.wikipedia.org/wiki/Entrep%C3%B4t [3] https://redis.io/docs/about/ [4] https://redis.io/commands/expire/

On Fri, 6 May 2022 at 15:26, Christian Grün christian.gruen@gmail.com wrote:

...

Hi Liam,

...
What happens on a crash (e.g. power failure)?

If BaseX is shut down gracefully, the data will be stored; otherwise, it might get lost indeed. If the cached data is important, it’s advisable to call cache:write after each update.

In the documentation, I mentioned that the cache will automatically be written to disk at shutdown time. Based on some more feedback I got, I imagine there can be cases in which you simply want to create a temporary cache without making it persistent. I think I’ll change this, and I will only serialize the cache if a cache file already exists on disk (as a result of a previous explicit cache:write call).

...
E/g/ for the listtle teszt/experiment site i have at www.fromoldbooks.org (and www.fromoldbooks.org/Search/) there's a framework i wrote that calls out to BaseX and keeps a cached result in a separate file, one per query. […]

By reading your reply and the one from Omar, I wonder if the »Cache« is really the best term to describe what the module offers. It’s basically a Main-Memory Key-Value Store that can be made persistent, similar to e.g. Redis. Suggestions for a better name are welcome.

All the best, Christian

Christian Grün

10 May 10 May

7:26 a.m.

;·) About time for BohemiaX.

Similar to Redis, we could either work with expiry dates or limit the cache to a maximum number of entries (and drop the ones with the oldest access time).

On Sat, May 7, 2022 at 5:30 PM Andy Bunce bunce.andy@gmail.com wrote:

...

"There are only two hard things in Computer Science: cache invalidation and naming things." [1]

This topic combines the two ;) As for alternative names I offer: the exotic: entrepot, the industrial: kvstore, the bohemian: stash. Personally I think cache is fine.

I have sometimes thought of using BaseX and Redis together managed by docker-compose. I like the data structures [3] and the concept of 'variables' that self destruct after some time [4]

/Andy

[1] https://www.martinfowler.com/bliki/TwoHardThings.html [2] https://en.wikipedia.org/wiki/Entrep%C3%B4t [3] https://redis.io/docs/about/ [4] https://redis.io/commands/expire/

On Fri, 6 May 2022 at 15:26, Christian Grün christian.gruen@gmail.com wrote:

...
Hi Liam,

...
What happens on a crash (e.g. power failure)?

If BaseX is shut down gracefully, the data will be stored; otherwise, it might get lost indeed. If the cached data is important, it’s advisable to call cache:write after each update.

In the documentation, I mentioned that the cache will automatically be written to disk at shutdown time. Based on some more feedback I got, I imagine there can be cases in which you simply want to create a temporary cache without making it persistent. I think I’ll change this, and I will only serialize the cache if a cache file already exists on disk (as a result of a previous explicit cache:write call).

...
E/g/ for the listtle teszt/experiment site i have at www.fromoldbooks.org (and www.fromoldbooks.org/Search/) there's a framework i wrote that calls out to BaseX and keeps a cached result in a separate file, one per query. […]

By reading your reply and the one from Omar, I wonder if the »Cache« is really the best term to describe what the module offers. It’s basically a Main-Memory Key-Value Store that can be made persistent, similar to e.g. Redis. Suggestions for a better name are welcome.

All the best, Christian

Andy Bunce

1:03 p.m.

Maybe keep it simple and focused on the minimum required for XQuery use :)

...

From a quick test Jedis seem to work fine from custom/lib [1]

/Andy [1] https://gist.github.com/apb2006/9563707df4d8f7dd536d9cd3ea70046f

On Tue, 10 May 2022 at 12:26, Christian Grün christian.gruen@gmail.com wrote:

...

;·) About time for BohemiaX.

Similar to Redis, we could either work with expiry dates or limit the cache to a maximum number of entries (and drop the ones with the oldest access time).

On Sat, May 7, 2022 at 5:30 PM Andy Bunce bunce.andy@gmail.com wrote:

...
"There are only two hard things in Computer Science: cache invalidation

and naming things." [1]

...
This topic combines the two ;) As for alternative names I offer: the exotic: entrepot, the industrial:

kvstore, the bohemian: stash. Personally I think cache is fine.

...
I have sometimes thought of using BaseX and Redis together managed by

docker-compose.

...
I like the data structures [3] and the concept of 'variables' that self

destruct after some time [4]

...
/Andy

[1] https://www.martinfowler.com/bliki/TwoHardThings.html [2] https://en.wikipedia.org/wiki/Entrep%C3%B4t [3] https://redis.io/docs/about/ [4] https://redis.io/commands/expire/

On Fri, 6 May 2022 at 15:26, Christian Grün christian.gruen@gmail.com

wrote:

...
...
Hi Liam,

...
What happens on a crash (e.g. power failure)?

If BaseX is shut down gracefully, the data will be stored; otherwise, it might get lost indeed. If the cached data is important, it’s advisable to call cache:write after each update.

In the documentation, I mentioned that the cache will automatically be written to disk at shutdown time. Based on some more feedback I got, I imagine there can be cases in which you simply want to create a temporary cache without making it persistent. I think I’ll change this, and I will only serialize the cache if a cache file already exists on disk (as a result of a previous explicit cache:write call).

...
E/g/ for the listtle teszt/experiment site i have at www.fromoldbooks.org (and www.fromoldbooks.org/Search/) there's a framework i wrote that calls out to BaseX and keeps a cached result in a separate file, one per query. […]

By reading your reply and the one from Omar, I wonder if the »Cache« is really the best term to describe what the module offers. It’s basically a Main-Memory Key-Value Store that can be made persistent, similar to e.g. Redis. Suggestions for a better name are welcome.

All the best, Christian

Christian Grün

17 Jun 17 Jun

12:01 p.m.

The naming of the upcoming module caused confusion more than once in our beta tests; so we decided to go for a light-version of the industrial proposal and name it »Store Module«! BaseX 9.7.3 will be released this month. It will contain a preview version.

Everyone’s feedback is welcome. Christian

On Sat, May 7, 2022 at 5:30 PM Andy Bunce bunce.andy@gmail.com wrote:

...

"There are only two hard things in Computer Science: cache invalidation and naming things." [1]

This topic combines the two ;) As for alternative names I offer: the exotic: entrepot, the industrial: kvstore, the bohemian: stash. Personally I think cache is fine.

I have sometimes thought of using BaseX and Redis together managed by docker-compose. I like the data structures [3] and the concept of 'variables' that self destruct after some time [4]

/Andy

[1] https://www.martinfowler.com/bliki/TwoHardThings.html [2] https://en.wikipedia.org/wiki/Entrep%C3%B4t [3] https://redis.io/docs/about/ [4] https://redis.io/commands/expire/

On Fri, 6 May 2022 at 15:26, Christian Grün christian.gruen@gmail.com wrote:

...
Hi Liam,

...
What happens on a crash (e.g. power failure)?

If BaseX is shut down gracefully, the data will be stored; otherwise, it might get lost indeed. If the cached data is important, it’s advisable to call cache:write after each update.

In the documentation, I mentioned that the cache will automatically be written to disk at shutdown time. Based on some more feedback I got, I imagine there can be cases in which you simply want to create a temporary cache without making it persistent. I think I’ll change this, and I will only serialize the cache if a cache file already exists on disk (as a result of a previous explicit cache:write call).

...
E/g/ for the listtle teszt/experiment site i have at www.fromoldbooks.org (and www.fromoldbooks.org/Search/) there's a framework i wrote that calls out to BaseX and keeps a cached result in a separate file, one per query. […]

By reading your reply and the one from Omar, I wonder if the »Cache« is really the best term to describe what the module offers. It’s basically a Main-Memory Key-Value Store that can be made persistent, similar to e.g. Redis. Suggestions for a better name are welcome.

All the best, Christian

1126

Age (days ago)

1169

Last active (days ago)

basex-talk@mailman.uni-konstanz.de

9 comments

5 participants

tags (0)

participants (5)

Andy Bunce
Christian Grün
Graydon
Liam R. E. Quin
Omar Siam