Re: [basex-talk] How best to cache an intermediate result in the context of a larger query?

10 Aug 2023


      Thanks for the tips.
This is an internal server so bots shouldn’t be a concern, but certainly the details of cache update will be important—that’s a detail to which I have not yet attended.
Today we implemented using in-browser JavaScript to asynchronously fetch the previews, which we got to work but we need to tune it an understand what the server load implications are.
We generate a report of all the DITA tables in a given set of content (i.e., a given publication or set of publications). This can be several thousand tables, so even at 20ms per table, it’s still a long wait.
We’ll see how this approach works.
Cheers,
E.
_____________________________________________
Eliot Kimber
Sr Staff Content Engineer
O: 512 554 9368
M: 512 554 9368
servicenow.comhttps://www.servicenow.com
LinkedInhttps://www.linkedin.com/company/servicenow | Twitterhttps://twitter.com/servicenow | YouTubehttps://www.youtube.com/user/servicenowinc | Facebookhttps://www.facebook.com/servicenow
From: Liam R. E. Quin liam@fromoldbooks.org
Date: Thursday, August 10, 2023 at 1:03 PM
To: Eliot Kimber eliot.kimber@servicenow.com, Christian Grün christian.gruen@gmail.com
Cc: basex-talk@mailman.uni-konstanz.de basex-talk@mailman.uni-konstanz.de
Subject: Re: [basex-talk] How best to cache an intermediate result in the context of a larger query?
[External Email]
On Thu, 2023-08-10 at 16:00 +0000, Eliot Kimber wrote:
...
This REST endpoint is called from server-side code that also checks
for a cached preview and just returns it (avoiding the overhead of
the REST call), otherwise it calls the endpoint.
I do something similar for fromoldbooks.org (using memcached for the
front page, as the site sometimes gets.. a little busy :) )
A couple of things to watch for...
* write the new cache file to a temp file and then rename it; that way,
another process can't start reading an incomplete cache file
* i check the load average (by opening /proc/loadavg on a Linux server,
it's a text file maintained by the kernel) and if it's too high, sleep
for a while to slow down crawlers, then return failure.
* updating the cache i handle in the front end code, and i return the
result before updating the cache, to shave off a few ms from “time to
first paint”. This affects your position in Google search results, if
that matters to you.
* if your pages are public, crawler bots will pre-populate the cache.
Possibly with nonsensical parameters, so it can make sense to reject
those early on. E.g. an incoming search at fromoldbooks.org with 30
keywords isn't from a human as the UI doesn't support more than 3. So I
don't need to store 2^30 cached pages when the bot tries every
combination
* you can use the Google search console (i think that's the right
place) to tell the google bot about parameters that don't affect the
result, so it shouldn't try with every possible value.
liam
--
Liam Quin, https://www.delightfulcomputing.com/https://www.delightfulcomputing.com
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.orghttp://www.fromoldbooks.org

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] How best to cache an intermediate result in the context of a larger query?