Hi,
Reaching out to get suggestions on improving performance.
Using basex to store and analyze around 350,000 to 500,000 XMLs.
Size of each XML varies between a few KBs to 5MB. Each day around 10k XMLs
get added/patched.
I have the following queries
1) What is the optimal size or number of documents in a DB? Earlier I had 1
DB with different collections but inserts were too slow, took more than 30s
just to replace a document. So split it up by some category to have around
30 DBs. Inserts are fine but again if there are too many documents in a
category, patching that DB slows and querying across all DBs also gets
slowed down. Any optimal number for DBs? Can I create many DBs like 1 for
every 10K XMLs? I read through
https://www.mail-archive.com/basex-talk@mailman.uni-konstanz.de/msg06310.ht…,
having 100s of DBs cause query performance degradation? Is there any better
solution?
2) Query performance has degraded with more documents in a DB. I also
noticed that with/without token/attribute index, there is not much
difference to query performance (they are just XML attribute queries).
"Optimize" flag after inserts to recreate the index takes too much time and
memory. I am not running it now since I didn't find significant improvement
with/without index with my tests. Any suggestions for improving this?
3) Is it possible to just run queries against specific XMLs? I will have a
pre-filter based on user selection and queries need to be run against only
those XMLs. There are a number of filters users can apply and every time it
can result in a different set of XMLs against which analysis has to be
performed (Hence not feasible to create so many collections). Right now, I
am querying against all XMLs even though I am interested only in a subset
of XMLs and doing post filtering. I did go through
https://mailman.uni-konstanz.de/pipermail/basex-talk/2010-July/000495.html,
but again having a regex to include all the interested file paths(sometimes
entire set of documents) will slow it down.
Thank you,
Deepak
Hi,
My databases are corrupted in a strange way. Everything worked yesterday
and I have not upgraded my system (automatic updates are NOT set on my OS).
In the WebDAV connector, all DB names except 6 appear as a date,
examples: 2023-06-14T07:37:56.294Z, 2023-12-12T09:56:02.722Z.
In the console, I get this error:
[qtp289639718-19] INFO com.bradmcevoy.http.HttpManager - PROPFIND ::
http://localhost:8972/webdav/ - http://localhost:8972/webdav/
bx_1 | Unparseable date: "app-pub-templates"
bx_1 | Unparseable date: "app-pubs"
bx_1 | Unparseable date: "app-tests"
bx_1 | Unparseable date: "ar-eg"
bx_1 | Unparseable date: "as-in"
bx_1 | Unparseable date: "az-az"
bx_1 | Unparseable date: "be-by"
bx_1 | Unparseable date: "bg-bg"
bx_1 | Unparseable date: "bn-bd"
...
It seems that the names and dates of the DBs have been interchanged.
I tried restoring the DBS from my backups (newer and older back ups). I
also tried restarting the server. Same difference. I am using Basex 10.7
(beta) and have been for a few months. I could update BaseX to the
official release, but I would prefer to upgrade with healthy DBs to avoid
adding a layer of complexity to the issue. I have not had a similar problem
in a decade of using BaseX, so I am a bit clueless about how else to try.
Thanks in advance for your help,
--
France Baril
Architecte documentaire / Documentation architect
france.baril(a)architextus.com
Hello,
Is it part of the spec that numbers in the “basic” JSON representation (of 7+ digits) be serialized using scientific notation? For example:
let $direct := <json type="object"><n type="number">1339029</n></json>
let $basic := <fn:map><fn:number key="n">1339029</fn:number></fn:map>
let $result := ($direct, $basic) ! serialize(., map {
"method": "json", "json": map {
"format": if (position() eq 1) {"direct"} else {"basic"}, "indent": "yes"
}
})
return $result
…produces two different results:
{
"n":1339029
}
{
"n":1.339029E6
}
I usually prefer working with the “basic” format, but the automatic conversion to scientific notation is inconvenient because the value is not easily castable as an xs:integer.
Thanks in advance,
Tim
--
Tim A. Thompson (he, him)
Librarian for Applied Metadata Research
Yale University Library
www.linkedin.com/in/timathompson<http://www.linkedin.com/in/timathompson>
timathom(a)protonmail.com<mailto:timothy.thompson@yale.edu>
I’m searching for short phrases where I may want to respect order or not and where the phrases may cross element boundaries.
For example, I have the phrase “Amazon Alexa Spoke” and I want to find any DITA topic whose title text includes “Amazon Alexa Spoke” in that order, or maybe I want those words in any order, depending on my search requirements.
When I run this query against my database I find occurrences where all three words are in the same parent element, i.e.:
<title>Create a connection record for the <ph>Amazon Alexa spoke</ph>
</title>
<title>Create a credential record for the <ph>Amazon Alexa spoke</ph>
</title>
<title>Set up the <ph>Amazon Alexa spoke</ph>
</title>
But I do not find it where one of the words is not in the same parent:
This title is *not* found (even though this is the one I actually want to have found):
<title><ph id="alexa">Amazon Alexa</ph> Spoke</title>
Reading the docs on ft:search(), it is clear that it is searching on text nodes:
“Returns all text nodes from the full-text index…”
So I think the behavior here is as documented.
Short of creating a separate database that removes the subelements within <title> elements, is there a way to use full text indexing to do the search I want? In particular, I want to be able to turn the ordered/unordered check on or off.
If I always wanted ordered I could just use a regular expression match—it wouldn’t be that efficient but efficiency is not a concern in this particular case (but I can see where it would be in a more general search support situation).
Or am I missing a more obvious solution to this requirement?
Note that in this case I don’t care about finding different word forms—for this particular search I only care about exact word matches.
Cheers,
E.
_____________________________________________
Eliot Kimber
Sr Staff Content Engineer
O: 512 554 9368
M: 512 554 9368
servicenow.com<https://www.servicenow.com>
LinkedIn<https://www.linkedin.com/company/servicenow> | Twitter<https://twitter.com/servicenow> | YouTube<https://www.youtube.com/user/servicenowinc> | Facebook<https://www.facebook.com/servicenow>
Hi,
Just discovered that the code samples on the basex wiki doesn't seem to be
working fully. Noticed it a couple of days ago and thought it was
temporary, but the problem is still there.
Regards,
Johan
I’m generating CSV data that includes URLs with multiple query parameters, so “&somekey” in them. These get serialized as “&somekey” where I want “&somekey”.
My CSV XML looks like this:
<record>
<AppID>sn_admin_center</AppID>
<DocsURL>=HYPERLINK(https://docs.servicenow.com/csh?topicname=admin-center-intro&version=vancouver)</DocsURL>
</record>
I’m then doing:
let $report := csv:serialize($csv, map{})
let $doWrite := file:write('/Users/eliot.kimber/temp/apps-to-topics.csv', $report)
To write the CSV file.
The resulting file looks like:
sn_admin_center,”=HYPERLINK(“"https://docs.servicenow.com/csh?topicname=admin-center-intro&version=va…")"
Note that the “&” is still escaped.
Reviewing the docs for the CVS module and the serialize options, I don’t see any option that looks like it would control how escaping is handled.
Is there a way to do what I want?
Thanks,
E.
_____________________________________________
Eliot Kimber
Sr Staff Content Engineer
O: 512 554 9368
M: 512 554 9368
servicenow.com<https://www.servicenow.com>
LinkedIn<https://www.linkedin.com/company/servicenow> | Twitter<https://twitter.com/servicenow> | YouTube<https://www.youtube.com/user/servicenowinc> | Facebook<https://www.facebook.com/servicenow>
Hi Christian,
Thnx. You made me a bit worrying there.
Just a thought: shouldn’t util:if not just be part of the regular BaseX syntax? Together with fn:for-each, the fn:fold functions and hof:fold-left1, they represent the group of procesflow-controll-functions.
Best,
Rob
Sent from Mail for Windows
From: basex-talk-request(a)mailman.uni-konstanz.de
Sent: Friday, November 24, 2023 12:00 PM
To: basex-talk(a)mailman.uni-konstanz.de
Subject: BaseX-Talk Digest, Vol 167, Issue 9
Send BaseX-Talk mailing list submissions to
basex-talk(a)mailman.uni-konstanz.de
To subscribe or unsubscribe via the World Wide Web, visit
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
or, via email, send a message with subject or body 'help' to
basex-talk-request(a)mailman.uni-konstanz.de
You can reach the person managing the list at
basex-talk-owner(a)mailman.uni-konstanz.de
When replying, please edit your Subject line so it is more specific
than "Re: Contents of BaseX-Talk digest..."
Today's Topics:
1. Re: BaseX 10.7 and util:if (Christian Gr?n)
2. Debugging XML catalog with BaseX 10.7 (Andy Bunce)
----------------------------------------------------------------------
Message: 1
Date: Thu, 23 Nov 2023 15:30:47 +0100
From: Christian Gr?n <christian.gruen(a)gmail.com>
To: Andy Bunce <bunce.andy(a)gmail.com>
Cc: BaseX <basex-talk(a)mailman.uni-konstanz.de>
Subject: Re: [basex-talk] BaseX 10.7 and util:if
Message-ID:
<CAP94bnN+BFAv6cRqvRwiDMLSw6hqN7pU12PcgUeN54XF-BwoAw(a)mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Hi Andy, hi Rob,
util:if is back again.
Just give me a note if there are other functions in the Utility Module (or
in the HOF Module) that you rely on.
Best,
Christian
On Mon, Nov 20, 2023 at 1:15?PM Andy Bunce <bunce.andy(a)gmail.com> wrote:
> Hi,
> The documentation for the utility module [1] says certain functions will
> be removed in version 11 because they will be in XPath 4.0.
> However, util:if has been removed from the documentation and I think there
> is no equivalent function in this case.
> util:if is an alternative syntax for the BaseX Ternary operator [2].
>
> ( I have used it because it was easier than trying to get my XQuery parser
> EBNF to accept $ok ?? 1 !! 0 etc)
>
> Is util:if to be removed?
>
> /Andy
> [1] https://docs.basex.org/wiki/Utility_Module#util:if
> [2] https://docs.basex.org/wiki/XQuery_Extensions#Ternary_If
>
Hi,
I am following the guide at [1] with the aim of loading some XML with a
variety of DTD doctypes.
I have a catalog.xml set (which works in another app) and in lib/custom I
have
- saxon-he-10.9.jar
- xmlresolver-5.2.2-data.jar
- xmlresolver-5.2.2.jar
- xmlresolver.properties
It is not working as I hoped. Is there a way to get debug info from
xmlresolver to a file without going to the Java level?
The guide says "You must configure your environment with an appropriate
logging backend." [2]
I guess this is about SLF4J <http://www.slf4j.org/> but I have no
experience here.
/Andy
[1] https://docs.basex.org/wiki/Catalog_Resolver
[2] https://xmlresolver.org/ch06#xml.catalog.defaultLoggerLogLevel
Hi,
The documentation for the utility module [1] says certain functions will be
removed in version 11 because they will be in XPath 4.0.
However, util:if has been removed from the documentation and I think there
is no equivalent function in this case.
util:if is an alternative syntax for the BaseX Ternary operator [2].
( I have used it because it was easier than trying to get my XQuery parser
EBNF to accept $ok ?? 1 !! 0 etc)
Is util:if to be removed?
/Andy
[1] https://docs.basex.org/wiki/Utility_Module#util:if
[2] https://docs.basex.org/wiki/XQuery_Extensions#Ternary_If