Send BaseX-Talk mailing list submissions to
basex-talk@mailman.uni-konstanz.de
To subscribe or unsubscribe via the World Wide Web, visit
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
or, via email, send a message with subject or body 'help' to
basex-talk-request@mailman.uni-konstanz.de
You can reach the person managing the list at
basex-talk-owner@mailman.uni-konstanz.de
When replying, please edit your Subject line so it is more specific
than "Re: Contents of BaseX-Talk digest..."
Today's Topics:
1. Re: Different interpretation of regex in eXist, Saxon and
BaseX (Omar Siam)
2. Re: BaseX insert/delete node performance (Christian Gr?n)
3. Transaction management in BaseX 8.6.4 (Marc Coenegracht)
4. Re: Transaction management in BaseX 8.6.4 (Christian Gr?n)
5. Re: Transaction management in BaseX 8.6.4 (Christian Gr?n)
------------------------------------------------------------ ----------
Message: 1
Date: Wed, 8 Aug 2018 12:58:39 +0200
From: Omar Siam <Omar.Siam@oeaw.ac.at>
To: basex-talk@mailman.uni-konstanz.de
Subject: Re: [basex-talk] Different interpretation of regex in eXist,
Saxon and BaseX
Message-ID: <91b47b0e-70a1-1336-ced6-e12eaa804cde@oeaw.ac.at >
Content-Type: text/plain; charset=utf-8; format=flowed
Hi
I think the problem is: There are numerous implemetations of regular
expressions which have a common subset but are different in the more
advanced features.
Using the java regular expression implementation you can use greedy and
some other things. The XSL and XQuery implementation according to the
standards does not allow this and so misinterpretes the regular
expression. See here: https://www.w3.org/TR/xpath-functions-31/#regex-syntax
You can tell Saxon to use a different regexp engine such as the standard
Java one:
https://www.saxonica.com/html/documentation/functions/fn/ matches.html
Best regards
Omar
Am 07.08.2018 um 21:38 schrieb Andreas Mixich:
> Hi
>
> [rfc3986](https://tools.ietf.org/html/rfc3986#appendix-B ) defines a nice
> regular expression, which groups any URI, including URN, by URI component.
>
> Interesting about this regex is the use of the '?' quantifier which
> makes every preceding group/component optional, thus matching either an
> URI or any other(!) string, since anything, that does not match one of
> the special groups, goes into a catch-all group (no.5), which keeps
> either the path or the full, arbitrary string. This is neglectable,
> since the input to this regex is guaranteed to be of the right type
> (a/@href/string()).
>
> Here is the relevant part from the RFC.
>
> Appendix B
>
> ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
> 12 3 4 5 6 7 8 9
>
> The numbers in the second line above are only to assist
> readability; they indicate the reference points for each
> subexpression (i.e., each paired parenthesis). We refer to the
> value matched for subexpression <n> as $<n>. For example, matching
> the above expression to
>
> http://www.ics.uci.edu/pub/ietf/uri/#Related
>
> results in the following subexpression matches:
>
> $1 = http:
> $2 = http
> $3 = //www.ics.uci.edu
> $4 = www.ics.uci.edu
> $5 = /pub/ietf/uri/
> $6 = <undefined>
> $7 = <undefined>
> $8 = #Related
> $9 = Related
>
> where <undefined> indicates that the component is not present,
> as is the case for the query component in the above example.
> Therefore, we can determine the value of the five components as
>
> scheme = $2
> authority = $4
> path = $5
> query = $7
> fragment = $9
>
> Going in the opposite direction, we can recreate a URI reference
> from its components by using the algorithm of Section 5.3.
>
>
> I tested this regex with Saxon, eXist and BaseX. eXist successfully
> parsed all the test-cases, I threw at it, into the right groups, Saxon
> and BaseX did not. The failure is:
>
> [FORX0003] Pattern matches empty string..
>
> And that got me baffled, since all three processors use Java underneath
> and since the definition of the '?' quantifier, when used like this,
> seems to be:
>
> Makes the preceding item optional. Greedy, so the optional item
> is included in the match if possible.
>
> Which means, that *if* any of the group's contents match, they should be
> included, rather than producing an empty string.
>
> Why is it like that? And what can I do about it? I found no other URI
> parsing regex, that componentizes this way and would be compatible with
> XQuery.
>
> See, attached, a test-case.
>
------------------------------
Message: 2
Date: Wed, 8 Aug 2018 19:16:51 +0200
From: Christian Gr?n <christian.gruen@gmail.com>
To: BIRKNER Michael <Michael.BIRKNER@akwien.at>
Cc: BaseX <basex-talk@mailman.uni-konstanz.de >
Subject: Re: [basex-talk] BaseX insert/delete node performance
Message-ID:
<CAP94bnPj9-qHXKu6bbv_6FAiX=JQ28R8etd_oH31j=-=tPL+UQ@mail. gmail.com >
Content-Type: text/plain; charset="utf-8"
Michael,
Welcome to the list.
One thing you could try immediately is to call OPTIMIZE ? possibly followed
by the ALL flag, or db:optimize(..., true() ? and see if performance
improves. Obviously, this doesn't make sense after each single update
operation, but it could be called before a bigger number of updates is to
be performed.
> The problem is that in my case, I have to do about 150000 inserts and
deletes, so it would take too much time.
If you define all the insert expression (or a bigger number than just 1 or
10) in a single XQuery expression (via a FLWOR expression), you will
benefit from various bulk optimizations. Did you try that already?
Best,
Christian
Hi
BIRKNER Michael <Michael.BIRKNER@akwien.at> schrieb am Mi., 8. Aug. 2018,
08:36:
> Hello,
>
>
> I asked this question in StackOverflow concerning some performance
> problems I experienced when inserting nodes into a BaseX database:
>
> https://stackoverflow.com/questions/51595210/basex- inserting-nodes-performance- problems
>
> I already made some progress, especially when it comes to querying all
> data I need for the updates. I work a lot with the indexes now.
>
> But I still have problems with inserting - and also deleting - nodes. It
> doesn't matter if I insert/delete nodes via a Java program or in the editor
> of the BaseX GUI: Both is quite slow. Inserting just one node in the GUI
> with an XQuery like this one takes up to 3 seconds:
>
> insert node <related_record><title>Test title</title><author>Joe
> Lastname</author></related_record> into db:open-id('Database_Name', 7947561)
>
> Deleting a node with the following command takes up to 7 seconds:
> delete node db:open-id('Database_Name', 88085737)
>
> The problem is that in my case, I have to do about 150000 inserts and
> deletes, so it would take too much time.
>
> Maybe my database is just too big to be performant? Or some settings are
> wrong? I'm very new to BaseX (and XML databases in general) so maybe there
> are just some errors I don't see. I also give you some information on my
> database that I copied from the info screen of the BaseX GUI:
>
>
> Database Properties
> NAME: Database_Name
> SIZE: 2568 MB
> NODES: 135607105
> DOCUMENTS: 1
> BINARIES: 0
> TIMESTAMP: 2018-08-07T07:05:56.000Z
> UPTODATE: true
>
> Resource Properties
> INPUTPATH: /path/to/file.xml
> INPUTSIZE: 1774 MB
> INPUTDATE: 2018-07-24T14:32:58.000Z
>
> Indexes
> TEXTINDEX: true
> ATTRINDEX: true
> TOKENINDEX: false
> FTINDEX: false
> TEXTINCLUDE:
> ATTRINCLUDE:
> TOKENINCLUDE:
> FTINCLUDE:
> LANGUAGE: English
> STEMMING: false
> CASESENS: false
> DIACRITICS: false
> STOPWORDS:
> UPDINDEX: true
> AUTOOPTIMIZE: false
> MAXCATS: 100
> MAXLEN: 96
> SPLITSIZE: 0
>
> Best regards,
> Michael
>
> Beachten Sie, dass Sie uns ab sofort unter einer ge?nderten Rufnummer
> erreichen. Bitte speichern Sie gleich Ihren Kontakt zur AK Wien ein unter *501
> 65 1*, gefolgt von der gewohnten Durchwahl.
> Dieses Mail ist ausschlie?lich f?r die Verwendung durch die/den darin
> genannten AdressatInnen bestimmt und kann vertrauliche bzw rechtlich
> gesch?tzte Informationen enthalten, deren Verwendung ohne Genehmigung durch
> den/ die AbsenderIn rechtswidrig sein kann. Falls Sie dieses Mail
> irrt?mlich erhalten haben, informieren Sie uns bitte und l?schen Sie die
> Nachricht. UID: ATU 16209706 I
> https://wien.arbeiterkammer.at/Datenschutz_(DSGVO).html
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.uni-konstanz.de/pipermail/basex-talk/ >attachments/20180808/e707b585/ attachment-0001.html
------------------------------
Message: 3
Date: Wed, 8 Aug 2018 22:31:42 +0200 (CEST)
From: Marc Coenegracht <marc@crosseyed.nl>
To: "basex-talk@mailman.uni-konstanz.de "
<basex-talk@mailman.uni-konstanz.de >
Subject: [basex-talk] Transaction management in BaseX 8.6.4
Message-ID: <alpine.DEB.2.20.1808082127520.14917@errorquark >
Content-Type: text/plain; format=flowed; charset=US-ASCII
Hi,
A CMS occasionally recreates some existing databases of a production site.
The databases are deleted and again created with the new content within
a few seconds.
What happens if a read operation is taking place during this process? Can
it cause problems with the recreation of the DB or with the BaseX server
instance?
Of course it is possible to update the databases instead, but this process
is a lot simpler and probably faster too.
All operations are executed running xquery scripts with REST using the
BaseX http server.
Marc
------------------------------
Message: 4
Date: Wed, 8 Aug 2018 23:51:47 +0200
From: Christian Gr?n <christian.gruen@gmail.com>
To: marc@crosseyed.nl
Cc: BaseX <basex-talk@mailman.uni-konstanz.de >
Subject: Re: [basex-talk] Transaction management in BaseX 8.6.4
Message-ID:
<CAP94bnPC4p_=vLYnpFXUTEJM6e-oCXJ=p+BY6aWuPKM+_goc1A@mail. gmail.com >
Content-Type: text/plain; charset="UTF-8"
Hi Marc,
As one XQuery expression is one transaction, the best approach is to
define your operations in a single query. If you call db:create, an
existing database will be overwritten, and the function allows you to
specify some initial input.
Hope this helps,
Christian
On Wed, Aug 8, 2018 at 10:31 PM Marc Coenegracht <marc@crosseyed.nl> wrote:
>
> Hi,
>
> A CMS occasionally recreates some existing databases of a production site.
> The databases are deleted and again created with the new content within
> a few seconds.
>
> What happens if a read operation is taking place during this process? Can
> it cause problems with the recreation of the DB or with the BaseX server
> instance?
>
> Of course it is possible to update the databases instead, but this process
> is a lot simpler and probably faster too.
>
> All operations are executed running xquery scripts with REST using the
> BaseX http server.
>
>
> Marc
------------------------------
Message: 5
Date: Thu, 9 Aug 2018 08:28:59 +0200
From: Christian Gr?n <christian.gruen@gmail.com>
To: Marc Coenegracht <marc@crosseyed.nl>, BaseX
<basex-talk@mailman.uni-konstanz.de >
Subject: Re: [basex-talk] Transaction management in BaseX 8.6.4
Message-ID:
<CAP94bnMUF0n_nbsQ4J6x2Lg=J8hM4TDJybJsKdGbRyFkHvGqSg@mail. >gmail.com
Content-Type: text/plain; charset="utf-8"
Hi Marc (cc to the list),
If the database replacement is defined as a single REST operation, you
won?t encounter any problems; other transactions will need to wait until
your database has been fully created.
Best,
Christian
Marc Coenegracht <marc@crosseyed.nl> schrieb am Do., 9. Aug. 2018, 00:21:
> Hi Christian,
>
> Thanks for the quick answer.
>
> Defining the operations in a single query would be preferable but isn't
> possible, since the read operation is simply triggered by a website
> visitor, and the creation of the new DB (indeed overwriting the old DB
> with db:create) is performed by an action of the CMS admin.
>
> So, these operations can happen at the exact same moment. The
> inconvenience at the front-end will be minimal, I'm just wondering if
> these concurrent operations can cause problems with BaseX or the
> db:create operation.
>
> best,
> Marc
>
> On Wed, 8 Aug 2018, Christian Gr?n wrote:
>
> > Hi Marc,
> >
> > As one XQuery expression is one transaction, the best approach is to
> > define your operations in a single query. If you call db:create, an
> > existing database will be overwritten, and the function allows you to
> > specify some initial input.
> >
> > Hope this helps,
> > Christian
> >
> >
> >
> > On Wed, Aug 8, 2018 at 10:31 PM Marc Coenegracht <marc@crosseyed.nl>
> wrote:
> >>
> >> Hi,
> >>
> >> A CMS occasionally recreates some existing databases of a production
> site.
> >> The databases are deleted and again created with the new content within
> >> a few seconds.
> >>
> >> What happens if a read operation is taking place during this process?
> Can
> >> it cause problems with the recreation of the DB or with the BaseX server
> >> instance?
> >>
> >> Of course it is possible to update the databases instead, but this
> process
> >> is a lot simpler and probably faster too.
> >>
> >> All operations are executed running xquery scripts with REST using the
> >> BaseX http server.
> >>
> >>
> >> Marc
> >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.uni-konstanz.de/pipermail/basex-talk/ >attachments/20180809/d28a69de/ attachment.html
End of BaseX-Talk Digest, Vol 104, Issue 15
*******************************************