Hello BaseX-Community,
I am working a while with BaseX now and I like it very much. I think it is a great piece of software ... and it's Open Source ... WOW!!!
But every now and then I discover some small cumbersome issues that could maybe solved with some new functionalities. That is why I would like to propose the following features for BaseX:
* Description for database backups: Right now, the titles of database backups are like "dbname-2021-02-02-17-45-44". It would be great if one could add a description to such a backup. The reason is: While developing I often update databases step by step and in between these steps, I make backups of the whole database, so in case of errors or if I have to look up something, I could revert to a former state of the database. With a meaningful description, it would be much easier to pick the right backup.
* RegEx in index searches: It would be great to have the possibility to search for RegEx patterns in the index, e. g. like this: db:text("db-name", "REGEX-PATTERN-HERE") or db:attribute("db-name", "REGEX-PATTERN-HERE")
* Index searches with a path: I sometimes had the need to do an index search within only a certain path, e. g.: db:text('db-name', '/PATH/WITHIN/DB', 'search-word') I don't know if this is even possible to realize, but if yes, it would sometimes be very useful to me.
* Selective indexing based on attribute values: It would be great to have selective indexing based on attributes and their values. Imagine an XML structure like this: <items> <item name="id">ABC</item> <item name="title">XYZ</item> <item name="person">Jane Doe</item> ... </items> If one would like to index e. g. only the ID values, a functionality to do selective indexing just for "<item>" nodes with attribute "name" that contains the value "id" would be useful. This would allow to have a comparatively small index for very large XML data sets, which could have impact on query performance.
* Human readable execution times in GUI: Maybe a small change but - at least in my case - it would make developing performant xQueries much easier: Having the "Timing" section in the Info-View of the GUI display human readable times. Right now, the values are displayed only in milliseconds like 175713.28 ms. But an additional display in a more human readable format, e. g. hh:mm:ss.ms would sometimes be very useful.
I hope at least some of the mentioned features do make sense and will be considered for implementation [:-)]
Best regards, Michael Dieses Mail ist ausschließlich für die Verwendung durch die/den darin genannten AdressatInnen bestimmt und kann vertrauliche bzw rechtlich geschützte Informationen enthalten, deren Verwendung ohne Genehmigung durch den/ die AbsenderIn rechtswidrig sein kann. Falls Sie dieses Mail irrtümlich erhalten haben, informieren Sie uns bitte und löschen Sie die Nachricht. UID: ATU 16209706 I https://wien.arbeiterkammer.at/datenschutz
Hi Michael,
Thanks for your suggestions, always appreciated.
- *Description for database backups*: Right now, the titles of
database backups are like "dbname-2021-02-02-17-45-44". It would be great if one could add a description to such a backup. The reason is: While developing I often update databases step by step and in between these steps, I make backups of the whole database, so in case of errors or if I have to look up something, I could revert to a former state of the database. With a meaningful description, it would be much easier to pick the right backup.
Backups are nothing else than zipped versions of single database
directories. We could think about adding something like an info.txt file. If you like, you can already to that manually after the backup has been created (but I guess it’s not that comfortable).
- *RegEx in index searches*: It would be great to have the possibility
to search for RegEx patterns in the index, e. g. like this: db:text("db-name", "REGEX-PATTERN-HERE") or db:attribute("db-name", "REGEX-PATTERN-HERE")
A challenging one, as regular expressions can be arbitrarily complex, and
expensive to evaluate. For a start, you can e.g. use index:texts and retrieve all strings that start with the specified substring…
let $terms := index:texts('db-name', 'German') return db:text('db-name', $terms)
…or even use regex (but that might be a bit slow):
let $terms := index:texts('factbook')[matches(., 'REGEX-PATTERN-HERE')] return db:text('factbook', $terms)
- *Index searches with a path*: I sometimes had the need to do an
index search within only a certain path, e. g.: db:text('db-name', '/PATH/WITHIN/DB', 'search-word') I don't know if this is even possible to realize, but if yes, it would sometimes be very useful to me.
Generic string arguments with paths are challenging, as we need to
consider things like namespaces. The easiest thing should be to manually revert your path…
db:text('db-name', 'search-word') /parent::DB /parent::WITHIN /parent::PATH[parent::document-node()]
…or, more dynamic:
let $path := '/PATH/WITHIN/DB' let $reverse-path := ( let $steps := reverse(tokenize($path, '/')[.]) return string-join( for $step in $steps return 'parent::' || $step, '/' ) || '[parent::document-node()]' ) return xquery:eval( "db:text('db-name', 'search-word')/" || $reverse-path )
- *Selective indexing based on attribute values*: It would be great to
have selective indexing based on attributes and their values. Imagine an XML structure like this:
<items> <item name="id">ABC</item> <item name="title">XYZ</item> <item name="person">Jane Doe</item> ...
</items> If one would like to index e. g. only the ID values, a functionality to do selective indexing just for "<item>" nodes with attribute "name" that contains the value "id" would be useful. This would allow to have a comparatively small index for very large XML data sets, which could have impact on query performance.
As the requirements for custom filters are manifold, it’s difficult to
define general rules. One common solution for that is to create an additional index databases, which only contains relevant contents, or contents that have been modified in some way to make them better searchable [1].
- *Human readable execution times in GUI*: Maybe a small change but -
at least in my case - it would make developing performant xQueries much easier: Having the "Timing" section in the Info-View of the GUI display human readable times. Right now, the values are displayed only in milliseconds like 175713.28 ms. But an additional display in a more human readable format, e. g. hh:mm:ss.ms would sometimes be very useful.
Sounds reasonable and doable. I’ll think about it.
Best, Christian
[1] https://docs.basex.org/wiki/Indexes#Custom_Index_Structures
On Thu, Feb 04, 2021 at 05:21:24PM +0100, Christian Grün scripsit:
- *Human readable execution times in GUI*: Maybe a small change but
- at least in my case - it would make developing performant xQueries
much easier: Having the "Timing" section in the Info-View of the GUI display human readable times. Right now, the values are displayed only in milliseconds like 175713.28 ms. But an additional display in a more human readable format, e. g. hh:mm:ss.ms would sometimes be very useful.
Sounds reasonable and doable. I’ll think about it.
Please leave the time formatting switchable! I'd strongly prefer straight milliseconds to a human-readable format.
Execution times will now be shown in the MM:SS.mm or HH:MM:SS.mm format, but only…
• in the header of the info view, and • if the measured time exceeds 60 seconds.
A new snapshot is available [1].
Have fun, Christian
[1] https://files.basex.org/releases/latest/
On Thu, Feb 4, 2021 at 6:15 PM Graydon graydonish@gmail.com wrote:
On Thu, Feb 04, 2021 at 05:21:24PM +0100, Christian Grün scripsit:
- *Human readable execution times in GUI*: Maybe a small change but
- at least in my case - it would make developing performant xQueries
much easier: Having the "Timing" section in the Info-View of the GUI display human readable times. Right now, the values are displayed only in milliseconds like 175713.28 ms. But an additional display in a more human readable format, e. g. hh:mm:ss.ms would sometimes be very useful.
Sounds reasonable and doable. I’ll think about it.
Please leave the time formatting switchable! I'd strongly prefer straight milliseconds to a human-readable format.
-- Graydon Saunders | graydonish@gmail.com Þæs oferéode, ðisses swá mæg. -- Deor ("That passed, so may this.")
Hi Christian,
thank you very much for considering my feature requests. Also, thank you for all your tips and hints (I already try to use them in my queries) and the already realized implementation of the execution time display.
I understand that some of my suggestions could be difficult to realize or could lead to expensive calculations. If it is not possible to implement them, I still will continue to be a very happy user of BaseX.
Best regards, Michael
-------- Ursprüngliche Nachricht -------- Von: Christian Grün <christian.gruen@gmail.commailto:Christian%20%3d%3fISO-8859-1%3fQ%3fGr%3dFCn%3f%3d%20%3cchristian.gruen@gmail.com%3e> An: Graydon Saunders <graydonish@gmail.commailto:Graydon%20Saunders%20%3cgraydonish@gmail.com%3e> Kopie: BIRKNER Michael <Michael.BIRKNER@akwien.atmailto:BIRKNER%20Michael%20%3cMichael.BIRKNER@akwien.at%3e>, basex-talk@mailman.uni-konstanz.de <basex-talk@mailman.uni-konstanz.demailto:%22basex-talk@mailman.uni-konstanz.de%22%20%3cbasex-talk@mailman.uni-konstanz.de%3e> Betreff: Re: [basex-talk] Feature requests: Backups, Indexes, Execution Times Datum: Sat, 06 Feb 2021 11:12:01 +0100
Execution times will now be shown in the MM:SS.mm or HH:MM:SS.mm format, but only…
• in the header of the info view, and • if the measured time exceeds 60 seconds.
A new snapshot is available [1].
Have fun, Christian
[1] https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Ffiles.base...
On Thu, Feb 4, 2021 at 6:15 PM Graydon <graydonish@gmail.commailto:graydonish@gmail.com> wrote:
On Thu, Feb 04, 2021 at 05:21:24PM +0100, Christian Grün scripsit: - *Human readable execution times in GUI*: Maybe a small change but - at least in my case - it would make developing performant xQueries much easier: Having the "Timing" section in the Info-View of the GUI display human readable times. Right now, the values are displayed only in milliseconds like 175713.28 ms. But an additional display in a more human readable format, e. g. hh:mm:ss.ms would sometimes be very useful.
Sounds reasonable and doable. I’ll think about it.
Please leave the time formatting switchable! I'd strongly prefer straight milliseconds to a human-readable format.
-- Graydon Saunders | graydonish@gmail.commailto:graydonish@gmail.com Þæs oferéode, ðisses swá mæg. -- Deor ("That passed, so may this.") Dieses Mail ist ausschließlich für die Verwendung durch die/den darin genannten AdressatInnen bestimmt und kann vertrauliche bzw rechtlich geschützte Informationen enthalten, deren Verwendung ohne Genehmigung durch den/ die AbsenderIn rechtswidrig sein kann. Falls Sie dieses Mail irrtümlich erhalten haben, informieren Sie uns bitte und löschen Sie die Nachricht. UID: ATU 16209706 I https://wien.arbeiterkammer.at/datenschutz
basex-talk@mailman.uni-konstanz.de