feature request: opening database at arbitrary file path

List overview All Threads
Download

newer

older

Content is not allowed in prolog

Faster in the cloud?

Eric Levy

19 Feb 2022 19 Feb '22

4:48 p.m.

I recently learned that BaseX offers no true support for access to a database given directly by its path.

In contrast, opening a database by file path is a well-established pattern in embedded databases, seen for example in SQLite, which permits use of any database given by an arbitrary path, without any dependence on external configuration or other data elsewhere on the system.

I wish to submit as a feature request the suggestion that BaseX include similar support.

Show replies by date

Christian Grün

20 Feb 20 Feb

10:36 a.m.

Hi Eric,

While it’s possible to use BaseX in an embedded way, it’s actually more than that:

• The existence of a persistent home directory is a focal aspect of BaseX, many features rely on that [1]. • Users, jobs, database logs and other information is kept in this directory, and is not limited to a single database instance. • The implementation of XQuery allows you to access and update multiple databases, and many features would need to be limited if we introduced the concept of a single database.

Maybe we can help you out by learning more about your use case: How do you currently embed BaseX?

Best, Christian

[1] https://docs.basex.org/wiki/Configuration

On Sat, Feb 19, 2022 at 10:48 PM Eric Levy contact@ericlevy.name wrote:

...

I recently learned that BaseX offers no true support for access to a database given directly by its path.

In contrast, opening a database by file path is a well-established pattern in embedded databases, seen for example in SQLite, which permits use of any database given by an arbitrary path, without any dependence on external configuration or other data elsewhere on the system.

I wish to submit as a feature request the suggestion that BaseX include similar support.

Eric Levy

8:28 p.m.

I am not sure what more information to give, as I have already drawn an analogy to a pattern that has become widespread for using SQLite.

Essentially, an application would operate on a project directory, which would follow a particular hierarchy. Among the items in the hierarchy would be a database. The application would need to open the database, perform queries, and then finally close the database, as part of a larger operation.

The project hierarchy would be represented in the application logic. For example, the application might place the database in a location called "database" at the top level of the project folder. As such, if the location of the project were given as "/path/to/project", then the application would need to open a database given at "/path/to/project/database". In this sense, the database would be simply an equal member of the file tree, just as would be a log file placed at "/path/to/project/logfile". If the file tree "/path/to/project" were copied to another host, and the new location were "/srv/proj/", then the database would be opened from "/srv/proj/database" when the application were invoked on the new host.

As projects would be just collections of files, there would be no limit to the number of such projects that may reside or the same system, and there would be no need to register them in any system configuration. They would be just files, as any other files.

On Sun, 2022-02-20 at 16:36 +0100, Christian Grün wrote:

...

Hi Eric,

While it’s possible to use BaseX in an embedded way, it’s actually more than that:

• The existence of a persistent home directory is a focal aspect of BaseX, many features rely on that [1]. • Users, jobs, database logs and other information is kept in this directory, and is not limited to a single database instance. • The implementation of XQuery allows you to access and update multiple databases, and many features would need to be limited if we introduced the concept of a single database.

Maybe we can help you out by learning more about your use case: How do you currently embed BaseX?

Best, Christian

[1] https://docs.basex.org/wiki/Configuration

On Sat, Feb 19, 2022 at 10:48 PM Eric Levy contact@ericlevy.name wrote:

...
I recently learned that BaseX offers no true support for access to a database given directly by its path.

In contrast, opening a database by file path is a well-established pattern in embedded databases, seen for example in SQLite, which permits use of any database given by an arbitrary path, without any dependence on external configuration or other data elsewhere on the system.

I wish to submit as a feature request the suggestion that BaseX include similar support.

Christian Grün

21 Feb 21 Feb

4:16 a.m.

...

I have already drawn an analogy to a pattern that has become widespread for using SQLite.

Maybe it helps to compare BaseX with other Open-Source SQL databases, e.g. PostgreSQL: While it’s possibly to use it embedded, specifying a simple path as target won’t suffice.

You’ve probably seen that a single BaseX database is a directory with multiple files.

...

if the location of the project were given as "/path/to/project", then the application would need to open a database given at "/path/to/project/database".

What about assigning "/path/to/project/database" to DBPATH and storing your single database as a sub-directory of that folder?

...

I am not sure what more information to give,

I was wondering if you use Java, the command line or something else to communicate with BaseX.

But If I understand you correctly, you haven’t embedded BaseX yet, but you would like to do so?

Tamara Marnell

2:59 p.m.

Hi Eric,

There is currently no limit to the number of projects in a system (besides storage and memory, of course). BaseX databases can point anywhere they need to in the file system, so there's no reason for the database files themselves to live in different child folders.

For example, on our Ubuntu server the databases are written to /opt/basex/data, while the files the databases represent live in /home/[user]/public_html/repos/[various children]/files. When we create new repositories, we put the files in a new folder and create the BaseX database like: CREATE DB new_name /home/[user]/public_html/repos/new_repo/files. We haven't used BaseX in more than one project yet, but if we did we'd just make more databases pointing at directories under a new /home/[user].

The only reason I can think of that storing the BaseX database files in different subfolders might be handy is if you need each of the sub-projects to be portable, but IME trying to copy BaseX database files into a new system doesn't work well. It's better to get all of your scripts in place for creating the databases, and just re-run them when the project is ready in its new home.

-Tamara

On Mon, Feb 21, 2022 at 1:16 AM Christian Grün christian.gruen@gmail.com wrote:

...

...
I have already drawn an analogy to a pattern that has become widespread

for using SQLite.

Maybe it helps to compare BaseX with other Open-Source SQL databases, e.g. PostgreSQL: While it’s possibly to use it embedded, specifying a simple path as target won’t suffice.

You’ve probably seen that a single BaseX database is a directory with multiple files.

...
if the location of the project were given as "/path/to/project", then

the application would need to open a database given at "/path/to/project/database".

What about assigning "/path/to/project/database" to DBPATH and storing your single database as a sub-directory of that folder?

...
I am not sure what more information to give,

I was wondering if you use Java, the command line or something else to communicate with BaseX.

But If I understand you correctly, you haven’t embedded BaseX yet, but you would like to do so?

-- Tamara Marnell Program Manager, Systems Orbis Cascade Alliance (orbiscascade.org https://www.orbiscascade.org/) Pronouns: she/her/hers

Eric Levy

3:35 p.m.

...

There is currently no limit to the number of projects in a system (besides storage and memory, of course). BaseX databases can point anywhere they need to in the file system, so there's no reason for the database files themselves to live in different child folders.

For example, on our Ubuntu server the databases are written to /opt/basex/data, while the files the databases represent live in /home/[user]/public_html/repos/[various children]/files. When we create new repositories, we put the files in a new folder and create the BaseX database like: CREATE DB new_name /home/[user]/public_html/repos/new_repo/files. We haven't used BaseX in more than one project yet, but if we did we'd just make more databases pointing at directories under a new /home/[user].

I think this explanation misses the point of my earlier one. If I have a project located at /srv/proj/foo, then make a copy to /srv/proj/bar, I would wish to have two independent copies of the same database data. If I then perform operations on the original copy, I would wish the new copy to retain the same project state from the time of the copy. The new copy of the project should change state only at some time of performing operations on that copy. In this sense, opening a database from a file path is the same as opening an image or text document. Operations of the open file affect the open file, independent of past or future copies, and of other files on the system.

In your model, all copies of my project would access the same database, on the same system. Further, when moved to a new system, a project copy would lose all the data from the database.

...

The only reason I can think of that storing the BaseX database files in different subfolders might be handy is if you need each of the sub-projects to be portable, but IME trying to copy BaseX database files into a new system doesn't work well. It's better to get all of your scripts in place for creating the databases, and just re-run them when the project is ready in its new home.

It seems not to work well for BaseX, but I am suggesting the limitation is currently coming from the design of BaseX. The request is to generalize the design for full support of embedded databases. Many embedded databases follow a design compatible with my use case. The files of each particular database may be given by a user path, and these files may be copied, resulting in a new independent and working copy of the database, given at the target path for the copy.

Eric Levy

3:52 p.m.

...

Maybe it helps to compare BaseX with other Open-Source SQL databases, e.g. PostgreSQL: While it’s possibly to use it embedded, specifying a simple path as target won’t suffice.

SQL databases are most useful for data that may be represented naturally using a relational model.

XML and JSON databases are needed when the data consists of documents already occurring in such formats, and which must be stored without loss of their structure and must be queried and manipulated according the relationships captured in the structure.

...

You’ve probably seen that a single BaseX database is a directory with multiple files.

Multiple files versus one would not be an issue.

...

...
if the location of the project were given as "/path/to/project", then the application would need to open a database given at "/path/to/project/database".

What about assigning "/path/to/project/database" to DBPATH and storing your single database as a sub-directory of that folder?

I could follow this design in principle, but I worry it is prone to breakage. There are ancillary issues too, such as the side effect of new files appearing under the user home directory.

...

I was wondering if you use Java, the command line or something else to communicate with BaseX.

But If I understand you correctly, you haven’t embedded BaseX yet, but you would like to do so?

The plan had been to prototype using a Bash script and query files, before resolving whether a more sophisticated approach would be warranted.

Christian Grün

4:02 p.m.

...

...
Maybe it helps to compare BaseX with other Open-Source SQL databases, e.g. PostgreSQL: While it’s possibly to use it embedded, specifying a simple path as target won’t suffice.

SQL databases are most useful for data that may be represented naturally using a relational model.

Sure. I simply wanted to illustrate that you cannot embed PostgreSQL that easily in your project as you can embed SQLite.

I suppose BaseX is not the right tool for you, as you seem to look for something that's more light-weight.

Eric Levy

5:08 p.m.

...

...
...
Maybe it helps to compare BaseX with other Open-Source SQL

databases,

...
e.g. PostgreSQL: While it’s possibly to use it embedded,

specifying a

...
simple path as target won’t suffice.

SQL databases are most useful for data that may be represented naturally using a relational model.

Sure. I simply wanted to illustrate that you cannot embed PostgreSQL that easily in your project as you can embed SQLite.

I may have misunderstood the direction of your comment.

Broadly, as you suggest, the databases mentioned may be classified according to two orthogonal traits, whether supporting server versus embedded modes, and by the data or document model.

I had mentioned SQLite, because, even though it has a different data model from that of BaseX, I thought it might useful to compare in terms of how it resolves the location of the database. Among the use cases for which SQLite has become successful is one in which a database occurs in a portable project directory. Such flexibility makes it a credible choice for managing relational data in contexts for which PostgreSQL and other server-only database systems would be inappropriate.

Presently, BaseX offers limited support for embedded use. It seems, at least in principle, a feasible path is available to strengthen the support for the embedded case by supporting a mode of opening a database from a file path independent of data external to the path. Such support, of course, may be added without imposing any constraints on existing support for either embedded or server modes.

...

I suppose BaseX is not the right tool for you, as you seem to look for something that's more light-weight.

It may not be the right tool, at least at present. I believe the limitation is currently constrained to the issue I have raised, about opening a database at a user-supplied location. In terms of essential operational characteristics, I think the application is suitably light weight, as it seems to handle invocation-per-query use cases with adequate efficiency on the target systems.

I think the request is sound, as it would leverage much of the already available functionality for embedded invocation, but for a much more general set of use cases.

Of course, how well the request would align with broader project objectives is outside the scope of my information.

Tamara Marnell

6:50 p.m.

Hi Eric,

As you say, "XML and JSON databases are needed when the data consists of documents already occurring in such formats." BaseX databases are intended to be representations of physical files, so what operations are you performing on the databases that are not easily reproducible from the XML? You say if you move your project to a new system, you'll "lose all data from the database"--is your goal to ship out the databases as files for read-only access, without including the source documents? I'm not a BaseX dev, but I don't think that's its intended use.

Re: "In your model, all copies of my project would access the same database, on the same system." One folder of databases isn't the same as one database. Your project copies can each have their own database within BaseX; they just need to have distinct names like "db-prod" for documents stored in your production folder, "db-dev" for documents stored in your development folder, etc. You can construct your queries to access the database you want at the time by passing an external variable like:

declare variable $d as xs:string external; for $result in db:open($d)/etc.

In that way you can access data representing files from anywhere on the machine, so it doesn't matter where the encrypted BaseX databases are stored. You can also copy databases into new ones for manipulation or preservation of a "state" with COPY original_db copied_db. If your changes in original_db get messed up and you want to restore, DROP DB original_db; COPY copied_db original_db.

Assuming every distribution of your project will contain the original files, or work based off of user-supplied files, there is no limitation to putting all databases under one directory. Your instructions for installation only need to include specifying that directory in the .basex file, or in a custom file of definitions if you want to set it programmatically; and running a script that constructs the BaseX database on that computer from the documents. Whenever I move our project to a new machine--which I did frequently during development--our scripts rebuilt our databases of ~42K documents, including full-text and custom indexes, in about five minutes.

-Tamara

On Mon, Feb 21, 2022 at 2:08 PM Eric Levy contact@ericlevy.name wrote:

...

...
...
...
Maybe it helps to compare BaseX with other Open-Source SQL

databases,

...
e.g. PostgreSQL: While it’s possibly to use it embedded,

specifying a

...
simple path as target won’t suffice.

SQL databases are most useful for data that may be represented naturally using a relational model.

Sure. I simply wanted to illustrate that you cannot embed PostgreSQL that easily in your project as you can embed SQLite.

I may have misunderstood the direction of your comment.

Broadly, as you suggest, the databases mentioned may be classified according to two orthogonal traits, whether supporting server versus embedded modes, and by the data or document model.

I had mentioned SQLite, because, even though it has a different data model from that of BaseX, I thought it might useful to compare in terms of how it resolves the location of the database. Among the use cases for which SQLite has become successful is one in which a database occurs in a portable project directory. Such flexibility makes it a credible choice for managing relational data in contexts for which PostgreSQL and other server-only database systems would be inappropriate.

Presently, BaseX offers limited support for embedded use. It seems, at least in principle, a feasible path is available to strengthen the support for the embedded case by supporting a mode of opening a database from a file path independent of data external to the path. Such support, of course, may be added without imposing any constraints on existing support for either embedded or server modes.

...
I suppose BaseX is not the right tool for you, as you seem to look for something that's more light-weight.

It may not be the right tool, at least at present. I believe the limitation is currently constrained to the issue I have raised, about opening a database at a user-supplied location. In terms of essential operational characteristics, I think the application is suitably light weight, as it seems to handle invocation-per-query use cases with adequate efficiency on the target systems.

I think the request is sound, as it would leverage much of the already available functionality for embedded invocation, but for a much more general set of use cases.

Of course, how well the request would align with broader project objectives is outside the scope of my information.

-- Tamara Marnell Program Manager, Systems Orbis Cascade Alliance (orbiscascade.org https://www.orbiscascade.org/) Pronouns: she/her/hers

Imsieke, Gerrit, le-tex

7:16 p.m.

Or to say it more bluntly: Eric, start using BaseX productively before making feature requests that seem out-of-sync with BaseX’s architecture to core developers and to long-time users alike. The fact that no one has raised such a request in the last decade – no one has come forth to say “wow, if I could specify each database’s own directory, I could do things with less friction” – speaks volumes. This is not a request that a BaseX user would make, even if they thought about ways to embed the database engine in an application.

On 22.02.2022 00:50, Tamara Marnell wrote:

...

Hi Eric,

As you say, "XML and JSON databases are needed when the data consists of documents already occurring in such formats." BaseX databases are intended to be representations of physical files, so what operations are you performing on the databases that are not easily reproducible from the XML? You say if you move your project to a new system, you'll "lose all data from the database"--is your goal to ship out the databases as files for read-only access, without including the source documents? I'm not a BaseX dev, but I don't think that's its intended use.

Re: "In your model, all copies of my project would access the same database, on the same system." One folder of databases isn't the same as one database. Your project copies can each have their own database within BaseX; they just need to have distinct names like "db-prod" for documents stored in your production folder, "db-dev" for documents stored in your development folder, etc. You can construct your queries to access the database you want at the time by passing an external variable like:

declare variable $d as xs:string external; for $result in db:open($d)/etc.

In that way you can access data representing files from anywhere on the machine, so it doesn't matter where the encrypted BaseX databases are stored. You can also copy databases into new ones for manipulation or preservation of a "state" with COPY original_db copied_db. If your changes in original_db get messed up and you want to restore, DROP DB original_db; COPY copied_db original_db.

Assuming every distribution of your project will contain the original files, or work based off of user-supplied files, there is no limitation to putting all databases under one directory. Your instructions for installation only need to include specifying that directory in the .basex file, or in a custom file of definitions if you want to set it programmatically; and running a script that constructs the BaseX database on that computer from the documents. Whenever I move our project to a new machine--which I did frequently during development--our scripts rebuilt our databases of ~42K documents, including full-text and custom indexes, in about five minutes.

-Tamara

On Mon, Feb 21, 2022 at 2:08 PM Eric Levy <contact@ericlevy.name mailto:contact@ericlevy.name> wrote:
 > > > Maybe it helps to compare BaseX with other Open-Source SQL
 > > databases,
 > > > e.g. PostgreSQL: While it’s possibly to use it embedded,
 > > specifying a
 > > > simple path as target won’t suffice.
 > >
 > > SQL databases are most useful for data that may be represented
 > > naturally using a relational model.
 >
 > Sure. I simply wanted to illustrate that you cannot embed PostgreSQL
 > that easily in your project as you can embed SQLite.

I may have misunderstood the direction of your comment.

Broadly, as you suggest, the databases mentioned may be classified
according to two orthogonal traits, whether supporting server versus
embedded modes, and by the data or document model.

I had mentioned SQLite, because, even though it has a different data
model from that of BaseX, I thought it might useful to compare in terms
of how it resolves the location of the database. Among the use cases
for which SQLite has become successful is one in which a database
occurs in a portable project directory. Such flexibility makes it a
credible choice for managing relational data in contexts for which
PostgreSQL and other server-only database systems would be
inappropriate.

Presently, BaseX offers limited support for embedded use. It seems, at
least in principle, a feasible path is available to strengthen the
support for the embedded case by supporting a mode of opening a
database from a file path independent of data external to the path.
Such support, of course, may be added without imposing any constraints
on existing support for either embedded or server modes.


 > I suppose BaseX is not the right tool for you, as you seem to look
 > for something that's more light-weight.

It may not be the right tool, at least at present. I believe the
limitation is currently constrained to the issue I have raised, about
opening a database at a user-supplied location. In terms of essential
operational characteristics, I think the application is suitably light
weight, as it seems to handle invocation-per-query use cases with
adequate efficiency on the target systems.

I think the request is sound, as it would leverage much of the already
available functionality for embedded invocation, but for a much more
general set of use cases.

Of course, how well the request would align with broader project
objectives is outside the scope of my information.
--

Tamara Marnell Program Manager, Systems Orbis Cascade Alliance (orbiscascade.org https://www.orbiscascade.org/) Pronouns: she/her/hers

Eric Levy

8:01 p.m.

On Mon, 2022-02-21 at 15:50 -0800, Tamara Marnell wrote:

...

Assuming every distribution of your project will contain the original files, or work based off of user-supplied files, there is no limitation to putting all databases under one directory. Your instructions for installation only need to include specifying that directory in the .basex file, or in a custom file of definitions if you want to set it programmatically; and running a script that constructs the BaseX database on that computer from the documents. Whenever I move our project to a new machine--which I did frequently during development--our scripts rebuilt our databases of ~42K documents, including full-text and custom indexes, in about five minutes.

I think you're not understanding the use, as your general framing is quite far from what I have tried to explain. I'm not sure it would be helpful for me to address each point you raised in detail.

I feel my explanations are generally clear, especially in light of the context I offered, and I don't want to keep repeating myself. Perhaps the SQLite documentation, especially the section "Appropriate Uses For SQLite" [1], would provide a more robust or accessible representation of embedded database applications than has emerged so far in the discussion.

1. https://www.sqlite.org/whentouse.html

On Tue, 2022-02-22 at 01:16 +0100, Imsieke, Gerrit, le-tex wrote:

...

Or to say it more bluntly: Eric, start using BaseX productively before making feature requests that seem out-of-sync with BaseX’s architecture to core developers and to long-time users alike. The fact that no one has raised such a request in the last decade – no one has come forth to say “wow, if I could specify each database’s own directory, I could do things with less friction” – speaks volumes. This is not a request that a BaseX user would make, even if they thought about ways to embed the database engine in an application.

I made a request, and explained the use case, drawing from examples of established design patterns. The request may not be one you see as aligned to the project objectives, which is fine, but the analysis meets me as rather circular.

Imsieke, Gerrit, le-tex

8:31 p.m.

What is circular in “don’t waste our precious time”?

Pardon my French, but this is really a non-issue, or an issue that needs to wait in line after the request to allow more than 256 namespaces [1].

[1] https://github.com/BaseXdb/basex/issues/902

On 22.02.2022 02:01, Eric Levy wrote:

...

On Mon, 2022-02-21 at 15:50 -0800, Tamara Marnell wrote:

...
Assuming every distribution of your project will contain the original files, or work based off of user-supplied files, there is no limitation to putting all databases under one directory. Your instructions for installation only need to include specifying that directory in the .basex file, or in a custom file of definitions if you want to set it programmatically; and running a script that constructs the BaseX database on that computer from the documents. Whenever I move our project to a new machine--which I did frequently during development--our scripts rebuilt our databases of ~42K documents, including full-text and custom indexes, in about five minutes.

I think you're not understanding the use, as your general framing is quite far from what I have tried to explain. I'm not sure it would be helpful for me to address each point you raised in detail.

I feel my explanations are generally clear, especially in light of the context I offered, and I don't want to keep repeating myself. Perhaps the SQLite documentation, especially the section "Appropriate Uses For SQLite" [1], would provide a more robust or accessible representation of embedded database applications than has emerged so far in the discussion.

https://www.sqlite.org/whentouse.html

On Tue, 2022-02-22 at 01:16 +0100, Imsieke, Gerrit, le-tex wrote:

...
Or to say it more bluntly: Eric, start using BaseX productively before making feature requests that seem out-of-sync with BaseX’s architecture to core developers and to long-time users alike. The fact that no one has raised such a request in the last decade – no one has come forth to say “wow, if I could specify each database’s own directory, I could do things with less friction” – speaks volumes. This is not a request that a BaseX user would make, even if they thought about ways to embed the database engine in an application.

I made a request, and explained the use case, drawing from examples of established design patterns. The request may not be one you see as aligned to the project objectives, which is fine, but the analysis meets me as rather circular.

-- Gerrit Imsieke Geschäftsführer / Managing Director le-tex publishing services GmbH Weissenfelser Str. 84, 04229 Leipzig, Germany Phone +49 341 355356 110, Fax +49 341 355356 510 gerrit.imsieke@le-tex.de, http://www.le-tex.de Registergericht / Commercial Register: Amtsgericht Leipzig Registernummer / Registration Number: HRB 24930 Geschäftsführer / Managing Directors: Gerrit Imsieke, Svea Jelonek, Thomas Schmidt

Eric Levy

8:45 p.m.

...

What is circular in “don’t waste our precious time”?

It's circular to conclude that a feature has no practical application generally because among the current users of a product that does not have the feature, no users are using the feature.

The current user base of any product is naturally constrained to those whose requirements align with the features available at any particular time. The user base of a product often changes, becoming broader and larger, as the product matures, and its features develop.

Supporting the feature might broaden the user base to include those whose requirements match the broader use cases proposed, but do not match the narrower use cases supported so far.

Tamara Marnell

22 Feb 22 Feb

11:58 a.m.

Good morning (in Oregon), Eric,

You've brought up SQLite many times as an example of how you want BaseX to behave. The "embeddable" nature of SQLite is not a feature the developer decided to add to an existing project, but the very purpose of its existence. SQLite was designed by a US military contractor specifically to be included in applications on naval ships without requiring the installation of a MySQL or PostgreSQL server.

In contrast, BaseX uses a server/client architecture like MySQL and PostgreSQL. It needs to be installed and configured system-wide and started as a service, so there's no point in allowing users to define multiple different DBPATHs. This isn't an unusual or limiting structure, just different from what you've envisioned for your particular application.

For example, anyone who wants to put WordPress on a web server needs to first install and configure the MySQL server package on the system. WordPress doesn't ship with embedded databases, just SQL statements to build the necessary tables. Similarly, to transfer an existing WordPress website to a new server, the database needs to be dumped as SQL and reconstructed from that source, the same way I reconstruct BaseX databases from the XML documents when I move the project around. And if the server admin wants to run other applications that use MySQL databases, they will all live in the same central service, not from files read from user-specified directories. I've rarely encountered projects that use SQLite, but I often find ones that require MySQL, so this workflow is not a significant problem for users.

In this thread, we've offered suggestions of ways you could structure your application to use BaseX, and how each of your concerns could be addressed. You can either consider them and experiment, or you can find a different tool you could use to fully embed your data and access it without a system-wide service.

-Tamara

On Mon, Feb 21, 2022 at 5:01 PM Eric Levy contact@ericlevy.name wrote:

...

On Mon, 2022-02-21 at 15:50 -0800, Tamara Marnell wrote:

...
Assuming every distribution of your project will contain the original files, or work based off of user-supplied files, there is no limitation to putting all databases under one directory. Your instructions for installation only need to include specifying that directory in the .basex file, or in a custom file of definitions if you want to set it programmatically; and running a script that constructs the BaseX database on that computer from the documents. Whenever I move our project to a new machine--which I did frequently during development--our scripts rebuilt our databases of ~42K documents, including full-text and custom indexes, in about five minutes.

I think you're not understanding the use, as your general framing is quite far from what I have tried to explain. I'm not sure it would be helpful for me to address each point you raised in detail.

I feel my explanations are generally clear, especially in light of the context I offered, and I don't want to keep repeating myself. Perhaps the SQLite documentation, especially the section "Appropriate Uses For SQLite" [1], would provide a more robust or accessible representation of embedded database applications than has emerged so far in the discussion.

https://www.sqlite.org/whentouse.html

On Tue, 2022-02-22 at 01:16 +0100, Imsieke, Gerrit, le-tex wrote:

...
Or to say it more bluntly: Eric, start using BaseX productively before making feature requests that seem out-of-sync with BaseX’s architecture to core developers and to long-time users alike. The fact that no one has raised such a request in the last decade – no one has come forth to say “wow, if I could specify each database’s own directory, I could do things with less friction” – speaks volumes. This is not a request that a BaseX user would make, even if they thought about ways to embed the database engine in an application.

I made a request, and explained the use case, drawing from examples of established design patterns. The request may not be one you see as aligned to the project objectives, which is fine, but the analysis meets me as rather circular.

-- Tamara Marnell Program Manager, Systems Orbis Cascade Alliance (orbiscascade.org https://www.orbiscascade.org/) Pronouns: she/her/hers

Liam R. E. Quin

21 Feb 21 Feb

7:01 p.m.

On Mon, 2022-02-21 at 17:08 -0500, Eric Levy wrote:

...

...
...
Presently, BaseX offers limited support for embedded use. It seems, at least in principle, a feasible path is available to strengthen the support for the embedded case by supporting a mode of opening a database from a file path independent of data external to the path.

Just to be clear, there are two common uses for the term "path" and i'm not sure which you mean... (1) a filename, e.g. /usrpk/dp/projects/basex/db107 (2) a list of directories (folders) to search for something (DBBPATH)

You can open a database at a given file location easily in BaseX from within XQuery using db:open() and for that matter you can search a list of directories.

Your wording is abstract enough that i'm not sure i've understood what you're trying to do, though.

Liam

-- Liam Quin, https://www.delightfulcomputing.com/ Available for XML/Document/Information Architecture/XSLT/ XSL/XQuery/Web/Text Processing/A11Y training, work & consulting. Barefoot Web-slave, antique illustrations: http://www.fromoldbooks.org

Eric Levy

8 p.m.

On Mon, 2022-02-21 at 19:01 -0500, Liam R. E. Quin wrote:

...

On Mon, 2022-02-21 at 17:08 -0500, Eric Levy wrote:

...
Presently, BaseX offers limited support for embedded use. It seems, at least in principle, a feasible path is available to strengthen the support for the embedded case by supporting a mode of opening a database from a file path independent of data external to the path.

Just to be clear, there are two common uses for the term "path" and i'm not sure which you mean... (1) a filename, e.g. /usrpk/dp/projects/basex/db107 (2) a list of directories (folders) to search for something (DBBPATH)

You can open a database at a given file location easily in BaseX from within XQuery using db:open() and for that matter you can search a list of directories.

Your wording is abstract enough that i'm not sure i've understood what you're trying to do, though.

Liam

I am using "path" in the sense of (1), as in a file location, not in sense (2), as in "search path".

The "open" function is documented as follows:

db:open($db as xs:string, $path as xs:string) as document-node()*

Opens the database $db and returns all document nodes. The document nodes to be returned can be filtered with the $path argument.

The database is given as a name, not a path. The variable named "path" is described as a path within the database for filtering nodes, not as a file path within the system file tree.

My use would need a function as such that may be used as such:

db:open_from_path("/home/user/path/to/database/in/filesystem")

Do I misunderstand?

Liam R. E. Quin

11:33 p.m.

On Mon, 2022-02-21 at 20:00 -0500, Eric Levy wrote:

...

My use would need a function as such that may be used as such:

db:open_from_path("/home/user/path/to/database/in/filesystem")

For that you would want to set DBPATH in a separate BaseX instance, i think. Once a BaseX server is running it doesn't want to switch database directory.

You can have multiple databases in the same folder (identified by DBPATH) though. Each database is contained in its own subdirectory.

The standalone basex can be given a DBPATH option too. Or you could use symbolic links to map from database name to other directories.

Eric Levy

11:44 p.m.

On Mon, 2022-02-21 at 23:33 -0500, Liam R. E. Quin wrote:

...

On Mon, 2022-02-21 at 20:00 -0500, Eric Levy wrote:

...
My use would need a function as such that may be used as such:

db:open_from_path("/home/user/path/to/database/in/filesystem")

For that you would want to set DBPATH in a separate BaseX instance, i think. Once a BaseX server is running it doesn't want to switch database directory.

You can have multiple databases in the same folder (identified by DBPATH) though. Each database is contained in its own subdirectory.

The standalone basex can be given a DBPATH option too. Or you could use symbolic links to map from database name to other directories.

Yes, I think the suggestion was given previously for using DBPATH. I need to consider how easy it is to accept. It is a workaround, and gives me some hesitation about robustness and stability.

It seems also that BaseX writes files under the user home directory, which unfortunately is not an agreeable side effect for my application.

Bridger Dyson-Smith

22 Feb 22 Feb

3:25 a.m.

Hi Eric,

On Mon, Feb 21, 2022, 11:44 PM Eric Levy contact@ericlevy.name wrote:

...

On Mon, 2022-02-21 at 23:33 -0500, Liam R. E. Quin wrote:

...
On Mon, 2022-02-21 at 20:00 -0500, Eric Levy wrote:

...
My use would need a function as such that may be used as such:

db:open_from_path("/home/user/path/to/database/in/filesystem")

For that you would want to set DBPATH in a separate BaseX instance, i think. Once a BaseX server is running it doesn't want to switch database directory.

You can have multiple databases in the same folder (identified by DBPATH) though. Each database is contained in its own subdirectory.

The standalone basex can be given a DBPATH option too. Or you could use symbolic links to map from database name to other directories.

Yes, I think the suggestion was given previously for using DBPATH. I need to consider how easy it is to accept. It is a workaround, and gives me some hesitation about robustness and stability.

It seems also that BaseX writes files under the user home directory, which unfortunately is not an agreeable side effect for my application.

BaseX would write files to the specified directory, based on configuration. Perhaps the docs would help clarify [1].

HTH! Best, Bridger

[1] https://docs.basex.org/wiki/Configuration#Home_Directory

...

Eric Levy

4:12 a.m.

On Tue, 2022-02-22 at 03:25 -0500, Bridger Dyson-Smith wrote:

...

BaseX would write files to the specified directory, based on configuration. Perhaps the docs would help clarify [1].

HTH! Best, Bridger

[1] https://docs.basex.org/wiki/Configuration#Home_Directory

Ok, right. If DBPATH is set, then use of ~/basex is suppressed.

Christian Grün

7:51 a.m.

...

I feel my explanations are generally clear, especially in light of the context I offered, and I don't want to keep repeating myself.

Sorry, Eric. I assume no one here wanted to steal your valuable time. But if you are seeking help, it’s sometimes helpful to help others first.

...

The plan had been to prototype using a Bash script and query files, […]

That piece of information was helpful, for example. If you use the Bash, you will treat BaseX as a standalone application. You’ll definitely have more freedom if you use Java: For example, you can prevent BaseX from reading the configuration from disk [1]. See [2] for some more code examples.

Have fun! Christian

[1] https://github.com/BaseXdb/basex/blob/da1e55d0214e44c1532f121c282021db50a9aa... [2] https://docs.basex.org/wiki/Java_Examples

Eric Levy

4:43 p.m.

...

...
I feel my explanations are generally clear, especially in light of the context I offered, and I don't want to keep repeating myself.

Sorry, Eric. I assume no one here wanted to steal your valuable time. But if you are seeking help, it’s sometimes helpful to help others first.

I was trying to indicate that to clarify the issues in the preceding comments would have been simply to repeat my previous explanation. The other contributor's comments were deeply rooted in particular assumptions that stood in stark contrast to the scenario I had described. I would not want the discussion to begin to follow a circular path.

...

...
The plan had been to prototype using a Bash script and query files, […]

That piece of information was helpful, for example. If you use the Bash, you will treat BaseX as a standalone application. You’ll definitely have more freedom if you use Java: For example, you can prevent BaseX from reading the configuration from disk [1]. See [2] for some more code examples.

Perhaps it has not been apparent to me which pieces would need further explanation. To me, "embedded" and "standalone" carry the same implication in the context of a database, and of course, I mentioned the former term more than a few times.

...

...
Yes, I think the suggestion was given previously for using DBPATH. I need to consider how easy it is to accept. It is a workaround, and gives me some hesitation about robustness and stability.

You've come to BaseX with a predetermined view of how you'll use it, so it's not surprising if you start out by fighting it.

I think it's a rather distorted interpretation.

...

You've brought up SQLite many times as an example of how you want BaseX to behave. The "embeddable" nature of SQLite is not a feature the developer decided to add to an existing project, but the very purpose of its existence. SQLite was designed by a US military contractor specifically to be included in applications on naval ships without requiring the installation of a MySQL or PostgreSQL server.

In contrast, BaseX uses a server/client architecture like MySQL and PostgreSQL

Right. SQLite, MySQL, and PostgreSQL are explicitly framed as having either a server-based or an embedded design. In contrast, BaseX actually supports both modes.

Broadly, however, it seems that BaseX carries a variety of behaviors from the server design into standalone invocations, limiting its general usefulness in embedded use cases. The application flow required for use without a server is already developed. The request I would suggest is simply adding a bit of additional flexibility to the finer features in order to solidify support for embedded cases.

Liam R. E. Quin

10:57 a.m.

On Mon, 2022-02-21 at 23:44 -0500, Eric Levy wrote:

...

Yes, I think the suggestion was given previously for using DBPATH. I need to consider how easy it is to accept. It is a workaround, and gives me some hesitation about robustness and stability.

You've come to BaseX with a predetermined view of how you'll use it, so it's not surprising if you start out by fighting it.

...

It seems also that BaseX writes files under the user home directory, which unfortunately is not an agreeable side effect for my application.

BaseX determines an application home directory and puts files there; the default is the directory containing the BaseX distribution.

See the BaseX documentation wiki under Configuration: https://docs.basex.org/wiki/Configuration

A down side of this is that if you upgrade to a new BaseX version you have to migrate the configuration files; i've been bitten that way a couple of times and would prefer the files in ~/.config/basex but this way is cross-platform.

Please, be very specific in saying what you're trying to do. It might indeed be that some other XQuery engine - or something else entirely - will do a better job for you, or that there's a better configuration for BaseX for you, but we don't have enough information from you to suggest such things.

1242

Age (days ago)

1245

Last active (days ago)

basex-talk@mailman.uni-konstanz.de

23 comments

6 participants

tags (0)

participants (6)

Bridger Dyson-Smith
Christian Grün
Eric Levy
Imsieke, Gerrit, le-tex
Liam R. E. Quin
Tamara Marnell