Greetings!
I have a directory of files that I use to create a database, using the directory as my starting point.
On Debian, the files appear in this order:
01-matthew.xml
02-mark.xml
03-luke.xml
04-john.xml
If I create an empty database and then add the files one by one, in that order, then //title[@type="main"]
returns:
ΚΑΤΑ ΜΑΘΘΑΙΟΝ
ΚΑΤΑ ΜΑΡΚΟΝ
ΚΑΤΑ ΙΩΑΝΗΝ
ΚΑΤΑ ΙΩΑΝΗΝ
On the other hand, if I start at the directory, add xml files, the database is created and //title[@type="main"]
returns:
ΚΑΤΑ ΙΩΑΝΗΝ
ΠΡΟΣ ΕΦΕΣΙΟΥΣ
ΚΑΤΑ ΜΑΘΘΑΙΟΝ
ΙΩΑΝΟΥ Α
Question (sorry for the long setup): Is the sort order of manually added files an artifact of the manual addition? That is I cannot expect users across systems to experience the same load order as I do?
Asking because I'm writing documentation along with example queries for several datasets and documenting random results, isn't likely to be helpful to readers.
I am aware I can use db:list(database-name) to obtain and sort on the original file names but that takes me away from the file content.
Ah, BaseX 9.7.1 on Debian.
You know, the files don't have internal sort attributes. ntorder = "digit" would be enough to sort on that attribute and return the book names in the expected order.
I'm sure there are better solutions, suggestions appreciated!
Patrick
On Wed, May 04, 2022 at 04:29:04PM -0400, Patrick Durusau scripsit:
You know, the files don't have internal sort attributes. ntorder = "digit" would be enough to sort on that attribute and return the book names in the expected order.
I'm sure there are better solutions, suggestions appreciated!
In principle, it's preferable to put in the metadata in the content as semantically meaningful things; perhaps <sort-title>matthew</sort-title> for this use case.
If there's no naturally occuring ordering in the semantically meaningful thing, you can write an ordering function or just map sort titles to something that has the ordering you want.
let $sequenceMap as map(xs:string,xs:integer) := map { 'matthew': 10, 'mark': 20, 'luke':30, 'john': 40 }
lets you do something like
order by $sequenceMap(sort-title)
$sequenceMap ought to be
declare variable $namespace:sequenceMap as map(xs:string,xs:integer) := map { 'matthew': 10, 'mark': 20, 'luke':30, 'john': 40 }
in the function module if you're doing this for real; some single place to have the ordering.
If you're going to sort differently for different languages, it turns into a map of per-language $sequenceMap style maps.
order by $sequenceMap(@xml:lang)(sort-title)
It keeps the ordering in one place, you can adjust it independently of the content, and the "we sort on this" label in the content can have some human meaning. You can also add a wrapper layer outside language if you want to support variation about which books and what order for different use cases:
order by $sequenceMap($heretics)(@xml:lang)(sort-title)
Thanks to both you and Christian!
Yes to the adding metadata, it would be cleaner and enable me to avoid more advance XQuery aspects at this point in the introduction.
Hope you are looking forward to a great weekend!
Patrick
On 5/4/22 17:07, Graydon wrote:
On Wed, May 04, 2022 at 04:29:04PM -0400, Patrick Durusau scripsit:
You know, the files don't have internal sort attributes. ntorder = "digit" would be enough to sort on that attribute and return the book names in the expected order.
I'm sure there are better solutions, suggestions appreciated!
In principle, it's preferable to put in the metadata in the content as semantically meaningful things; perhaps <sort-title>matthew</sort-title> for this use case.
If there's no naturally occuring ordering in the semantically meaningful thing, you can write an ordering function or just map sort titles to something that has the ordering you want.
let $sequenceMap as map(xs:string,xs:integer) := map { 'matthew': 10, 'mark': 20, 'luke':30, 'john': 40 }
lets you do something like
order by $sequenceMap(sort-title)
$sequenceMap ought to be
declare variable $namespace:sequenceMap as map(xs:string,xs:integer) := map { 'matthew': 10, 'mark': 20, 'luke':30, 'john': 40 }
in the function module if you're doing this for real; some single place to have the ordering.
If you're going to sort differently for different languages, it turns into a map of per-language $sequenceMap style maps.
order by $sequenceMap(@xml:lang)(sort-title)
It keeps the ordering in one place, you can adjust it independently of the content, and the "we sort on this" label in the content can have some human meaning. You can also add a wrapper layer outside language if you want to support variation about which books and what order for different use cases:
order by $sequenceMap($heretics)(@xml:lang)(sort-title)
Hi Patrick,
The order in which files are imported depends on both your operating system and file system; we don’t change the order of the retrieved paths.
A custom order can be enforced by specifying all single paths to db:create. The following example is slightly more complex: It sorts file names (9.xml, 10.xml) in a numeric way:
let $root := file:base-dir() || 'x/' let $paths := ( for $file in file:list($root) order by number(replace($file, '..*', '')) return $root || $file ) return db:create('db', $paths) Hope this helps, Christian
On Wed, May 4, 2022 at 10:29 PM Patrick Durusau patrick@durusau.net wrote:
Greetings!
I have a directory of files that I use to create a database, using the directory as my starting point.
On Debian, the files appear in this order:
01-matthew.xml
02-mark.xml
03-luke.xml
04-john.xml
If I create an empty database and then add the files one by one, in that order, then //title[@type="main"]
returns:
ΚΑΤΑ ΜΑΘΘΑΙΟΝ
ΚΑΤΑ ΜΑΡΚΟΝ
ΚΑΤΑ ΙΩΑΝΗΝ
ΚΑΤΑ ΙΩΑΝΗΝ
On the other hand, if I start at the directory, add xml files, the database is created and //title[@type="main"]
returns:
ΚΑΤΑ ΙΩΑΝΗΝ
ΠΡΟΣ ΕΦΕΣΙΟΥΣ
ΚΑΤΑ ΜΑΘΘΑΙΟΝ
ΙΩΑΝΟΥ Α
Question (sorry for the long setup): Is the sort order of manually added files an artifact of the manual addition? That is I cannot expect users across systems to experience the same load order as I do?
Asking because I'm writing documentation along with example queries for several datasets and documenting random results, isn't likely to be helpful to readers.
I am aware I can use db:list(database-name) to obtain and sort on the original file names but that takes me away from the file content.
Ah, BaseX 9.7.1 on Debian.
You know, the files don't have internal sort attributes. ntorder = "digit" would be enough to sort on that attribute and return the book names in the expected order.
I'm sure there are better solutions, suggestions appreciated!
Patrick
-- Patrick Durusau patrick@durusau.net Technical Advisory Board, OASIS (TAB) Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300 Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)
Another Word For It (blog): http://tm.durusau.net Homepage: http://www.durusau.net Twitter: patrickDurusau
basex-talk@mailman.uni-konstanz.de