I am experiencing unexpected behavior with a database I am working with in BaseX 8.3.1. The database is a collection of information about trials in the late Roman Republic (see http://tlrr.blackmesatech.com/ for more information), and while the upper-level elements have only element content, most of the actual data values are mixed content.
I reloaded the data the other day, having run some cleanup processes on it to regularize the whitespace and make the XML source more readable. In one trial record, for example, the information about the defendant looks like this:
<defGrp> <defendant> <namelist> <person-entry> <person pid="pSulpicius58Ser.Galba" ix="2" form="Sulpicius (+58), Ser. Galba" >Ser. Sulpicius Galba (58)</person> cos. 144 spoke <i>pro se</i> (<i>ORF</i> 19.II, III)</person-entry> </namelist> </defendant> </defGrp>
(This says that the defendant in the case was one Servius Sulpicius Galba, whose biography is given as the 58th entry under "Galba" in the Pauly/Wissowa Reallexikon, that this man was consul in 144 BC, that he spoke on his own behalf, and that the extant fragments of his speech are printed in the collection Oratorum Romanorum Fragmenta (ORF) as items 19.II and 19.III.)
After a little research, I learned (I think) how to make the default settings for the database have the value CHOP = false (I call db:create($dbname,(),(), map{ "chop": false{}) to create the db), and also (redundantly, I hope) to specify CHOP = false as an option on the db:add() and db:replace() calls I am using to reload records in the database.
When the web front end retrieves the individual trial record whose defendant information is shown above, I get a result that looks essentially like what is shown above. When a different query retrieves just portions of the trial record, using the expression
<trial id="{$e/@id}" tlrr1="{$e/@tlrr1}" doc="{document-uri(root($e))}">{ $e/date, $e/ccGrp, $e/defGrp(: /defendant :), $e/ppGrp(: /prosecutor :), $e/partiesGrp, $e/advGrp }</trial>
the defendant information looks like this, according to both Safari and Opera:
<defGrp> <defendant> <namelist> <person-entry> <person pid="pSulpicius58Ser.Galba" ix="2" form="Sulpicius (+58), Ser. Galba">Ser. Sulpicius Galba (58)</person>cos. 144 spoke<i>pro se</i>(<i>ORF</i>19.II, III)</person-entry> </namelist> </defendant> </defGrp>
Note that within the person-entry element, the whitespace adjacent to the 'person' and 'i' elements has disappeared.
It looks almost as if some queries were stripping whitespace as part of the query, or as part of returning a result. To confuse me even further, dynamic queries using the dba application on the server return data with the whitespace chopped.
Is there something obvious I am overlooking or doing wrong?
Actually, I guess i have two questions: first, I'd like to figure out why BaseX is currently behaving as it does. And then I'd like to make it behave differently.
I realize now that the documents I just updated all had xml:space="preserve" on their root elements, because I couldn't make this work last time I tried, either. I would much much rather avoid resorting to that again, if I can, since it feels like a hack and it complicates processing of the data.
I will try to construct a minimum repeatable example that illustrates the problem, but I have not done so yet.
thanks for any help anyone on the list can provide,
Michael