Dear Michael,
As you correctly guessed, if you want to preserve whitespaces, you will need to set the CHOP option to false. I remember there has been discussion around this option on this list more than once. It turned out it would cause quite a lot of surprises if we changed the default to 'false', because the visualizations, the database layout etc. have been tailored to work best without superfluous whitespaces. But whitespace chopping is surely not what you would expect when working with full-text [1].
By the way, I never stopped wondering why only 'preserve' and 'default' are allowed as values for the xml:space attribute. As one of the renowned editors of the spec, can you tell why a 'strip' value was omitted back then?
Please note that 'chop' in combination with db:create will only get effective if you specify actual input with this command [2]. If you want to globally deactivated whitespace chopping, you can specify this option in the .basex configuration file or (if you are working with RESTXQ, REST, etc.), add it in the web.xml file.
Hope this helps, Christian
[1] http://docs.basex.org/wiki/Full-Text#Mixed_Content [2] http://docs.basex.org/wiki/Database_Module#db:create
On Fri, Jun 17, 2016 at 8:31 PM, C. M. Sperberg-McQueen cmsmcq@blackmesatech.com wrote:
On Jun 16, 2016, at 10:32 PM, C. M. Sperberg-McQueen wrote:
I am experiencing unexpected behavior with a database I am working with in BaseX 8.3.1. ...
I will try to construct a minimum repeatable example that illustrates the problem, but I have not done so yet.
One attempt to reduce things to the essentials is:
- I've placed one input document at
http://tlrr.blackmesatech.com/2016/06/ZAA.xml
- Running curl http://tlrr.blackmesatech.com/2016/06/ZAA.xml | grep person
shows whitespace in the data, as show in (A) below. As can be seen, I've added xml:space to one 'person-entry' element as an experiment.
- Running the following updating query in the Queries interface in the
database server produces a 'Query successful' method.
(: load a single document :)
let $options-map := map { "chop": false(), "intparse": true() }
let $host := "http://tlrr.blackmesatech.com", $path := "trials/ZAA.xml", $uri := concat($host, '/2016/06/ZAA.xml'), $doc := doc($uri)
return db:replace('tlrr1-alpha', $path, $doc, $options-map)
- Running the command
curl --user ... http://modeleditions.blackmesatech.com/BaseX831/rest/tlrr1-alpha/trials/ZAA.... | grep person
with a userid assigned read-only access to the database produces the results shown in (B) below, which shows that whitespace being stripped, despite (a) the database having been created with
db:create('tlrr1-alpha',(),(), map { "chop" : false() })
and the use of map { "chop" : false() } in the update query shown above.
5 Trying this with "chop" : false(), "chop" : 'false', "chop" : 0 does not change the result. Nor does "chop" : true(), which I tried just in case I was reading the documentation wrong. Including "intparse" : true() also has no visible effect.
As the URI of the REST interface suggests, the server is running BaseX 8.3.1. The documentation says the XML parsing options were added to db:replace in version 7.9.
I'm close to my wits' end. Why is the CHOP option not working as advertised? Or what am I doing wrong in trying to set it?
Michael
p.s. the earlier report that some queries returned stripped text nodes and others returned unstripped text nodes appears to be irreproducible. Perhaps it was caused by stale caches.
......
(A) output of curl http://tlrr.blackmesatech.com/2016/06/ZAA.xml | grep person
Note white space after 'person' elements and elsewhere.
<person-entry> <person pid="pSulpicius58Ser.Galba" form="Sulpicius (+58), Ser. Galba">Ser. Sulpicius Galba (58)</person> cos. 144 spoke <i>pro se</i> (<i>ORF</i> 19.II, III)</person-entry> <person-entry xml:space="preserve"> <person pid="pFulvius95Q.Nobilior" ix="3" form="Fulvius (+95), Q. Nobilior">Q. Fulvius Nobilior (95)</person> cos. 153, cens. 136</person-entry> <person-entry> <person pid="pCornelius91L.Cethegus" form="Cornelius (+91), L. Cethegus">L. Cornelius Cethegus (91)</person> </person-entry> <person-entry> <person pid="pPorcius9M.Cato" ix="4" form="Porcius (++9), M. Cato">M. Porcius Cato (9)</person> cos. 195, cens. 184 (<i>ORF</i> 8.LI)</person-entry> <person-entry> <person pid="pScribonius18L.Libo" ix="4" form="Scribonius (+18), L. Libo">L. Scribonius Libo (18)</person> tr. pl. 149 (<i>promulgator</i>)</person-entry>
(B) output of curl --user ... http://modeleditions.blackmesatech.com/BaseX831/rest/tlrr1-alpha/trials/ZAA.... | grep person
Note absence of whitespace after 'person' elements, except in the entry for Q. Fulvius Nobilior.
<person-entry> <person pid="pSulpicius58Ser.Galba" ix="2" form="Sulpicius (+58), Ser. Galba">Ser. Sulpicius Galba (58)</person>cos. 144 spoke<i>pro se</i>(<i>ORF</i>19.II, III)</person-entry> <person-entry xml:space="preserve"> <person pid="pFulvius95Q.Nobilior" ix="3" form="Fulvius (+95), Q. Nobilior">Q. Fulvius Nobilior (95)</person> cos. 153, cens. 136</person-entry> <person-entry> <person pid="pCornelius91L.Cethegus" ix="4" form="Cornelius (+91), L. Cethegus">L. Cornelius Cethegus (91)</person> </person-entry> <person-entry> <person pid="pPorcius9M.Cato" ix="4" form="Porcius (++9), M. Cato">M. Porcius Cato (9)</person>cos. 195, cens. 184 (<i>ORF</i>8.LI)</person-entry> <person-entry> <person pid="pScribonius18L.Libo" ix="4" form="Scribonius (+18), L. Libo">L. Scribonius Libo (18)</person>tr. pl. 149 (<i>promulgator</i>)</person-entry>
--
- C. M. Sperberg-McQueen, Black Mesa Technologies LLC
- http://www.blackmesatech.com
- http://cmsmcq.com/mib
- http://balisage.net