Mm, the documentation says:
"Chops all leading and trailing whitespaces from text nodes while building a database, and discards empty text nodes. By default, this option is set to true, as it often reduces the database size by up to 50%. It can also be turned off on command line via -w."
The text states clearly that chopping affects only text nodes stored into a database.
At any rate - the problem remains, whether or not I use option -w, and whether or not I use prolog option db:chop. (Side remark: it would be a serious issue if the prolog option were required, as this would imply that standard conformant behaviour could only be achieved by making the code unportable.)
Kind regards, Hans-Juergen
-------------------------------------------- Dirk Kirsten dk@basex.org schrieb am Do, 20.3.2014:
Betreff: Re: [basex-talk] Bug (?) - trailing whitespace in text nodes An: "Hans-Juergen Rennau" hrennau@yahoo.de, "basex-talk@mailman.uni-konstanz.de" basex-talk@mailman.uni-konstanz.de Datum: Donnerstag, 20. März, 2014 18:28 Uhr
Dear Hans-Jürgen,
I am not quite sure it is intended that the CHOP option is applied to text nodes. At least the wording in the documentation ("while building a database") does not indicate it, while I think it actually does make sense. Christian will have to answer whether this works as is intended behavior.
However, you can set the chop option to false within your XQuery by declaring it declare option db:chop "false";
and this should also affect reading in files from the file system. At least this works for me.
Cheers, Dirk
On 20/03/14 17:47, Hans-Juergen Rennau wrote:
My understanding is that it only
affects database documents, and I used file input.
Nevertheless, I also
tried option -w, but always got the same result.
Kind regards, Hans-Jürgen
Hans-Juergen Rennau hrennau@yahoo.de
schrieb am 17:45 Donnerstag, 20.März 2014:
Dear Dirk, thank
you.
But this is
strange - I ran the query using a file as input - not a database document.
I tried two versions: BaseX 7.8.2 beta f505185
[Standalone]
BaseX 8.0 beta
606f18b [Standalone]
OS is Windows 7.
I always get this result: <para>xxx<emphasis
role="italic">abc</emphasis>yyy.</para>
Using a different
XQuery processor, I get this result, as expected:
<para>xxx <emphasis
role="italic">abc</emphasis> yyy.</para>
Kind regards,
Hans-Jürgen
Dirk Kirsten dk@basex.org schrieb
am 17:10 Donnerstag, 20.März 2014:
Dear Hans-Jürgen,
When running
local:edit() on an in-memory node I get the expected and
correct result, including whitespaces.
I guess that you run
this command on a database node and the XML
documents were parsed with CHOP being true
(which is the default), this
would
explain the behavior and would be as expected. If you do not want
this you might consider setting
the CHOP option (see
to false.
Cheers,
Dirk
On 20/03/14 16:57,
Hans-Juergen Rennau wrote:
Dear
BaseX team,
I think I observed a bug concerning
trailing whitespace in text nodes.
Please
consider this input document:
<para>xxx <emphasis
role="italic">abc</emphasis> yyy.</para>
[Note the
blanks between xxx and <emphasis>.]
The result of
the following "null-transformation"
=======================
declare function
local:edit($n as node())
as node()? { typeswitch($n) case document-node() return document {for $c in
$n/node() return local:edit($c)}
case element() return
element {node-name($n)}
{for $ac in $n/(@*, node()) return local:edit($ac)}
default
return $n
}; local:edit(.)
=======================
should be identical, but what I get is
this:
<para>xxx<emphasis role="italic">abc</emphasis>yyy.</para>
The
blanks after "xxx" are
gone!
When transforming mixed content like
docbook, this can have awkware consequences.
Kind
regards,
Hans-Jürgen
_______________________________________________
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
-- Dirk Kirsten, BaseX GmbH, http://basex.org |-- Firmensitz: Blarerstrasse 56, 78462 Konstanz |-- Registergericht Freiburg, HRB: 708285, Geschäftsführer: | Dr. Christian Grün, Dr. Alexander Holupirek, Michael Seiferle `-- Phone: 0049 7531 28 28 676, Fax: 0049 7531 20 05 22
Hi Hans-Jürgen,
"Chops all leading and trailing whitespaces from text nodes while building a database, and discards empty text nodes. By default, this option is set to true, as it often reduces the database size by up to 50%. It can also be turned off on command line via -w."
The text states clearly that chopping affects only text nodes stored into a database.
Just another indication that we continuously need to improve our documentation (we are looking for volunteers!). The chop option (which is one of the features that we introduced at a very early stage, but are hard to get out again) also applies to the -i flag which I assume you used to specify the input. When using -w...
basex -wi input.xml .
<para>xxx <emphasis role="italic">abc</emphasis> yyy.</para>
...I get the correct result.
(Side remark: it would be a serious issue if the prolog option were required, as this would imply that standard conformant behaviour could only be achieved by making the code unportable.)
Side answer: The situation is not ideal, but BaseX-specific prolog options won't at least cause any compatbility issues, because the option declaration will simply be ignored by other processors.
How did you proceed? Christian
HURRA! -wi fixes the problem! Thank you very much, Christian, and Dirk, too.
I had not understood that I must use -w in combination with i - what I had tried was -i ... -w .
Now I know how I can always avoid the problem (which tends to be necessary when dealing with mixed content, where of course embedded markup is usually preceded and following by whitespace.)
Problem solved, file closed, BaseX top.
Kind regards, Hans-Jürgen
Trailing remark - of course your side answer is true, I had not thought of that: options do not render the code unportable. Thanks for the reminder!
-------------------------------------------- Christian Grün christian.gruen@gmail.com schrieb am Do, 20.3.2014:
Betreff: Re: [basex-talk] Bug (?) - trailing whitespace in text nodes An: "Hans-Juergen Rennau" hrennau@yahoo.de CC: "basex-talk@mailman.uni-konstanz.de" basex-talk@mailman.uni-konstanz.de, "Dirk Kirsten" dk@basex.org Datum: Donnerstag, 20. März, 2014 22:38 Uhr
Hi Hans-Jürgen,
"Chops all leading
and trailing whitespaces from text nodes while building a database, and discards empty text nodes. By default, this option is set to true, as it often reduces the database size by up to 50%. It can also be turned off on command line via -w."
The text
states clearly that chopping affects only text nodes stored into a database.
Just another indication that we continuously need to improve our documentation (we are looking for volunteers!). The chop option (which is one of the features that we introduced at a very early stage, but are hard to get out again) also applies to the -i flag which I assume you used to specify the input. When using -w...
basex -wi input.xml .
<para>xxx <emphasis role="italic">abc</emphasis> yyy.</para>
...I get the correct result.
(Side remark: it would be a serious issue if the prolog option were required, as this would imply that standard conformant behaviour could only be achieved by making the code unportable.)
Side answer: The situation is not ideal, but BaseX-specific prolog options won't at least cause any compatbility issues, because the option declaration will simply be ignored by other processors.
How did you proceed? Christian
HURRA! -wi fixes the problem! Thank you very much, Christian, and Dirk, too.
Perfect. It's helpful to know that BaseX interprets all command-line flags from left to right.. This way, flags that have been activated for a first command/query/etc. can later be turned off again in a single basex call.
Have a good evening, Christian
I had not understood that I must use -w in combination with i - what I had tried was -i ... -w .
Now I know how I can always avoid the problem (which tends to be necessary when dealing with mixed content, where of course embedded markup is usually preceded and following by whitespace.)
Problem solved, file closed, BaseX top.
Kind regards, Hans-Jürgen
Trailing remark - of course your side answer is true, I had not thought of that: options do not render the code unportable. Thanks for the reminder!
Christian Grün christian.gruen@gmail.com schrieb am Do, 20.3.2014:
Betreff: Re: [basex-talk] Bug (?) - trailing whitespace in text nodes An: "Hans-Juergen Rennau" hrennau@yahoo.de CC: "basex-talk@mailman.uni-konstanz.de" basex-talk@mailman.uni-konstanz.de, "Dirk Kirsten" dk@basex.org Datum: Donnerstag, 20. März, 2014 22:38 Uhr
Hi Hans-Jürgen,
"Chops all leading
and trailing whitespaces from text nodes while building a database, and discards empty text nodes. By default, this option is set to true, as it often reduces the database size by up to 50%. It can also be turned off on command line via -w."
The text
states clearly that chopping affects only text nodes stored into a database.
Just another indication that we continuously need to improve our documentation (we are looking for volunteers!). The chop option (which is one of the features that we introduced at a very early stage, but are hard to get out again) also applies to the -i flag which I assume you used to specify the input. When using -w...
basex -wi input.xml .
<para>xxx <emphasis role="italic">abc</emphasis> yyy.</para>
...I get the correct result.
(Side remark: it would be a serious issue if the prolog option were required, as this would imply that standard conformant behaviour could only be achieved by making the code unportable.)
Side answer: The situation is not ideal, but BaseX-specific prolog options won't at least cause any compatbility issues, because the option declaration will simply be ignored by other processors.
How did you proceed? Christian
I have added an issue on the effects of (XML) parsing options; you are invited to leave comments:
https://github.com/BaseXdb/basex/issues/905 __________________________________
On Thu, Mar 20, 2014 at 11:15 PM, Christian Grün christian.gruen@gmail.com wrote:
HURRA! -wi fixes the problem! Thank you very much, Christian, and Dirk, too.
Perfect. It's helpful to know that BaseX interprets all command-line flags from left to right.. This way, flags that have been activated for a first command/query/etc. can later be turned off again in a single basex call.
Have a good evening, Christian
I had not understood that I must use -w in combination with i - what I had tried was -i ... -w .
Now I know how I can always avoid the problem (which tends to be necessary when dealing with mixed content, where of course embedded markup is usually preceded and following by whitespace.)
Problem solved, file closed, BaseX top.
Kind regards, Hans-Jürgen
Trailing remark - of course your side answer is true, I had not thought of that: options do not render the code unportable. Thanks for the reminder!
Christian Grün christian.gruen@gmail.com schrieb am Do, 20.3.2014:
Betreff: Re: [basex-talk] Bug (?) - trailing whitespace in text nodes An: "Hans-Juergen Rennau" hrennau@yahoo.de CC: "basex-talk@mailman.uni-konstanz.de" basex-talk@mailman.uni-konstanz.de, "Dirk Kirsten" dk@basex.org Datum: Donnerstag, 20. März, 2014 22:38 Uhr
Hi Hans-Jürgen,
"Chops all leading
and trailing whitespaces from text nodes while building a database, and discards empty text nodes. By default, this option is set to true, as it often reduces the database size by up to 50%. It can also be turned off on command line via -w."
The text
states clearly that chopping affects only text nodes stored into a database.
Just another indication that we continuously need to improve our documentation (we are looking for volunteers!). The chop option (which is one of the features that we introduced at a very early stage, but are hard to get out again) also applies to the -i flag which I assume you used to specify the input. When using -w...
basex -wi input.xml .
<para>xxx <emphasis role="italic">abc</emphasis> yyy.</para>
...I get the correct result.
(Side remark: it would be a serious issue if the prolog option were required, as this would imply that standard conformant behaviour could only be achieved by making the code unportable.)
Side answer: The situation is not ideal, but BaseX-specific prolog options won't at least cause any compatbility issues, because the option declaration will simply be ignored by other processors.
How did you proceed? Christian
basex-talk@mailman.uni-konstanz.de