Greetings!
I've been tasked with using BaseX to produce:
*****
<wg class="cl" rule="S-IO" cltype="VerbElided"> <wg rule="NpaNp" role="s"> <wg type="group" appositioncontainer="true" rule="Np-Appos"> <w ref="PHM 1:1!1" after=" " class="noun" gbiType="proper" xml:id="n57001001001" lemma="Παῦλος" normalized="Παῦλος" strong="3972" number="singular" gender="masculine" case="nominative" gloss="Paul" domain="093001" ln="93.294a" morph="N-NSM" unicode="Παῦλος">Παῦλος</w>
*****
The indenting is easy enough and I can even make it deeper if required but is there a command for serialization that will properly format the attributes?
My personal suspicion is that inserting \n when each attribute is serialized (and not on the last one) is the easier route but I promised to investigate the command line.
Have I overlooked something in the very fine manual?
Hope everyone is having a great week!
Patrick
Hi Patrick,
There’s currently no serialization parameter to control the custom indentation of attributes.
If I get you correctly, you’d like to get attributes indented if the string length of the element name and the attributes exceed a specific maximum length?
Best, Christian
On Mon, Feb 13, 2023 at 9:10 PM Patrick Durusau patrick@durusau.net wrote:
Greetings!
I've been tasked with using BaseX to produce:
<wg class="cl" rule="S-IO" cltype="VerbElided"> <wg rule="NpaNp" role="s"> <wg type="group" appositioncontainer="true" rule="Np-Appos"> <w ref="PHM 1:1!1" after=" " class="noun" gbiType="proper" xml:id="n57001001001" lemma="Παῦλος" normalized="Παῦλος" strong="3972" number="singular" gender="masculine" case="nominative" gloss="Paul" domain="093001" ln="93.294a" morph="N-NSM" unicode="Παῦλος">Παῦλος</w>
The indenting is easy enough and I can even make it deeper if required but is there a command for serialization that will properly format the attributes?
My personal suspicion is that inserting \n when each attribute is serialized (and not on the last one) is the easier route but I promised to investigate the command line.
Have I overlooked something in the very fine manual?
Hope everyone is having a great week!
Patrick
-- Patrick Durusau patrick@durusau.net Technical Advisory Board, OASIS (TAB) Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300 Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)
Another Word For It (blog): http://tm.durusau.net Homepage: http://www.durusau.net Twitter: patrickDurusau
A call from the backbench: I think it would be interesting to have such a serialization option! The esthetic aspect of XML can be important, depending on context. What we get without such an option looks like a heap of information. <wg class="cl" rule="S-IO" cltype="VerbElided"> <wg rule="NpaNp" role="s"> <wg type="group" appositioncontainer="true" rule="Np-Appos"> <w ref="PHM 1:1!1" after=" " class="noun" gbiType="proper" xml:id="n57001001001" lemma="Παῦλος" normalized="Παῦλος" strong="3972" number="singular" gender="masculine" case="nominative" gloss="Paul" domain="093001" ln="93.294a" morph="N-NSM" unicode="Παῦλος">Παῦλος</w> </wg> </wg></wg>
Am Dienstag, 14. Februar 2023 um 07:30:46 MEZ hat Christian Grün christian.gruen@gmail.com Folgendes geschrieben:
Hi Patrick,
There’s currently no serialization parameter to control the custom indentation of attributes.
If I get you correctly, you’d like to get attributes indented if the string length of the element name and the attributes exceed a specific maximum length?
Best, Christian
On Mon, Feb 13, 2023 at 9:10 PM Patrick Durusau patrick@durusau.net wrote:
Greetings!
I've been tasked with using BaseX to produce:
<wg class="cl" rule="S-IO" cltype="VerbElided"> <wg rule="NpaNp" role="s"> <wg type="group" appositioncontainer="true" rule="Np-Appos"> <w ref="PHM 1:1!1" after=" " class="noun" gbiType="proper" xml:id="n57001001001" lemma="Παῦλος" normalized="Παῦλος" strong="3972" number="singular" gender="masculine" case="nominative" gloss="Paul" domain="093001" ln="93.294a" morph="N-NSM" unicode="Παῦλος">Παῦλος</w>
The indenting is easy enough and I can even make it deeper if required but is there a command for serialization that will properly format the attributes?
My personal suspicion is that inserting \n when each attribute is serialized (and not on the last one) is the easier route but I promised to investigate the command line.
Have I overlooked something in the very fine manual?
Hope everyone is having a great week!
Patrick
-- Patrick Durusau patrick@durusau.net Technical Advisory Board, OASIS (TAB) Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300 Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)
Another Word For It (blog): http://tm.durusau.net Homepage: http://www.durusau.net Twitter: patrickDurusau
Christian,
Ah, no, it isn't a length of element name + attribute but the ability to align attributes for an element as you see in my post for the <w element. Each key/value is followed by a line return.
In the mean time, the current version of tidy has been added to the workflow to produce the desired results.
But it would be great to have it native to BaseX!
Thanks!
Patrick
On 2/14/23 01:30, Christian Grün wrote:
Hi Patrick,
There’s currently no serialization parameter to control the custom indentation of attributes.
If I get you correctly, you’d like to get attributes indented if the string length of the element name and the attributes exceed a specific maximum length?
Best, Christian
On Mon, Feb 13, 2023 at 9:10 PM Patrick Durusau patrick@durusau.net wrote:
Greetings!
I've been tasked with using BaseX to produce:
<wg class="cl" rule="S-IO" cltype="VerbElided"> <wg rule="NpaNp" role="s"> <wg type="group" appositioncontainer="true" rule="Np-Appos"> <w ref="PHM 1:1!1" after=" " class="noun" gbiType="proper" xml:id="n57001001001" lemma="Παῦλος" normalized="Παῦλος" strong="3972" number="singular" gender="masculine" case="nominative" gloss="Paul" domain="093001" ln="93.294a" morph="N-NSM" unicode="Παῦλος">Παῦλος</w>
The indenting is easy enough and I can even make it deeper if required but is there a command for serialization that will properly format the attributes?
My personal suspicion is that inserting \n when each attribute is serialized (and not on the last one) is the easier route but I promised to investigate the command line.
Have I overlooked something in the very fine manual?
Hope everyone is having a great week!
Patrick
-- Patrick Durusau patrick@durusau.net Technical Advisory Board, OASIS (TAB) Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300 Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)
Another Word For It (blog): http://tm.durusau.net Homepage: http://www.durusau.net Twitter: patrickDurusau
Hi Patrick
I noticed that the attributes for the wg element had not been aligned, so I was wondering if you were thinking of a more advanced rule.
Or would you possibly like to supply the names of the elements for which the alignment should take place?
Best, Christian
Patrick Durusau patrick@durusau.net schrieb am Mi., 15. Feb. 2023, 03:51:
Christian,
Ah, no, it isn't a length of element name + attribute but the ability to align attributes for an element as you see in my post for the <w element. Each key/value is followed by a line return.
In the mean time, the current version of tidy has been added to the workflow to produce the desired results.
But it would be great to have it native to BaseX!
Thanks!
Patrick
On 2/14/23 01:30, Christian Grün wrote:
Hi Patrick,
There’s currently no serialization parameter to control the custom indentation of attributes.
If I get you correctly, you’d like to get attributes indented if the string length of the element name and the attributes exceed a specific maximum length?
Best, Christian
On Mon, Feb 13, 2023 at 9:10 PM Patrick Durusau patrick@durusau.net
wrote:
Greetings!
I've been tasked with using BaseX to produce:
<wg class="cl" rule="S-IO" cltype="VerbElided"> <wg rule="NpaNp" role="s"> <wg type="group" appositioncontainer="true"
rule="Np-Appos">
<w ref="PHM 1:1!1" after=" " class="noun" gbiType="proper" xml:id="n57001001001" lemma="Παῦλος" normalized="Παῦλος" strong="3972" number="singular" gender="masculine" case="nominative" gloss="Paul" domain="093001" ln="93.294a" morph="N-NSM" unicode="Παῦλος">Παῦλος</w>
The indenting is easy enough and I can even make it deeper if required but is there a command for serialization that will properly format the attributes?
My personal suspicion is that inserting \n when each attribute is serialized (and not on the last one) is the easier route but I promised to investigate the command line.
Have I overlooked something in the very fine manual?
Hope everyone is having a great week!
Patrick
-- Patrick Durusau patrick@durusau.net Technical Advisory Board, OASIS (TAB) Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300 Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)
Another Word For It (blog): http://tm.durusau.net Homepage: http://www.durusau.net Twitter: patrickDurusau
-- Patrick Durusau patrick@durusau.net Technical Advisory Board, OASIS (TAB) Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300 Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)
Another Word For It (blog): http://tm.durusau.net Homepage: http://www.durusau.net Twitter: patrickDurusau
Hi Christian,
Currently, I am using HTML tidy to reformat the XML output. It gives me the formatting I need, which is Git-diff friendly.
Jonathan
$ nodes % tidy --version
HTML Tidy for Apple macOS version 5.6.0
$ nodes % tidy -config tidy.config 03-luke.xml
Sample Output:
<?xml version="1.0"?> <Sentences> <Sentence ref="LUK 1:1!1-1:4!8"> <Trees> <Tree> <Node Cat="S" Head="0" nodeId="420010010010421"> <Node Cat="CL" Start="0" End="41" Rule="ClCl" Head="0" nodeId="420010010010420"> <Node Cat="CL" Start="0" End="33" Rule="ClCl" Head="0" nodeId="420010010010340"> <Node Cat="CL" Start="0" End="31" Rule="ClCl2" Head="1" nodeId="420010010010320"> <Node Cat="CL" Start="0" End="22" Rule="sub-CL" nodeId="420010010010230"> <Node xml:id="n42001001001" ref="LUK 1:1!1" Cat="conj" Start="0" End="0" StrongNumber="1895" UnicodeLemma="ἐπειδήπερ" FunctionalTag="CONJ" Type="" morphId="42001001001" NormalizedForm="Ἐπειδήπερ" Unicode="Ἐπειδήπερ" FormalTag="CONJ"
tidy.config
add-xml-decl: true drop-empty-paras: false fix-backslash: false fix-bad-comments: false fix-uri: false input-xml: true join-styles: false literal-attributes: true lower-literals: false output-xml: true preserve-entities: true quote-ampersand: false quote-marks: false quote-nbsp: false
indent: auto indent-attributes: true indent-spaces: 4 tab-size: 4 vertical-space: true wrap: 150
char-encoding: utf8 input-encoding: utf8 newline: CRLF output-encoding: utf8
quiet: true
On Wed, Feb 15, 2023 at 3:06 AM Christian Grün christian.gruen@gmail.com wrote:
Hi Patrick
I noticed that the attributes for the wg element had not been aligned, so I was wondering if you were thinking of a more advanced rule.
Or would you possibly like to supply the names of the elements for which the alignment should take place?
Best, Christian
Patrick Durusau patrick@durusau.net schrieb am Mi., 15. Feb. 2023, 03:51:
Christian,
Ah, no, it isn't a length of element name + attribute but the ability to align attributes for an element as you see in my post for the <w element. Each key/value is followed by a line return.
In the mean time, the current version of tidy has been added to the workflow to produce the desired results.
But it would be great to have it native to BaseX!
Thanks!
Patrick
On 2/14/23 01:30, Christian Grün wrote:
Hi Patrick,
There’s currently no serialization parameter to control the custom indentation of attributes.
If I get you correctly, you’d like to get attributes indented if the string length of the element name and the attributes exceed a specific maximum length?
Best, Christian
On Mon, Feb 13, 2023 at 9:10 PM Patrick Durusau patrick@durusau.net
wrote:
Greetings!
I've been tasked with using BaseX to produce:
<wg class="cl" rule="S-IO" cltype="VerbElided"> <wg rule="NpaNp" role="s"> <wg type="group" appositioncontainer="true"
rule="Np-Appos">
<w ref="PHM 1:1!1" after=" " class="noun" gbiType="proper" xml:id="n57001001001" lemma="Παῦλος" normalized="Παῦλος" strong="3972" number="singular" gender="masculine" case="nominative" gloss="Paul" domain="093001" ln="93.294a" morph="N-NSM" unicode="Παῦλος">Παῦλος</w>
The indenting is easy enough and I can even make it deeper if required but is there a command for serialization that will properly format the attributes?
My personal suspicion is that inserting \n when each attribute is serialized (and not on the last one) is the easier route but I promised to investigate the command line.
Have I overlooked something in the very fine manual?
Hope everyone is having a great week!
Patrick
-- Patrick Durusau patrick@durusau.net Technical Advisory Board, OASIS (TAB) Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300 Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)
Another Word For It (blog): http://tm.durusau.net Homepage: http://www.durusau.net Twitter: patrickDurusau
-- Patrick Durusau patrick@durusau.net Technical Advisory Board, OASIS (TAB) Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300 Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)
Another Word For It (blog): http://tm.durusau.net Homepage: http://www.durusau.net Twitter: patrickDurusau
Hi Jonathan,
Thanks for sharing your tidy settings.
With the given configuration, all attributes except for the first are returned in a separate line…
<wg class="cl" rule="S-IO" cltype="VerbElided">
In Patrick’s example, some attributes were returned in a single line (possibly depending on the expected string length). Maybe it was generated via Saxon (just a guess):
<wg class="cl" rule="S-IO" cltype="VerbElided">
Do you have a preference which representation would be required, or do you think the details are not that relevant?
We could possibly add a custom serialization parameter similar to tidy’s 'indent-attributes' option, and it would probably be easier to ignore the expected string length.
All the best, Christian
Hi Christian,
I prefer to be able to require one attribute per line. This is important for Git diffs, which are the main reason we care.
Jonathan
On Wed, Feb 15, 2023 at 11:31 AM Christian Grün christian.gruen@gmail.com wrote:
Hi Jonathan,
Thanks for sharing your tidy settings.
With the given configuration, all attributes except for the first are returned in a separate line…
<wg class="cl" rule="S-IO" cltype="VerbElided">
In Patrick’s example, some attributes were returned in a single line (possibly depending on the expected string length). Maybe it was generated via Saxon (just a guess):
<wg class="cl" rule="S-IO" cltype="VerbElided">
Do you have a preference which representation would be required, or do you think the details are not that relevant?
We could possibly add a custom serialization parameter similar to tidy’s 'indent-attributes' option, and it would probably be easier to ignore the expected string length.
All the best, Christian
Hi Jonathan,
I think we can offer you a solution soon. I have created a GitHub issue to document the progress [1].
All the best, Christian
[1] https://github.com/BaseXdb/basex/issues/2174
Jonathan Robie jonathan.robie@gmail.com schrieb am Mi., 15. Feb. 2023, 19:05:
Hi Christian,
I prefer to be able to require one attribute per line. This is important for Git diffs, which are the main reason we care.
Jonathan
On Wed, Feb 15, 2023 at 11:31 AM Christian Grün christian.gruen@gmail.com wrote:
Hi Jonathan,
Thanks for sharing your tidy settings.
With the given configuration, all attributes except for the first are returned in a separate line…
<wg class="cl" rule="S-IO" cltype="VerbElided">
In Patrick’s example, some attributes were returned in a single line (possibly depending on the expected string length). Maybe it was generated via Saxon (just a guess):
<wg class="cl" rule="S-IO" cltype="VerbElided">
Do you have a preference which representation would be required, or do you think the details are not that relevant?
We could possibly add a custom serialization parameter similar to tidy’s 'indent-attributes' option, and it would probably be easier to ignore the expected string length.
All the best, Christian
Hi Jonathan, hi Patrick,
The new serialization parameter 'indent-attributes' is already available [1]:
(: provided globally :) declare option output:indent 'yes'; declare option output:indent-attributes 'yes'; <e a='a' b='b' c='c'/>
(: provided locally :) serialize( <e a='a' b='b' c='c'/>, map { 'indent-attributes': true(), 'indent': true() } )
Result: <e a="a" b="b" c="c"/>
Thank you to Gunther Rademacher, who contributed the code solution!
A new stable snapshot is available [2]. The serialization parameter may officially be supported with XQuery 4 [3].
Hope this helps, Christian
[1] https://docs.basex.org/wiki/Serialization [2] https://files.basex.org/releases/latest/ [3] https://github.com/qt4cg/qtspecs/issues/358#issuecomment-1436595401
On Wed, Feb 15, 2023 at 7:05 PM Jonathan Robie jonathan.robie@gmail.com wrote:
Hi Christian,
I prefer to be able to require one attribute per line. This is important for Git diffs, which are the main reason we care.
Jonathan
On Wed, Feb 15, 2023 at 11:31 AM Christian Grün christian.gruen@gmail.com wrote:
Hi Jonathan,
Thanks for sharing your tidy settings.
With the given configuration, all attributes except for the first are returned in a separate line…
<wg class="cl" rule="S-IO" cltype="VerbElided">
In Patrick’s example, some attributes were returned in a single line (possibly depending on the expected string length). Maybe it was generated via Saxon (just a guess):
<wg class="cl" rule="S-IO" cltype="VerbElided">
Do you have a preference which representation would be required, or do you think the details are not that relevant?
We could possibly add a custom serialization parameter similar to tidy’s 'indent-attributes' option, and it would probably be easier to ignore the expected string length.
All the best, Christian
Christian,
Thanks indeed to both you and Gunther Rademacher!
Excellent work!
Patrick
On 2/21/23 06:41, Christian Grün wrote:
Hi Jonathan, hi Patrick,
The new serialization parameter 'indent-attributes' is already available [1]:
(: provided globally :) declare option output:indent 'yes'; declare option output:indent-attributes 'yes';
<e a='a' b='b' c='c'/>
(: provided locally :) serialize( <e a='a' b='b' c='c'/>, map { 'indent-attributes': true(), 'indent': true() } )
Result: <e a="a" b="b" c="c"/>
Thank you to Gunther Rademacher, who contributed the code solution!
A new stable snapshot is available [2]. The serialization parameter may officially be supported with XQuery 4 [3].
Hope this helps, Christian
[1] https://docs.basex.org/wiki/Serialization [2] https://files.basex.org/releases/latest/ [3] https://github.com/qt4cg/qtspecs/issues/358#issuecomment-1436595401
On Wed, Feb 15, 2023 at 7:05 PM Jonathan Robie jonathan.robie@gmail.com wrote:
Hi Christian,
I prefer to be able to require one attribute per line. This is important for Git diffs, which are the main reason we care.
Jonathan
On Wed, Feb 15, 2023 at 11:31 AM Christian Grün christian.gruen@gmail.com wrote:
Hi Jonathan,
Thanks for sharing your tidy settings.
With the given configuration, all attributes except for the first are returned in a separate line…
<wg class="cl" rule="S-IO" cltype="VerbElided">
In Patrick’s example, some attributes were returned in a single line (possibly depending on the expected string length). Maybe it was generated via Saxon (just a guess):
<wg class="cl" rule="S-IO" cltype="VerbElided">
Do you have a preference which representation would be required, or do you think the details are not that relevant?
We could possibly add a custom serialization parameter similar to tidy’s 'indent-attributes' option, and it would probably be easier to ignore the expected string length.
All the best, Christian
basex-talk@mailman.uni-konstanz.de