HI Christian --

There's no query!  This is about loading the files into a DB with the GUI.

I've attached two files.

If I load them as Database->New with "input format" HTML, the comments go away.

If I load them the same way but with "lexical" as a TagSoup parser option, the comments go away.  I expect "lexical" is the TagSoup option that keeps comments from going away.  (And for the DOCTYPE in the example that has it to be retained.)

If I use 
java -jar /usr/share/java/tagsoup.jar --lexical --files *html

​​
from the
​command
 line, the comments do NOT go away, ​
​so I don't think it's a TagSoup problem, at least not with 1.2.1

thanks!
Graydon

On Fri, Aug 18, 2017 at 9:07 AM, Christian Grün <christian.gruen@gmail.com> wrote:
Hi Graydon,

A little example query and input file would be great (the smaller, the better).

Thanks in advance,
Christian



On Fri, Aug 18, 2017 at 2:40 PM, Graydon Saunders <graydonish@gmail.com> wrote:
> Hello --
>
> So I have a pile of near-XML HTML with semantically significant comments to
> deal with.  (I must have been sinning much more than I realized!)
>
> Using BaseX866-20170818.124137, BaseX will parse the content but all the
> comments go away.  This is with passing the "lexical" option on the parser
> tab where it asks for TagSoup options, which I understand from
> https://github.com/orbeon/tagsoup/blob/master/trunk/README to pass through
> comments (and DOCTYPE declarations).
>
> How do I parse HTML and keep the comments?
>
> Thanks!
> Graydon