"MS" == Michael Seiferle michael.seiferle@uni-konstanz.de writes:
MS> Hi,
MS> if Tagsoup [1] is present in the classpath (it comes with our Zip MS> packages e.g.), BaseX will allow (the "poor, nasty and brutish" [1]) MS> HTML input.
Well all I know is that http://docs.basex.org/wiki/Parsers should mention what to do to read HTML, and on my machine there is $ apt-cache search tagsoup-java libtagsoup-java - SAX-compliant parser for real-life HTML libtagsoup-java-doc - API Documentation for TagSoup
Mainly it is tags like <img ...> without /> that throw basex off track.