BaseX-Talk

basex-talk@mailman.uni-konstanz.de

6 participants
5269 discussions

reg: Stack Overflow: Try tail recursion?
by chandra sekhar n 17 Feb '23

17 Feb '23

Dear team, While using the below query on collection getting this try tail recursion issue. db:exists("3860_Master_Voice_JULY-21_Unbilled","SSBS_9038343_01131130904_02JUL21.xml") I am attaching a query plan and database info. Regards, Chandrasekhar

2 4

Help Diagnosing Memory Leak in SAXWrapper or Associated Code
by Eliot Kimber 13 Feb '23

13 Feb '23

I’ve worked out how to add a Xerces grammar cache to the XML parser. For current code from GitHub, I did this in SAXParser: public void parse() throws IOException { final InputSource is = inputSource(); final SAXSource saxs = new SAXSource(is); XMLReader reader = null; try { reader = saxs.getXMLReader(); if(reader == null) { reader = XmlParser.reader(options.get(MainOptions.DTD), options.get(MainOptions.XINCLUDE)); } final EntityResolver er = Resolver.entities(options); if(er != null) reader.setEntityResolver(er); saxh = new SAXHandler(builder, options.get(MainOptions.STRIPWS), options.get(MainOptions.STRIPNS)); reader.setDTDHandler(saxh); reader.setContentHandler(saxh); reader.setProperty(http://xml.org/sax/properties/lexical-handler, saxh); reader.setErrorHandler(saxh); if (true && options.get(MainOptions.DTD)) { XMLGrammarPool pool = getGrammarPool(); try { reader.setProperty(FEATURE_GRAMMAR_POOL, pool); } catch (final NoClassDefFoundError e) { } catch (final SAXNotRecognizedException | SAXNotSupportedException e) { } } reader.parse(is); ... Where getGrammarPool() is simply: static private XMLGrammarPool getGrammarPool() { XMLGrammarPool pool = grammarPool; if (pool == null) { pool = new XMLGrammarPoolImpl(); grammarPool = pool; //.set(pool); } return pool; } When I use the grammar cache to parse DITA docs (setting parse DTDs to true and specifying an XML catalog from DITA Open Toolkit) I see the expected speedup. For example, on a small set of about 2400 maps and topics, no-DTD parsing takes 7 seconds, grammar-cache parsing takes 20 seconds, and no-grammar-cache DTD parsing takes about 2.5 minutes. So roughly an 8x improvement (which is what I’ve measured using the same grammar cache with Saxon, for example). However, using the grammar cache causes some kind of extreme memory leak and I have no idea what it is. Without the grammar cache, parsing these topics requires only a few 100 metabytes of memory DTD or no, but with the grammar cache, memory usage starts at about 1GB and goes up from there. Parsing my full set of 40K maps and topics, memory grows by 1GB every 30 seconds or so until it eventually exceeds even the 14GB I allocated in my last test. The 1GB could be explained by the cache itself, which holds the parsed grammars. Using the debugger, I can see that the grammar cache itself is static once it’s populated with grammars (for my set it ends up loading 10 parsed grammars), so the grammar cache itself doesn’t seem to be the problem. I’m trying to use VisualVM to profile the memory but this is not something I have done before and I’m not sure what classes I should be focusing on. So my questions: 1. Any idea why the simple addition of the grammar cache would cause this kind of memory leak? 2. Any guidance on what classes I should focus on to find the culprit? My reason for using the grammar cache is to have all the default attributes populated in the database without requiring two hours to load my content (which is what I’ve measured in the past for DTD-aware parsing of my 40K content set). Another solution, specific to DITA, would be to use a custom SAX parser that injects the default attributes based on static configuration (for a given set of DITA grammars we know what the defaults will be for every element type and can easily generate the configuration a SAX parser would use). But the current code doesn’t seem to provide an easy way to swap in a custom SAX handler and I’m not really in a position to try to add that level of sophistication to the code. Modulo this memory issue, the grammar cache is a nice simple solution to the DTD parsing requirement that is general to any content set that has consistent DTD or XSDs across the set of documents to be parsed. Thanks, Eliot _____________________________________________ Eliot Kimber Sr Staff Content Engineer O: 512 554 9368 M: 512 554 9368 servicenow.com<https://www.servicenow.com> LinkedIn<https://www.linkedin.com/company/servicenow> | Twitter<https://twitter.com/servicenow> | YouTube<https://www.youtube.com/user/servicenowinc> | Facebook<https://www.facebook.com/servicenow>

2 2

XQuery Prolog with serialization settings - basex "Unknown option 'basex'?
by Patrick Durusau 13 Feb '23

13 Feb '23

Greetings! I have the following in the prolog of an XQuery: declare namespace output = 'http://www.w3.org/2010/xslt-xquery-serialization'; declare option output:method 'basex'; declare option output:basex 'indent=yes, newline=\n'; I'm following the example at: https://docs.basex.org/wiki/Serialization for CSV, reasoning the same form should work for basex. Sadly, BaseX 10.4 on Debian reports: [XQST0109] Unknown option 'basex Is it because basex isn't in the namespace of 'http://www.w3.org/2010/xslt-xquery-serialization'; ? OK so I change the namespace to http://tm.durusau.net and it runs! Sadly not applying the requested formatting. Some guidance on serialization please? Thanks! Patrick -- Patrick Durusau patrick(a)durusau.net Technical Advisory Board, OASIS (TAB) Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300 Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps) Another Word For It (blog): http://tm.durusau.net Homepage: http://www.durusau.net Twitter: patrickDurusau

2 2

basexhttp: default to index.html
by Christoph Gaukel 13 Feb '23

13 Feb '23

Hello, i successfully installed basex on my raspberry Pi. Now I want my jetty-server to answer with index.html. Currently it shows the subdirectories in directory .../webapp. cmd: curl https://personal-balance.com returns: <!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <link href="jetty-dir.css" rel="stylesheet" /> <title>Directory: /</title> </head> <body> <h1 class="title">Directory: /</h1> <table class="listing"> <thead> <tr><th class="name"><a href="?C=N&O=D">Name  ⇧</a></th><th class="lastmodified"><a href="?C=M&O=A">Last Modified</a></th><th class="size"><a href="?C=S&O=A">Size</a></th></tr> </thead> <tbody> <tr><td class="name"><a href="/.logs/">.logs/ </a></td><td class="lastmodified">11.02.2023, 13:32:12 </td><td class="size">4.096 bytes </td></tr> <tr><td class="name"><a href="/chat/">chat/ </a></td><td class="lastmodified">01.12.2022, 11:06:52 </td><td class="size">4.096 bytes </td></tr> <tr><td class="name"><a href="/dba/">dba/ </a></td><td class="lastmodified">01.12.2022, 11:06:52 </td><td class="size">4.096 bytes </td></tr> <tr><td class="name"><a href="/restxq/">restxq/ </a></td><td class="lastmodified">10.02.2023, 12:35:22 </td><td class="size">4.096 bytes </td></tr> <tr><td class="name"><a href="/restxq.xqm">restxq.xqm </a></td><td class="lastmodified">29.07.2022, 13:41:16 </td><td class="size">2.492 bytes </td></tr> <tr><td class="name"><a href="/static/">static/ </a></td><td class="lastmodified">01.12.2022, 11:06:52 </td><td class="size">4.096 bytes </td></tr> <tr><td class="name"><a href="/WEB-INF/">WEB-INF/ </a></td><td class="lastmodified">10.02.2023, 16:19:26 </td><td class="size">4.096 bytes </td></tr> </tbody> </table> </body></html> web.xml and jetty.xml in .../webapp/WEB-INF/ are nearly in their original state. I only modified the location of restxq: <servlet-mapping> <servlet-name>RESTXQ</servlet-name> <url-pattern>*/restxq/**</url-pattern> </servlet-mapping> Kind regards, Christoph

2 1

Too aggressive optimizations?
by Marco Lettere 07 Feb '23

07 Feb '23

Dear all, my scenario is a RestXQ: - download resources and store them in temporary directory. - do it with fork-join in order to obtain smaller latency - compress to zip archive and return the archive data. I've noticed often the archive arrives empty. So after investigation I've found that query [1] isnon predictable. It is often optimized to "count(0)". I can manage to produce results from time to time but not consistently with [2]. [3] Seems the safer solution. The behavior is the same with 9.x and 10. Since I do not feel very comfortable, is there someone who can tell me if I'm doing it wrong or if there is a secure solution or if I should abandon fork-join tout-court? Thanks a lot. Regards, Marco. [1] let $ops := ( for $i in (1 to 5) let $url := "http://www.google.com" return function(){ (file:write(file:create-temp-file("ttt",string($i)),fetch:content-type($url))) } ) let $download := xquery:fork-join($ops) return count($ops) [2] let $ops := ( for $i in (1 to 5) let $url := "http://www.google.com" return function(){ (file:write(file:create-temp-file("ttt",string($i)),fetch:content-type($url)),1) } ) let $download := xquery:fork-join($ops) return count($ops) [3] let $ops := xquery:fork-join( for $i in (1 to 5) let $url := "http://www.google.com" return function(){ (1, file:write(file:create-temp-file("ttt",string($i)),fetch:content-type($url))) } ) return count($ops)

2 2

Reducing logs
by Marco Lettere 03 Feb '23

03 Feb '23

Dear all, when using the Basex Xquery server (9.7.3) we see that logs such as [1] are produced for every query execution. Since we have a scenario which involves polling and thus we have a lot of query executions the log files increase to hundreds om MBs during a day. This makes them unmanageable and even cause dba UI to crash with OOM exceptions when opening the logs page. Is there a way to disable this logging (without affecting other logs such as HTTP)? Is it really necessary to log server side query executions at this grain by default? Maybe it could be made optional? Thanks for any support. Regards, Marco. [1] 12:22:27.849 10.0.4.15:50424 admin OK 0.02 CLOSE[0] 12:22:27.848 10.0.4.15:50434 admin OK 1.40 FULL[0] 12:22:27.848 10.0.4.15:50424 admin OK 0.83 FULL[0] 12:22:27.847 10.0.4.15:50420 admin OK 0.05 CLOSE[0] 12:22:27.847 10.0.4.15:50412 admin OK 0.04 CLOSE[0] 12:22:27.847 10.0.4.15:50434 admin OK 0.05 BIND[0] db=infrastructures as xs:string 12:22:27.846 10.0.4.15:50410 admin OK 0.22 CLOSE[0] 12:22:27.846 10.0.4.15:50434 admin OK 0.06 BIND[0] id=ontheroad-lxd as xs:string 12:22:27.846 10.0.4.15:50424 admin OK 0.05 BIND[0] db=infrastructures as xs:string 12:22:27.846 10.0.4.15:50420 admin OK 0.62 FULL[0] 12:22:27.846 10.0.4.15:50412 admin OK 0.90 FULL[0] 12:22:27.846 10.0.4.15:50424 admin OK 0.06 BIND[0] id=ontheroad-lxd as xs:string 12:22:27.845 10.0.4.15:50380 admin OK 0.02 CLOSE[0] 12:22:27.845 10.0.4.15:50366 admin OK 0.01 CLOSE[0] 12:22:27.845 10.0.4.15:50402 admin OK 0.02 CLOSE[0] 12:22:27.845 10.0.4.15:50420 admin OK 0.04 BIND[0] db=infrastructures as xs:string 12:22:27.845 10.0.4.15:50386 admin OK 0.02 CLOSE[0] 12:22:27.845 10.0.4.15:50410 admin OK 0.88 FULL[0] 12:22:27.845 10.0.4.15:50420 admin OK 0.06 BIND[0] id=ontheroad-lxd as xs:string 12:22:27.845 10.0.4.15:50412 admin OK 0.04 BIND[0] db=infrastructures as xs:string 12:22:27.845 10.0.4.15:50434 admin OK 0.04 QUERY[0] declare variable $id external; declare variable $db external; db:open($db)[json/id = $id]

3 3

Different results when importing HTML documents
by Timothée 02 Feb '23

02 Feb '23

Hello all, I am trying to store HTML documents in BaseX. I setup a local instance of BaseX on my computer using Docker, and I imported this file in it: https://pastebin.com/HJdJgLv9 On my local BaseX instance, the document is imported and "/html/body/article" does return the <article> node as expected. On my remote/production BaseX instance (using the same Dockerfile and image), the document is imported but the <article> tag is "stripped" (even though its contents / child nodes remain in the imported document). "/html/body/article" is empty. If I copy over the .basex files from my local database to my remote database, then the documents are complete like on my local instance. I also tried to import the documents again on my local instance, and the <article> tag gets stripped too (and the child nodes remain). What am I doing wrong when importing my documents? What did I do to import them properly in my current local instance? I tried a lot of options but I just can't figure out why this happens (I fiddled a lot with it). I used the following options when importing my documents, as per the documentation: SET PARSER html SET HTMLPARSER method=xml,nons=true,nocdata=true,nodefaults=true,nobogons=true,nocolons=true,ignorable=true SET CREATEFILTER *.html I also use SET FTINDEX true but I don't think it would have an impact anyway. Thank you very much! - Tim

2 3

bug in BaseX104 with function arity determination
by Rob Stapper 31 Jan '23

31 Jan '23

Hi Christian, See attachment. When running, function ‘local:test1a’ triggers an error: ‘[XPTY0004] 2 arguments supplied, 3 expected: $f’. It looks like the arity of a function is incorrectly determined when called with a partial parameter set. ‘function ‘local:test1b’ is a workaround. Can you please have a look at this? Thanx in advance, Rob Stapper Sent from Mail for Windows -- This email has been checked for viruses by Avast antivirus software. www.avast.com

2 3

BaseX bug report
by Veysel Karslı 23 Jan '23

23 Jan '23

Hello, We got an error about BaseX. We connect BaseX Java Server from our application developed with C#. Thank you for your support. Improper use? Potential bug? Your feedback is welcome: Contact: basex-talk(a)mailman.uni-konstanz.de Version: BaseX 10.3 Java: Oracle Corporation, 19.0.1 OS: Windows Server 2019, amd64 Stack Trace: java.lang.ArrayIndexOutOfBoundsException: Index 4304 out of bounds for length 4096 at org.basex.io.random.TableDiskAccess.read1(TableDiskAccess.java:160) at org.basex.data.Data.kind(Data.java:312) at org.basex.query.up.atomic.AtomicUpdateCache.adjustDistances(AtomicUpdateCache.java:320) at org.basex.query.up.atomic.AtomicUpdateCache.execute(AtomicUpdateCache.java:276) at org.basex.query.up.DataUpdates.apply(DataUpdates.java:167) at org.basex.query.up.ContextModifier.apply(ContextModifier.java:120) at org.basex.query.up.Updates.apply(Updates.java:179) at org.basex.query.QueryContext.update(QueryContext.java:660) at org.basex.query.QueryContext.lambda$4(QueryContext.java:354) at org.basex.query.QueryContext.run(QueryContext.java:763) at org.basex.query.QueryContext.iter(QueryContext.java:354) at org.basex.query.QueryProcessor.iter(QueryProcessor.java:95) at org.basex.server.ServerQuery.execute(ServerQuery.java:125) at org.basex.server.ClientListener.query(ClientListener.java:397) at org.basex.server.ClientListener.run(ClientListener.java:104) -- Veysel Karslı Managing Partner & Co-founder M +90 543 238 8328 E veysel.karsli(a)codease.io CodEase Teknoloji Barbaros mh. Begonya sk. No:1/2 Ataşehir / İstanbul www.codease.io

2 2

Symlinks bug in webapp
by Jerome Chauveau 23 Jan '23

23 Jan '23

Hello, Symlinks seem to be not fonctional anymore since BaseX10 (Maybe before) in webapps. Is it due to a new version of Jetty ? I pushed a dummy example on that repository : [ https://git.unicaen.fr/chauveau/test-basex-symlinks | https://git.unicaen.fr/chauveau/test-basex-symlinks ] Thanks for your help ! Jérôme

2 4

Jump to page:

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

BaseX-Talk