Dear team,
While using the below query on collection getting this try tail
recursion issue.
db:exists("3860_Master_Voice_JULY-21_Unbilled","SSBS_9038343_01131130904_02JUL21.xml")
I am attaching a query plan and database info.
Regards,
Chandrasekhar
I’ve worked out how to add a Xerces grammar cache to the XML parser. For current code from GitHub, I did this in SAXParser:
public void parse() throws IOException {
final InputSource is = inputSource();
final SAXSource saxs = new SAXSource(is);
XMLReader reader = null;
try {
reader = saxs.getXMLReader();
if(reader == null) {
reader = XmlParser.reader(options.get(MainOptions.DTD), options.get(MainOptions.XINCLUDE));
}
final EntityResolver er = Resolver.entities(options);
if(er != null) reader.setEntityResolver(er);
saxh = new SAXHandler(builder, options.get(MainOptions.STRIPWS),
options.get(MainOptions.STRIPNS));
reader.setDTDHandler(saxh);
reader.setContentHandler(saxh);
reader.setProperty(http://xml.org/sax/properties/lexical-handler, saxh);
reader.setErrorHandler(saxh);
if (true && options.get(MainOptions.DTD)) {
XMLGrammarPool pool = getGrammarPool();
try {
reader.setProperty(FEATURE_GRAMMAR_POOL, pool);
} catch (final NoClassDefFoundError e) {
} catch (final SAXNotRecognizedException | SAXNotSupportedException e) {
}
}
reader.parse(is);
...
Where getGrammarPool() is simply:
static private XMLGrammarPool getGrammarPool() {
XMLGrammarPool pool = grammarPool;
if (pool == null) {
pool = new XMLGrammarPoolImpl();
grammarPool = pool; //.set(pool);
}
return pool;
}
When I use the grammar cache to parse DITA docs (setting parse DTDs to true and specifying an XML catalog from DITA Open Toolkit) I see the expected speedup.
For example, on a small set of about 2400 maps and topics, no-DTD parsing takes 7 seconds, grammar-cache parsing takes 20 seconds, and no-grammar-cache DTD parsing takes about 2.5 minutes. So roughly an 8x improvement (which is what I’ve measured using the same grammar cache with Saxon, for example).
However, using the grammar cache causes some kind of extreme memory leak and I have no idea what it is.
Without the grammar cache, parsing these topics requires only a few 100 metabytes of memory DTD or no, but with the grammar cache, memory usage starts at about 1GB and goes up from there. Parsing my full set of 40K maps and topics, memory grows by 1GB every 30 seconds or so until it eventually exceeds even the 14GB I allocated in my last test. The 1GB could be explained by the cache itself, which holds the parsed grammars.
Using the debugger, I can see that the grammar cache itself is static once it’s populated with grammars (for my set it ends up loading 10 parsed grammars), so the grammar cache itself doesn’t seem to be the problem.
I’m trying to use VisualVM to profile the memory but this is not something I have done before and I’m not sure what classes I should be focusing on.
So my questions:
1. Any idea why the simple addition of the grammar cache would cause this kind of memory leak?
2. Any guidance on what classes I should focus on to find the culprit?
My reason for using the grammar cache is to have all the default attributes populated in the database without requiring two hours to load my content (which is what I’ve measured in the past for DTD-aware parsing of my 40K content set).
Another solution, specific to DITA, would be to use a custom SAX parser that injects the default attributes based on static configuration (for a given set of DITA grammars we know what the defaults will be for every element type and can easily generate the configuration a SAX parser would use). But the current code doesn’t seem to provide an easy way to swap in a custom SAX handler and I’m not really in a position to try to add that level of sophistication to the code.
Modulo this memory issue, the grammar cache is a nice simple solution to the DTD parsing requirement that is general to any content set that has consistent DTD or XSDs across the set of documents to be parsed.
Thanks,
Eliot
_____________________________________________
Eliot Kimber
Sr Staff Content Engineer
O: 512 554 9368
M: 512 554 9368
servicenow.com<https://www.servicenow.com>
LinkedIn<https://www.linkedin.com/company/servicenow> | Twitter<https://twitter.com/servicenow> | YouTube<https://www.youtube.com/user/servicenowinc> | Facebook<https://www.facebook.com/servicenow>
Greetings!
I have the following in the prolog of an XQuery:
declare namespace output =
'http://www.w3.org/2010/xslt-xquery-serialization';
declare option output:method 'basex';
declare option output:basex 'indent=yes, newline=\n';
I'm following the example at: https://docs.basex.org/wiki/Serialization
for CSV, reasoning the same form should work for basex.
Sadly, BaseX 10.4 on Debian reports:
[XQST0109] Unknown option 'basex
Is it because basex isn't in the namespace of
'http://www.w3.org/2010/xslt-xquery-serialization'; ?
OK so I change the namespace to http://tm.durusau.net and it runs!
Sadly not applying the requested formatting.
Some guidance on serialization please?
Thanks!
Patrick
--
Patrick Durusau
patrick(a)durusau.net
Technical Advisory Board, OASIS (TAB)
Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300
Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)
Another Word For It (blog): http://tm.durusau.net
Homepage: http://www.durusau.net
Twitter: patrickDurusau
Dear all,
my scenario is a RestXQ:
- download resources and store them in temporary directory.
- do it with fork-join in order to obtain smaller latency
- compress to zip archive and return the archive data.
I've noticed often the archive arrives empty. So after investigation
I've found that query [1] isnon predictable. It is often optimized to
"count(0)".
I can manage to produce results from time to time but not consistently
with [2].
[3] Seems the safer solution.
The behavior is the same with 9.x and 10.
Since I do not feel very comfortable, is there someone who can tell me
if I'm doing it wrong or if there is a secure solution or if I should
abandon fork-join tout-court?
Thanks a lot.
Regards,
Marco.
[1]
let $ops := (
for $i in (1 to 5)
let $url := "http://www.google.com"
return function(){
(file:write(file:create-temp-file("ttt",string($i)),fetch:content-type($url)))
}
)
let $download := xquery:fork-join($ops)
return count($ops)
[2]
let $ops := (
for $i in (1 to 5)
let $url := "http://www.google.com"
return function(){
(file:write(file:create-temp-file("ttt",string($i)),fetch:content-type($url)),1)
}
)
let $download := xquery:fork-join($ops)
return count($ops)
[3]
let $ops := xquery:fork-join(
for $i in (1 to 5)
let $url := "http://www.google.com"
return function(){
(1,
file:write(file:create-temp-file("ttt",string($i)),fetch:content-type($url)))
}
)
return count($ops)
Dear all,
when using the Basex Xquery server (9.7.3) we see that logs such as [1]
are produced for every query execution.
Since we have a scenario which involves polling and thus we have a lot
of query executions the log files increase to hundreds om MBs during a day.
This makes them unmanageable and even cause dba UI to crash with OOM
exceptions when opening the logs page.
Is there a way to disable this logging (without affecting other logs
such as HTTP)?
Is it really necessary to log server side query executions at this grain
by default? Maybe it could be made optional?
Thanks for any support.
Regards,
Marco.
[1]
12:22:27.849 10.0.4.15:50424 admin OK 0.02 CLOSE[0]
12:22:27.848 10.0.4.15:50434 admin OK 1.40 FULL[0]
12:22:27.848 10.0.4.15:50424 admin OK 0.83 FULL[0]
12:22:27.847 10.0.4.15:50420 admin OK 0.05 CLOSE[0]
12:22:27.847 10.0.4.15:50412 admin OK 0.04 CLOSE[0]
12:22:27.847 10.0.4.15:50434 admin OK 0.05 BIND[0]
db=infrastructures as xs:string
12:22:27.846 10.0.4.15:50410 admin OK 0.22 CLOSE[0]
12:22:27.846 10.0.4.15:50434 admin OK 0.06 BIND[0]
id=ontheroad-lxd as xs:string
12:22:27.846 10.0.4.15:50424 admin OK 0.05 BIND[0]
db=infrastructures as xs:string
12:22:27.846 10.0.4.15:50420 admin OK 0.62 FULL[0]
12:22:27.846 10.0.4.15:50412 admin OK 0.90 FULL[0]
12:22:27.846 10.0.4.15:50424 admin OK 0.06 BIND[0]
id=ontheroad-lxd as xs:string
12:22:27.845 10.0.4.15:50380 admin OK 0.02 CLOSE[0]
12:22:27.845 10.0.4.15:50366 admin OK 0.01 CLOSE[0]
12:22:27.845 10.0.4.15:50402 admin OK 0.02 CLOSE[0]
12:22:27.845 10.0.4.15:50420 admin OK 0.04 BIND[0]
db=infrastructures as xs:string
12:22:27.845 10.0.4.15:50386 admin OK 0.02 CLOSE[0]
12:22:27.845 10.0.4.15:50410 admin OK 0.88 FULL[0]
12:22:27.845 10.0.4.15:50420 admin OK 0.06 BIND[0]
id=ontheroad-lxd as xs:string
12:22:27.845 10.0.4.15:50412 admin OK 0.04 BIND[0]
db=infrastructures as xs:string
12:22:27.845 10.0.4.15:50434 admin OK 0.04 QUERY[0] declare
variable $id external; declare variable $db external;
db:open($db)[json/id = $id]
Hello all,
I am trying to store HTML documents in BaseX. I setup a local instance of
BaseX on my computer using Docker, and I imported this file in it:
https://pastebin.com/HJdJgLv9
On my local BaseX instance, the document is imported and
"/html/body/article" does return the <article> node as expected.
On my remote/production BaseX instance (using the same Dockerfile and
image), the document is imported but the <article> tag is "stripped" (even
though its contents / child nodes remain in the imported document).
"/html/body/article" is empty.
If I copy over the .basex files from my local database to my remote
database, then the documents are complete like on my local instance. I also
tried to import the documents again on my local instance, and the <article>
tag gets stripped too (and the child nodes remain).
What am I doing wrong when importing my documents? What did I do to import
them properly in my current local instance? I tried a lot of options but I
just can't figure out why this happens (I fiddled a lot with it).
I used the following options when importing my documents, as per the
documentation:
SET PARSER html
SET HTMLPARSER
method=xml,nons=true,nocdata=true,nodefaults=true,nobogons=true,nocolons=true,ignorable=true
SET CREATEFILTER *.html
I also use SET FTINDEX true but I don't think it would have an impact
anyway.
Thank you very much!
- Tim
Hi Christian,
See attachment.
When running, function ‘local:test1a’ triggers an error: ‘[XPTY0004] 2 arguments supplied, 3 expected: $f’.
It looks like the arity of a function is incorrectly determined when called with a partial parameter set.
‘function ‘local:test1b’ is a workaround.
Can you please have a look at this?
Thanx in advance,
Rob Stapper
Sent from Mail for Windows
--
This email has been checked for viruses by Avast antivirus software.
www.avast.com
Hello,
We got an error about BaseX. We connect BaseX Java Server from our
application developed with C#.
Thank you for your support.
Improper use? Potential bug? Your feedback is welcome:
Contact: basex-talk(a)mailman.uni-konstanz.de
Version: BaseX 10.3
Java: Oracle Corporation, 19.0.1
OS: Windows Server 2019, amd64
Stack Trace:
java.lang.ArrayIndexOutOfBoundsException: Index 4304 out of bounds for
length 4096
at org.basex.io.random.TableDiskAccess.read1(TableDiskAccess.java:160)
at org.basex.data.Data.kind(Data.java:312)
at org.basex.query.up.atomic.AtomicUpdateCache.adjustDistances(AtomicUpdateCache.java:320)
at org.basex.query.up.atomic.AtomicUpdateCache.execute(AtomicUpdateCache.java:276)
at org.basex.query.up.DataUpdates.apply(DataUpdates.java:167)
at org.basex.query.up.ContextModifier.apply(ContextModifier.java:120)
at org.basex.query.up.Updates.apply(Updates.java:179)
at org.basex.query.QueryContext.update(QueryContext.java:660)
at org.basex.query.QueryContext.lambda$4(QueryContext.java:354)
at org.basex.query.QueryContext.run(QueryContext.java:763)
at org.basex.query.QueryContext.iter(QueryContext.java:354)
at org.basex.query.QueryProcessor.iter(QueryProcessor.java:95)
at org.basex.server.ServerQuery.execute(ServerQuery.java:125)
at org.basex.server.ClientListener.query(ClientListener.java:397)
at org.basex.server.ClientListener.run(ClientListener.java:104)
--
Veysel Karslı
Managing Partner & Co-founder
M +90 543 238 8328
E veysel.karsli(a)codease.io
CodEase Teknoloji
Barbaros mh. Begonya sk. No:1/2
Ataşehir / İstanbul
www.codease.io