Hi all,
The following query works:
XQUERY //b4a[shipNameNorm/text() contains text 'mount.+' using
wildcards]/shipNameNorm <shipNameNorm>mountstewartelphinstone</shipNameNorm> <shipNameNorm>mountstewartelphinstone</shipNameNorm> <shipNameNorm>mountstewartelphinstone</shipNameNorm> Query executed in 22.31 ms.
But with a space in the same query, the server just hangs with java using up all cpu and after a long time I get an out of memory error:
21:24:01.510 [127.0.0.1:35052] XQUERY //b4a[shipNameNorm/text() contains text 'mount .+' using wildcards]/shipNameNorm Error: Out of Main Memory. The following hints might help you: - increase Java's heap size with the flag -Xmx<size> - choose the internal XML parser in the GUI or via 'set intparse on' - deactivate the text and attribute indexes 18607.03 ms
To double check, I have recreated the db and reindexed (as it had been created under an earlier version). This time I don't get an out of memory error, but this:
XQUERY //b4a[shipNameNorm/text() contains text 'mount .+' using
wildcards]/shipNameNorm [XQST0054] Circular variable definition?
So this still appears to me to be a bug. The db is about 1.3GB, 26 documents. and Nodes: 48959283 with Indexes: Path Summary: ON Text Index: ON Attribute Index: ON Full-Text Index: ON (wildcards)
Many thanks for any guidance.
Cheers, Sandra
Hi Sandra,
Am 25.08.2010 14:27, schrieb Sandra Maria Silcot:
21:24:01.510 [127.0.0.1:35052] XQUERY //b4a[shipNameNorm/text() contains text 'mount .+' using wildcards]/shipNameNorm Error: Out of Main Memory. The following hints might help you: - increase Java's heap size with the flag -Xmx<size> - choose the internal XML parser in the GUI or via 'set intparse on' - deactivate the text and attribute indexes 18607.03 ms
this sounds like a bug in the wildcards-supporting trie-based index...
To double check, I have recreated the db and reindexed (as it had been created under an earlier version). This time I don't get an out of memory error, but this:
XQUERY //b4a[shipNameNorm/text() contains text 'mount .+' using
wildcards]/shipNameNorm [XQST0054] Circular variable definition?
This only means that now the stack's full, not the heap. can you reproduce the error? If so, could you provide us with the stack trace? As you seem to use the client/server architecture you probably have to use the local API to get it:
import org.basex.core.Context; import org.basex.core.cmd.Close; import org.basex.core.cmd.Open; import org.basex.core.cmd.XQuery;
public class WildcardsBug {
static final String DB_NAME = ...;
public static void main(final String[] args) throws Exception { final Context ctx = new Context(); new Open(DB_NAME).execute(ctx); System.out.println(new XQuery("//b4a[shipNameNorm/text() " + "contains text 'mount .+' using wildcards]/shipNameNorm" ).execute(ctx)); new Close().execute(ctx); }
}
A small, executable example would be even better, but as the DB size suggests, it's probably not trivial to create one.
Thank you for reporting this bug and in advance for helping us fix it, Cheers Leo
Hi again,
Am 25.08.2010 16:04, schrieb Leonard Wörteler:
As you seem to use the client/server architecture you probably have to use the local API to get it:
import org.basex.core.Context; import org.basex.core.cmd.Close; import org.basex.core.cmd.Open; import org.basex.core.cmd.XQuery;
public class WildcardsBug {
static final String DB_NAME = ...;
public static void main(final String[] args) throws Exception { final Context ctx = new Context(); new Open(DB_NAME).execute(ctx); System.out.println(new XQuery("//b4a[shipNameNorm/text() " + "contains text 'mount .+' using wildcards]/shipNameNorm" ).execute(ctx)); new Close().execute(ctx); }
}
...as my code could also swallow the stack trace, please try this instead:
import org.basex.core.Context; import org.basex.core.cmd.Close; import org.basex.core.cmd.Open; import org.basex.query.QueryProcessor;
public class WildcardsBug {
static final String DB_NAME = ...;
public static void main(final String[] args) throws Exception { final Context ctx = new Context(); new Open(DB_NAME).execute(ctx); new QueryProcessor("//b4a[shipNameNorm/text() contains text " + "'mount .+' using wildcards]/shipNameNorm", ctx).execute(); new Close().execute(ctx); }
}
Sorry for the inconvenience...
"All problems in computer science can be solved by another level of indirection" -- David Wheeler "...except for the problem of too many layers of indirection." -- Kevlin Henney
Leo
Hello,
I'm just starting to play with the full text support and am considering replacing some legacy text searching capabilities with queries. In order to do so however, I need a few features that I haven't been able to find by looking at the specs. Is it possible to return the absolute text offset in characters (either from the start of the document or the start of the result node) for each match? Along with that is it possible to return every match, even if it results in returning the same node more than once (if it has more than one occurrence for example)?
Even if XQuery Full Text doesn't work for this particular need, it's still very cool and I really like the implementation and look forward to finding other uses.
Thanks,
Dave
Hi Dave,
Am 25.08.2010 17:43, schrieb Dave Glick:
Is it possible to return the absolute text offset in characters (either from the start of the document or the start of the result node) for each match? Along with that is it possible to return every match, even if it results in returning the same node more than once (if it has more than one occurrence for example)?
well, I don't think there's an *official* way to get the full-text positions out of BaseX until now. It's only used in the query process and in the GUI, for highlighting the matches.
But if I got your previous mails right, you know your way around the BaseX codebase pretty well, so here's the Cheater's Guide:
The GUI gets the positions by setting the hidden static property org.basex.core.Prop.gui to true. That lets the FTPosData object be propagated to the resulting Nodes. It's accessible via Nodes.ftpos. The interface isn't as nice as it could be, but you get everything you asked for.
Please note that serializing the result will yield control characters used for highlighting in the GUI. This can be avoided by discarding the full-text position as in
final Nodes res = qp.queryNodes(); final Nodes copy = new Nodes(res.nodes, res.data);
and serializing the copy after that.
As these are internals of BaseX, the solution described above may stop working at any time in the future. When we find the time we will implement a simpler interface for this, but we're not really short of ToDos...
I hope this helps you in any way...
Even if XQuery Full Text doesn't work for this particular need, it's still very cool and I really like the implementation and look forward to finding other uses.
That's nice to hear!
Cheers Leo
An addition to Leo's answer (thanks anyway):
The search string 'mount .+' will trigger two keyword searches for "mount" and ".+". As the second search is very expensive (it will return the complete index), it's most likely the reason for the exception.
To find all texts that have "mount" as word, followed by another word, the query could be rewritten as
...[text() contains text 'exercise'][text() contains text ftnot 'exercise' at end]
If ".+" is used as only search term, index access will be skipped, and sequential execution will be chosen. I've now rewritten the code to skip index access whenever a single keyword starts with a dot.
Hope this helps, Christian
Christian,
Thanks for that information; it makes sense now. I have rewritten my query accordingly.
Best wishes,
Sandra
ps: Thanks to Leo also.
An addition to Leo's answer (thanks anyway):
The search string 'mount .+' will trigger two keyword searches for
"mount" and ".+". As the second search is very expensive (it will return the complete index), it's most likely the reason for the
exception.
To find all texts that have "mount" as word, followed by another word,
the query could be rewritten as
...[text() contains text 'exercise'][text() contains text ftnot 'exercise' at end]
If ".+" is used as only search term, index access will be skipped, and
sequential execution will be chosen. I've now rewritten the code to skip index access whenever a single keyword starts with a dot.
Hope this helps, Christian
basex-talk@mailman.uni-konstanz.de