Hi
[rfc3986](https://tools.ietf.org/html/rfc3986#appendix-B) defines a nice regular expression, which groups any URI, including URN, by URI component.
Interesting about this regex is the use of the '?' quantifier which makes every preceding group/component optional, thus matching either an URI or any other(!) string, since anything, that does not match one of the special groups, goes into a catch-all group (no.5), which keeps either the path or the full, arbitrary string. This is neglectable, since the input to this regex is guaranteed to be of the right type (a/@href/string()).
Here is the relevant part from the RFC.
Appendix B
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(?([^#]*))?(#(.*))? 12 3 4 5 6 7 8 9
The numbers in the second line above are only to assist readability; they indicate the reference points for each subexpression (i.e., each paired parenthesis). We refer to the value matched for subexpression <n> as $<n>. For example, matching the above expression to
http://www.ics.uci.edu/pub/ietf/uri/#Related
results in the following subexpression matches:
$1 = http: $2 = http $3 = //www.ics.uci.edu $4 = www.ics.uci.edu $5 = /pub/ietf/uri/ $6 = <undefined> $7 = <undefined> $8 = #Related $9 = Related
where <undefined> indicates that the component is not present, as is the case for the query component in the above example. Therefore, we can determine the value of the five components as
scheme = $2 authority = $4 path = $5 query = $7 fragment = $9
Going in the opposite direction, we can recreate a URI reference from its components by using the algorithm of Section 5.3.
I tested this regex with Saxon, eXist and BaseX. eXist successfully parsed all the test-cases, I threw at it, into the right groups, Saxon and BaseX did not. The failure is:
[FORX0003] Pattern matches empty string..
And that got me baffled, since all three processors use Java underneath and since the definition of the '?' quantifier, when used like this, seems to be:
Makes the preceding item optional. Greedy, so the optional item is included in the match if possible.
Which means, that *if* any of the group's contents match, they should be included, rather than producing an empty string.
Why is it like that? And what can I do about it? I found no other URI parsing regex, that componentizes this way and would be compatible with XQuery.
See, attached, a test-case.
Hi Andreas -
wow, that is a pretty nice regex :). I'm not nearly caffeinated enough right now to pick it apart, so I'm only able to ask a question - not provide any answers or help. Unless I'm reading the spec and Walmsley's coverage wrong, isn't the '?' a reluctant quantifier - given two choices it will always match the shorter choice? Or does the hash/octothorp give extra significance to the '?' quantifier?
In any event, thank you for the neat brain teaser! Best, Bridger
On Tue, Aug 7, 2018 at 3:38 PM Andreas Mixich mixich.andreas@gmail.com wrote:
Hi
[rfc3986](https://tools.ietf.org/html/rfc3986#appendix-B) defines a nice regular expression, which groups any URI, including URN, by URI component.
Interesting about this regex is the use of the '?' quantifier which makes every preceding group/component optional, thus matching either an URI or any other(!) string, since anything, that does not match one of the special groups, goes into a catch-all group (no.5), which keeps either the path or the full, arbitrary string. This is neglectable, since the input to this regex is guaranteed to be of the right type (a/@href/string()).
Here is the relevant part from the RFC.
Appendix B
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(?([^#]*))?(#(.*))? 12 3 4 5 6 7 8 9
The numbers in the second line above are only to assist readability; they indicate the reference points for each subexpression (i.e., each paired parenthesis). We refer to the value matched for subexpression <n> as $<n>. For example, matching the above expression to http://www.ics.uci.edu/pub/ietf/uri/#Related results in the following subexpression matches: $1 = http: $2 = http $3 = //www.ics.uci.edu $4 = www.ics.uci.edu $5 = /pub/ietf/uri/ $6 = <undefined> $7 = <undefined> $8 = #Related $9 = Related where <undefined> indicates that the component is not present, as is the case for the query component in the above example. Therefore, we can determine the value of the five components as scheme = $2 authority = $4 path = $5 query = $7 fragment = $9 Going in the opposite direction, we can recreate a URI reference from its components by using the algorithm of Section 5.3.
I tested this regex with Saxon, eXist and BaseX. eXist successfully parsed all the test-cases, I threw at it, into the right groups, Saxon and BaseX did not. The failure is:
[FORX0003] Pattern matches empty string..
And that got me baffled, since all three processors use Java underneath and since the definition of the '?' quantifier, when used like this, seems to be:
Makes the preceding item optional. Greedy, so the optional item is included in the match if possible.
Which means, that *if* any of the group's contents match, they should be included, rather than producing an empty string.
Why is it like that? And what can I do about it? I found no other URI parsing regex, that componentizes this way and would be compatible with XQuery.
See, attached, a test-case.
-- Goody Bye, Minden jót, Mit freundlichen Grüßen, Andreas Mixich
On Tue, 2018-08-07 at 21:31 -0400, Bridger Dyson-Smith wrote:
isn't the '?' a reluctant quantifier - given two choices it will always match the shorter choice?
b? matches zero or one "b".
b* matches zero or more "b" using the longest match possible
b+ matches one or more "b" using the longest match possible
b*? matches zero or more "b" using the shortest match possible.
b+? matches one or more "b" using the shortest match possible.
See https://www.w3.org/TR/xpath-functions-31/#regex-syntax for examples and more text.
? inside a character class matches a ? so that [#?] matches either "#" or "?".
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(?([^#]*))?(#(.*))?
This can indeed match the empty string: adding speaces for clarity:
^ -- start of string (([^:/?#]+):)? -- optional because of ? (//([^/?#]*))? -- optional because of ? ([^?#]*) -------- can match the empty string because of * (?([^#]*))? ---- optional because of ? (#(.*))? -------- optional because of ?
[no $ to match the end of the string included]
It's actually hard to construct a string that isn't a valid URI according to the specs, and harder still to determine this from reading the specs.
In XQuery i'd just do soemthing like xs:anyURI($string) and let the XQuery engine work it out.- use try/catch if necessary. It's rare that it makes sense to be more restrictive than, say, fn:doc() or than Web browsers.
Liam
Bridger Dyson-Smith wrote:
wow, that is a pretty nice regex :).
Indeed, I found that, too! :-)
coverage wrong, isn't the '?' a reluctant quantifier - given two choices it will always match the shorter choice? Or does the hash/octothorp give extra significance to the '?' quantifier?
I found https://www.regular-expressions.info/reference.html to be a brilliant and most complete resource for reference. It even covers the [XSD](https://www.regular-expressions.info/xml.html) and [XPath](https://www.regular-expressions.info/xpath.html) regular expressions.
And while this may sound as advertisement, which it is not, the site *is* just *that* good, for a little tip, around 5 dollars, you can download the whole website as formatted PDF. Best regex reference I read, so far. The guy really knows this stuff and is very passionated about it.
Now, if you go to https://www.regular-expressions.info/floatingpoint.html , you will see a very similar problem to ours.
And since I am already in recommendation mode, http://regex101.com. Just saying... Sadly, it has no XPath coverage. Oh, and also http://rexegg.com, which is less referential, but more tutorial/anectodical.
Hi
I think the problem is: There are numerous implemetations of regular expressions which have a common subset but are different in the more advanced features.
Using the java regular expression implementation you can use greedy and some other things. The XSL and XQuery implementation according to the standards does not allow this and so misinterpretes the regular expression. See here: https://www.w3.org/TR/xpath-functions-31/#regex-syntax
You can tell Saxon to use a different regexp engine such as the standard Java one: https://www.saxonica.com/html/documentation/functions/fn/matches.html
Best regards
Omar
Am 07.08.2018 um 21:38 schrieb Andreas Mixich:
Hi
[rfc3986](https://tools.ietf.org/html/rfc3986#appendix-B) defines a nice regular expression, which groups any URI, including URN, by URI component.
Interesting about this regex is the use of the '?' quantifier which makes every preceding group/component optional, thus matching either an URI or any other(!) string, since anything, that does not match one of the special groups, goes into a catch-all group (no.5), which keeps either the path or the full, arbitrary string. This is neglectable, since the input to this regex is guaranteed to be of the right type (a/@href/string()).
Here is the relevant part from the RFC.
Appendix B
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(?([^#]*))?(#(.*))? 12 3 4 5 6 7 8 9
The numbers in the second line above are only to assist readability; they indicate the reference points for each subexpression (i.e., each paired parenthesis). We refer to the value matched for subexpression <n> as $<n>. For example, matching the above expression to http://www.ics.uci.edu/pub/ietf/uri/#Related results in the following subexpression matches: $1 = http: $2 = http $3 = //www.ics.uci.edu $4 = www.ics.uci.edu $5 = /pub/ietf/uri/ $6 = <undefined> $7 = <undefined> $8 = #Related $9 = Related where <undefined> indicates that the component is not present, as is the case for the query component in the above example. Therefore, we can determine the value of the five components as scheme = $2 authority = $4 path = $5 query = $7 fragment = $9 Going in the opposite direction, we can recreate a URI reference from its components by using the algorithm of Section 5.3.
I tested this regex with Saxon, eXist and BaseX. eXist successfully parsed all the test-cases, I threw at it, into the right groups, Saxon and BaseX did not. The failure is:
[FORX0003] Pattern matches empty string..
And that got me baffled, since all three processors use Java underneath and since the definition of the '?' quantifier, when used like this, seems to be:
Makes the preceding item optional. Greedy, so the optional item is included in the match if possible.
Which means, that *if* any of the group's contents match, they should be included, rather than producing an empty string.
Why is it like that? And what can I do about it? I found no other URI parsing regex, that componentizes this way and would be compatible with XQuery.
See, attached, a test-case.
Omar Siam wrote:
Using the java regular expression implementation you can use greedy and some other things. The XSL and XQuery implementation according to the standards does not allow this and so misinterpretes the regular expression. See here:
I checked
and also the https://www.w3.org/TR/xmlschema-2/#regexs but did not find any mention of greediness. But then, I am not sure, whether I understood this from latter document:
A ·regular expression· R is a sequence of characters that denote a set of strings L(R). When used to constrain a ·lexical space·, a regular expression R asserts that only strings in L(R) are valid literals for values of that type.
For all ·atom·s S and non-negative integers n, m such that n <= m, valid ·piece·s R are: Denoting the set of strings L(R) containing: S? the empty string, and all strings in L(S).
Now I am not quite sure what L(S) means.
You can tell Saxon to use a different regexp engine such as the standard Java one: https://www.saxonica.com/html/documentation/functions/fn/matches.html
The hint is much appreciated, though BaseX is my actual development target. I just mentioned Saxon and eXist, because I cross checked them and found the result to be interesting enough to be taken to the list (and still hope, that Christian chimes in and may find a good reason, to do it the other way around in opposition to the way it is now)
Hi!
My point was that greediness is *not* part of the XQuery RegExp standard. Java on the other hand has this feature: https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#greed... and others. And I don't know about Perl, PHP, Python and so on.
What I want to stress is: A beautiful RegExp from the internet may or may not work with a particular RegExp implementation.
Nevertheless as Saxon is well integrated in BaseX you can use it to do some RegExp work. Just getting data to and from Saxon may be not possible depending on the size of what you want to process. Saxon always works on a in-memory-representation of the data as far as I know and that is not an option with a 2.5 GB XML for example.
Best regards
Omar
Am 09.08.2018 um 16:32 schrieb Andreas Mixich:
Omar Siam wrote:
Using the java regular expression implementation you can use greedy and some other things. The XSL and XQuery implementation according to the standards does not allow this and so misinterpretes the regular expression. See here:
I checked
and also the https://www.w3.org/TR/xmlschema-2/#regexs but did not find any mention of greediness. But then, I am not sure, whether I understood this from latter document:
A ·regular expression· R is a sequence of characters that denote a set of strings L(R). When used to constrain a ·lexical space·, a regular expression R asserts that only strings in L(R) are valid literals for values of that type.
For all ·atom·s S and non-negative integers n, m such that n <= m, valid ·piece·s R are: Denoting the set of strings L(R) containing: S? the empty string, and all strings in L(S).
Now I am not quite sure what L(S) means.
You can tell Saxon to use a different regexp engine such as the standard Java one: https://www.saxonica.com/html/documentation/functions/fn/matches.html
The hint is much appreciated, though BaseX is my actual development target. I just mentioned Saxon and eXist, because I cross checked them and found the result to be interesting enough to be taken to the list (and still hope, that Christian chimes in and may find a good reason, to do it the other way around in opposition to the way it is now)
In https://www.w3.org/TR/xpath-functions-31/#regex-syntax you won't find the words "greedy" or "greediness" because the term used is "reluctant quantifiers." See section 5.6.1.2.
On 8/9/18, 11:59 AM, "BaseX-Talk on behalf of Omar Siam" <basex-talk-bounces@mailman.uni-konstanz.de on behalf of Omar.Siam@oeaw.ac.at> wrote:
Hi!
My point was that greediness is *not* part of the XQuery RegExp standard. Java on the other hand has this feature: https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#greed... and others. And I don't know about Perl, PHP, Python and so on.
What I want to stress is: A beautiful RegExp from the internet may or may not work with a particular RegExp implementation.
Nevertheless as Saxon is well integrated in BaseX you can use it to do some RegExp work. Just getting data to and from Saxon may be not possible depending on the size of what you want to process. Saxon always works on a in-memory-representation of the data as far as I know and that is not an option with a 2.5 GB XML for example.
Best regards
Omar
Am 09.08.2018 um 16:32 schrieb Andreas Mixich: > Omar Siam wrote: >> Using the java regular expression implementation you can use greedy >> and some other things. The XSL and XQuery implementation according to >> the standards does not allow this and so misinterpretes the regular >> expression. See here: > I checked > >> https://www.w3.org/TR/xpath-functions-31/#regex-syntax > and also the https://www.w3.org/TR/xmlschema-2/#regexs but did not find > any mention of greediness. But then, I am not sure, whether I understood > this from latter document: > > A ·regular expression· R is a sequence of characters that denote a > set of strings L(R). When used to constrain a ·lexical space·, a > regular expression R asserts that only strings in L(R) are valid > literals for values of that type. > > For all ·atom·s S and non-negative integers n, m such that n <= m, valid > ·piece·s R are: > Denoting the set of strings L(R) containing: > S? > the empty string, and all strings in L(S). > > > > Now I am not quite sure what L(S) means. > >> You can tell Saxon to use a different regexp engine such as the >> standard Java one: >> https://www.saxonica.com/html/documentation/functions/fn/matches.html > The hint is much appreciated, though BaseX is my actual development > target. I just mentioned Saxon and eXist, because I cross checked them > and found the result to be interesting enough to be taken to the list > (and still hope, that Christian chimes in and may find a good reason, to > do it the other way around in opposition to the way it is now) >
Sorry I got that wrong. I meant XQuery has greedy (the default) and reluctant. But not possessive.
Thanks, Omar, for the hint to the 'j' flag in Saxon. Sounds enticing; I think we can include it in BaseX as well.
Omar Siam Omar.Siam@oeaw.ac.at schrieb am Mi., 8. Aug. 2018, 12:58:
Hi
I think the problem is: There are numerous implemetations of regular expressions which have a common subset but are different in the more advanced features.
Using the java regular expression implementation you can use greedy and some other things. The XSL and XQuery implementation according to the standards does not allow this and so misinterpretes the regular expression. See here: https://www.w3.org/TR/xpath-functions-31/#regex-syntax
You can tell Saxon to use a different regexp engine such as the standard Java one: https://www.saxonica.com/html/documentation/functions/fn/matches.html
Best regards
Omar
Am 07.08.2018 um 21:38 schrieb Andreas Mixich:
Hi
[rfc3986](https://tools.ietf.org/html/rfc3986#appendix-B) defines a nice regular expression, which groups any URI, including URN, by URI
component.
Interesting about this regex is the use of the '?' quantifier which makes every preceding group/component optional, thus matching either an URI or any other(!) string, since anything, that does not match one of the special groups, goes into a catch-all group (no.5), which keeps either the path or the full, arbitrary string. This is neglectable, since the input to this regex is guaranteed to be of the right type (a/@href/string()).
Here is the relevant part from the RFC.
Appendix B
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(?([^#]*))?(#(.*))? 12 3 4 5 6 7 8 9
The numbers in the second line above are only to assist readability; they indicate the reference points for each subexpression (i.e., each paired parenthesis). We refer to the value matched for subexpression <n> as $<n>. For example, matching the above expression to http://www.ics.uci.edu/pub/ietf/uri/#Related results in the following subexpression matches: $1 = http: $2 = http $3 = //www.ics.uci.edu $4 = www.ics.uci.edu $5 = /pub/ietf/uri/ $6 = <undefined> $7 = <undefined> $8 = #Related $9 = Related where <undefined> indicates that the component is not present, as is the case for the query component in the above example. Therefore, we can determine the value of the five components as scheme = $2 authority = $4 path = $5 query = $7 fragment = $9 Going in the opposite direction, we can recreate a URI reference from its components by using the algorithm of Section 5.3.
I tested this regex with Saxon, eXist and BaseX. eXist successfully parsed all the test-cases, I threw at it, into the right groups, Saxon and BaseX did not. The failure is:
[FORX0003] Pattern matches empty string..
And that got me baffled, since all three processors use Java underneath and since the definition of the '?' quantifier, when used like this, seems to be:
Makes the preceding item optional. Greedy, so the optional item is included in the match if possible.
Which means, that *if* any of the group's contents match, they should be included, rather than producing an empty string.
Why is it like that? And what can I do about it? I found no other URI parsing regex, that componentizes this way and would be compatible with XQuery.
See, attached, a test-case.
Am 09.08.2018 um 16:35 schrieb Christian Grün:
Thanks, Omar, for the hint to the 'j' flag in Saxon. Sounds enticing; I think we can include it in BaseX as well.
Very good news! Thanks a lot!
+1 for the Java flag as this enables \b for word boundaries as mentioned here [1]
/Andy
[1] https://stackoverflow.com/questions/25446314/in-saxon-9- he-java-xml-parser-word-boundaries-b-in-regular-expressions-are-n/25464233# 25464233
On 9 August 2018 at 15:35, Christian Grün christian.gruen@gmail.com wrote:
Thanks, Omar, for the hint to the 'j' flag in Saxon. Sounds enticing; I think we can include it in BaseX as well.
Omar Siam Omar.Siam@oeaw.ac.at schrieb am Mi., 8. Aug. 2018, 12:58:
Hi
I think the problem is: There are numerous implemetations of regular expressions which have a common subset but are different in the more advanced features.
Using the java regular expression implementation you can use greedy and some other things. The XSL and XQuery implementation according to the standards does not allow this and so misinterpretes the regular expression. See here: https://www.w3.org/TR/xpath- functions-31/#regex-syntax
You can tell Saxon to use a different regexp engine such as the standard Java one: https://www.saxonica.com/html/documentation/functions/fn/matches.html
Best regards
Omar
Am 07.08.2018 um 21:38 schrieb Andreas Mixich:
Hi
[rfc3986](https://tools.ietf.org/html/rfc3986#appendix-B) defines a
nice
regular expression, which groups any URI, including URN, by URI
component.
Interesting about this regex is the use of the '?' quantifier which makes every preceding group/component optional, thus matching either an URI or any other(!) string, since anything, that does not match one of the special groups, goes into a catch-all group (no.5), which keeps either the path or the full, arbitrary string. This is neglectable, since the input to this regex is guaranteed to be of the right type (a/@href/string()).
Here is the relevant part from the RFC.
Appendix B
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(?([^#]*))?(#(.*))? 12 3 4 5 6 7 8 9
The numbers in the second line above are only to assist readability; they indicate the reference points for each subexpression (i.e., each paired parenthesis). We refer to the value matched for subexpression <n> as $<n>. For example,
matching
the above expression to http://www.ics.uci.edu/pub/ietf/uri/#Related results in the following subexpression matches: $1 = http: $2 = http $3 = //www.ics.uci.edu $4 = www.ics.uci.edu $5 = /pub/ietf/uri/ $6 = <undefined> $7 = <undefined> $8 = #Related $9 = Related where <undefined> indicates that the component is not present, as is the case for the query component in the above example. Therefore, we can determine the value of the five components as scheme = $2 authority = $4 path = $5 query = $7 fragment = $9 Going in the opposite direction, we can recreate a URI reference from its components by using the algorithm of Section 5.3.
I tested this regex with Saxon, eXist and BaseX. eXist successfully parsed all the test-cases, I threw at it, into the right groups, Saxon and BaseX did not. The failure is:
[FORX0003] Pattern matches empty string..
And that got me baffled, since all three processors use Java underneath and since the definition of the '?' quantifier, when used like this, seems to be:
Makes the preceding item optional. Greedy, so the optional item is included in the match if possible.
Which means, that *if* any of the group's contents match, they should be included, rather than producing an empty string.
Why is it like that? And what can I do about it? I found no other URI parsing regex, that componentizes this way and would be compatible with XQuery.
See, attached, a test-case.
+1 for the Java flag as this enables \b for word boundaries as mentioned here [1]
True, I missed that one as well more than once.
I’ve just support for Java’s default parser [1,2]. Apart from 'j' (which doesn’t need to be prefixed with a semicolon, as in Saxon), '!' is available as alternative. As it’s not officially documented in Saxon, just keep this one as a secret :)
A new snapshot will be available later tonight.
[1] https://github.com/BaseXdb/basex/issues/1608 [2] http://docs.basex.org/wiki/XQuery_Extensions#Regular_expressions
A new snapshot will be available later tonight.
…which is now.
On Thu, Aug 9, 2018 at 7:02 PM Christian Grün christian.gruen@gmail.com wrote:
+1 for the Java flag as this enables \b for word boundaries as mentioned here [1]
True, I missed that one as well more than once.
I’ve just support for Java’s default parser [1,2]. Apart from 'j' (which doesn’t need to be prefixed with a semicolon, as in Saxon), '!' is available as alternative. As it’s not officially documented in Saxon, just keep this one as a secret :)
A new snapshot will be available later tonight.
[1] https://github.com/BaseXdb/basex/issues/1608 [2] http://docs.basex.org/wiki/XQuery_Extensions#Regular_expressions
Great! I believe the "!" option is best ignored...:)
Note: On the Java platform, this can also be achieved using the flag "!";
this was never formally supported and is likely to be withdrawn in a future Saxon version. [1]
/Andy [1] https://www.saxonica.com/html/documentation/functions/fn/matches.html%5B1]
On 9 August 2018 at 18:54, Christian Grün christian.gruen@gmail.com wrote:
A new snapshot will be available later tonight.
…which is now.
On Thu, Aug 9, 2018 at 7:02 PM Christian Grün christian.gruen@gmail.com wrote:
+1 for the Java flag as this enables \b for word boundaries as
mentioned here [1]
True, I missed that one as well more than once.
I’ve just support for Java’s default parser [1,2]. Apart from 'j' (which doesn’t need to be prefixed with a semicolon, as in Saxon), '!' is available as alternative. As it’s not officially documented in Saxon, just keep this one as a secret :)
A new snapshot will be available later tonight.
[1] https://github.com/BaseXdb/basex/issues/1608 [2] http://docs.basex.org/wiki/XQuery_Extensions#Regular_expressions
Am 09.08.2018 um 21:57 schrieb Andy Bunce:
Great! I believe the "!" option is best ignored...:)
I wonder why Saxon had it there, in the first place!?
Am 09.08.2018 um 19:54 schrieb Christian Grün:
A new snapshot will be available later tonight.
…which is now.
Installed the new snapshot, all went fine, but later I stumbled upon the following issue:
Error: Improper use? Potential bug? Your feedback is welcome: Contact: basex-talk@mailman.uni-konstanz.de Version: BaseX 9.1 beta Java: Oracle Corporation, 9.0.1 OS: Windows 10, amd64 Stack Trace: java.util.regex.PatternSyntaxException: Unmatched closing ')' near index 13 ([^:]*)://)?(?:([^:@]*)(?::([^@]*))?@)?(?:([^/:]*))?(?::([0-9]*))?/(/[^?#]*(?=.*?/)/)?([^?#]*)?(?:?([^#]*))?(?:#(.*))?/ ^ at java.base/java.util.regex.Pattern.error(Unknown Source) at java.base/java.util.regex.Pattern.compile(Unknown Source) at java.base/java.util.regex.Pattern.<init>(Unknown Source) at java.base/java.util.regex.Pattern.compile(Unknown Source) at org.basex.query.util.regex.parse.RegExParser.parse(RegExParser.java:61) ...
To me it looks like the Java regex circumvents the BaseX error catcher. Full error log and test-case attached.
Installed the new snapshot, all went fine, but later I stumbled upon the following issue:
Confirmed and fixed, thanks (the new snapshot is available in around 5 min).
I wonder why Saxon had it there, in the first place!?
Feel free to ask Michael Kay.
basex-talk@mailman.uni-konstanz.de