On Tue, 2018-08-07 at 21:31 -0400, Bridger Dyson-Smith wrote:
isn't the '?' a reluctant quantifier - given two choices it will always match the shorter choice?
b? matches zero or one "b".
b* matches zero or more "b" using the longest match possible
b+ matches one or more "b" using the longest match possible
b*? matches zero or more "b" using the shortest match possible.
b+? matches one or more "b" using the shortest match possible.
See https://www.w3.org/TR/xpath-functions-31/#regex-syntax for examples and more text.
? inside a character class matches a ? so that [#?] matches either "#" or "?".
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(?([^#]*))?(#(.*))?
This can indeed match the empty string: adding speaces for clarity:
^ -- start of string (([^:/?#]+):)? -- optional because of ? (//([^/?#]*))? -- optional because of ? ([^?#]*) -------- can match the empty string because of * (?([^#]*))? ---- optional because of ? (#(.*))? -------- optional because of ?
[no $ to match the end of the string included]
It's actually hard to construct a string that isn't a valid URI according to the specs, and harder still to determine this from reading the specs.
In XQuery i'd just do soemthing like xs:anyURI($string) and let the XQuery engine work it out.- use try/catch if necessary. It's rare that it makes sense to be more restrictive than, say, fn:doc() or than Web browsers.
Liam