Hello,
I'm doing some work matching between XML documents - one set has no characters outside the basic ASCII range while the other has a mix of of Ø and Ö and lots of others. Some are in UPPER case and some in Mixed. I need to match a "James" in one file to "JAMES" in another and so on. To do the comparisons I've been looking at BaseX's support for collations.
Following the example in the documentation like this works perfectly:
declare default collation 'http://basex.org/collation?strength=primary'; "Straße" = "Strasse", "Jérome" = "Jerome", "James" = "JAMES"
But it doesn't work when testing attribute (or node) values in a statement like this:
declare default collation 'http://basex.org/collation?strength=primary'; let $doc := doc(' <root> <test name="Straße">Straße</test> <test name="Strasse">Strasse</test> </root> ') return count($doc/root/test[@name = "Strasse"])
I would expect that to return a count of 2 but it returns a count of 1.
I can get round this by calling fn:compare() like this but it feels like a hack:
declare default collation 'http://basex.org/collation?strength=primary'; let $doc := doc(' <root> <test name="Straße">Straße</test> <test name="Strasse">Strasse</test> </root> ') return count($doc/root/test[0=fn:compare(@name,"Strasse")])
Is this behaviour as intended? I can see that it might make query speed and indexes much better to ignore collation for = but I couldn't find it stated in the documentation. My quick read of the specification suggested that the operation of fn:compare would drive the behaviour of eq, gt, lt etc.
I think that I'm probably doing this completely the wrong way and I should be using some of the other features of Full-Text but I'm not sure. If anyone can point me in the right direction I will be very grateful.
Many thanks, James