I've been searching in the BaseX docs online and can't seem to find reference to what I'm seeing. It may be common knowledge in the XML community but I'm not highly skilled in XML.
Here's my issue:
I have XML files that I'm loading into my database where there is inconsistency in the case (capital vs. lower-case) of the tag names (XML elements I think?). For instance, I have a tag named "lyrics" and sometimes in the files (whose source is not necessarily under my control) that will be <lyrics> and sometimes <Lyrics>. It could of course, by someone's error also be <lYrIcS> or any other weird combination. Is there a way, either at database creation or via some option or at query time to say that "lyrics" = "Lyrics" = "LyRics", etc. i.e. to be case insensitive for the XML tag names?
Thanks!
David
mailto:david@leighweb.com DavidEmailSig
There are functions like lower-case() and upper-case(). So you could do something like this
//*[lower-case(name(.)) = "lyrics"]
However, I guess it could result in a decreased performance. Therefore it might be reasonable to do this in some kind of preprocessing, if it is relevant for your use-case.
cheers, Dirk
On 06/16/2012 03:51 PM, David Leigh wrote:
I’ve been searching in the BaseX docs online and can’t seem to find reference to what I’m seeing. It may be common knowledge in the XML community but I’m not highly skilled in XML.
Here’s my issue:
I have XML files that I’m loading into my database where there is inconsistency in the case (capital vs. lower-case) of the tag names (XML elements I think?). For instance, I have a tag named “lyrics” and sometimes in the files (whose source is not necessarily under my control) that will be <lyrics> and sometimes <Lyrics>. It could of course, by someone’s error also be <lYrIcS> or any other weird combination. Is there a way, either at database creation or via some option or at query time to say that “lyrics” = “Lyrics” = “LyRics”, etc. i.e. to be case insensitive for the XML tag names?
Thanks!
David
DavidEmailSig mailto:david@leighweb.com
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Hi David,
for preprocessing you could use something like
for $i in //* return rename node $i as lower-case(name($i))
to rename all nodes to their lower-case equivalent. If you only want to process lyrics-nodes of all variations, insert Dirk's predicate to `//*`.
Sadly the full-text index which offers case insensitivity doesn't include tag names, so you can't use this either.
Kind regards from Lake Constance, Germany, Jens
basex-talk@mailman.uni-konstanz.de