Perhaps a proposal below.
27.06.2017 21:49 Christian GrĂ¼n kirjutas:
It is currently not possible to work with different languages in a single database. This is mostly because all normalized tokens will end up in the same internal index, and it would be a lot of effort to diversify this software behavior.
How is the behavior if the database content is in many different languages and is correctly marked with xml:lang attributes. Does the full-text index consider this information and apply full-text indexing only to elements with matching language?
As a simple illustration (does not run): will the following code create full-text index only for the Russian text or for both the russian and the English?
db:create( 'db-ft-ru', <texts> <text xml:lang="ru">something in Russian</text> <text xml:lang="en">something in English</text> </texts>, texts, map { 'ftindex': true(), 'language': 'ru' } )
If BaseX does create the full-text index for both languages (the English index would contain useless scramble) I would propose a simple filtering of xml:lang tags according to the language given in the map to ftindex. This should be simpler to implement than the diversifying as suggested by Christian.
Best regards Kristian K