Hi Christian,
I come back to some previously discussed questions:
Zitat von Christian Grün christian.gruen@gmail.com:
[...]
To give more information, I'll have to look at the actual data; do you think you can provide me with a little document that exemplifies your observation?
As I am not sure, if the behavior has something to do with my actual data, I didn't create an example, but put a sample of my collection consisting of 4 smaller documents online: http://oldphras.unibas.ch/test.tgz
//*[text() contains text ('Kopf' ftand 'Sand' ftand 'stecken') using stemming using language "de"][self::*:p or self::*:l]
gives 3 hits (in Wille, Suttner, and Cervantes)
//*[text() contains text ('Kopf' ftand 'Sand' ftand 'stecken') using stemming using language "de" distance at most 10 words][self::*:p or self::*:l]
gives 2 hits (in Wille and Suttner)
//*[text() contains text "Kopf Sand stecken" all words using stemming using language "de" distance at most 10 words][self::*:p or self::*:l]
gives 3 hits (in Wille, Suttner, and Cervantes), the "distance" option seems to be ignored.
The second question is about "ftand" and "ftor".
//*[text() contains text ('Kopf' ftand 'Sand' ftand 'stecken') using stemming using language "de" distance at most 10 words][self::*:p or self::*:l]
gives 2 hits (in Wille and Suttner)
//*[text() contains text ('Nase' ftand 'Sand' ftand 'stecken') using stemming using language "de" distance at most 10 words][self::*:p or self::*:l]
gives 1 hit (in Müllenhoff)
Therefore, for
//*[text() contains text ( ('Nase' ftor 'Kopf') ftand 'Sand' ftand 'stecken') using stemming using language "de" distance at most 10 words][self::*:p or self::*:l]
I would expect to get all 3 hits, but actually get only 1 (the one in Wille). It makes no difference, if I put ('Nase' ftor 'Kopf') or ('Kopf' ftor 'Nase'). Additionally, the highlighting is strange.
In the end, I would like to search for something like this to speed up annotating the data:
( Nase | Kopf | Hals ) & ( Sand | Schlinge ) & ( ziehen | stecken )
The third question is about the full-text index itself. When applying fuzzy search or using wildcards, the full-text index is not applied -- resulting in a time out on my website, I need 341859.09 ms in the GUI for applying
Currently, the choice has to be made between efficient fuzzy or wildcard matching (the latter being based on a Trie index structure).
So I can have fuzzy OR stemming and wildcard. For searching it's OK, I copied the collection and created the other index for the copy, but as I wan't to update the collection after searching, I would have to update both collections and re-index them after updating one. Is this correct?
Best regards
Cerstin