Hi Christian,
Yes, that seems to make it work correctly. Maybe the wiki needs to be updated to be more clear about what "diacritics true" does? Apologies for the misunderstanding on my part.
All The Best, Chris
On Tue, Aug 19, 2014 at 1:38 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi Chris,
DIACRITICS: true
It seems as if you set the diacritics option to true (which is equivalent to "diacritics sensitive", as it is supposed to say "consider diacritics: yes, please!"). Could you try to rebuild the index with the diacritics option disabled?
Christian
On Tue, Aug 19, 2014 at 2:19 PM, Christopher Yocum cyocum@gmail.com wrote:
Hi Christian,
I hope you had a good weekend!
Otherwise, no, this doesn't help as it doesn't choose to use the full
text
index on my content :(. This is what I am getting at the moment:
Compiling:
- pre-evaluating fn:collection("edil")
- simplifying descendant-or-self step(s)
- converting descendant::*:entry to child steps
- simplifying descendant-or-self step(s)
- removing context expression (.)
- rewriting where clause(s)
- simplifying flwor expression
Query: declare variable $term as xs:string external := 'athgabāi.*'; declare variable $col as xs:string external := 'edil'; <results>{subsequence(ft:mark(for $x in collection($col)//entry where $x//text() contains text {$term} using diacritics insensitive using wildcards return $x), 1, 5000)}</results>
Optimized Query: element results { (fn:subsequence(ft:mark((db:open-pre("edil",0), db:open-pre("edil",155748), ...)/*:sample/*:entry[descendant::text() contains text "athgabāi.*" using wildcards using language 'English']), 1, 5000)) }
I tried this as well with the same results:
Compiling:
- pre-evaluating fn:collection("edil")
- simplifying descendant-or-self step(s)
- converting descendant::*:entry to child steps
- removing context expression (.)
- rewriting where clause(s)
- simplifying flwor expression
Query: declare variable $term as xs:string external := 'athgabāi.*'; declare variable $col as xs:string external := 'edil'; <results>{subsequence(ft:mark(for $x in collection($col)//entry where $x/descendant::*[text() contains text 'athgabāi.*' using diacritics insensitive using wildcards] return $x), 1, 5000)}</results> Optimized Query:
element results { (fn:subsequence(ft:mark((db:open-pre("edil",0), db:open-pre("edil",155748), ...)/*:sample/*:entry[descendant::*[text() contains text "athgabāi.*" using wildcards using language 'English']]),
1,
5000)) }
There are the options set on the database:
Database Properties Name: edil Size: 194 MB Nodes: 7951662 Documents: 19 Binaries: 0 Timestamp: 2014-08-15-17-00-29
Resource Properties Input Path: /home/cyocum/temp/edil_src/xml_src Input Size: 87 MB Timestamp: 2014-08-15-16-46-31 Encoding: UTF-8 CHOP: true
Indexes Up-to-date: true TEXTINDEX: true ATTRINDEX: true FTINDEX: true LANGUAGE: STEMMING: false CASESENS: false DIACRITICS: true STOPWORDS: UPDINDEX: false MAXCATS: 100 MAXLEN: 96
I hope this helps.
All the best, Chris