Hi all,
Here that the Massachusetts Historical Society we're exploring using BaseX for some new transcriptions of newspapers from around the American Revolution, as collected by a fellow named Harbottle Dorr.
I'm having trouble figuring out how exactly to enable fulltext XQuery searches. We have it working, but quite a few keywords are not being indexed. From the command line tool, we do:
create database dorr_index create index fulltext
Then we have a PHP tool (using the class found on the BaseX site) that calls $session->execute to open the collection and $session->add to place the files. But when I then run
info db
from the command line tool, it shows:
Database Properties Name: dorr_index Size: 1390 KB Nodes: 32411 Resources: 24 Timestamp: 30.11.2012 08:00:11
Resource Properties Timestamp: 30.11.2012 08:03:21 Encoding: UTF-8 Whitespace Chopping: ON
Indexes Up-to-date: false Path Summary: ON Text Index: OFF Attribute Index: OFF Full-Text Index: OFF
if I then run the command
create index fulltext
again, info db shows full-text index on, but up-to-date is still false, and our search results still have many missing hits. our XQuery is:
for $i in collection(PATH)//item let $s := $i/mhs:sort where $i contains text ("QUERY" using stemming) return ...
Are there any steps missing? Also, any tutorial out there to follow?
Thanks very much in advance!
-Bill B -- Bill Beck, Web Development Specialist Massachusetts Historical Society 1154 Boylston Street, Boston, MA 02215 Tel: 617-646-0505, Fax: 617-859-0074 www.masshist.org - America's Oldest Historical Society - Founded 1791
In Death Lamented: The Tradition of Anglo-American Mourning Jewelry opens on 28 September 2012 and runs through 31 January 2013. The exhibition will be on display Monday through Saturday from 10 AM to 4 PM. More information is available at www.masshist.org/events/.
Hi Bill,
I'm having trouble figuring out how exactly to enable fulltext XQuery searches. We have it working, but quite a few keywords are not being indexed. [...]
as you already indicated, the fulltext index gets lost when performing updates. If you set FTNDEX to TRUE before creating your database, your fulltext index will always be recreated when calling OPTIMIZE:
set ftindex true create db dorr_index add ... optimize
for $i in collection(PATH)//item let $s := $i/mhs:sort where $i contains text ("QUERY" using stemming) return ...
I know too less about your xml structure, so it's difficult to tell what/if sth. goes wrong. Does the behavior change when following the steps suggested above?
Cheers, Christian ___________________________
for $i in collection(PATH)//item let $s := $i/mhs:sort where $i contains text ("QUERY" using stemming) return ...
Are there any steps missing? Also, any tutorial out there to follow?
Thanks very much in advance!
-Bill B
Bill Beck, Web Development Specialist Massachusetts Historical Society 1154 Boylston Street, Boston, MA 02215 Tel: 617-646-0505, Fax: 617-859-0074 www.masshist.org - America's Oldest Historical Society - Founded 1791
In Death Lamented: The Tradition of Anglo-American Mourning Jewelry opens on 28 September 2012 and runs through 31 January 2013. The exhibition will be on display Monday through Saturday from 10 AM to 4 PM. More information is available at www.masshist.org/events/. _______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
On 11/30/2012 02:14 PM, Bill Beck wrote:
[Activating Fulltext index]
Are there any steps missing? Also, any tutorial out there to follow?
Usually it's up to the UPDINDEX[1] option (for text/attribute values), but currently the Fulltext index can not be updated efficiently (at least I think so). It is planned[2] however to rewrite the index structure.
However, I'm not sure about the real problem, as "QUERY" shouldn't be a stopword which is not indexed.
But shouldn't it be "where $i/text() contains text 'QUERY' using stemming"? At least $i is bound to an item I guess, and you have to use text values (or probably attribute values), but not an element.
[1] http://docs.basex.org/wiki/Options#UPDINDEX [2] https://github.com/BaseXdb/basex/issues/169
On 11/30/2012 02:53 PM, Johannes.Lichtenberger wrote:
On 11/30/2012 02:14 PM, Bill Beck wrote:
[Activating Fulltext index]
Are there any steps missing? Also, any tutorial out there to follow?
Usually it's up to the UPDINDEX[1] option (for text/attribute values), but currently the Fulltext index can not be updated efficiently (at least I think so). It is planned[2] however to rewrite the index structure.
However, I'm not sure about the real problem, as "QUERY" shouldn't be a stopword which is not indexed.
But shouldn't it be "where $i/text() contains text 'QUERY' using stemming"? At least $i is bound to an item I guess, and you have to use text values (or probably attribute values), but not an element.
Ok, the last sentence is wrong ;-)
basex-talk@mailman.uni-konstanz.de