At
https://words.fromoldbooks.org/Search/
a search for henry shows lots of matches, and sodoes a search for henry i, but henry ii and henry viis missing and so is henry viii.
I can search for viii and find Henry VIII and also Charles VIII, but i also can't search for Charles VIII.
I can search for the king’s feet, and for Henry V, but not Charles II.
It looks like words ending in ii are invisible.
I'm using ft:search("wobo", $term)/ancestor-or-self::p
Might this be related to stemming? i have that turned off
This is used to create the db (from the Perl API)
# create query instance t("drop db wobo"); t("create db wobo"); t("open wobo"); $session->send("set chop false"); $session->send("set ftindex true"); $session->send("set updindex true"); $session->send("set autooptimize true");
txf("/home/liam/w/Search/wobo.xml");
t("create index attribute"); t("create index text"); t("create index fulltext"); t("optimize"); t("close"); t("quit");
where t() just does a $session->execute() on its argument after printing a trace line, and txf does a delete followed by an add.
Probably i can make a test case available if needed.
Liam
On Mon, 2020-09-28 at 19:32 -0400, Liam R. E. Quin wrote:
At
https://words.fromoldbooks.org/Search/
a search for henry shows lots of matches, and sodoes a search for henry i, but henry ii and henry viis missing and so is henry viii.
Actually it turns out (1) Henry VIII doesn't occur :) although the others do... and (2) in each case the roman numerals are surrounded by markup, <sc>III</sc> or whatever.
So maybe it's behaving as expected!
I'll remove the sc markup and see. Sorry ofr the noise.
Liam
Hi Liam, Maybe you could translate those <sc> tag contents into the corresponding unicode symbols. At least I would hope that text searching algorithms deal with that kind of expansion already, that they match vii with Ⅶ and ⅶ. Best regards,Kristian Kankainen Ühel kenal päeval, E, 28.09.2020 kell 21:54, kirjutas Liam R. E. Quin:
On Mon, 2020-09-28 at 19:32 -0400, Liam R. E. Quin wrote:
At https://words.fromoldbooks.org/Search/
a search for henry shows lots of matches, and sodoes a search forhenryi, but henry ii and henry viis missing and so is henry viii.
Actually it turns out (1) Henry VIII doesn't occur :) although theothers do... and (2) in each case the roman numerals are surrounded bymarkup, <sc>III</sc> or whatever. So maybe it's behaving as expected! I'll remove the sc markup and see. Sorry ofr the noise. Liam
Hi Liam,
Did you find out why II et al. was ignored? Feel free to provide me with a little test case.
Cheers, Christian
On Tue, Sep 29, 2020 at 3:55 AM Liam R. E. Quin liam@fromoldbooks.org wrote:
On Mon, 2020-09-28 at 19:32 -0400, Liam R. E. Quin wrote:
At
https://words.fromoldbooks.org/Search/
a search for henry shows lots of matches, and sodoes a search for henry i, but henry ii and henry viis missing and so is henry viii.
Actually it turns out (1) Henry VIII doesn't occur :) although the others do... and (2) in each case the roman numerals are surrounded by markup, <sc>III</sc> or whatever.
So maybe it's behaving as expected!
I'll remove the sc markup and see. Sorry ofr the noise.
Liam
-- Liam Quin, https://www.delightfulcomputing.com/ Available for XML/Document/Information Architecture/XSLT/ XSL/XQuery/Web/Text Processing/A11Y training, work & consulting. Barefoot Web-slave, antique illustrations: http://www.fromoldbooks.org
On Mon, 2020-10-05 at 15:15 +0200, Christian Grün wrote:
Hi Liam,
Did you find out why II et al. was ignored? Feel free to provide me with a little test case.
The markup in the surrogate files in the database turned out to be, Edward <sc>II</sc>
Changing to Edward II made it work.
Henry VIII was the same.
The query was, essentially, let $term := "Henry VIII" return ft:search("wobo", $term)/ancestor-or-self::p
Liam
basex-talk@mailman.uni-konstanz.de