Hi Lars,
I think I can confirm the observed behavior: in certain circumstances, the index properties (stemming etc.) won't be applied to the optimized full-text query when using RESTXQ.
I'll check out how this can be fixed.
Thanks, Christian
On Mon, May 18, 2015 at 6:46 PM, Lars Johnsen yoonsen@gmail.com wrote:
A last update, which may illuminate a little. After reindexing the database using Norwegian (snowball), stemming, and keeping diacritis, RESTXQ processes neither the special characters (treats them as closest ascii), nor inflected forms.
The words "mannen" (=the man, definite) and "spaserer" (=walks, present tense), result in no output, while using the naked stems "mann" and "spaser" the full result is displayed. In contrast to REST which behaves as expected.
Cheers Lars
2015-05-18 15:28 GMT+02:00 Lars Johnsen yoonsen@gmail.com:
As an update, after rebuilding database with
text index, full text index (no language, no stemming, keep diacritics)
restarting server: BaseX 8.1.1 [Server] Server was started (port: 29084) [main] INFO org.eclipse.jetty.server.AbstractConnector - Started SelectChannelConnector@0.0.0.0:8984 HTTP Server was started (port: 8984)
RESTXQ: Norwegian characters are converted using full text index, changing to text index takes forever. REST: Full-text works as expected, and text index works as expected (same as runing in GUI for both).
It looks as if the index structure is treated differently.
2015-05-18 15:07 GMT+02:00 Lars Johnsen yoonsen@gmail.com:
The full text query is blisteringly fast for both, the text index query is fast only for REST queries and seems not to be used with queries in RESTXQ. I am rebuilding the whole database now to see how it goes, and will restart everything for a new assessment.
2015-05-18 15:00 GMT+02:00 Christian Grün christian.gruen@gmail.com:
However, when using text index instead of full text the results are the same for both, except that RESTXQ takes almost forever
What about the original query: Has it been slow as well, or do you think this is a new problem?
2015-05-18 14:28 GMT+02:00 Christian Grün christian.gruen@gmail.com:
It could be that your URL is decoded in a wrong way.. What happens if you run the following function with REST and RESTXQ and "føre" as word?
declare %rest:path("/test/encoding/{$word}") function page:test-encoding($word) { string-to-codepoints($word) };
Thanks, Christian
string-to-codepoints() > REST output (2 first lines): > føre > fø - re 219 > > RESTXQ > føre > fo - re 123 > > The first word quoted is "føre" in both cases and is what the > scripts > see, > so the full text is given the same in both cases. Could it be that > within > RESTXQ the full text index is treated differently? > > I will work closer on a self contained example, but thought this > might > point to something. > > Cheers > Lars > > > 2015-05-18 13:44 GMT+02:00 Lars Johnsen yoonsen@gmail.com: >> >> Hi Christian - and thanks for fast response. Latest version 8.11 >> is in >> use >> (same behaviour as previous). Let me see if I can make a self >> contained >> example. >> >> best, >> Lars >> >> 2015-05-18 13:40 GMT+02:00 Christian Grün >> christian.gruen@gmail.com: >>> >>> Hi Lars, >>> >>> hm, that's difficult to tell. All I can say is that this sounds >>> unusual, so I'm coming up with my standard questions: Do you >>> think you >>> could build us a little example that allows us to reproduce the >>> problem? Have you tried the latest version of BaseX? >>> >>> Best, >>> Christian >>> >>> >>> On Mon, May 18, 2015 at 1:35 PM, Lars Johnsen yoonsen@gmail.com >>> wrote: >>> > >>> > I am running a web script in two identical versions (identical >>> > as in >>> > "cut >>> > and paste"), one via RESTXQ and one vi REST. The response is >>> > different, >>> > and >>> > I wondered what may be the trouble. >>> > >>> > For example the output (the URLs only works locally) for >>> > http://ljohnsen:8984/hyphens/mellom >>> > is the same as >>> > http://ljohnsen:8984/rest?run=hyphen-show.xq&word=mellom >>> > >>> > which is a set of hyphenation data: >>> > mellom >>> > mel - lom 17005 >>> > Mel - lom 144 >>> > mel - lom. 50 >>> > >>> > but if "mellom" is exchanged with "nasjonalbiblioteket" only >>> > the >>> > REST >>> > version shows any result, which then is the same as I get >>> > experimenting >>> > in >>> > the GUI. >>> > >>> > The actual script is added below, and which runs in both >>> > versions >>> > (identical apart form the rest and restxq interfaces), it uses >>> > full >>> > text >>> > search, but results differ when run under the REST-regime. >>> > >>> > All the best >>> > Lars G Johnsen >>> > National Library of Norway >>> > >>> > module namespace page = 'http://basex.org/modules/web-page'; >>> > >>> > declare >>> > %rest:path("/hyphens/{$word}") >>> > %output:method("html") >>> > >>> > function page:show-hyphens($word) { >>> > let $db := db:open('hyphen-data') >>> > let $hyphens := for $hyp in $db/hyphens/hyphens[full >>> > contains >>> > text >>> > {$word}] >>> > group by $first := $hyp/first, $second := $hyp/second >>> > let $count := count($hyp) >>> > order by xs:int($count) descending >>> > return element p { >>> > attribute freq {$count}, >>> > $first, " - ", $second, $count >>> > } >>> > >>> > let $total := sum($hyphens//@freq) >>> > let $div := element div { >>> > element p {$word}, >>> > for $hyp in $hyphens >>> > return element div { >>> > attribute class {"hyph"}, >>> > attribute style {"font-size:", 1 >>> > +round(xs:int($hyp//@freq/data()) >>> > div $total,1) || "em"}, >>> > $hyp >>> > >>> > } >>> > } >>> > return >>> > <html encoding="UTF-8"> >>> > <head> >>> > <meta http-equiv="Content-Type" content="text/html" >>> > charset="UTF-8" >>> > /> >>> > <title>Orddelinger</title> >>> > </head> >>> > <body>{$div} >>> > </body> >>> > </html> >>> > >>> > }; >> >> >