Thanks for that. New version works nicely with full text indexing - and very fast too!
Noticed that the text index seems to work differently between RESTXQ (not utilized?) and REST - judging from the response time.
Thanks again for the efforts
Lars
2015-05-19 13:29 GMT+02:00 Christian Grün christian.gruen@gmail.com:
I'll check out how this can be fixed.
So I checked out how to fix it, and I fixed it [1]. Feel free to try the latest snapshot [2]! Christian
[1] https://github.com/BaseXdb/basex/issues/1144 [2] http://files.basex.org/releases/latest
On Mon, May 18, 2015 at 6:46 PM, Lars Johnsen yoonsen@gmail.com wrote:
A last update, which may illuminate a little. After reindexing the
database
using Norwegian (snowball), stemming, and keeping diacritis, RESTXQ processes neither the special characters (treats them as closest
ascii), nor
inflected forms.
The words "mannen" (=the man, definite) and "spaserer" (=walks, present tense), result in no output, while using the naked stems "mann" and
"spaser"
the full result is displayed. In contrast to REST which behaves as
expected.
Cheers Lars
2015-05-18 15:28 GMT+02:00 Lars Johnsen yoonsen@gmail.com:
As an update, after rebuilding database with
text index, full text index (no language, no stemming, keep diacritics)
restarting server: BaseX 8.1.1 [Server] Server was started (port: 29084) [main] INFO org.eclipse.jetty.server.AbstractConnector - Started SelectChannelConnector@0.0.0.0:8984 HTTP Server was started (port: 8984)
RESTXQ: Norwegian characters are converted using full text index,
changing
to text index takes forever. REST: Full-text works as expected, and text index works as expected
(same
as runing in GUI for both).
It looks as if the index structure is treated differently.
2015-05-18 15:07 GMT+02:00 Lars Johnsen yoonsen@gmail.com:
The full text query is blisteringly fast for both, the text index
query
is fast only for REST queries and seems not to be used with queries in RESTXQ. I am rebuilding the whole database now to see how it goes,
and will
restart everything for a new assessment.
2015-05-18 15:00 GMT+02:00 Christian Grün <christian.gruen@gmail.com
:
> However, when using text index instead of full text the results are > the same > for both, except that RESTXQ takes almost forever
What about the original query: Has it been slow as well, or do you think this is a new problem?
> 2015-05-18 14:28 GMT+02:00 Christian Grün <
christian.gruen@gmail.com>:
>> >> It could be that your URL is decoded in a wrong way.. What
happens if
>> you run the following function with REST and RESTXQ and "føre" as >> word? >> >> declare >> %rest:path("/test/encoding/{$word}") >> function page:test-encoding($word) { >> string-to-codepoints($word) >> }; >> >> Thanks, >> Christian >> >> >> string-to-codepoints() >> > REST output (2 first lines): >> > føre >> > fø - re 219 >> > >> > RESTXQ >> > føre >> > fo - re 123 >> > >> > The first word quoted is "føre" in both cases and is what the >> > scripts >> > see, >> > so the full text is given the same in both cases. Could it be
that
>> > within >> > RESTXQ the full text index is treated differently? >> > >> > I will work closer on a self contained example, but thought
this
>> > might >> > point to something. >> > >> > Cheers >> > Lars >> > >> > >> > 2015-05-18 13:44 GMT+02:00 Lars Johnsen yoonsen@gmail.com: >> >> >> >> Hi Christian - and thanks for fast response. Latest version
8.11
>> >> is in >> >> use >> >> (same behaviour as previous). Let me see if I can make a self >> >> contained >> >> example. >> >> >> >> best, >> >> Lars >> >> >> >> 2015-05-18 13:40 GMT+02:00 Christian Grün >> >> christian.gruen@gmail.com: >> >>> >> >>> Hi Lars, >> >>> >> >>> hm, that's difficult to tell. All I can say is that this
sounds
>> >>> unusual, so I'm coming up with my standard questions: Do you >> >>> think you >> >>> could build us a little example that allows us to reproduce
the
>> >>> problem? Have you tried the latest version of BaseX? >> >>> >> >>> Best, >> >>> Christian >> >>> >> >>> >> >>> On Mon, May 18, 2015 at 1:35 PM, Lars Johnsen <
yoonsen@gmail.com>
>> >>> wrote: >> >>> > >> >>> > I am running a web script in two identical versions
(identical
>> >>> > as in >> >>> > "cut >> >>> > and paste"), one via RESTXQ and one vi REST. The response is >> >>> > different, >> >>> > and >> >>> > I wondered what may be the trouble. >> >>> > >> >>> > For example the output (the URLs only works locally) for >> >>> > http://ljohnsen:8984/hyphens/mellom >> >>> > is the same as >> >>> >
http://ljohnsen:8984/rest?run=hyphen-show.xq&word=mellom
>> >>> > >> >>> > which is a set of hyphenation data: >> >>> > mellom >> >>> > mel - lom 17005 >> >>> > Mel - lom 144 >> >>> > mel - lom. 50 >> >>> > >> >>> > but if "mellom" is exchanged with "nasjonalbiblioteket" only >> >>> > the >> >>> > REST >> >>> > version shows any result, which then is the same as I get >> >>> > experimenting >> >>> > in >> >>> > the GUI. >> >>> > >> >>> > The actual script is added below, and which runs in both >> >>> > versions >> >>> > (identical apart form the rest and restxq interfaces), it
uses
>> >>> > full >> >>> > text >> >>> > search, but results differ when run under the REST-regime. >> >>> > >> >>> > All the best >> >>> > Lars G Johnsen >> >>> > National Library of Norway >> >>> > >> >>> > module namespace page = 'http://basex.org/modules/web-page
';
>> >>> > >> >>> > declare >> >>> > %rest:path("/hyphens/{$word}") >> >>> > %output:method("html") >> >>> > >> >>> > function page:show-hyphens($word) { >> >>> > let $db := db:open('hyphen-data') >> >>> > let $hyphens := for $hyp in $db/hyphens/hyphens[full >> >>> > contains >> >>> > text >> >>> > {$word}] >> >>> > group by $first := $hyp/first, $second := $hyp/second >> >>> > let $count := count($hyp) >> >>> > order by xs:int($count) descending >> >>> > return element p { >> >>> > attribute freq {$count}, >> >>> > $first, " - ", $second, $count >> >>> > } >> >>> > >> >>> > let $total := sum($hyphens//@freq) >> >>> > let $div := element div { >> >>> > element p {$word}, >> >>> > for $hyp in $hyphens >> >>> > return element div { >> >>> > attribute class {"hyph"}, >> >>> > attribute style {"font-size:", 1 >> >>> > +round(xs:int($hyp//@freq/data()) >> >>> > div $total,1) || "em"}, >> >>> > $hyp >> >>> > >> >>> > } >> >>> > } >> >>> > return >> >>> > <html encoding="UTF-8"> >> >>> > <head> >> >>> > <meta http-equiv="Content-Type" content="text/html" >> >>> > charset="UTF-8" >> >>> > /> >> >>> > <title>Orddelinger</title> >> >>> > </head> >> >>> > <body>{$div} >> >>> > </body> >> >>> > </html> >> >>> > >> >>> > }; >> >> >> >> >> > > >