I'll check out how this can be fixed.
So I checked out how to fix it, and I fixed it [1]. Feel free to try the latest snapshot [2]! Christian
[1] https://github.com/BaseXdb/basex/issues/1144 [2] http://files.basex.org/releases/latest
On Mon, May 18, 2015 at 6:46 PM, Lars Johnsen yoonsen@gmail.com wrote:
A last update, which may illuminate a little. After reindexing the database using Norwegian (snowball), stemming, and keeping diacritis, RESTXQ processes neither the special characters (treats them as closest ascii), nor inflected forms.
The words "mannen" (=the man, definite) and "spaserer" (=walks, present tense), result in no output, while using the naked stems "mann" and "spaser" the full result is displayed. In contrast to REST which behaves as expected.
Cheers Lars
2015-05-18 15:28 GMT+02:00 Lars Johnsen yoonsen@gmail.com:
As an update, after rebuilding database with
text index, full text index (no language, no stemming, keep diacritics)
restarting server: BaseX 8.1.1 [Server] Server was started (port: 29084) [main] INFO org.eclipse.jetty.server.AbstractConnector - Started SelectChannelConnector@0.0.0.0:8984 HTTP Server was started (port: 8984)
RESTXQ: Norwegian characters are converted using full text index, changing to text index takes forever. REST: Full-text works as expected, and text index works as expected (same as runing in GUI for both).
It looks as if the index structure is treated differently.
2015-05-18 15:07 GMT+02:00 Lars Johnsen yoonsen@gmail.com:
The full text query is blisteringly fast for both, the text index query is fast only for REST queries and seems not to be used with queries in RESTXQ. I am rebuilding the whole database now to see how it goes, and will restart everything for a new assessment.
2015-05-18 15:00 GMT+02:00 Christian Grün christian.gruen@gmail.com:
However, when using text index instead of full text the results are the same for both, except that RESTXQ takes almost forever
What about the original query: Has it been slow as well, or do you think this is a new problem?
2015-05-18 14:28 GMT+02:00 Christian Grün christian.gruen@gmail.com: > > It could be that your URL is decoded in a wrong way.. What happens if > you run the following function with REST and RESTXQ and "føre" as > word? > > declare > %rest:path("/test/encoding/{$word}") > function page:test-encoding($word) { > string-to-codepoints($word) > }; > > Thanks, > Christian > > > string-to-codepoints() > > REST output (2 first lines): > > føre > > fø - re 219 > > > > RESTXQ > > føre > > fo - re 123 > > > > The first word quoted is "føre" in both cases and is what the > > scripts > > see, > > so the full text is given the same in both cases. Could it be that > > within > > RESTXQ the full text index is treated differently? > > > > I will work closer on a self contained example, but thought this > > might > > point to something. > > > > Cheers > > Lars > > > > > > 2015-05-18 13:44 GMT+02:00 Lars Johnsen yoonsen@gmail.com: > >> > >> Hi Christian - and thanks for fast response. Latest version 8.11 > >> is in > >> use > >> (same behaviour as previous). Let me see if I can make a self > >> contained > >> example. > >> > >> best, > >> Lars > >> > >> 2015-05-18 13:40 GMT+02:00 Christian Grün > >> christian.gruen@gmail.com: > >>> > >>> Hi Lars, > >>> > >>> hm, that's difficult to tell. All I can say is that this sounds > >>> unusual, so I'm coming up with my standard questions: Do you > >>> think you > >>> could build us a little example that allows us to reproduce the > >>> problem? Have you tried the latest version of BaseX? > >>> > >>> Best, > >>> Christian > >>> > >>> > >>> On Mon, May 18, 2015 at 1:35 PM, Lars Johnsen yoonsen@gmail.com > >>> wrote: > >>> > > >>> > I am running a web script in two identical versions (identical > >>> > as in > >>> > "cut > >>> > and paste"), one via RESTXQ and one vi REST. The response is > >>> > different, > >>> > and > >>> > I wondered what may be the trouble. > >>> > > >>> > For example the output (the URLs only works locally) for > >>> > http://ljohnsen:8984/hyphens/mellom > >>> > is the same as > >>> > http://ljohnsen:8984/rest?run=hyphen-show.xq&word=mellom > >>> > > >>> > which is a set of hyphenation data: > >>> > mellom > >>> > mel - lom 17005 > >>> > Mel - lom 144 > >>> > mel - lom. 50 > >>> > > >>> > but if "mellom" is exchanged with "nasjonalbiblioteket" only > >>> > the > >>> > REST > >>> > version shows any result, which then is the same as I get > >>> > experimenting > >>> > in > >>> > the GUI. > >>> > > >>> > The actual script is added below, and which runs in both > >>> > versions > >>> > (identical apart form the rest and restxq interfaces), it uses > >>> > full > >>> > text > >>> > search, but results differ when run under the REST-regime. > >>> > > >>> > All the best > >>> > Lars G Johnsen > >>> > National Library of Norway > >>> > > >>> > module namespace page = 'http://basex.org/modules/web-page'; > >>> > > >>> > declare > >>> > %rest:path("/hyphens/{$word}") > >>> > %output:method("html") > >>> > > >>> > function page:show-hyphens($word) { > >>> > let $db := db:open('hyphen-data') > >>> > let $hyphens := for $hyp in $db/hyphens/hyphens[full > >>> > contains > >>> > text > >>> > {$word}] > >>> > group by $first := $hyp/first, $second := $hyp/second > >>> > let $count := count($hyp) > >>> > order by xs:int($count) descending > >>> > return element p { > >>> > attribute freq {$count}, > >>> > $first, " - ", $second, $count > >>> > } > >>> > > >>> > let $total := sum($hyphens//@freq) > >>> > let $div := element div { > >>> > element p {$word}, > >>> > for $hyp in $hyphens > >>> > return element div { > >>> > attribute class {"hyph"}, > >>> > attribute style {"font-size:", 1 > >>> > +round(xs:int($hyp//@freq/data()) > >>> > div $total,1) || "em"}, > >>> > $hyp > >>> > > >>> > } > >>> > } > >>> > return > >>> > <html encoding="UTF-8"> > >>> > <head> > >>> > <meta http-equiv="Content-Type" content="text/html" > >>> > charset="UTF-8" > >>> > /> > >>> > <title>Orddelinger</title> > >>> > </head> > >>> > <body>{$div} > >>> > </body> > >>> > </html> > >>> > > >>> > }; > >> > >> > >