As the data is quite huge around 2GB, which creates problems while creating FT Index, I have divided it into 2 parts. For a better picture of what I have done, I am explaining it with a suitable example.
Lets say I have articles on Physics in one single DB. So, I divided them into 2 DBs, viz; short description and long description. So, now I have 2 DBs -
PhysicsSD PhysicsLD
By doing so, I am not getting OOM now while creating FT Index. (Wow!!).
But now, I am facing another issue...
Say, I want to search for words "emf" and "waves" into the Physics DBs, I would do this -
for $dbname in ('physicsSD', 'physicsLD') for $x in doc($dbname)//Doc[SD/Info/ text() contains text {"emf waves"} all words or LD/Info/Para/text() contains text {"emf waves"} all words]
order by xs:integer($x/Details/Year) descending return $x/Doc
This query retrieves data in approx. 83000 ms (83 Sec)
But when executed on INDIVIDUAL DBs, the total time is very very less compared to the above. Its only 4500 ms (4.5 sec) for BOTH !!!
Hello John,
Sounds to me like the FT index isn't hit in the long-running query. Could you please check for both queries if the index is actually used? You can do so by looking at the query plan in the info view.
Could you then please provide the misbehaving query plan? Also, it would be best to have at least some of your data, so that we can reproduce the problem. You can send the data to us directly, so we don't spam the list and if the data should not be public.
Cheers, Dirk
On 09/02/2013 06:34 AM, John Best wrote:
As the data is quite huge around 2GB, which creates problems while creating FT Index, I have divided it into 2 parts. For a better picture of what I have done, I am explaining it with a suitable example.
Lets say I have articles on Physics in one single DB. So, I divided them into 2 DBs, viz; short description and long description. So, now I have 2 DBs -
PhysicsSD PhysicsLD
By doing so, I am not getting OOM now while creating FT Index. (Wow!!).
But now, I am facing another issue...
Say, I want to search for words "emf" and "waves" into the Physics DBs, I would do this -
for $dbname in ('physicsSD', 'physicsLD') for $x in doc($dbname)//Doc[SD/Info/ text() contains text {"emf waves"} all words or LD/Info/Para/text() contains text {"emf waves"} all words]
order by xs:integer($x/Details/Year) descending return $x/Doc
This query retrieves data in approx. 83000 ms (83 Sec)
But when executed on INDIVIDUAL DBs, the total time is very very less compared to the above. Its only 4500 ms (4.5 sec) for BOTH !!!
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Hi John, thanks Dirk,
for $dbname in ('physicsSD', 'physicsLD') for $x in doc($dbname)//Doc[SD/Info/ text() contains text {"emf waves"} all words or LD/Info/Para/text() contains text {"emf waves"} all words]
due to the dynamic choice of addressed databases, the full-text index will not be utilized. The ft:search function can be used instead [1].
Hope this helps, Christian
[1] http://docs.basex.org/wiki/Full-Text_Module#ft:search
order by xs:integer($x/Details/Year) descending return $x/Doc
This query retrieves data in approx. 83000 ms (83 Sec)
But when executed on INDIVIDUAL DBs, the total time is very very less compared to the above. Its only 4500 ms (4.5 sec) for BOTH !!!
-- Have a nice day JBest
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
basex-talk@mailman.uni-konstanz.de