I am getting conflicting timing results if I change the search format. I have product database that is fulltext indexed.
Product database looks like
<RECORDS>
<RECORD>
...
<PROP NAME="TAXONOMY_ID>
<PVAL>...</PVAL>
..
</RECORD>
<RECORD>
...
<PROP NAME="TAXONOMY_ID>
<PVAL>...</PVAL>
..
</RECORD>
</RECORDS>
If I run the following script it will take 62175 ms. But I comment evaluation of $test but uncomment $test2 and run, that will take 1233 ms. Both test1 and test2 scipt same except in first one it is evaluated in flwor and in second sequential evaluation.
Is there any language feature that will me to run this test1 faster?
let $pd := fn:doc ("product")
let $test1 := for $tid in ("1535","1491")
let $c := $pd/RECORDS/RECORD[PROP[@NAME eq
"TAXONOMY_ID"]/PVAL contains text {$tid}]
return $c
(:
let $exan1 := $pd//RECORD[PROP[@NAME eq "TAXONOMY_ID"]/PVAL contains text
"1395"]
let $exan2 := $pd//RECORD[PROP[@NAME eq "TAXONOMY_ID"]/PVAL contains text
"1491"]
let $test2 := ($exan1,$exan2)
:)
return
$test1
Thanks
Erol Akarsu
I tried to delete some nodes with "delete nodes .." xquery update command. It actually did what was requested.
I checked the database size that is still same as previous one. I know the nodes deleted has a lot of data and I think database indexes would be adjusted accordingly.
Then, I exported database as xml file and recreated db. I can see it has real size.
My question is why DB is not adjusting indexes when nodes deleted?
Thanks
Erol AkarsuOn Fri, Apr 29, 2011 at 9:23 AM, Erol Akarsu <eakarsu@gmail.com> wrote:Hi,
Have we thought on clustering basex servers so we can partition xml documents?
Here, I am only interested in a global partitioning:
Let's say if we have xml document like this, I would like to partition RECORDS content so that each host will have equal number of RECORD elements. Then, we need to aggregate results of individual hosts.
Can we implement this simple clustering framework with Basex?
<RECORDS>
<RECORD>
.....
</RECORD>
<RECORD>
.....
</RECORD>
</RECORDS>On Mon, Apr 11, 2011 at 12:35 PM, Erol Akarsu <eakarsu@gmail.com> wrote:
Ok,
I was able to run full text search with another XML Database.
I am primarily interested in how Basex will play with big XML file Wikipedia.
Actually, Database create of wikipedia is fine. But when I add full tet search and indexes for it, it always throws out of memory exception error.
I have changed -Xmx with 6GB that is still not enough to generate indexes for Wikipedia.
Can you help me on how to generate indexes with a machine that case 6GB for Basex process?
Thanks
Erol Akarsu
On Sun, Apr 10, 2011 at 4:07 AM, Andreas Weiler <andreas.weiler@uni-konstanz.de> wrote:
Hi,the following query could work for you:declare namespace w="http://www.mediawiki.org/xml/export-0.5/";for $i in doc("enwiki-latest-pages-articles")//w:sitenamereturn $i[. contains text "Wikipedia"]/..-- AndreasAm 09.04.2011 um 20:43 schrieb Erol Akarsu:Hi
I am having difficulty in running full text operators. This script gives siteinfo below
declare namespace w="http://www.mediawiki.org/xml/export-0.5/";
let $d := fn:doc ("enwiki-latest-pages-articles")
return ($d//w:siteinfo)[1]
But return $d//w:siteinfo[w:sitename contains text 'Wikipedia'] does NOT give same node
Why "contains" ft operator behave incorrectly? I remember it was working fine. I just dropped and recreated database and turn all indexes. Can you help me?
Query info is here:
Query: declare namespace w="http://www.mediawiki.org/xml/export-0.5/";
Compiling:
- pre-evaluating fn:doc("enwiki-latest-pages-articles")
- adding text() step
- optimizing descendant-or-self step(s)
- removing path with no index results
- pre-evaluating (())[1]
- binding static variable $res
- pre-evaluating fn:doc("enwiki-latest-pages-articles")
- binding static variable $d
- adding text() step
- optimizing descendant-or-self step(s)
- removing path with no index results
- simplifying flwor expression
Result: ()
Timing:
- Parsing: 0.46 ms
- Compiling: 0.42 ms
- Evaluating: 0.17 ms
- Printing: 0.1 ms
- Total Time: 1.15 ms
Query plan:
<sequence size="0"/>
<siteinfo xmlns="http://www.mediawiki.org/xml/export-0.5/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<sitename>Wikipedia</sitename>
<base>http://en.wikipedia.org/wiki/Main_Page</base>
<generator>MediaWiki 1.17wmf1</generator>
<case>first-letter</case>
<namespaces>
<namespace key="-2" case="first-letter">Media</namespace>
<namespace key="-1" case="first-letter">Special</namespace>
<namespace key="0" case="first-letter"/>
<namespace key="1" case="first-letter">Talk</namespace>
<namespace key="2" case="first-letter">User</namespace>
<namespace key="3" case="first-letter">User talk</namespace>
<namespace key="4" case="first-letter">Wikipedia</namespace>
<namespace key="5" case="first-letter">Wikipedia talk</namespace>
<namespace key="6" case="first-letter">File</namespace>
<namespace key="7" case="first-letter">File talk</namespace>
<namespace key="8" case="first-letter">MediaWiki</namespace>
<namespace key="9" case="first-letter">MediaWiki talk</namespace>
<namespace key="10" case="first-letter">Template</namespace>
<namespace key="11" case="first-letter">Template talk</namespace>
<namespace key="12" case="first-letter">Help</namespace>
<namespace key="13" case="first-letter">Help talk</namespace>
<namespace key="14" case="first-letter">Category</namespace>
<namespace key="15" case="first-letter">Category talk</namespace>
<namespace key="100" case="first-letter">Portal</namespace>
<namespace key="101" case="first-letter">Portal talk</namespace>
<namespace key="108" case="first-letter">Book</namespace>
<namespace key="109" case="first-letter">Book talk</namespace>
</namespaces>
</siteinfo>On Mon, Apr 4, 2011 at 7:31 AM, Erol Akarsu <eakarsu@gmail.com> wrote:I imported wikipedia xml into basex and tried to search it.
But searching it takes longer.
I tried to search one element that is first child of whole document and it took 52 sec.
I know the XML file is very big 31GB. How can I optimize the search?
declare namespace w="http://www.mediawiki.org/xml/export-0.5/";
let $d := fn:doc ("enwiki-latest-pages-articles")//w:siteinfo
return $d
Database info:
> open enwiki-latest-pages-articles
Database 'enwiki-latest-pages-articles' opened in 778.49 ms.
> info database
Database Properties
Name: enwiki-latest-pages-articles
Size: 23356 MB
Nodes: 228090153
Height: 6
Database Creation
Path: /mnt/hgfs/C/tmp/enwiki-latest-pages-articles.xml
Time Stamp: 03.04.2011 12:29:15
Input Size: 30025 MB
Encoding: UTF-8
Documents: 1
Whitespace Chopping: ON
Entity Parsing: OFF
Indexes
Up-to-date: true
Path Summary: ON
Text Index: ON
Attribute Index: ON
Full-Text Index: OFF
>
Timing info:
Query: declare namespace w="http://www.mediawiki.org/xml/export-0.5/";
Compiling:
- pre-evaluating fn:doc("enwiki-latest-pages-articles")
- optimizing descendant-or-self step(s)
- binding static variable $d
- removing variable $d
- simplifying flwor expression
Result: element siteinfo { ... }
Timing:
- Parsing: 1.4 ms
- Compiling: 52599.0 ms
- Evaluating: 0.28 ms
- Printing: 0.62 ms
- Total Time: 52601.32 ms
Query plan:
<DBNode name="enwiki-latest-pages-articles" pre="5"/>
Result of query:
<siteinfo xmlns="http://www.mediawiki.org/xml/export-0.5/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<sitename>Wikipedia</sitename>
<base>http://en.wikipedia.org/wiki/Main_Page</base>
<generator>MediaWiki 1.17wmf1</generator>
<case>first-letter</case>
<namespaces>
<namespace key="-2" case="first-letter">Media</namespace>
<namespace key="-1" case="first-letter">Special</namespace>
<namespace key="0" case="first-letter"/>
<namespace key="1" case="first-letter">Talk</namespace>
<namespace key="2" case="first-letter">User</namespace>
<namespace key="3" case="first-letter">User talk</namespace>
<namespace key="4" case="first-letter">Wikipedia</namespace>
<namespace key="5" case="first-letter">Wikipedia talk</namespace>
<namespace key="6" case="first-letter">File</namespace>
<namespace key="7" case="first-letter">File talk</namespace>
<namespace key="8" case="first-letter">MediaWiki</namespace>
<namespace key="9" case="first-letter">MediaWiki talk</namespace>
<namespace key="10" case="first-letter">Template</namespace>
<namespace key="11" case="first-letter">Template talk</namespace>
<namespace key="12" case="first-letter">Help</namespace>
<namespace key="13" case="first-letter">Help talk</namespace>
<namespace key="14" case="first-letter">Category</namespace>
<namespace key="15" case="first-letter">Category talk</namespace>
<namespace key="100" case="first-letter">Portal</namespace>
<namespace key="101" case="first-letter">Portal talk</namespace>
<namespace key="108" case="first-letter">Book</namespace>
<namespace key="109" case="first-letter">Book talk</namespace>
</namespaces>
</siteinfo>
Thanks
Erol Akarsu
_______________________________________________
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk