Is there a way Basex server can send us JSON output instead of xml?
Thanks
On Wed, May 11, 2011 at 10:28 AM, Erol Akarsu eakarsu@gmail.com wrote:
I am filtering one big xml db that 800Mb stored into basex db already.
When I write filtered data into a file, I am getting out of memory.
I know basex is generating result filtered document in memory before writing and facing memory error.
Can basex write results block by block?
Thanks
On Wed, May 4, 2011 at 9:38 AM, Erol Akarsu eakarsu@gmail.com wrote:
I tried to delete some nodes with "delete nodes .." xquery update command. It actually did what was requested.
I checked the database size that is still same as previous one. I know the nodes deleted has a lot of data and I think database indexes would be adjusted accordingly.
Then, I exported database as xml file and recreated db. I can see it has real size.
My question is why DB is not adjusting indexes when nodes deleted?
Thanks
Erol Akarsu
On Fri, Apr 29, 2011 at 9:23 AM, Erol Akarsu eakarsu@gmail.com wrote:
Hi,
Have we thought on clustering basex servers so we can partition xml documents?
Here, I am only interested in a global partitioning:
Let's say if we have xml document like this, I would like to partition RECORDS content so that each host will have equal number of RECORD elements. Then, we need to aggregate results of individual hosts. Can we implement this simple clustering framework with Basex?
<RECORDS> <RECORD> ..... </RECORD> <RECORD> ..... </RECORD> </RECORDS>
On Mon, Apr 11, 2011 at 12:35 PM, Erol Akarsu eakarsu@gmail.com wrote:
Ok,
I was able to run full text search with another XML Database.
I am primarily interested in how Basex will play with big XML file Wikipedia.
Actually, Database create of wikipedia is fine. But when I add full tet search and indexes for it, it always throws out of memory exception error.
I have changed -Xmx with 6GB that is still not enough to generate indexes for Wikipedia.
Can you help me on how to generate indexes with a machine that case 6GB for Basex process?
Thanks
Erol Akarsu
On Sun, Apr 10, 2011 at 4:07 AM, Andreas Weiler < andreas.weiler@uni-konstanz.de> wrote:
Hi,
the following query could work for you:
declare namespace w="http://www.mediawiki.org/xml/export-0.5/"; for $i in doc("enwiki-latest-pages-articles")//w:sitename return $i[. contains text "Wikipedia"]/..
-- Andreas
Am 09.04.2011 um 20:43 schrieb Erol Akarsu:
Hi
I am having difficulty in running full text operators. This script gives siteinfo below declare namespace w="http://www.mediawiki.org/xml/export-0.5/"; let $d := fn:doc ("enwiki-latest-pages-articles") return ($d//w:siteinfo)[1]
But return $d//w:siteinfo[w:sitename contains text 'Wikipedia'] does NOT give same node Why "contains" ft operator behave incorrectly? I remember it was working fine. I just dropped and recreated database and turn all indexes. Can you help me? Query info is here:
Query: declare namespace w="http://www.mediawiki.org/xml/export-0.5/"; Compiling:
- pre-evaluating fn:doc("enwiki-latest-pages-articles")
- adding text() step
- optimizing descendant-or-self step(s)
- removing path with no index results
- pre-evaluating (())[1]
- binding static variable $res
- pre-evaluating fn:doc("enwiki-latest-pages-articles")
- binding static variable $d
- adding text() step
- optimizing descendant-or-self step(s)
- removing path with no index results
- simplifying flwor expression
Result: () Timing:
- Parsing: 0.46 ms
- Compiling: 0.42 ms
- Evaluating: 0.17 ms
- Printing: 0.1 ms
- Total Time: 1.15 ms
Query plan:
<sequence size="0"/>
<siteinfo xmlns="http://www.mediawiki.org/xml/export-0.5/" xmlns:xsi=" http://www.w3.org/2001/XMLSchema-instance"> <sitename>Wikipedia</sitename>
<base>http://en.wikipedia.org/wiki/Main_Page</base> <generator>MediaWiki 1.17wmf1</generator> <case>first-letter</case> <namespaces> <namespace key="-2" case="first-letter">Media</namespace> <namespace key="-1" case="first-letter">Special</namespace> <namespace key="0" case="first-letter"/> <namespace key="1" case="first-letter">Talk</namespace> <namespace key="2" case="first-letter">User</namespace> <namespace key="3" case="first-letter">User talk</namespace> <namespace key="4" case="first-letter">Wikipedia</namespace> <namespace key="5" case="first-letter">Wikipedia talk</namespace> <namespace key="6" case="first-letter">File</namespace> <namespace key="7" case="first-letter">File talk</namespace> <namespace key="8" case="first-letter">MediaWiki</namespace> <namespace key="9" case="first-letter">MediaWiki talk</namespace> <namespace key="10" case="first-letter">Template</namespace> <namespace key="11" case="first-letter">Template talk</namespace> <namespace key="12" case="first-letter">Help</namespace> <namespace key="13" case="first-letter">Help talk</namespace> <namespace key="14" case="first-letter">Category</namespace> <namespace key="15" case="first-letter">Category talk</namespace> <namespace key="100" case="first-letter">Portal</namespace> <namespace key="101" case="first-letter">Portal talk</namespace> <namespace key="108" case="first-letter">Book</namespace> <namespace key="109" case="first-letter">Book talk</namespace> </namespaces> </siteinfo>
On Mon, Apr 4, 2011 at 7:31 AM, Erol Akarsu eakarsu@gmail.com wrote:
I imported wikipedia xml into basex and tried to search it.
But searching it takes longer.
I tried to search one element that is first child of whole document and it took 52 sec. I know the XML file is very big 31GB. How can I optimize the search?
declare namespace w="http://www.mediawiki.org/xml/export-0.5/";
let $d := fn:doc ("enwiki-latest-pages-articles")//w:siteinfo return $d
Database info:
> open enwiki-latest-pages-articles Database 'enwiki-latest-pages-articles' opened in 778.49 ms. > info database Database Properties Name: enwiki-latest-pages-articles Size: 23356 MB Nodes: 228090153 Height: 6
Database Creation Path: /mnt/hgfs/C/tmp/enwiki-latest-pages-articles.xml Time Stamp: 03.04.2011 12:29:15 Input Size: 30025 MB Encoding: UTF-8 Documents: 1 Whitespace Chopping: ON Entity Parsing: OFF
Indexes Up-to-date: true Path Summary: ON Text Index: ON Attribute Index: ON Full-Text Index: OFF >
Timing info:
Query: declare namespace w="http://www.mediawiki.org/xml/export-0.5/ "; Compiling:
- pre-evaluating fn:doc("enwiki-latest-pages-articles")
- optimizing descendant-or-self step(s)
- binding static variable $d
- removing variable $d
- simplifying flwor expression
Result: element siteinfo { ... } Timing:
- Parsing: 1.4 ms
- Compiling: 52599.0 ms
- Evaluating: 0.28 ms
- Printing: 0.62 ms
- Total Time: 52601.32 ms
Query plan:
<DBNode name="enwiki-latest-pages-articles" pre="5"/>
Result of query:
<siteinfo xmlns="http://www.mediawiki.org/xml/export-0.5/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <sitename>Wikipedia</sitename>
<base>http://en.wikipedia.org/wiki/Main_Page</base> <generator>MediaWiki 1.17wmf1</generator> <case>first-letter</case> <namespaces> <namespace key="-2" case="first-letter">Media</namespace> <namespace key="-1" case="first-letter">Special</namespace> <namespace key="0" case="first-letter"/> <namespace key="1" case="first-letter">Talk</namespace> <namespace key="2" case="first-letter">User</namespace> <namespace key="3" case="first-letter">User talk</namespace> <namespace key="4" case="first-letter">Wikipedia</namespace> <namespace key="5" case="first-letter">Wikipedia talk</namespace> <namespace key="6" case="first-letter">File</namespace> <namespace key="7" case="first-letter">File talk</namespace> <namespace key="8" case="first-letter">MediaWiki</namespace> <namespace key="9" case="first-letter">MediaWiki talk</namespace> <namespace key="10" case="first-letter">Template</namespace> <namespace key="11" case="first-letter">Template talk</namespace> <namespace key="12" case="first-letter">Help</namespace> <namespace key="13" case="first-letter">Help talk</namespace> <namespace key="14" case="first-letter">Category</namespace> <namespace key="15" case="first-letter">Category talk</namespace> <namespace key="100" case="first-letter">Portal</namespace> <namespace key="101" case="first-letter">Portal talk</namespace> <namespace key="108" case="first-letter">Book</namespace> <namespace key="109" case="first-letter">Book talk</namespace> </namespaces> </siteinfo>
Thanks
Erol Akarsu
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Is there a way Basex server can send us JSON output instead of xml?
Good point; it has been added to our issue list (feel free to extend it):
https://github.com/BaseXdb/basex/issues/14
it's currently discussed by the W3C if the official standard will include support for JSON.
Christian
On Wed, May 11, 2011 at 10:28 AM, Erol Akarsu eakarsu@gmail.com wrote:
I am filtering one big xml db that 800Mb stored into basex db already.
When I write filtered data into a file, I am getting out of memory.
I know basex is generating result filtered document in memory before writing and facing memory error.
Can basex write results block by block?
Thanks
On Wed, May 4, 2011 at 9:38 AM, Erol Akarsu eakarsu@gmail.com wrote:
I tried to delete some nodes with "delete nodes .." xquery update command. It actually did what was requested.
I checked the database size that is still same as previous one. I know the nodes deleted has a lot of data and I think database indexes would be adjusted accordingly.
Then, I exported database as xml file and recreated db. I can see it has real size.
My question is why DB is not adjusting indexes when nodes deleted?
Thanks
Erol Akarsu
On Fri, Apr 29, 2011 at 9:23 AM, Erol Akarsu eakarsu@gmail.com wrote:
Hi,
Have we thought on clustering basex servers so we can partition xml documents?
Here, I am only interested in a global partitioning:
Let's say if we have xml document like this, I would like to partition RECORDS content so that each host will have equal number of RECORD elements. Then, we need to aggregate results of individual hosts. Can we implement this simple clustering framework with Basex?
<RECORDS> <RECORD> ..... </RECORD> <RECORD> ..... </RECORD> </RECORDS>
On Mon, Apr 11, 2011 at 12:35 PM, Erol Akarsu eakarsu@gmail.com wrote:
Ok,
I was able to run full text search with another XML Database.
I am primarily interested in how Basex will play with big XML file Wikipedia.
Actually, Database create of wikipedia is fine. But when I add full tet search and indexes for it, it always throws out of memory exception error.
I have changed -Xmx with 6GB that is still not enough to generate indexes for Wikipedia.
Can you help me on how to generate indexes with a machine that case 6GB for Basex process?
Thanks
Erol Akarsu
On Sun, Apr 10, 2011 at 4:07 AM, Andreas Weiler andreas.weiler@uni-konstanz.de wrote:
Hi, the following query could work for you: declare namespace w="http://www.mediawiki.org/xml/export-0.5/"; for $i in doc("enwiki-latest-pages-articles")//w:sitename return $i[. contains text "Wikipedia"]/.. -- Andreas Am 09.04.2011 um 20:43 schrieb Erol Akarsu:
Hi
I am having difficulty in running full text operators. This script gives siteinfo below declare namespace w="http://www.mediawiki.org/xml/export-0.5/"; let $d := fn:doc ("enwiki-latest-pages-articles") return ($d//w:siteinfo)[1]
But return $d//w:siteinfo[w:sitename contains text 'Wikipedia'] does NOT give same node Why "contains" ft operator behave incorrectly? I remember it was working fine. I just dropped and recreated database and turn all indexes. Can you help me? Query info is here:
Query: declare namespace w="http://www.mediawiki.org/xml/export-0.5/"; Compiling:
- pre-evaluating fn:doc("enwiki-latest-pages-articles")
- adding text() step
- optimizing descendant-or-self step(s)
- removing path with no index results
- pre-evaluating (())[1]
- binding static variable $res
- pre-evaluating fn:doc("enwiki-latest-pages-articles")
- binding static variable $d
- adding text() step
- optimizing descendant-or-self step(s)
- removing path with no index results
- simplifying flwor expression
Result: () Timing: - Parsing: 0.46 ms - Compiling: 0.42 ms - Evaluating: 0.17 ms - Printing: 0.1 ms - Total Time: 1.15 ms Query plan:
<sequence size="0"/>
<siteinfo xmlns="http://www.mediawiki.org/xml/export-0.5/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <sitename>Wikipedia</sitename> <base>http://en.wikipedia.org/wiki/Main_Page</base> <generator>MediaWiki 1.17wmf1</generator> <case>first-letter</case> <namespaces> <namespace key="-2" case="first-letter">Media</namespace> <namespace key="-1" case="first-letter">Special</namespace> <namespace key="0" case="first-letter"/> <namespace key="1" case="first-letter">Talk</namespace> <namespace key="2" case="first-letter">User</namespace> <namespace key="3" case="first-letter">User talk</namespace> <namespace key="4" case="first-letter">Wikipedia</namespace> <namespace key="5" case="first-letter">Wikipedia talk</namespace> <namespace key="6" case="first-letter">File</namespace> <namespace key="7" case="first-letter">File talk</namespace> <namespace key="8" case="first-letter">MediaWiki</namespace> <namespace key="9" case="first-letter">MediaWiki talk</namespace> <namespace key="10" case="first-letter">Template</namespace> <namespace key="11" case="first-letter">Template talk</namespace> <namespace key="12" case="first-letter">Help</namespace> <namespace key="13" case="first-letter">Help talk</namespace> <namespace key="14" case="first-letter">Category</namespace> <namespace key="15" case="first-letter">Category talk</namespace> <namespace key="100" case="first-letter">Portal</namespace> <namespace key="101" case="first-letter">Portal talk</namespace> <namespace key="108" case="first-letter">Book</namespace> <namespace key="109" case="first-letter">Book talk</namespace> </namespaces>
</siteinfo>
On Mon, Apr 4, 2011 at 7:31 AM, Erol Akarsu eakarsu@gmail.com wrote: > > I imported wikipedia xml into basex and tried to search it. > > But searching it takes longer. > > I tried to search one element that is first child of whole document > and it took 52 sec. > I know the XML file is very big 31GB. How can I optimize the search? > > declare namespace w="http://www.mediawiki.org/xml/export-0.5/"; > > let $d := fn:doc ("enwiki-latest-pages-articles")//w:siteinfo > return $d > > Database info: > > > open enwiki-latest-pages-articles > Database 'enwiki-latest-pages-articles' opened in 778.49 ms. > > info database > Database Properties > Name: enwiki-latest-pages-articles > Size: 23356 MB > Nodes: 228090153 > Height: 6 > > Database Creation > Path: /mnt/hgfs/C/tmp/enwiki-latest-pages-articles.xml > Time Stamp: 03.04.2011 12:29:15 > Input Size: 30025 MB > Encoding: UTF-8 > Documents: 1 > Whitespace Chopping: ON > Entity Parsing: OFF > > Indexes > Up-to-date: true > Path Summary: ON > Text Index: ON > Attribute Index: ON > Full-Text Index: OFF > > > > > Timing info: > > Query: declare namespace > w="http://www.mediawiki.org/xml/export-0.5/"; > Compiling: > - pre-evaluating fn:doc("enwiki-latest-pages-articles") > - optimizing descendant-or-self step(s) > - binding static variable $d > - removing variable $d > - simplifying flwor expression > Result: element siteinfo { ... } > Timing: > - Parsing: 1.4 ms > - Compiling: 52599.0 ms > - Evaluating: 0.28 ms > - Printing: 0.62 ms > - Total Time: 52601.32 ms > Query plan: > <DBNode name="enwiki-latest-pages-articles" pre="5"/> > > > > Result of query: > > <siteinfo xmlns="http://www.mediawiki.org/xml/export-0.5/" > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance%22%3E > <sitename>Wikipedia</sitename> > <base>http://en.wikipedia.org/wiki/Main_Page</base> > <generator>MediaWiki 1.17wmf1</generator> > <case>first-letter</case> > <namespaces> > <namespace key="-2" case="first-letter">Media</namespace> > <namespace key="-1" case="first-letter">Special</namespace> > <namespace key="0" case="first-letter"/> > <namespace key="1" case="first-letter">Talk</namespace> > <namespace key="2" case="first-letter">User</namespace> > <namespace key="3" case="first-letter">User talk</namespace> > <namespace key="4" case="first-letter">Wikipedia</namespace> > <namespace key="5" case="first-letter">Wikipedia talk</namespace> > <namespace key="6" case="first-letter">File</namespace> > <namespace key="7" case="first-letter">File talk</namespace> > <namespace key="8" case="first-letter">MediaWiki</namespace> > <namespace key="9" case="first-letter">MediaWiki talk</namespace> > <namespace key="10" case="first-letter">Template</namespace> > <namespace key="11" case="first-letter">Template talk</namespace> > <namespace key="12" case="first-letter">Help</namespace> > <namespace key="13" case="first-letter">Help talk</namespace> > <namespace key="14" case="first-letter">Category</namespace> > <namespace key="15" case="first-letter">Category talk</namespace> > <namespace key="100" case="first-letter">Portal</namespace> > <namespace key="101" case="first-letter">Portal talk</namespace> > <namespace key="108" case="first-letter">Book</namespace> > <namespace key="109" case="first-letter">Book talk</namespace> > </namespaces> > </siteinfo> > > > > Thanks > > Erol Akarsu >
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Christian,
Do we have a timeline when JSON output can be added to Basex server?
Thanks
Erol Akarsu
On Fri, May 13, 2011 at 9:32 AM, Christian Grün christian.gruen@gmail.comwrote:
Is there a way Basex server can send us JSON output instead of xml?
Good point; it has been added to our issue list (feel free to extend it):
https://github.com/BaseXdb/basex/issues/14
it's currently discussed by the W3C if the official standard will include support for JSON.
Christian
On Wed, May 11, 2011 at 10:28 AM, Erol Akarsu eakarsu@gmail.com wrote:
I am filtering one big xml db that 800Mb stored into basex db already.
When I write filtered data into a file, I am getting out of memory.
I know basex is generating result filtered document in memory before writing and facing memory error.
Can basex write results block by block?
Thanks
On Wed, May 4, 2011 at 9:38 AM, Erol Akarsu eakarsu@gmail.com wrote:
I tried to delete some nodes with "delete nodes .." xquery update command. It actually did what was requested.
I checked the database size that is still same as previous one. I know the nodes deleted has a lot of data and I think database indexes would
be
adjusted accordingly.
Then, I exported database as xml file and recreated db. I can see it
has
real size.
My question is why DB is not adjusting indexes when nodes deleted?
Thanks
Erol Akarsu
On Fri, Apr 29, 2011 at 9:23 AM, Erol Akarsu eakarsu@gmail.com
wrote:
Hi,
Have we thought on clustering basex servers so we can partition xml documents?
Here, I am only interested in a global partitioning:
Let's say if we have xml document like this, I would like to partition RECORDS content so that each host will have equal number of RECORD
elements.
Then, we need to aggregate results of individual hosts. Can we implement this simple clustering framework with Basex?
<RECORDS> <RECORD> ..... </RECORD> <RECORD> ..... </RECORD> </RECORDS>
On Mon, Apr 11, 2011 at 12:35 PM, Erol Akarsu eakarsu@gmail.com
wrote:
Ok,
I was able to run full text search with another XML Database.
I am primarily interested in how Basex will play with big XML file Wikipedia.
Actually, Database create of wikipedia is fine. But when I add full
tet
search and indexes for it, it always throws out of memory exception
error.
I have changed -Xmx with 6GB that is still not enough to generate indexes for Wikipedia.
Can you help me on how to generate indexes with a machine that case 6GB for Basex process?
Thanks
Erol Akarsu
On Sun, Apr 10, 2011 at 4:07 AM, Andreas Weiler andreas.weiler@uni-konstanz.de wrote: > > Hi, > the following query could work for you: > declare namespace w="http://www.mediawiki.org/xml/export-0.5/"; > for $i in doc("enwiki-latest-pages-articles")//w:sitename > return $i[. contains text "Wikipedia"]/.. > -- Andreas > Am 09.04.2011 um 20:43 schrieb Erol Akarsu: > > Hi > > I am having difficulty in running full text operators. This script > gives siteinfo below > declare namespace w="http://www.mediawiki.org/xml/export-0.5/"; > let $d := fn:doc ("enwiki-latest-pages-articles") > return ($d//w:siteinfo)[1] > > But return $d//w:siteinfo[w:sitename contains text 'Wikipedia']
does
> NOT give same node > Why "contains" ft operator behave incorrectly? I remember it was > working fine. I just dropped and recreated database and turn all
indexes.
> Can you help me? > Query info is here: > > Query: declare namespace w="
http://www.mediawiki.org/xml/export-0.5/";
> Compiling: > - pre-evaluating fn:doc("enwiki-latest-pages-articles") > - adding text() step > - optimizing descendant-or-self step(s) > - removing path with no index results > - pre-evaluating (())[1] > - binding static variable $res > - pre-evaluating fn:doc("enwiki-latest-pages-articles") > - binding static variable $d > - adding text() step > - optimizing descendant-or-self step(s) > - removing path with no index results > - simplifying flwor expression > Result: () > Timing: > - Parsing: 0.46 ms > - Compiling: 0.42 ms > - Evaluating: 0.17 ms > - Printing: 0.1 ms > - Total Time: 1.15 ms > Query plan: > <sequence size="0"/> > > > > <siteinfo xmlns="http://www.mediawiki.org/xml/export-0.5/" > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance%22%3E > <sitename>Wikipedia</sitename> > <base>http://en.wikipedia.org/wiki/Main_Page</base> > <generator>MediaWiki 1.17wmf1</generator> > <case>first-letter</case> > <namespaces> > <namespace key="-2" case="first-letter">Media</namespace> > <namespace key="-1" case="first-letter">Special</namespace> > <namespace key="0" case="first-letter"/> > <namespace key="1" case="first-letter">Talk</namespace> > <namespace key="2" case="first-letter">User</namespace> > <namespace key="3" case="first-letter">User talk</namespace> > <namespace key="4" case="first-letter">Wikipedia</namespace> > <namespace key="5" case="first-letter">Wikipedia
talk</namespace>
> <namespace key="6" case="first-letter">File</namespace> > <namespace key="7" case="first-letter">File talk</namespace> > <namespace key="8" case="first-letter">MediaWiki</namespace> > <namespace key="9" case="first-letter">MediaWiki
talk</namespace>
> <namespace key="10" case="first-letter">Template</namespace> > <namespace key="11" case="first-letter">Template
talk</namespace>
> <namespace key="12" case="first-letter">Help</namespace> > <namespace key="13" case="first-letter">Help talk</namespace> > <namespace key="14" case="first-letter">Category</namespace> > <namespace key="15" case="first-letter">Category
talk</namespace>
> <namespace key="100" case="first-letter">Portal</namespace> > <namespace key="101" case="first-letter">Portal talk</namespace> > <namespace key="108" case="first-letter">Book</namespace> > <namespace key="109" case="first-letter">Book talk</namespace> > </namespaces> > </siteinfo> > > On Mon, Apr 4, 2011 at 7:31 AM, Erol Akarsu eakarsu@gmail.com
wrote:
>> >> I imported wikipedia xml into basex and tried to search it. >> >> But searching it takes longer. >> >> I tried to search one element that is first child of whole document >> and it took 52 sec. >> I know the XML file is very big 31GB. How can I optimize the
search?
>> >> declare namespace w="http://www.mediawiki.org/xml/export-0.5/"; >> >> let $d := fn:doc ("enwiki-latest-pages-articles")//w:siteinfo >> return $d >> >> Database info: >> >> > open enwiki-latest-pages-articles >> Database 'enwiki-latest-pages-articles' opened in 778.49 ms. >> > info database >> Database Properties >> Name: enwiki-latest-pages-articles >> Size: 23356 MB >> Nodes: 228090153 >> Height: 6 >> >> Database Creation >> Path: /mnt/hgfs/C/tmp/enwiki-latest-pages-articles.xml >> Time Stamp: 03.04.2011 12:29:15 >> Input Size: 30025 MB >> Encoding: UTF-8 >> Documents: 1 >> Whitespace Chopping: ON >> Entity Parsing: OFF >> >> Indexes >> Up-to-date: true >> Path Summary: ON >> Text Index: ON >> Attribute Index: ON >> Full-Text Index: OFF >> > >> >> >> Timing info: >> >> Query: declare namespace >> w="http://www.mediawiki.org/xml/export-0.5/"; >> Compiling: >> - pre-evaluating fn:doc("enwiki-latest-pages-articles") >> - optimizing descendant-or-self step(s) >> - binding static variable $d >> - removing variable $d >> - simplifying flwor expression >> Result: element siteinfo { ... } >> Timing: >> - Parsing: 1.4 ms >> - Compiling: 52599.0 ms >> - Evaluating: 0.28 ms >> - Printing: 0.62 ms >> - Total Time: 52601.32 ms >> Query plan: >> <DBNode name="enwiki-latest-pages-articles" pre="5"/> >> >> >> >> Result of query: >> >> <siteinfo xmlns="http://www.mediawiki.org/xml/export-0.5/" >> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance%22%3E >> <sitename>Wikipedia</sitename> >> <base>http://en.wikipedia.org/wiki/Main_Page</base> >> <generator>MediaWiki 1.17wmf1</generator> >> <case>first-letter</case> >> <namespaces> >> <namespace key="-2" case="first-letter">Media</namespace> >> <namespace key="-1" case="first-letter">Special</namespace> >> <namespace key="0" case="first-letter"/> >> <namespace key="1" case="first-letter">Talk</namespace> >> <namespace key="2" case="first-letter">User</namespace> >> <namespace key="3" case="first-letter">User talk</namespace> >> <namespace key="4" case="first-letter">Wikipedia</namespace> >> <namespace key="5" case="first-letter">Wikipedia
talk</namespace>
>> <namespace key="6" case="first-letter">File</namespace> >> <namespace key="7" case="first-letter">File talk</namespace> >> <namespace key="8" case="first-letter">MediaWiki</namespace> >> <namespace key="9" case="first-letter">MediaWiki
talk</namespace>
>> <namespace key="10" case="first-letter">Template</namespace> >> <namespace key="11" case="first-letter">Template
talk</namespace>
>> <namespace key="12" case="first-letter">Help</namespace> >> <namespace key="13" case="first-letter">Help talk</namespace> >> <namespace key="14" case="first-letter">Category</namespace> >> <namespace key="15" case="first-letter">Category
talk</namespace>
>> <namespace key="100" case="first-letter">Portal</namespace> >> <namespace key="101" case="first-letter">Portal
talk</namespace>
>> <namespace key="108" case="first-letter">Book</namespace> >> <namespace key="109" case="first-letter">Book talk</namespace> >> </namespaces> >> </siteinfo> >> >> >> >> Thanks >> >> Erol Akarsu >> > > _______________________________________________ > BaseX-Talk mailing list > BaseX-Talk@mailman.uni-konstanz.de > https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk >
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Do we have a timeline when JSON output can be added to Basex server?
Dear Erol,
sorry for the delayed feedback. We first try to fix those GitHub issues that are marked as bugs. The prioritization of the remaining features is mainly influenced by the interests of our paying customers and the number of users that vote for certain features.
Hope this helps, Christian
On Fri, May 13, 2011 at 9:32 AM, Christian Grün christian.gruen@gmail.com wrote:
Is there a way Basex server can send us JSON output instead of xml?
Good point; it has been added to our issue list (feel free to extend it):
https://github.com/BaseXdb/basex/issues/14
it's currently discussed by the W3C if the official standard will include support for JSON.
Christian
On Wed, May 11, 2011 at 10:28 AM, Erol Akarsu eakarsu@gmail.com wrote:
I am filtering one big xml db that 800Mb stored into basex db already.
When I write filtered data into a file, I am getting out of memory.
I know basex is generating result filtered document in memory before writing and facing memory error.
Can basex write results block by block?
Thanks
On Wed, May 4, 2011 at 9:38 AM, Erol Akarsu eakarsu@gmail.com wrote:
I tried to delete some nodes with "delete nodes .." xquery update command. It actually did what was requested.
I checked the database size that is still same as previous one. I know the nodes deleted has a lot of data and I think database indexes would be adjusted accordingly.
Then, I exported database as xml file and recreated db. I can see it has real size.
My question is why DB is not adjusting indexes when nodes deleted?
Thanks
Erol Akarsu
On Fri, Apr 29, 2011 at 9:23 AM, Erol Akarsu eakarsu@gmail.com wrote:
Hi,
Have we thought on clustering basex servers so we can partition xml documents?
Here, I am only interested in a global partitioning:
Let's say if we have xml document like this, I would like to partition RECORDS content so that each host will have equal number of RECORD elements. Then, we need to aggregate results of individual hosts. Can we implement this simple clustering framework with Basex?
<RECORDS> <RECORD> ..... </RECORD> <RECORD> ..... </RECORD> </RECORDS>
On Mon, Apr 11, 2011 at 12:35 PM, Erol Akarsu eakarsu@gmail.com wrote: > > Ok, > > I was able to run full text search with another XML Database. > > I am primarily interested in how Basex will play with big XML file > Wikipedia. > > Actually, Database create of wikipedia is fine. But when I add full > tet > search and indexes for it, it always throws out of memory exception > error. > > I have changed -Xmx with 6GB that is still not enough to generate > indexes for Wikipedia. > > Can you help me on how to generate indexes with a machine that case > 6GB for Basex process? > > Thanks > > Erol Akarsu > > > > On Sun, Apr 10, 2011 at 4:07 AM, Andreas Weiler > andreas.weiler@uni-konstanz.de wrote: >> >> Hi, >> the following query could work for you: >> declare namespace w="http://www.mediawiki.org/xml/export-0.5/"; >> for $i in doc("enwiki-latest-pages-articles")//w:sitename >> return $i[. contains text "Wikipedia"]/.. >> -- Andreas >> Am 09.04.2011 um 20:43 schrieb Erol Akarsu: >> >> Hi >> >> I am having difficulty in running full text operators. This script >> gives siteinfo below >> declare namespace w="http://www.mediawiki.org/xml/export-0.5/"; >> let $d := fn:doc ("enwiki-latest-pages-articles") >> return ($d//w:siteinfo)[1] >> >> But return $d//w:siteinfo[w:sitename contains text 'Wikipedia'] >> does >> NOT give same node >> Why "contains" ft operator behave incorrectly? I remember it was >> working fine. I just dropped and recreated database and turn all >> indexes. >> Can you help me? >> Query info is here: >> >> Query: declare namespace >> w="http://www.mediawiki.org/xml/export-0.5/"; >> Compiling: >> - pre-evaluating fn:doc("enwiki-latest-pages-articles") >> - adding text() step >> - optimizing descendant-or-self step(s) >> - removing path with no index results >> - pre-evaluating (())[1] >> - binding static variable $res >> - pre-evaluating fn:doc("enwiki-latest-pages-articles") >> - binding static variable $d >> - adding text() step >> - optimizing descendant-or-self step(s) >> - removing path with no index results >> - simplifying flwor expression >> Result: () >> Timing: >> - Parsing: 0.46 ms >> - Compiling: 0.42 ms >> - Evaluating: 0.17 ms >> - Printing: 0.1 ms >> - Total Time: 1.15 ms >> Query plan: >> <sequence size="0"/> >> >> >> >> <siteinfo xmlns="http://www.mediawiki.org/xml/export-0.5/" >> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance%22%3E >> <sitename>Wikipedia</sitename> >> <base>http://en.wikipedia.org/wiki/Main_Page</base> >> <generator>MediaWiki 1.17wmf1</generator> >> <case>first-letter</case> >> <namespaces> >> <namespace key="-2" case="first-letter">Media</namespace> >> <namespace key="-1" case="first-letter">Special</namespace> >> <namespace key="0" case="first-letter"/> >> <namespace key="1" case="first-letter">Talk</namespace> >> <namespace key="2" case="first-letter">User</namespace> >> <namespace key="3" case="first-letter">User talk</namespace> >> <namespace key="4" case="first-letter">Wikipedia</namespace> >> <namespace key="5" case="first-letter">Wikipedia >> talk</namespace> >> <namespace key="6" case="first-letter">File</namespace> >> <namespace key="7" case="first-letter">File talk</namespace> >> <namespace key="8" case="first-letter">MediaWiki</namespace> >> <namespace key="9" case="first-letter">MediaWiki >> talk</namespace> >> <namespace key="10" case="first-letter">Template</namespace> >> <namespace key="11" case="first-letter">Template >> talk</namespace> >> <namespace key="12" case="first-letter">Help</namespace> >> <namespace key="13" case="first-letter">Help talk</namespace> >> <namespace key="14" case="first-letter">Category</namespace> >> <namespace key="15" case="first-letter">Category >> talk</namespace> >> <namespace key="100" case="first-letter">Portal</namespace> >> <namespace key="101" case="first-letter">Portal >> talk</namespace> >> <namespace key="108" case="first-letter">Book</namespace> >> <namespace key="109" case="first-letter">Book talk</namespace> >> </namespaces> >> </siteinfo> >> >> On Mon, Apr 4, 2011 at 7:31 AM, Erol Akarsu eakarsu@gmail.com >> wrote: >>> >>> I imported wikipedia xml into basex and tried to search it. >>> >>> But searching it takes longer. >>> >>> I tried to search one element that is first child of whole >>> document >>> and it took 52 sec. >>> I know the XML file is very big 31GB. How can I optimize the >>> search? >>> >>> declare namespace w="http://www.mediawiki.org/xml/export-0.5/"; >>> >>> let $d := fn:doc ("enwiki-latest-pages-articles")//w:siteinfo >>> return $d >>> >>> Database info: >>> >>> > open enwiki-latest-pages-articles >>> Database 'enwiki-latest-pages-articles' opened in 778.49 ms. >>> > info database >>> Database Properties >>> Name: enwiki-latest-pages-articles >>> Size: 23356 MB >>> Nodes: 228090153 >>> Height: 6 >>> >>> Database Creation >>> Path: /mnt/hgfs/C/tmp/enwiki-latest-pages-articles.xml >>> Time Stamp: 03.04.2011 12:29:15 >>> Input Size: 30025 MB >>> Encoding: UTF-8 >>> Documents: 1 >>> Whitespace Chopping: ON >>> Entity Parsing: OFF >>> >>> Indexes >>> Up-to-date: true >>> Path Summary: ON >>> Text Index: ON >>> Attribute Index: ON >>> Full-Text Index: OFF >>> > >>> >>> >>> Timing info: >>> >>> Query: declare namespace >>> w="http://www.mediawiki.org/xml/export-0.5/"; >>> Compiling: >>> - pre-evaluating fn:doc("enwiki-latest-pages-articles") >>> - optimizing descendant-or-self step(s) >>> - binding static variable $d >>> - removing variable $d >>> - simplifying flwor expression >>> Result: element siteinfo { ... } >>> Timing: >>> - Parsing: 1.4 ms >>> - Compiling: 52599.0 ms >>> - Evaluating: 0.28 ms >>> - Printing: 0.62 ms >>> - Total Time: 52601.32 ms >>> Query plan: >>> <DBNode name="enwiki-latest-pages-articles" pre="5"/> >>> >>> >>> >>> Result of query: >>> >>> <siteinfo xmlns="http://www.mediawiki.org/xml/export-0.5/" >>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance%22%3E >>> <sitename>Wikipedia</sitename> >>> <base>http://en.wikipedia.org/wiki/Main_Page</base> >>> <generator>MediaWiki 1.17wmf1</generator> >>> <case>first-letter</case> >>> <namespaces> >>> <namespace key="-2" case="first-letter">Media</namespace> >>> <namespace key="-1" case="first-letter">Special</namespace> >>> <namespace key="0" case="first-letter"/> >>> <namespace key="1" case="first-letter">Talk</namespace> >>> <namespace key="2" case="first-letter">User</namespace> >>> <namespace key="3" case="first-letter">User talk</namespace> >>> <namespace key="4" case="first-letter">Wikipedia</namespace> >>> <namespace key="5" case="first-letter">Wikipedia >>> talk</namespace> >>> <namespace key="6" case="first-letter">File</namespace> >>> <namespace key="7" case="first-letter">File talk</namespace> >>> <namespace key="8" case="first-letter">MediaWiki</namespace> >>> <namespace key="9" case="first-letter">MediaWiki >>> talk</namespace> >>> <namespace key="10" case="first-letter">Template</namespace> >>> <namespace key="11" case="first-letter">Template >>> talk</namespace> >>> <namespace key="12" case="first-letter">Help</namespace> >>> <namespace key="13" case="first-letter">Help talk</namespace> >>> <namespace key="14" case="first-letter">Category</namespace> >>> <namespace key="15" case="first-letter">Category >>> talk</namespace> >>> <namespace key="100" case="first-letter">Portal</namespace> >>> <namespace key="101" case="first-letter">Portal >>> talk</namespace> >>> <namespace key="108" case="first-letter">Book</namespace> >>> <namespace key="109" case="first-letter">Book talk</namespace> >>> </namespaces> >>> </siteinfo> >>> >>> >>> >>> Thanks >>> >>> Erol Akarsu >>> >> >> _______________________________________________ >> BaseX-Talk mailing list >> BaseX-Talk@mailman.uni-konstanz.de >> https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk >> >
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Hi Erol I had exactly the same question some time ago. As far as I know, BaseX does not have "native" JSON output supported. I was advised to use either some generic XSLT transformation, which converts XML into JSON, or use some Java library, which is supporting this and can be optionally called as imported function (I do not remembter which one, try searching archive or finding your own - there used to be some instruction how to use functions from jar archives as XQuery ones, but I have always problems finding this).
My real life project ended up with writting my own XQuery, generating text output. Any approach is making some assumptions and you easily find, that some parts of your system do not fit. My task was to generate json output, being usable for SIMILE framework and I had to follow the JSON structure, which SIMILE was expecting.
Good luck
Jan
*Ing. Jan Vlčinský* CAD programy Slunečnicová 338/3, 734 01 Karviná Ráj, Czech Republic tel: +420-597 602 024; mob: +420-608 979 040 skype: janvlcinsky; GoogleTalk: jan.vlcinsky@gmail.com http://cz.linkedin.com/in/vlcinsky
2011/5/13 Erol Akarsu eakarsu@gmail.com
Is there a way Basex server can send us JSON output instead of xml?
Hi Jan, Hi Erol,
maybe you could give https://github.com/douglascrockford/JSON-java a try, it looks booth clean & simple :-)
Kind regards Michael Am 13.05.2011 um 15:33 schrieb Jan Vlčinský (CAD):
ching archive or finding your own - there used to be some instruction how to use functions from jar archives as XQuery ones, but I have always problems finding this).
Jan,
Thanks for sharing your experience.
I am thinking to use Basex as database for product/services catalog and show them in html document. I think jquery will be more comfortable with json input that is also lightweight.
I did not compare the complexity of parsing xml and json. Do you have any experience?
Thanks
Erol Akarsu
On Fri, May 13, 2011 at 9:33 AM, Jan Vlčinský (CAD) < jan.vlcinsky@cad-programs.com> wrote:
Hi Erol I had exactly the same question some time ago. As far as I know, BaseX does not have "native" JSON output supported. I was advised to use either some generic XSLT transformation, which converts XML into JSON, or use some Java library, which is supporting this and can be optionally called as imported function (I do not remembter which one, try searching archive or finding your own - there used to be some instruction how to use functions from jar archives as XQuery ones, but I have always problems finding this).
My real life project ended up with writting my own XQuery, generating text output. Any approach is making some assumptions and you easily find, that some parts of your system do not fit. My task was to generate json output, being usable for SIMILE framework and I had to follow the JSON structure, which SIMILE was expecting.
Good luck
Jan
*Ing. Jan Vlčinský* CAD programy Slunečnicová 338/3, 734 01 Karviná Ráj, Czech Republic tel: +420-597 602 024; mob: +420-608 979 040 skype: janvlcinsky; GoogleTalk: jan.vlcinsky@gmail.com http://cz.linkedin.com/in/vlcinsky
2011/5/13 Erol Akarsu eakarsu@gmail.com
Is there a way Basex server can send us JSON output instead of xml?
Hi Erol and others.
1. Description, how to use existing Java functions as XQuery ones is here: http://docs.basex.org/wiki/Java_Bindings 2. Old thread about JSON is here: https://mailman.uni-konstanz.de/pipermail/basex-talk/2011-January/001076.htm... 3. I was comparing different formats for load and parsing time and I was really surprised, how effective JSON format is in this regard. So providing the data in JSON format is likely to be more effective for your (Erol) use case.
And thanks to Michael for pointing to JSON implementation for Java.
Jan
2011/5/13 Erol Akarsu eakarsu@gmail.com
Jan,
Thanks for sharing your experience.
I am thinking to use Basex as database for product/services catalog and show them in html document. I think jquery will be more comfortable with json input that is also lightweight.
I did not compare the complexity of parsing xml and json. Do you have any experience?
Thanks
Erol Akarsu
On Fri, May 13, 2011 at 9:33 AM, Jan Vlčinský (CAD) < jan.vlcinsky@cad-programs.com> wrote:
Hi Erol I had exactly the same question some time ago. As far as I know, BaseX does not have "native" JSON output supported. I was advised to use either some generic XSLT transformation, which converts XML into JSON, or use some Java library, which is supporting this and can be optionally called as imported function (I do not remembter which one, try searching archive or finding your own - there used to be some instruction how to use functions from jar archives as XQuery ones, but I have always problems finding this).
My real life project ended up with writting my own XQuery, generating text output. Any approach is making some assumptions and you easily find, that some parts of your system do not fit. My task was to generate json output, being usable for SIMILE framework and I had to follow the JSON structure, which SIMILE was expecting.
Good luck
Jan
*Ing. Jan Vlčinský* CAD programy Slunečnicová 338/3, 734 01 Karviná Ráj, Czech Republic tel: +420-597 602 024; mob: +420-608 979 040 skype: janvlcinsky; GoogleTalk: jan.vlcinsky@gmail.com http://cz.linkedin.com/in/vlcinsky
2011/5/13 Erol Akarsu eakarsu@gmail.com
Is there a way Basex server can send us JSON output instead of xml?
basex-talk@mailman.uni-konstanz.de