Hi all.
I've got another question about attribute indexing. The function index:attributes() returns strings like
... <entry count="1">id152429</entry> ...
but how can I get objects in my data base that contain these indexes?
Markin Alex.
Hi Alex,
but how can I get objects in my data base that contain these indexes?
just use ordinary XPath/XQuery expressions, such as..
db:open('db')//*[@* = 'id152429']
You can also directly retrieve text and attribute nodes, e.g. as follows [1]:
db:attribute('db', 'id152429')
Hope this helps, Christian
Thanks, but don't think this helps.
The problem is that i need a function like
declare function t:getIdByName($arg) { let $x := collection($dbName)/module//object[@name = $arg] return if( empty($x/@id) ) then "-1" else $x/@id };
But as I understand, BaseX doesn't have runtime query optimization, that's why this variant works too long. I thought, indexes could help to solve this problem.
2012/6/4 Christian Grün christian.gruen@gmail.com
Hi Alex,
but how can I get objects in my data base that contain these indexes?
just use ordinary XPath/XQuery expressions, such as..
db:open('db')//*[@* = 'id152429']
You can also directly retrieve text and attribute nodes, e.g. as follows [1]:
db:attribute('db', 'id152429')
Hope this helps, Christian
declare function t:getIdByName($arg) { let $x := collection($dbName)/module//object[@name = $arg] return if( empty($x/@id) ) then "-1" else $x/@id };
Your query will be optimized if $dbName is globally declared (which I guess it is anyway), and if $arg is specified as string:
declare variable $dbName := '...'; declare function local:getIdByName($arg as xs:string) { let $x := collection($dbName)/module//object[@name = $arg] return if( empty($x/@id) ) then "-1" else $x/@id };
You may as well use db:attributes() to explicitly access the index (if available):
declare variable $dbName := '...'; declare function local:getIdByName($arg) { let $x := db:attribute($dbName, $arg, 'name')/parent::object[ancestor::module] return if( empty($x/@id) ) then "-1" else $x/@id };
Hope this helps, Christian
I tried to do it in this way, here is query I've run:
declare namespace lang = "http://s3r.ru/ns/code-language"; declare namespace t = "none";
declare variable $dbName := "test2";
declare function t:getIdByName($arg as xs:string) { let $x := db:attribute($dbName, $arg, 'name')/parent::object[ancestor::module] return if( empty($x/@id) ) then "-1" else $x/@id };
for $x in collection($dbName)/module//flow/argument/atom[@lang:tp = "function_call_name"]/@name let $id := t:getIdByName($x) return $id
The collection 'test2' contains 100 documents (it is very small collection). The query executed 2017156.38 ms (about 30 minutes). This is too slow :(
2012/6/5 Christian Grün christian.gruen@gmail.com
declare function t:getIdByName($arg) { let $x := collection($dbName)/module//object[@name = $arg] return if( empty($x/@id) ) then "-1" else $x/@id };
Your query will be optimized if $dbName is globally declared (which I guess it is anyway), and if $arg is specified as string:
declare variable $dbName := '...'; declare function local:getIdByName($arg as xs:string) { let $x := collection($dbName)/module//object[@name = $arg] return if( empty($x/@id) ) then "-1" else $x/@id };
You may as well use db:attributes() to explicitly access the index (if available):
declare variable $dbName := '...'; declare function local:getIdByName($arg) { let $x := db:attribute($dbName, $arg, 'name')/parent::object[ancestor::module] return if( empty($x/@id) ) then "-1" else $x/@id };
Hope this helps, Christian
..that's indeed slow. Some questions that might help:
-- did you try both queries that I sent to you? -- did you have a look at the query plan (e.g. in the Info View), and was the index chosen? If not, did you try to optimize your database, or rebuild the index structures? -- feel free to send us your test data (you may send it directly to us); we'll have a look at it some time soon __________________________________________
On Tue, Jun 5, 2012 at 10:17 AM, Alex Markin alexanius@gmail.com wrote:
I tried to do it in this way, here is query I've run:
declare namespace lang = "http://s3r.ru/ns/code-language"; declare namespace t = "none";
declare variable $dbName := "test2";
declare function t:getIdByName($arg as xs:string)
{ let $x := db:attribute($dbName, $arg, 'name')/parent::object[ancestor::module] return if( empty($x/@id) ) then "-1" else $x/@id };
for $x in collection($dbName)/module//flow/argument/atom[@lang:tp = "function_call_name"]/@name let $id := t:getIdByName($x) return $id
The collection 'test2' contains 100 documents (it is very small collection). The query executed 2017156.38 ms (about 30 minutes). This is too slow :(
2012/6/5 Christian Grün christian.gruen@gmail.com
declare function t:getIdByName($arg) { let $x := collection($dbName)/module//object[@name = $arg] return if( empty($x/@id) ) then "-1" else $x/@id };
Your query will be optimized if $dbName is globally declared (which I guess it is anyway), and if $arg is specified as string:
declare variable $dbName := '...'; declare function local:getIdByName($arg as xs:string) { let $x := collection($dbName)/module//object[@name = $arg] return if( empty($x/@id) ) then "-1" else $x/@id };
You may as well use db:attributes() to explicitly access the index (if available):
declare variable $dbName := '...'; declare function local:getIdByName($arg) { let $x := db:attribute($dbName, $arg, 'name')/parent::object[ancestor::module] return if( empty($x/@id) ) then "-1" else $x/@id };
Hope this helps, Christian
-- Yes, both variants are slow. -- Looked at the query plan, didn't find any information about indexing The creation of database was made in the following way:
$ ./basex -Vc"SET ATTRINDEX true;SET UPDINDEX true;CREATE DB test2" # loop, adding documents $ ./basex -Vc"OPEN test2;CREATE INDEX ATTRIBUTE;OPTIMIZE"
Then I ran quiery, inserting attribute 'id'. No attribute 'name' was changed. But executing for $i in index:attributes('test2', 'name') return $i query is still fast (and no information about indexing in query plan).
Then I tried to make the same queries for database backup before inserting attributes into nodes: declare namespace lang = "http://s3r.ru/ns/code-language"; declare namespace t = "none";
declare variable $dbName := "test2";
declare function t:getIdByName($arg as xs:string) { let $x := collection($dbName)/module//object[@name = $arg] return if( empty($x/@id) ) then "-1" else $x/@id };
for $x in collection($dbName)/module//flow/argument/atom[@lang:tp = "function_call_name"]/@name let $id := t:getIdByName($x) return $id
It was slow (1958006.4ms (~30 minutes)), but in the query plan there were the lines: Compiling: ... - applying attribute index ...
In attachment you can find xml files I put in the collection, and this is the script for updating attributes I used: declare namespace lang = "http://s3r.ru/ns/code-language"; declare namespace t = "none";
for $x in collection('test2')/module//object[@lang:tp = "function_body"] return insert node (attribute { 'id' } { generate-id($x) }) into $x
2012/6/5 Christian Grün christian.gruen@gmail.com
..that's indeed slow. Some questions that might help:
-- did you try both queries that I sent to you? -- did you have a look at the query plan (e.g. in the Info View), and was the index chosen? If not, did you try to optimize your database, or rebuild the index structures? -- feel free to send us your test data (you may send it directly to us); we'll have a look at it some time soon __________________________________________
On Tue, Jun 5, 2012 at 10:17 AM, Alex Markin alexanius@gmail.com wrote:
I tried to do it in this way, here is query I've run:
declare namespace lang = "http://s3r.ru/ns/code-language"; declare namespace t = "none";
declare variable $dbName := "test2";
declare function t:getIdByName($arg as xs:string)
{ let $x := db:attribute($dbName, $arg, 'name')/parent::object[ancestor::module] return if( empty($x/@id) ) then "-1" else $x/@id };
for $x in collection($dbName)/module//flow/argument/atom[@lang:tp = "function_call_name"]/@name let $id := t:getIdByName($x) return $id
The collection 'test2' contains 100 documents (it is very small
collection).
The query executed 2017156.38 ms (about 30 minutes). This is too slow :(
2012/6/5 Christian Grün christian.gruen@gmail.com
declare function t:getIdByName($arg) { let $x := collection($dbName)/module//object[@name = $arg] return if( empty($x/@id) ) then "-1" else $x/@id };
Your query will be optimized if $dbName is globally declared (which I guess it is anyway), and if $arg is specified as string:
declare variable $dbName := '...'; declare function local:getIdByName($arg as xs:string) { let $x := collection($dbName)/module//object[@name = $arg] return if( empty($x/@id) ) then "-1" else $x/@id };
You may as well use db:attributes() to explicitly access the index (if available):
declare variable $dbName := '...'; declare function local:getIdByName($arg) { let $x := db:attribute($dbName, $arg, 'name')/parent::object[ancestor::module] return if( empty($x/@id) ) then "-1" else $x/@id };
Hope this helps, Christian
Hi Alex,
thanks for the reproducible test files. It seems your main loop returns many duplicates. The attached query should be processed much faster (on my machine, it's processed in 500ms). I didn't work with updindex=true, though.
$ ./basex -Vc"SET ATTRINDEX true;SET UPDINDEX true;CREATE DB test2" # loop, adding documents
Assuming that all imported files are located in a local directory.. Did you try to add all documents in one go?
basex -Vc"CREATE DB test2 /path/to/back"
$ ./basex -Vc"OPEN test2;CREATE INDEX ATTRIBUTE;OPTIMIZE"
By the way, there is no need to create the attribute index, as this will be automatically done by optimize. You could as well try to run "optimize all", which will rebuild the "updindex" data structures.
Christian
..and a little update: you should get similar performance with updindex=true.
The main bottleneck of your query was that ~20'000 index requests were performed, most of which requested a certain index entry with a large result set. By using distinct-values(), there are still around 1'500 index requests. As the challenging index entry from the first query will only be requested once, however, the query will be much faster (in this case, the average time for an index requests will be around 0.3ms).
Hope this helps, Christian ___________________________
On Tue, Jun 5, 2012 at 6:16 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi Alex,
thanks for the reproducible test files. It seems your main loop returns many duplicates. The attached query should be processed much faster (on my machine, it's processed in 500ms). I didn't work with updindex=true, though.
$ ./basex -Vc"SET ATTRINDEX true;SET UPDINDEX true;CREATE DB test2" # loop, adding documents
Assuming that all imported files are located in a local directory.. Did you try to add all documents in one go?
basex -Vc"CREATE DB test2 /path/to/back"
$ ./basex -Vc"OPEN test2;CREATE INDEX ATTRIBUTE;OPTIMIZE"
By the way, there is no need to create the attribute index, as this will be automatically done by optimize. You could as well try to run "optimize all", which will rebuild the "updindex" data structures.
Christian
Thank you very much for answer.
That does not solve my problem but, you gave me few ideas. Really, it would be great, if you make the possibility of getting node by index for example like here http://www.sedna.org/progguide/ProgGuidesu5.html#x9-310002.2.2.
Best wishes, Markin Alex.
2012/6/5 Christian Grün christian.gruen@gmail.com
..and a little update: you should get similar performance with updindex=true.
The main bottleneck of your query was that ~20'000 index requests were performed, most of which requested a certain index entry with a large result set. By using distinct-values(), there are still around 1'500 index requests. As the challenging index entry from the first query will only be requested once, however, the query will be much faster (in this case, the average time for an index requests will be around 0.3ms).
Hope this helps, Christian ___________________________
On Tue, Jun 5, 2012 at 6:16 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi Alex,
thanks for the reproducible test files. It seems your main loop returns many duplicates. The attached query should be processed much faster (on my machine, it's processed in 500ms). I didn't work with updindex=true, though.
$ ./basex -Vc"SET ATTRINDEX true;SET UPDINDEX true;CREATE DB test2" # loop, adding documents
Assuming that all imported files are located in a local directory.. Did you try to add all documents in one go?
basex -Vc"CREATE DB test2 /path/to/back"
$ ./basex -Vc"OPEN test2;CREATE INDEX ATTRIBUTE;OPTIMIZE"
By the way, there is no need to create the attribute index, as this will be automatically done by optimize. You could as well try to run "optimize all", which will rebuild the "updindex" data structures.
Christian
Well, my problem is the following: to parse the the source code, give id to function implementations and to connect these id with calls of functions. The difficulty appears because of existing different functions with same names (function overloading). So using distinct-values() excludes overloaded functions.
Now I understand that most likely I was going in wrong way at all.
Markin Alex.
2012/6/8 Christian Grün christian.gruen@gmail.com
That does not solve my problem but, you gave me few ideas. Really, it
would
be great, if you make the possibility of getting node by index for
example
like here.
I wonder how would that look like? And, next.. What's the problem you couldn't solve yet? Christian
basex-talk@mailman.uni-konstanz.de