Greetings,
I need to generate a list of all the distinct values within an index.
Is there a way to retrieve a sequence of all values within an attribute index?
Regards Alex tech.jahtoe.com bafila.jahtoe.com
Is there a way to retrieve a sequence of all values within an attribute index?
Yep: http://docs.basex.org/wiki/Index_Module#index:attributes
any way to retrieve the index for a specific attribute name?
This returns the whole index
xquery version "3.0"; declare namespace file="http://expath.org/ns/file"; declare namespace index="http://basex.org/modules/index";
let $periods := index:attributes("13F") return file:write('/var/www/appusec3.jahtoe.com/xml/periods.xml', $periods)
<entry count="57">000032</entry> <entry count="48">000033</entry> <entry count="48">000034</entry> <entry count="38">000035</entry> <entry count="40">000036</entry> <entry count="35">000037</entry>
I specify 4 attribute to be in the index
<commands> <set option='attrinclude'>accno,filingDate,form13FFileNumber,periodOfReport</set> <create-db name='13F'/> <info-db/> </commands>
I just require the index values for periodOfReport
Regards Alex tech.jahtoe.com bafila.jahtoe.com
On Mon, Jul 11, 2016 at 5:42 PM, Christian Grün christian.gruen@gmail.com wrote:
Is there a way to retrieve a sequence of all values within an attribute index?
Yep: http://docs.basex.org/wiki/Index_Module#index:attributes
any way to retrieve the index for a specific attribute name?
Nope, sorry. The index itself has no information on the location of the text and attribute values. You’ll have to use distinct-values:
distinct-values(//periodOfReport)
If the number of distinct values is smaller than MAXCATS [1], the path index will be utilized to speed up your query [2]. You can set MAXCATS to a much larger value, but this might slow down the time required for opening a database.
Hope this helps Christian
[1] http://docs.basex.org/wiki/Options#MAXCATS [2] http://docs.basex.org/wiki/Indexes#Path_Index
Okay thanks
I wrote the following query which returns 59 distinct periods from a 8gb db.. It's quite slow but it works
let $periods := distinct-values(db:open("13F")//data/@periodOfReport) let $transform := <periods> { for $period in $periods return <period>{$period}</period> } </periods>
return file:write('/var/www/appusec3.jahtoe.com/xml/periods.xml', $transform)
Regards Alex tech.jahtoe.com bafila.jahtoe.com
On Mon, Jul 11, 2016 at 6:21 PM, Christian Grün christian.gruen@gmail.com wrote:
any way to retrieve the index for a specific attribute name?
Nope, sorry. The index itself has no information on the location of the text and attribute values. You’ll have to use distinct-values:
distinct-values(//periodOfReport)
If the number of distinct values is smaller than MAXCATS [1], the path index will be utilized to speed up your query [2]. You can set MAXCATS to a much larger value, but this might slow down the time required for opening a database.
Hope this helps Christian
[1] http://docs.basex.org/wiki/Options#MAXCATS [2] http://docs.basex.org/wiki/Indexes#Path_Index
I wrote the following query which returns 59 distinct periods from a 8gb db.. It's quite slow but it works
Ah, well… I guess that all the values are numeric? In that case, only the min and max value will be stored in the statistics (and that won’t help you in fact). Bad luck. You can call index:facets("13F") to get more insight… Maybe we can fix that in future and store distinct numbers as well.
let $periods := distinct-values(db:open("13F")//data/@periodOfReport) let $transform :=
<periods> { for $period in $periods return <period>{$period}</period> } </periods>
return file:write('/var/www/appusec3.jahtoe.com/xml/periods.xml', $transform)
Regards Alex tech.jahtoe.com bafila.jahtoe.com
On Mon, Jul 11, 2016 at 6:21 PM, Christian Grün christian.gruen@gmail.com wrote:
any way to retrieve the index for a specific attribute name?
Nope, sorry. The index itself has no information on the location of the text and attribute values. You’ll have to use distinct-values:
distinct-values(//periodOfReport)
If the number of distinct values is smaller than MAXCATS [1], the path index will be utilized to speed up your query [2]. You can set MAXCATS to a much larger value, but this might slow down the time required for opening a database.
Hope this helps Christian
[1] http://docs.basex.org/wiki/Options#MAXCATS [2] http://docs.basex.org/wiki/Indexes#Path_Index
I have a similar situation in which I want to get all distinct values of a specific attribute. I’ve tried using 2 different approaches: group and distinct-values. On small or medium size databases group tends to be faster. When trying to get distinct values of a specific attribute from large databases however both approaches are timing out for me. I’m looking for a way to optimize this query:
distinct-values(for $db in db:list() return distinct-values(db:open($db)//@sec-type))
Thanks, Vincent
From: basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.de] On Behalf Of Christian Grün Sent: Monday, July 11, 2016 2:46 PM To: Alex Muir alex.g.muir@gmail.com Cc: BaseX basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] retrieve a sequence of all values within an attribute index
I wrote the following query which returns 59 distinct periods from a 8gb db.. It's quite slow but it works
Ah, well… I guess that all the values are numeric? In that case, only the min and max value will be stored in the statistics (and that won’t help you in fact). Bad luck. You can call index:facets("13F") to get more insight… Maybe we can fix that in future and store distinct numbers as well.
let $periods := distinct-values(db:open("13F")//data/@periodOfReport) let $transform :=
<periods> { for $period in $periods return <period>{$period}</period> } </periods>
return file:write('/var/www/appusec3.jahtoe.com/xml/periods.xml', $transform)
Regards Alex tech.jahtoe.com bafila.jahtoe.com
On Mon, Jul 11, 2016 at 6:21 PM, Christian Grün <christian.gruen@gmail.commailto:christian.gruen@gmail.com> wrote:
any way to retrieve the index for a specific attribute name?
Nope, sorry. The index itself has no information on the location of the text and attribute values. You’ll have to use distinct-values:
distinct-values(//periodOfReport)
If the number of distinct values is smaller than MAXCATS [1], the path index will be utilized to speed up your query [2]. You can set MAXCATS to a much larger value, but this might slow down the time required for opening a database.
Hope this helps Christian
[1] http://docs.basex.org/wiki/Options#MAXCATShttp://docs.basex.org/wiki/Options#MAXCATS [2] http://docs.basex.org/wiki/Indexes#Path_Indexhttp://docs.basex.org/wiki/Indexes#Path_Index
On Mon, Jul 11, 2016 at 7:17 PM, Lizzi, Vincent < Vincent.Lizzi@taylorandfrancis.com> wrote:
I have a similar situation in which I want to get all distinct values of a specific attribute. I’ve tried using 2 different approaches: group and distinct-values. On small or medium size databases group tends to be faster. When trying to get distinct values of a specific attribute from large databases however both approaches are timing out for me. I’m looking for a way to optimize this query:
distinct-values(for $db in db:list() return distinct-values(db:open($db)//@sec-type))
With the current logic available it look possible given an attribute index on sec-type to associate a prefix onto the attribute value prior to insertion into the database like sec-type="type:13F" with the prefix type: and then use *index:texts*("dbname","type:") to get a distinct list of types all be it with a prefix that would need adjusting logic in using that data or querying.
Regards Alex tech.jahtoe.com bafila.jahtoe.com
Hi Vincent, hi Alex,
I am glad to report that with BaseX 8.6 the distinct values of numeric elements and attributes will also be stored in the index. You are invited to check out the latest stable snapshot [1].
Cheers, Christian
[1] http://files.basex.org/releases/latest/
On Mon, Jul 11, 2016 at 10:56 PM, Alex Muir alex.g.muir@gmail.com wrote:
On Mon, Jul 11, 2016 at 7:17 PM, Lizzi, Vincent Vincent.Lizzi@taylorandfrancis.com wrote:
I have a similar situation in which I want to get all distinct values of a specific attribute. I’ve tried using 2 different approaches: group and distinct-values. On small or medium size databases group tends to be faster. When trying to get distinct values of a specific attribute from large databases however both approaches are timing out for me. I’m looking for a way to optimize this query:
distinct-values(for $db in db:list() return distinct-values(db:open($db)//@sec-type))
With the current logic available it look possible given an attribute index on sec-type to associate a prefix onto the attribute value prior to insertion into the database like sec-type="type:13F" with the prefix type: and then use index:texts("dbname","type:") to get a distinct list of types all be it with a prefix that would need adjusting logic in using that data or querying.
Regards Alex tech.jahtoe.com bafila.jahtoe.com
Hi Christian,
Thank you!!! I will have a look.
Vincent
From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: Sunday, October 09, 2016 4:49 PM To: Alex Muir alex.g.muir@gmail.com Cc: Lizzi, Vincent Vincent.Lizzi@taylorandfrancis.com; BaseX basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] retrieve a sequence of all values within an attribute index
Hi Vincent, hi Alex,
I am glad to report that with BaseX 8.6 the distinct values of numeric elements and attributes will also be stored in the index. You are invited to check out the latest stable snapshot [1].
Cheers, Christian
[1] http://files.basex.org/releases/latest/http://files.basex.org/releases/latest/
On Mon, Jul 11, 2016 at 10:56 PM, Alex Muir <alex.g.muir@gmail.commailto:alex.g.muir@gmail.com> wrote:
On Mon, Jul 11, 2016 at 7:17 PM, Lizzi, Vincent <Vincent.Lizzi@taylorandfrancis.commailto:Vincent.Lizzi@taylorandfrancis.com> wrote:
I have a similar situation in which I want to get all distinct values of a specific attribute. I’ve tried using 2 different approaches: group and distinct-values. On small or medium size databases group tends to be faster. When trying to get distinct values of a specific attribute from large databases however both approaches are timing out for me. I’m looking for a way to optimize this query:
distinct-values(for $db in db:list() return distinct-values(db:open($db)//@sec-type))
With the current logic available it look possible given an attribute index on sec-type to associate a prefix onto the attribute value prior to insertion into the database like sec-type="type:13F" with the prefix type: and then use index:texts("dbname","type:") to get a distinct list of types all be it with a prefix that would need adjusting logic in using that data or querying.
Regards Alex tech.jahtoe.com bafila.jahtoe.com
basex-talk@mailman.uni-konstanz.de