Hi Wiard,
But what if I want what is in the 'idno' ? <idno type="jlb">012</idno>
Or : vg:placeLetLondon</vg:placeLet>
* *This should work:* declare namespace ns = "http://www.tei-c.org/ns/1.0"; //ns:idno* ... and this: *declare namespace ns = "http://www.vangoghletters.org/ns/"; //ns:placeLet
*Concerning your last email I guess you already figured it out yourself, right? Otherwise, just let me know.
Have a nice day, regards, Lukas
On Fri, May 20, 2011 at 10:26 AM, Wiard Vasen wiard.vasen@gmail.com wrote:
Hi Lukas,
*//*:fileDesc *works, and gives the whole content of the file back. Starting with the term fileDesc in 'brackets'.
When I use //*:sourceDesc or note I get a bit the same result.
But what if I want what is in the 'idno' ? <idno type="jlb">012</idno>
Or : vg:placeLetLondon</vg:placeLet>
How do I get these results?
Thanks in advance.
Regards,
Wiard
2011/5/20 Wiard Vasen wiard.vasen@gmail.com
Hey Lucas,
thanks a lot for your help! I have a look at your example.
Regards,
Wiard
2011/5/20 Lukas Kircher lukaskircher1@googlemail.com
Hi Wiard,
being neither Christian nor Andreas, I nevertheless take a shot :)
It's kinda hard to give a hint knowing that little about your case, but I think you problem might be related to namespaces.
As the root element 'TEI' declares a namespace, all descendants lie in this namespace (unless they declare another one or are linked to a prefix).
Short example - if you want to query a letDesc element you have to specify the namespace it is linked to. Just declare the prefix at the beginning of your query.
*declare namespace ns = "http://www.vangoghletters.org/ns/"; //ns:letDesc
*You can also use namespace wildcards to access nodes without a specific prefix, for example the 'fileDesc' element:
*//*:fileDesc
*Hope this helps you a little - don't hesitate to ask for more.
I also included a link to a short discussion [1]. I'm afraid I can't provide with a better tutorial about namespaces - it's a rare thing.
Regards, Lukas
[1] http://www.stylusstudio.com/xquerytalk/200608/001654.html
2011/5/20 Wiard Vasen wiard.vasen@gmail.com
Dear Christian, Andreas,
I have trouble querying the following text. Could you have a look at it and show me how I can pose different queries on the file?
I would be grateful if you did!
Thanks in advance.
Kind regards,
Wiard
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Hi Lukas,
Thanks a lot for your help!
Regards,
Wiard
2011/5/20 Lukas Kircher lukaskircher1@googlemail.com
Hi Wiard,
But what if I want what is in the 'idno' ? <idno type="jlb">012</idno>
Or : vg:placeLetLondon</vg:placeLet>
This should work:* declare namespace ns = "http://www.tei-c.org/ns/1.0"; //ns:idno* ... and this:
declare namespace ns = "http://www.vangoghletters.org/ns/"; //ns:placeLet
*Concerning your last email I guess you already figured it out yourself, right? Otherwise, just let me know.
Have a nice day, regards, Lukas
On Fri, May 20, 2011 at 10:26 AM, Wiard Vasen wiard.vasen@gmail.comwrote:
Hi Lukas,
*//*:fileDesc *works, and gives the whole content of the file back. Starting with the term fileDesc in 'brackets'.
When I use //*:sourceDesc or note I get a bit the same result.
But what if I want what is in the 'idno' ? <idno type="jlb">012</idno>
Or : vg:placeLetLondon</vg:placeLet>
How do I get these results?
Thanks in advance.
Regards,
Wiard
2011/5/20 Wiard Vasen wiard.vasen@gmail.com
Hey Lucas,
thanks a lot for your help! I have a look at your example.
Regards,
Wiard
2011/5/20 Lukas Kircher lukaskircher1@googlemail.com
Hi Wiard,
being neither Christian nor Andreas, I nevertheless take a shot :)
It's kinda hard to give a hint knowing that little about your case, but I think you problem might be related to namespaces.
As the root element 'TEI' declares a namespace, all descendants lie in this namespace (unless they declare another one or are linked to a prefix).
Short example - if you want to query a letDesc element you have to specify the namespace it is linked to. Just declare the prefix at the beginning of your query.
*declare namespace ns = "http://www.vangoghletters.org/ns/"; //ns:letDesc
*You can also use namespace wildcards to access nodes without a specific prefix, for example the 'fileDesc' element:
*//*:fileDesc
*Hope this helps you a little - don't hesitate to ask for more.
I also included a link to a short discussion [1]. I'm afraid I can't provide with a better tutorial about namespaces - it's a rare thing.
Regards, Lukas
[1] http://www.stylusstudio.com/xquerytalk/200608/001654.html
2011/5/20 Wiard Vasen wiard.vasen@gmail.com
Dear Christian, Andreas,
I have trouble querying the following text. Could you have a look at it and show me how I can pose different queries on the file?
I would be grateful if you did!
Thanks in advance.
Kind regards,
Wiard
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Hi Lukas,
Having a thousand xml-files of the Van Gogh Letters, I want to query them like this:
Given the letters from Arles, how many letters have the term 'Gauguin'? Or: Given the letters from Arles, how many letters have the term 'Gauguin AND Pissarro'?
I give you an earlier solution of Leonard from the BaseX mailing -list, which might be useful to you.
let $range := 1 to 640 for $doc in collection('brievenvangogh') let $uri := base-uri($doc), $num := substring($uri, string-length($uri) - 6, 3) where $num castable as xs:integer and xs:integer($num) = $range return <document uri='{$uri}'>{ for $n score $s in $doc//*[text() contains text 'gauguin'] return <hit score='{$s}'>{ $n }</hit> }</document>
Can you see from the query, printed in blue, whether it gives the tf/idf score? While making the database I checked the td/idf score in the full text search option.
Regards,
Wiard
2011/5/20 Wiard Vasen wiard.vasen@gmail.com
Hi Lukas,
Thanks a lot for your help!
Regards,
Wiard
2011/5/20 Lukas Kircher lukaskircher1@googlemail.com
Hi Wiard,
But what if I want what is in the 'idno' ? <idno type="jlb">012</idno>
Or : vg:placeLetLondon</vg:placeLet>
This should work:* declare namespace ns = "http://www.tei-c.org/ns/1.0"; //ns:idno* ... and this:
declare namespace ns = "http://www.vangoghletters.org/ns/"; //ns:placeLet
*Concerning your last email I guess you already figured it out yourself, right? Otherwise, just let me know.
Have a nice day, regards, Lukas
On Fri, May 20, 2011 at 10:26 AM, Wiard Vasen wiard.vasen@gmail.comwrote:
Hi Lukas,
*//*:fileDesc *works, and gives the whole content of the file back. Starting with the term fileDesc in 'brackets'.
When I use //*:sourceDesc or note I get a bit the same result.
But what if I want what is in the 'idno' ? <idno type="jlb">012</idno>
Or : vg:placeLetLondon</vg:placeLet>
How do I get these results?
Thanks in advance.
Regards,
Wiard
2011/5/20 Wiard Vasen wiard.vasen@gmail.com
Hey Lucas,
thanks a lot for your help! I have a look at your example.
Regards,
Wiard
2011/5/20 Lukas Kircher lukaskircher1@googlemail.com
Hi Wiard,
being neither Christian nor Andreas, I nevertheless take a shot :)
It's kinda hard to give a hint knowing that little about your case, but I think you problem might be related to namespaces.
As the root element 'TEI' declares a namespace, all descendants lie in this namespace (unless they declare another one or are linked to a prefix).
Short example - if you want to query a letDesc element you have to specify the namespace it is linked to. Just declare the prefix at the beginning of your query.
*declare namespace ns = "http://www.vangoghletters.org/ns/"; //ns:letDesc
*You can also use namespace wildcards to access nodes without a specific prefix, for example the 'fileDesc' element:
*//*:fileDesc
*Hope this helps you a little - don't hesitate to ask for more.
I also included a link to a short discussion [1]. I'm afraid I can't provide with a better tutorial about namespaces - it's a rare thing.
Regards, Lukas
[1] http://www.stylusstudio.com/xquerytalk/200608/001654.html
2011/5/20 Wiard Vasen wiard.vasen@gmail.com
Dear Christian, Andreas,
I have trouble querying the following text. Could you have a look at it and show me how I can pose different queries on the file?
I would be grateful if you did!
Thanks in advance.
Kind regards,
Wiard
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Wiard,
Given the letters from Arles, how many letters have the term 'Gauguin'? Or: Given the letters from Arles, how many letters have the term 'Gauguin AND Pissarro'?
It depends on the input data. Did you manage to write a query that pre-selects the letters from Arles?
Can you see from the query, printed in blue, whether it gives the tf/idf score? While making the database I checked the td/idf score in the full text search option.
You'll have to check the query info to see if the tf/idf scoring is used. If the compilation steps include sth like "Applying full-text index", you can be sure that tf/idf is used as scoring. Otherwise, the default scoring (..which often yields better results for XML documents..) is utilized.
Hope this helps, Christian
Hi Lukas,
Yes. I have a query per-selecting te letters from Arles.
I do this by entering the numbers of the letters which are written in Arles. Letters in this range are from Arles.
let $range := 577 to 771 for $doc in collection('brievenvangogh') let $uri := base-uri($doc), $num := substring($uri, string-length($uri) - 6, 3) where $num castable as xs:integer and xs:integer($num) = $range return <document uri='{$uri}'>{ for $n score $s in $doc//*[text() contains text 'gauguin'] return <hit score='{$s}'>{ $n }</hit> }</document>
I wonder if you see a query in this line of code which could do the required action:
for $n score $s in $doc//*[text() contains text 'gauguin']
Something like: for $n score $s in $doc//*[text() contains text 'gauguin' AND contains text 'pissarro']
Regards,
Wiard
2011/5/20 Christian Grün christian.gruen@gmail.com
Wiard,
Given the letters from Arles, how many letters have the term 'Gauguin'? Or: Given the letters from Arles, how many letters have the term 'Gauguin AND Pissarro'?
It depends on the input data. Did you manage to write a query that pre-selects the letters from Arles?
Can you see from the query, printed in blue, whether it gives the tf/idf score? While making the database I checked the td/idf score in the full text
search
option.
You'll have to check the query info to see if the tf/idf scoring is used. If the compilation steps include sth like "Applying full-text index", you can be sure that tf/idf is used as scoring. Otherwise, the default scoring (..which often yields better results for XML documents..) is utilized.
Hope this helps, Christian
Something like: for $n score $s in $doc//*[text() contains text 'gauguin' AND contains text 'pissarro']
Looks fine; it should work if you switch "and" to lower case. You may also try:
for $n score $s in $doc//*[text() contains text 'gauguin' ftand 'pissarro']
for $n score $s in $doc//*[text() contains text { 'gauguin', 'pissarro' } all]
Christian
Hi Christian,
It does work.
This is the one which works: for $n score $s in $doc//*[text() contains text {'pissarro', 'gauguin'} ]
Thank you all for the help!
Regards,
Wiard
2011/5/20 Christian Grün christian.gruen@gmail.com
Something like: for $n score $s in $doc//*[text() contains text 'gauguin' AND contains text 'pissarro']
Looks fine; it should work if you switch "and" to lower case. You may also try:
for $n score $s in $doc//*[text() contains text 'gauguin' ftand 'pissarro']
for $n score $s in $doc//*[text() contains text { 'gauguin', 'pissarro' } all]
Christian
This is the one which works: for $n score $s in $doc//*[text() contains text {'pissarro', 'gauguin'} ]
Please note that the quoted query will either choose Pissarro or Gaugin; see e.g.:
"A" contains text { "A", "B" }
You'll have to append the "all" modifier to get sure that all artist names will be returned.
Christian
You are right. I was just investigating on this problem. Though: for $n score $s in $doc//*[text() contains text {'gauguin','pissarro','monet'}all ] gives no text from the documents!?
And: for $n score $s in $doc//*[text() contains text {'gauguin','pissarro','monet'} ] gives all the documents containing 'gauguin'.
I have tried this one: for $n score $s in $doc//*[text() contains text {'gauguin',and,'pissarro',and,'cezanne'} all] Doesn't work properly, either.
Maybe you see a solution?
2011/5/20 Christian Grün christian.gruen@gmail.com
This is the one which works: for $n score $s in $doc//*[text() contains text {'pissarro', 'gauguin'} ]
Please note that the quoted query will either choose Pissarro or Gaugin; see e.g.:
"A" contains text { "A", "B" }
You'll have to append the "all" modifier to get sure that all artist names will be returned.
Christian
You are probably right. That would be funny.
I am going to figure it out.
Regards.
2011/5/20 Christian Grün christian.gruen@gmail.com
$doc//*[text() contains text {'gauguin','pissarro','monet'}all ]
I would assume that none of the text nodes in your document contains all three names.
Hi Christian,
The query works. Thanks a lot!
Regards,
Wiard
2011/5/20 Wiard Vasen wiard.vasen@gmail.com
You are probably right. That would be funny.
I am going to figure it out.
Regards.
2011/5/20 Christian Grün christian.gruen@gmail.com
$doc//*[text() contains text {'gauguin','pissarro','monet'}all ]
I would assume that none of the text nodes in your document contains all three names.
Hi Christian,
I would like to make a counter for the number of documents containing the terms from the query. I made one in blue in the following query.
let $range := 1 to 800 for $doc in collection('brievenvangogh') let $uri := base-uri($doc), $num := substring($uri, string-length($uri) - 6, 3) where $num castable as xs:integer and xs:integer($num) = $range return <document uri='{$uri}'>{ let $t := 0 for $n score $s in $doc//*[text() contains text {'gauguin','pissarro'}all ] return $t:=+1; return <hit score='{$s}'>{$n}{$t} }</hit> } </document>
Could you help me make this counter work?
Thank you in advance.
Regards,
Wiard
2011/5/20 Wiard Vasen wiard.vasen@gmail.com
Hi Christian,
The query works. Thanks a lot!
Regards,
Wiard
2011/5/20 Wiard Vasen wiard.vasen@gmail.com
You are probably right. That would be funny.
I am going to figure it out.
Regards.
2011/5/20 Christian Grün christian.gruen@gmail.com
$doc//*[text() contains text {'gauguin','pissarro','monet'}all ]
I would assume that none of the text nodes in your document contains all three names.
XQuery is a functional language; as such, you can never update variables that have already been assigned. The count() function gives you the number of items in a sequence:
<hit score='{ $s }'>{ $n, count($n) }</hit> ___________________________
On Fri, May 20, 2011 at 5:05 PM, Wiard Vasen wiard.vasen@gmail.com wrote:
Hi Christian, I would like to make a counter for the number of documents containing the terms from the query. I made one in blue in the following query. let $range := 1 to 800 for $doc in collection('brievenvangogh') let $uri := base-uri($doc), $num := substring($uri, string-length($uri) - 6, 3) where $num castable as xs:integer and xs:integer($num) = $range return <document uri='{$uri}'>{ let $t := 0 for $n score $s in $doc//*[text() contains text {'gauguin','pissarro'}all ] return $t:=+1; return <hit score='{$s}'>{$n}{$t} }</hit> }
</document> Could you help me make this counter work? Thank you in advance. Regards, Wiard 2011/5/20 Wiard Vasen <wiard.vasen@gmail.com> > > Hi Christian, > The query works. > Thanks a lot! > Regards, > Wiard > > 2011/5/20 Wiard Vasen <wiard.vasen@gmail.com> >> >> You are probably right. >> That would be funny. >> I am going to figure it out. >> Regards. >> >> 2011/5/20 Christian Grün <christian.gruen@gmail.com> >>> >>> > $doc//*[text() contains text {'gauguin','pissarro','monet'}all ] >>> >>> I would assume that none of the text nodes in your document contains >>> all three names. >> >
Oke Christian,
I learned a lot from you. Have a nice weekend.
Regards,
Wiard
2011/5/20 Christian Grün christian.gruen@gmail.com
XQuery is a functional language; as such, you can never update variables that have already been assigned. The count() function gives you the number of items in a sequence:
<hit score='{ $s }'>{ $n, count($n) }</hit> ___________________________
On Fri, May 20, 2011 at 5:05 PM, Wiard Vasen wiard.vasen@gmail.com wrote:
Hi Christian, I would like to make a counter for the number of documents containing the terms from the query. I made one in blue in the following query. let $range := 1 to 800 for $doc in collection('brievenvangogh') let $uri := base-uri($doc), $num := substring($uri, string-length($uri) - 6, 3) where $num castable as xs:integer and xs:integer($num) = $range return <document uri='{$uri}'>{ let $t := 0 for $n score $s in $doc//*[text() contains text
{'gauguin','pissarro'}all
] return $t:=+1; return <hit score='{$s}'>{$n}{$t} }</hit> }
</document> Could you help me make this counter work? Thank you in advance. Regards, Wiard 2011/5/20 Wiard Vasen <wiard.vasen@gmail.com> > > Hi Christian, > The query works. > Thanks a lot! > Regards, > Wiard > > 2011/5/20 Wiard Vasen <wiard.vasen@gmail.com> >> >> You are probably right. >> That would be funny. >> I am going to figure it out. >> Regards. >> >> 2011/5/20 Christian Grün <christian.gruen@gmail.com> >>> >>> > $doc//*[text() contains text {'gauguin','pissarro','monet'}all ] >>> >>> I would assume that none of the text nodes in your document contains >>> all three names. >> >
Hi Christian,
This weekend I tried to make a query which counts the number of times a certain combination of terms occurs in my repository of xml-files.
I didn't succeed in finding a way to count the number of occurrences. Could you help me with this?
I have attached the result-file from the query to this mail.
As you have said in your last mail, the found letters have to be put in a sequence. After that the items in the sequence have to be counted.
I was thinking of a method like:
let $sequence := ( a method to get the items from the result)
let $count := count($sequence)
return <results> <count>{$count}</count> <items> {for $item in $sequence return <item>{$item}</item> } </items> </results>
let $range := 1 to 800 for $doc in collection('brievenvangogh') let $uri := base-uri($doc), $num := substring($uri, string-length($uri) - 6, 3) where $num castable as xs:integer and xs:integer($num) = $range return <document uri='{$uri}'>{
for $n score $s in $doc//*[text() contains text {'gauguin','pissarro'}all ] return <hit score='{ $s }'>{ $n, count($n) }</hit> } </document>
In my count($n) I get only the number 1.
Thank you very much in advance!
Regards,
Wiard
2011/5/20 Wiard Vasen wiard.vasen@gmail.com
Oke Christian,
I learned a lot from you. Have a nice weekend.
Regards,
Wiard
2011/5/20 Christian Grün christian.gruen@gmail.com
XQuery is a functional language; as such, you can never update variables that have already been assigned. The count() function gives you the number of items in a sequence:
<hit score='{ $s }'>{ $n, count($n) }</hit> ___________________________
On Fri, May 20, 2011 at 5:05 PM, Wiard Vasen wiard.vasen@gmail.com wrote:
Hi Christian, I would like to make a counter for the number of documents
containing the
terms from the query. I made one in blue in the following query. let $range := 1 to 800 for $doc in collection('brievenvangogh') let $uri := base-uri($doc), $num := substring($uri, string-length($uri) - 6, 3) where $num castable as xs:integer and xs:integer($num) = $range return <document uri='{$uri}'>{ let $t := 0 for $n score $s in $doc//*[text() contains text
{'gauguin','pissarro'}all
] return $t:=+1; return <hit score='{$s}'>{$n}{$t} }</hit> }
</document> Could you help me make this counter work? Thank you in advance. Regards, Wiard 2011/5/20 Wiard Vasen <wiard.vasen@gmail.com> > > Hi Christian, > The query works. > Thanks a lot! > Regards, > Wiard > > 2011/5/20 Wiard Vasen <wiard.vasen@gmail.com> >> >> You are probably right. >> That would be funny. >> I am going to figure it out. >> Regards. >> >> 2011/5/20 Christian Grün <christian.gruen@gmail.com> >>> >>> > $doc//*[text() contains text {'gauguin','pissarro','monet'}all ] >>> >>> I would assume that none of the text nodes in your document contains >>> all three names. >> >
Dear Wiard,
for $n score $s in $doc//*[text() contains text {'gauguin','pissarro'}all ] return <hit score='{ $s }'>{ $n, count($n) }</hit> }
</document> In my count($n) I get only the number 1.
true, that doesn't make sense. I'm sorry it's not possible (at least for now) to count the number of occurrences of search terms in a single hit. It could make sense, however, to extend our own set of full-text functions..
http://docs.basex.org/wiki/Full-Text_Functions
..with a new function that counts the number of full-text matches; something like:
ft:count ( $node[ . contains text { "a", "b" } ] )
Suggestions from everyone are welcome. Christian _____________________
On Sun, May 22, 2011 at 5:40 PM, Wiard Vasen wiard.vasen@gmail.com wrote:
Hi Christian, This weekend I tried to make a query which counts the number of times a certain combination of terms occurs in my repository of xml-files. I didn't succeed in finding a way to count the number of occurrences. Could you help me with this? I have attached the result-file from the query to this mail. As you have said in your last mail, the found letters have to be put in a sequence. After that the items in the sequence have to be counted. I was thinking of a method like: let $sequence := ( a method to get the items from the result)
let $count := count($sequence)
return
<results> <count>{$count}</count> <items> {for $item in $sequence return <item>{$item}</item> } </items> </results>
let $range := 1 to 800 for $doc in collection('brievenvangogh') let $uri := base-uri($doc), $num := substring($uri, string-length($uri) - 6, 3) where $num castable as xs:integer and xs:integer($num) = $range return <document uri='{$uri}'>{ for $n score $s in $doc//*[text() contains text {'gauguin','pissarro'}all ] return <hit score='{ $s }'>{ $n, count($n) }</hit> }
</document> In my count($n) I get only the number 1.
Wiard,
Leo just gave me a hint that our ft:mark() function can used as well to count the number of occurrences of terms in a full-text query. I hope that the following example gives you some clue:
for $doc in doc('test') let $terms := ('gauguin', 'pissarro') for $hit score $s in $doc//*[text() contains text { $terms }] let $c := count(ft:mark($hit[text() contains text { $terms }])/mark) return <hit score="{ $s }" count="{ $c }">{ $hit }</hit>
Note that, in this case, the contains text expression must be specified twice. Christian ___________________________
On Mon, May 23, 2011 at 12:32 AM, Christian Grün christian.gruen@gmail.com wrote:
Dear Wiard,
for $n score $s in $doc//*[text() contains text {'gauguin','pissarro'}all ] return <hit score='{ $s }'>{ $n, count($n) }</hit> }
</document> In my count($n) I get only the number 1.
true, that doesn't make sense. I'm sorry it's not possible (at least for now) to count the number of occurrences of search terms in a single hit. It could make sense, however, to extend our own set of full-text functions..
http://docs.basex.org/wiki/Full-Text_Functions
..with a new function that counts the number of full-text matches; something like:
ft:count ( $node[ . contains text { "a", "b" } ] )
Suggestions from everyone are welcome. Christian _____________________
On Sun, May 22, 2011 at 5:40 PM, Wiard Vasen wiard.vasen@gmail.com wrote:
Hi Christian, This weekend I tried to make a query which counts the number of times a certain combination of terms occurs in my repository of xml-files. I didn't succeed in finding a way to count the number of occurrences. Could you help me with this? I have attached the result-file from the query to this mail. As you have said in your last mail, the found letters have to be put in a sequence. After that the items in the sequence have to be counted. I was thinking of a method like: let $sequence := ( a method to get the items from the result)
let $count := count($sequence)
return <results> <count>{$count}</count> <items> {for $item in $sequence return <item>{$item}</item> } </items> </results>
let $range := 1 to 800 for $doc in collection('brievenvangogh') let $uri := base-uri($doc), $num := substring($uri, string-length($uri) - 6, 3) where $num castable as xs:integer and xs:integer($num) = $range return <document uri='{$uri}'>{ for $n score $s in $doc//*[text() contains text {'gauguin','pissarro'}all ] return <hit score='{ $s }'>{ $n, count($n) }</hit> }
</document> In my count($n) I get only the number 1.
..and a last one: I'm glad to tell you that we've just added the ft:count() function to our full-text module:
http://docs.basex.org/wiki/Full-Text_Functions#ft:count
Now you can either count the total number of occurrences of search terms in your documents:
let $terms := ('gauguin', 'pissarro') return ft:count( //*[text() contains text { $terms } ] )
...or return them hit by hit as e.g. follows:
for $doc in doc('test') let $terms := ('gauguin', 'pissarro') for $hit score $s in $doc//*[text() contains text { $terms }] let $c := ft:count($hit[text() contains text { $terms }]) return <hit score="{ $s }" count="{ $c }">{ $hit }</hit>
Please download the latest snapshot of BaseX to get the new feature working:
http://files.basex.org/releases/latest/
Christian ___________________________
On Mon, May 23, 2011 at 12:47 AM, Christian Grün christian.gruen@gmail.com wrote:
Wiard,
Leo just gave me a hint that our ft:mark() function can used as well to count the number of occurrences of terms in a full-text query. I hope that the following example gives you some clue:
for $doc in doc('test') let $terms := ('gauguin', 'pissarro') for $hit score $s in $doc//*[text() contains text { $terms }] let $c := count(ft:mark($hit[text() contains text { $terms }])/mark) return <hit score="{ $s }" count="{ $c }">{ $hit }</hit>
Note that, in this case, the contains text expression must be specified twice. Christian ___________________________
On Mon, May 23, 2011 at 12:32 AM, Christian Grün christian.gruen@gmail.com wrote:
Dear Wiard,
for $n score $s in $doc//*[text() contains text {'gauguin','pissarro'}all ] return <hit score='{ $s }'>{ $n, count($n) }</hit> }
</document> In my count($n) I get only the number 1.
true, that doesn't make sense. I'm sorry it's not possible (at least for now) to count the number of occurrences of search terms in a single hit. It could make sense, however, to extend our own set of full-text functions..
http://docs.basex.org/wiki/Full-Text_Functions
..with a new function that counts the number of full-text matches; something like:
ft:count ( $node[ . contains text { "a", "b" } ] )
Suggestions from everyone are welcome. Christian _____________________
On Sun, May 22, 2011 at 5:40 PM, Wiard Vasen wiard.vasen@gmail.com wrote:
Hi Christian, This weekend I tried to make a query which counts the number of times a certain combination of terms occurs in my repository of xml-files. I didn't succeed in finding a way to count the number of occurrences. Could you help me with this? I have attached the result-file from the query to this mail. As you have said in your last mail, the found letters have to be put in a sequence. After that the items in the sequence have to be counted. I was thinking of a method like: let $sequence := ( a method to get the items from the result)
let $count := count($sequence)
return <results> <count>{$count}</count> <items> {for $item in $sequence return <item>{$item}</item> } </items> </results>
let $range := 1 to 800 for $doc in collection('brievenvangogh') let $uri := base-uri($doc), $num := substring($uri, string-length($uri) - 6, 3) where $num castable as xs:integer and xs:integer($num) = $range return <document uri='{$uri}'>{ for $n score $s in $doc//*[text() contains text {'gauguin','pissarro'}all ] return <hit score='{ $s }'>{ $n, count($n) }</hit> }
</document> In my count($n) I get only the number 1.
Hi Christian,
Thanks a lot for all your work. That's really very kind of Leo and you! This afternoon I will look at your solutions and come back to you.
Regards Wiard
2011/5/23 Christian Grün christian.gruen@gmail.com
..and a last one: I'm glad to tell you that we've just added the ft:count() function to our full-text module:
http://docs.basex.org/wiki/Full-Text_Functions#ft:count
Now you can either count the total number of occurrences of search terms in your documents:
let $terms := ('gauguin', 'pissarro') return ft:count( //*[text() contains text { $terms } ] )
...or return them hit by hit as e.g. follows:
for $doc in doc('test') let $terms := ('gauguin', 'pissarro') for $hit score $s in $doc//*[text() contains text { $terms }] let $c := ft:count($hit[text() contains text { $terms }]) return <hit score="{ $s }" count="{ $c }">{ $hit }</hit>
Please download the latest snapshot of BaseX to get the new feature working:
http://files.basex.org/releases/latest/
Christian ___________________________
On Mon, May 23, 2011 at 12:47 AM, Christian Grün christian.gruen@gmail.com wrote:
Wiard,
Leo just gave me a hint that our ft:mark() function can used as well to count the number of occurrences of terms in a full-text query. I hope that the following example gives you some clue:
for $doc in doc('test') let $terms := ('gauguin', 'pissarro') for $hit score $s in $doc//*[text() contains text { $terms }] let $c := count(ft:mark($hit[text() contains text { $terms }])/mark) return <hit score="{ $s }" count="{ $c }">{ $hit }</hit>
Note that, in this case, the contains text expression must be specified
twice.
Christian ___________________________
On Mon, May 23, 2011 at 12:32 AM, Christian Grün christian.gruen@gmail.com wrote:
Dear Wiard,
for $n score $s in $doc//*[text() contains text
{'gauguin','pissarro'}all
] return <hit score='{ $s }'>{ $n, count($n) }</hit> }
</document> In my count($n) I get only the number 1.
true, that doesn't make sense. I'm sorry it's not possible (at least for now) to count the number of occurrences of search terms in a single hit. It could make sense, however, to extend our own set of full-text functions..
http://docs.basex.org/wiki/Full-Text_Functions
..with a new function that counts the number of full-text matches; something like:
ft:count ( $node[ . contains text { "a", "b" } ] )
Suggestions from everyone are welcome. Christian _____________________
On Sun, May 22, 2011 at 5:40 PM, Wiard Vasen wiard.vasen@gmail.com
wrote:
Hi Christian, This weekend I tried to make a query which counts the number of times a certain combination of terms occurs in my repository of xml-files. I didn't succeed in finding a way to count the number of occurrences. Could you help me with this? I have attached the result-file from the query to this mail. As you have said in your last mail, the found letters have to be put in
a
sequence. After that the items in the sequence have to be counted. I was thinking of a method like: let $sequence := ( a method to get the items from the result)
let $count := count($sequence)
return
<results> <count>{$count}</count> <items> {for $item in $sequence return <item>{$item}</item> } </items> </results>
let $range := 1 to 800 for $doc in collection('brievenvangogh') let $uri := base-uri($doc), $num := substring($uri, string-length($uri) - 6, 3) where $num castable as xs:integer and xs:integer($num) = $range return <document uri='{$uri}'>{ for $n score $s in $doc//*[text() contains text
{'gauguin','pissarro'}all
] return <hit score='{ $s }'>{ $n, count($n) }</hit> }
</document> In my count($n) I get only the number 1.
Hi Leo and Christian,
The number of hits is shown in the application itself. This is very useful and I thank you very much for your help!
Till next time.
Regards,
Wiard
2011/5/23 Wiard Vasen wiard.vasen@gmail.com
Hi Christian,
Thanks a lot for all your work. That's really very kind of Leo and you! This afternoon I will look at your solutions and come back to you.
Regards Wiard
2011/5/23 Christian Grün christian.gruen@gmail.com
..and a last one: I'm glad to tell you that we've just added the ft:count() function to our full-text module:
http://docs.basex.org/wiki/Full-Text_Functions#ft:count
Now you can either count the total number of occurrences of search terms in your documents:
let $terms := ('gauguin', 'pissarro') return ft:count( //*[text() contains text { $terms } ] )
...or return them hit by hit as e.g. follows:
for $doc in doc('test') let $terms := ('gauguin', 'pissarro') for $hit score $s in $doc//*[text() contains text { $terms }] let $c := ft:count($hit[text() contains text { $terms }]) return <hit score="{ $s }" count="{ $c }">{ $hit }</hit>
Please download the latest snapshot of BaseX to get the new feature working:
http://files.basex.org/releases/latest/
Christian ___________________________
On Mon, May 23, 2011 at 12:47 AM, Christian Grün christian.gruen@gmail.com wrote:
Wiard,
Leo just gave me a hint that our ft:mark() function can used as well to count the number of occurrences of terms in a full-text query. I hope that the following example gives you some clue:
for $doc in doc('test') let $terms := ('gauguin', 'pissarro') for $hit score $s in $doc//*[text() contains text { $terms }] let $c := count(ft:mark($hit[text() contains text { $terms }])/mark) return <hit score="{ $s }" count="{ $c }">{ $hit }</hit>
Note that, in this case, the contains text expression must be specified
twice.
Christian ___________________________
On Mon, May 23, 2011 at 12:32 AM, Christian Grün christian.gruen@gmail.com wrote:
Dear Wiard,
for $n score $s in $doc//*[text() contains text
{'gauguin','pissarro'}all
] return <hit score='{ $s }'>{ $n, count($n) }</hit> }
</document> In my count($n) I get only the number 1.
true, that doesn't make sense. I'm sorry it's not possible (at least for now) to count the number of occurrences of search terms in a single hit. It could make sense, however, to extend our own set of full-text functions..
http://docs.basex.org/wiki/Full-Text_Functions
..with a new function that counts the number of full-text matches; something like:
ft:count ( $node[ . contains text { "a", "b" } ] )
Suggestions from everyone are welcome. Christian _____________________
On Sun, May 22, 2011 at 5:40 PM, Wiard Vasen wiard.vasen@gmail.com
wrote:
Hi Christian, This weekend I tried to make a query which counts the number of times
a
certain combination of terms occurs in my repository of xml-files. I didn't succeed in finding a way to count the number of occurrences. Could you help me with this? I have attached the result-file from the query to this mail. As you have said in your last mail, the found letters have to be put
in a
sequence. After that the items in the sequence have to be counted. I was thinking of a method like: let $sequence := ( a method to get the items from the result)
let $count := count($sequence)
return
<results> <count>{$count}</count> <items> {for $item in $sequence return <item>{$item}</item> } </items> </results>
let $range := 1 to 800 for $doc in collection('brievenvangogh') let $uri := base-uri($doc), $num := substring($uri, string-length($uri) - 6, 3) where $num castable as xs:integer and xs:integer($num) = $range return <document uri='{$uri}'>{ for $n score $s in $doc//*[text() contains text
{'gauguin','pissarro'}all
] return <hit score='{ $s }'>{ $n, count($n) }</hit> }
</document> In my count($n) I get only the number 1.
Hi Christian, Leo,
Here is my query again.
let $range := 1 to 800 for $doc in collection('brievenvangogh') let $uri := base-uri($doc), $num := substring($uri, string-length($uri) - 6, 3) where $num castable as xs:integer and xs:integer($num) = $range return <document uri='{$uri}'>{ let $terms := ('zonnebloemen') for $hit score $s in $doc//*[text() contains text { $terms }] let $c := ft:count($hit[text() contains text { $terms }]) return <hit score="{ $s }" count="{ $c }">{ $hit }</hit> } </document>
I get the number of all documents as the number of hits. And what I want is the number of documents containing the term 'zonnebloemen'.
Last time you had a solution for the number of hits in one specific document. I hope you have a solution for this problem.
Thanks in advance!
Regrads,
Wiard
2011/5/23 Wiard Vasen wiard.vasen@gmail.com
Hi Leo and Christian,
The number of hits is shown in the application itself. This is very useful and I thank you very much for your help!
Till next time.
Regards,
Wiard
2011/5/23 Wiard Vasen wiard.vasen@gmail.com
Hi Christian,
Thanks a lot for all your work. That's really very kind of Leo and you! This afternoon I will look at your solutions and come back to you.
Regards Wiard
2011/5/23 Christian Grün christian.gruen@gmail.com
..and a last one: I'm glad to tell you that we've just added the ft:count() function to our full-text module:
http://docs.basex.org/wiki/Full-Text_Functions#ft:count
Now you can either count the total number of occurrences of search terms in your documents:
let $terms := ('gauguin', 'pissarro') return ft:count( //*[text() contains text { $terms } ] )
...or return them hit by hit as e.g. follows:
for $doc in doc('test') let $terms := ('gauguin', 'pissarro') for $hit score $s in $doc//*[text() contains text { $terms }] let $c := ft:count($hit[text() contains text { $terms }]) return <hit score="{ $s }" count="{ $c }">{ $hit }</hit>
Please download the latest snapshot of BaseX to get the new feature working:
http://files.basex.org/releases/latest/
Christian ___________________________
On Mon, May 23, 2011 at 12:47 AM, Christian Grün christian.gruen@gmail.com wrote:
Wiard,
Leo just gave me a hint that our ft:mark() function can used as well to count the number of occurrences of terms in a full-text query. I hope that the following example gives you some clue:
for $doc in doc('test') let $terms := ('gauguin', 'pissarro') for $hit score $s in $doc//*[text() contains text { $terms }] let $c := count(ft:mark($hit[text() contains text { $terms }])/mark) return <hit score="{ $s }" count="{ $c }">{ $hit }</hit>
Note that, in this case, the contains text expression must be specified
twice.
Christian ___________________________
On Mon, May 23, 2011 at 12:32 AM, Christian Grün christian.gruen@gmail.com wrote:
Dear Wiard,
for $n score $s in $doc//*[text() contains text
{'gauguin','pissarro'}all
] return <hit score='{ $s }'>{ $n, count($n) }</hit> }
</document> In my count($n) I get only the number 1.
true, that doesn't make sense. I'm sorry it's not possible (at least for now) to count the number of occurrences of search terms in a single hit. It could make sense, however, to extend our own set of full-text functions..
http://docs.basex.org/wiki/Full-Text_Functions
..with a new function that counts the number of full-text matches; something like:
ft:count ( $node[ . contains text { "a", "b" } ] )
Suggestions from everyone are welcome. Christian _____________________
On Sun, May 22, 2011 at 5:40 PM, Wiard Vasen wiard.vasen@gmail.com
wrote:
Hi Christian, This weekend I tried to make a query which counts the number of times
a
certain combination of terms occurs in my repository of xml-files. I didn't succeed in finding a way to count the number of occurrences. Could you help me with this? I have attached the result-file from the query to this mail. As you have said in your last mail, the found letters have to be put
in a
sequence. After that the items in the sequence have to be counted. I was thinking of a method like: let $sequence := ( a method to get the items from the result)
let $count := count($sequence)
return
<results> <count>{$count}</count> <items> {for $item in $sequence return <item>{$item}</item> } </items> </results>
let $range := 1 to 800 for $doc in collection('brievenvangogh') let $uri := base-uri($doc), $num := substring($uri, string-length($uri) - 6, 3) where $num castable as xs:integer and xs:integer($num) = $range return <document uri='{$uri}'>{ for $n score $s in $doc//*[text() contains text
{'gauguin','pissarro'}all
] return <hit score='{ $s }'>{ $n, count($n) }</hit> }
</document> In my count($n) I get only the number 1.
Hi Wiard,
I get the number of all documents as the number of hits. And what I want is the number of documents containing the term 'zonnebloemen'. Last time you had a solution for the number of hits in one specific document. I hope you have a solution for this problem.
Could you provide us with a little document that allows us to reproduce the problem?
Thanks, Christian
Hi Christian,
I hereby send you several xml-files containing the term 'zonnebloemen'. The first four documents(266,653,667,740) are with this term. The last four(1,2,3,4) don't have the term.
Thanks for looking at my problem!
Regards,
Wiard
2011/5/25 Christian Grün christian.gruen@gmail.com
Hi Wiard,
I get the number of all documents as the number of hits. And what I want is the number of documents containing the term 'zonnebloemen'. Last time you had a solution for the number of hits in one specific document. I hope you have a solution for this problem.
Could you provide us with a little document that allows us to reproduce the problem?
Thanks, Christian
Hi Christian,
Did I throw to much over the fence?
Regards,
Wiard
2011/5/25 Wiard Vasen wiard.vasen@gmail.com
Hi Christian,
I hereby send you several xml-files containing the term 'zonnebloemen'. The first four documents(266,653,667,740) are with this term. The last four(1,2,3,4) don't have the term.
Thanks for looking at my problem!
Regards,
Wiard
2011/5/25 Christian Grün christian.gruen@gmail.com
Hi Wiard,
I get the number of all documents as the number of hits. And what I want is the number of documents containing the term 'zonnebloemen'. Last time you had a solution for the number of hits in one specific document. I hope you have a solution for this problem.
Could you provide us with a little document that allows us to reproduce the problem?
Thanks, Christian
Hi Wiard,
sorry for the delay; my todo list is long today, but I'll try to give you some feedback soon (if not someone else is faster..).
Christian ___________________________
On Wed, May 25, 2011 at 8:56 PM, Wiard Vasen wiard.vasen@gmail.com wrote:
Hi Christian, Did I throw to much over the fence? Regards, Wiard
2011/5/25 Wiard Vasen wiard.vasen@gmail.com
Hi Christian, I hereby send you several xml-files containing the term 'zonnebloemen'. The first four documents(266,653,667,740) are with this term. The last four(1,2,3,4) don't have the term. Thanks for looking at my problem! Regards, Wiard
2011/5/25 Christian Grün christian.gruen@gmail.com
Hi Wiard,
I get the number of all documents as the number of hits. And what I want is the number of documents containing the term 'zonnebloemen'. Last time you had a solution for the number of hits in one specific document. I hope you have a solution for this problem.
Could you provide us with a little document that allows us to reproduce the problem?
Thanks, Christian
basex-talk@mailman.uni-konstanz.de