Hello,
I’m getting unexpected results using the Archive module when extracting multiple entries in one call:
archive:extract-text($archive,$entries)
For example what I’m doing is taking an archive with, say, 1000 entries, selecting a subset of those, say 100, and then calling archive:extract-text with that subset.
I’d expect to get 100 results but I get perhaps 97.
I know that all the content is in the archive - if I try to extract one of the missing items directly it’ll be returned. It seems to be an issue asking for lots at once.
Has anyone come across this? What is staring me in the face that I’m missing?
I’ve put a small SSCE at the end of this message that shows the issue - I get something like this returned:
"Entries found: 100 Number of content items returned (bulk): 97 Number of content items returned (individual): 100"
Many thanks for any help.
Regards, James
Running BaseX 9.4, 9.4.3 on macOS.
let $zipPath := "demo.zip" let $start := 0 let $number := 100
(: Create a zip file with 1000 entries - each entry being just a UUID as the same path :) let $uuids := (1 to 1000) ! random:uuid() let $zipOut := file:write-binary($zipPath,archive:create($uuids,$uuids))
(: Read the zip file :) let $archive := file:read-binary($zipPath)
(: Get all the entries that meet the criteria - here based on position :) let $entries := archive:entries($archive)[$start < position() and position() <= $start+$number] let $count := count($entries) let $contents := archive:extract-text($archive,$entries) let $contents2 := for $entry in $entries return archive:extract-text($archive,$entry) return concat( "Entries found: ", count($entries), " Number of content items returned (bulk): ", count($contents), " Number of content items returned (individual): ", count($contents2) )
Hi James,
Thanks for the observation and the attached test case.
The bug was fixed. You may be surprised to hear that it was already introduced eight years ago in a completely different context: Chained hash entries were not correctly linked after the deletion of an entry [1,2]… A new snapshot is available, and the release of BaseX 9.4.4 is planned for next week.
Best, Christian
[1] https://github.com/BaseXdb/basex/commit/21a4f7cad84f1902f22171a8821ccc90703e... [2] https://github.com/BaseXdb/basex/commit/cd95c13c7390ed5b6c3c0bd9660306501ca4...
On Sat, Oct 31, 2020 at 4:25 PM James Ball basex-talk@jamesball.co.uk wrote:
Hello,
I’m getting unexpected results using the Archive module when extracting multiple entries in one call:
archive:extract-text($archive,$entries)
For example what I’m doing is taking an archive with, say, 1000 entries, selecting a subset of those, say 100, and then calling archive:extract-text with that subset.
I’d expect to get 100 results but I get perhaps 97.
I know that all the content is in the archive - if I try to extract one of the missing items directly it’ll be returned. It seems to be an issue asking for lots at once.
Has anyone come across this? What is staring me in the face that I’m missing?
I’ve put a small SSCE at the end of this message that shows the issue - I get something like this returned:
"Entries found: 100 Number of content items returned (bulk): 97 Number of content items returned (individual): 100"
Many thanks for any help.
Regards, James
Running BaseX 9.4, 9.4.3 on macOS.
let $zipPath := "demo.zip" let $start := 0 let $number := 100
(: Create a zip file with 1000 entries - each entry being just a UUID as the same path :) let $uuids := (1 to 1000) ! random:uuid() let $zipOut := file:write-binary($zipPath,archive:create($uuids,$uuids))
(: Read the zip file :) let $archive := file:read-binary($zipPath)
(: Get all the entries that meet the criteria - here based on position :) let $entries := archive:entries($archive)[$start < position() and position() <= $start+$number] let $count := count($entries) let $contents := archive:extract-text($archive,$entries) let $contents2 := for $entry in $entries return archive:extract-text($archive,$entry) return concat( "Entries found: ", count($entries), " Number of content items returned (bulk): ", count($contents), " Number of content items returned (individual): ", count($contents2) )
Hi Christian,
I never cease to be amazed by the speed you manage to find and fix these - at least me me - obscure bugs.
I have downloaded the snapshot and I’m now getting exactly the result expected.
I think you’ve saved my weekend as this speeds up an operation for me from hours to 30 seconds. So thank you!
Keep safe in these strange times.
James
On 1 Nov 2020, at 11:54, Christian Grün christian.gruen@gmail.com wrote:
Hi James,
Thanks for the observation and the attached test case.
The bug was fixed. You may be surprised to hear that it was already introduced eight years ago in a completely different context: Chained hash entries were not correctly linked after the deletion of an entry [1,2]… A new snapshot is available, and the release of BaseX 9.4.4 is planned for next week.
Best, Christian
[1] https://github.com/BaseXdb/basex/commit/21a4f7cad84f1902f22171a8821ccc90703e... [2] https://github.com/BaseXdb/basex/commit/cd95c13c7390ed5b6c3c0bd9660306501ca4...
On Sat, Oct 31, 2020 at 4:25 PM James Ball basex-talk@jamesball.co.uk wrote:
Hello,
I’m getting unexpected results using the Archive module when extracting multiple entries in one call:
archive:extract-text($archive,$entries)
For example what I’m doing is taking an archive with, say, 1000 entries, selecting a subset of those, say 100, and then calling archive:extract-text with that subset.
I’d expect to get 100 results but I get perhaps 97.
I know that all the content is in the archive - if I try to extract one of the missing items directly it’ll be returned. It seems to be an issue asking for lots at once.
Has anyone come across this? What is staring me in the face that I’m missing?
I’ve put a small SSCE at the end of this message that shows the issue - I get something like this returned:
"Entries found: 100 Number of content items returned (bulk): 97 Number of content items returned (individual): 100"
Many thanks for any help.
Regards, James
Running BaseX 9.4, 9.4.3 on macOS.
let $zipPath := "demo.zip" let $start := 0 let $number := 100
(: Create a zip file with 1000 entries - each entry being just a UUID as the same path :) let $uuids := (1 to 1000) ! random:uuid() let $zipOut := file:write-binary($zipPath,archive:create($uuids,$uuids))
(: Read the zip file :) let $archive := file:read-binary($zipPath)
(: Get all the entries that meet the criteria - here based on position :) let $entries := archive:entries($archive)[$start < position() and position() <= $start+$number] let $count := count($entries) let $contents := archive:extract-text($archive,$entries) let $contents2 := for $entry in $entries return archive:extract-text($archive,$entry) return concat( "Entries found: ", count($entries), " Number of content items returned (bulk): ", count($contents), " Number of content items returned (individual): ", count($contents2) )
[Protocol: applause from a backbencher]
Am Sonntag, 1. November 2020, 14:07:56 MEZ hat James Ball basex-talk@jamesball.co.uk Folgendes geschrieben:
Hi Christian,
I never cease to be amazed by the speed you manage to find and fix these - at least me me - obscure bugs.
I have downloaded the snapshot and I’m now getting exactly the result expected.
I think you’ve saved my weekend as this speeds up an operation for me from hours to 30 seconds. So thank you!
Keep safe in these strange times.
James
On 1 Nov 2020, at 11:54, Christian Grün christian.gruen@gmail.com wrote:
Hi James,
Thanks for the observation and the attached test case.
The bug was fixed. You may be surprised to hear that it was already introduced eight years ago in a completely different context: Chained hash entries were not correctly linked after the deletion of an entry [1,2]… A new snapshot is available, and the release of BaseX 9.4.4 is planned for next week.
Best, Christian
[1] https://github.com/BaseXdb/basex/commit/21a4f7cad84f1902f22171a8821ccc90703e... [2] https://github.com/BaseXdb/basex/commit/cd95c13c7390ed5b6c3c0bd9660306501ca4...
On Sat, Oct 31, 2020 at 4:25 PM James Ball basex-talk@jamesball.co.uk wrote:
Hello,
I’m getting unexpected results using the Archive module when extracting multiple entries in one call:
archive:extract-text($archive,$entries)
For example what I’m doing is taking an archive with, say, 1000 entries, selecting a subset of those, say 100, and then calling archive:extract-text with that subset.
I’d expect to get 100 results but I get perhaps 97.
I know that all the content is in the archive - if I try to extract one of the missing items directly it’ll be returned. It seems to be an issue asking for lots at once.
Has anyone come across this? What is staring me in the face that I’m missing?
I’ve put a small SSCE at the end of this message that shows the issue - I get something like this returned:
"Entries found: 100 Number of content items returned (bulk): 97 Number of content items returned (individual): 100"
Many thanks for any help.
Regards, James
Running BaseX 9.4, 9.4.3 on macOS.
let $zipPath := "demo.zip" let $start := 0 let $number := 100
(: Create a zip file with 1000 entries - each entry being just a UUID as the same path :) let $uuids := (1 to 1000) ! random:uuid() let $zipOut := file:write-binary($zipPath,archive:create($uuids,$uuids))
(: Read the zip file :) let $archive := file:read-binary($zipPath)
(: Get all the entries that meet the criteria - here based on position :) let $entries := archive:entries($archive)[$start < position() and position() <= $start+$number] let $count := count($entries) let $contents := archive:extract-text($archive,$entries) let $contents2 := for $entry in $entries return archive:extract-text($archive,$entry) return concat( "Entries found: ", count($entries), " Number of content items returned (bulk): ", count($contents), " Number of content items returned (individual): ", count($contents2) )
basex-talk@mailman.uni-konstanz.de