Christian,

Hi. Thx for the trick. It seems to work finely for serializing, but awfully for unserializing :
1) Serialization using pack-integer is great. I am saving time and disk space (around a factor 4).
2) However unserialization seems to perform awfully, or I do not know how to do it properly.

Here is a test :

declare function local:savebin($seq,$file as xs:string) {
    file:write-binary($file,bin:join( (bin:pack-integer(count($seq),4) ,$seq ! bin:pack-integer(.,4))))
};
declare function local:loadbin($file as xs:string) {
    let $data := file:read-binary($file)
    let $size:= bin:unpack-integer($data,0,4)
    let $seq := for $i in (1 to ($size)) return bin:unpack-integer($data,$i*4,4)
    return  count($seq)   
};

prof:time(local:savebin((1 to 100000),"Bin.dat"))
,prof:time(local:loadbin("Bin.dat"))

output :

46.38 ms
10775.12 ms
100000

To compare, unserializing a sequence (1 to 10 000 000) stored in a file as a big string using fn:tokenize takes about 10 sec (100 x faster). Did I mistake something ?


2015-01-08 16:44 GMT+01:00 Christian Grün <christian.gruen@gmail.com>:
> This way of doing stores integers as string, then call a cast string /
> integer to unserialize it. For large integer list (I am dealing with lists
> of size 134 Mo), it is quite time and mem consuming.
>
> I was wondering if there exists a more efficient way to store and retrieve
> atomic list into BaseX ?

One alternative is to store the integers in a binary file:

  let $size := 4
  let $data := bin:join(
    for $n in 1 to 100
    return bin:pack-integer($n, $size)
  )
  return db:store('db', 'integers.bin', $data)

This way, every integer will occupy the supplied number of bytes
(here: 4, allowing you to address 2^32 integers).