Hi Christian,
Thanks for your quick replies.
"ISIZE (Input SIZE)" from https://tools.ietf.org/html/rfc1952 looks promising for most GZIP archives containing a single file.
N.B.: The Database Resource Properties INPUTSIZE for ZIP archives also shows "0 b".
Thanks and regards, RG
On Mon, Nov 26, 2018 at 3:29 PM Christian Grün christian.gruen@gmail.com wrote:
... would you want to set the Database Resource Properties INPUTSIZE to
something other than "0 b" when the INPUTPATH is an archive?
In contrast to ZIP archives, there seems to be no trivial way in Java to retrieve the uncompressed file size from gzipped input streams. We could do some extra efforts (as e.g. proposed in [1]). As the processed input stream in BaseX may not rely on a local file, I am not sure if there is a generic solution for that.
[1] https://stackoverflow.com/questions/7317243/gets-the-uncompressed-size-of-th...
On Mon, Nov 26, 2018 at 11:49 AM Christian Grün <
christian.gruen@gmail.com> wrote:
A new stable snapshot is available [1]. In the updated version, all corner cases should be taken into consideration (such as gzip archive with missing file suffix in the file name).
Hope this helps, Christian
[1] http://files.basex.org/releases/latest/
On Mon, Nov 26, 2018 at 10:52 AM Christian Grün christian.gruen@gmail.com wrote:
Hi Rick,
I just wanted to use
https://nvd.nist.gov/feeds/json/cve/1.0/nvdcve-1.0-recent.json.gz with basexgui. basexgui doesn't seem to process the archive correctly.
I got it. So you were choosing JSON as input format, and the archive input was not chosen for import.
The challenge seems to be that the filename is not stored inside this particular .gz archive, so the ".json" substring in the original file is the only hint that the compressed file is a json file. This is different for ZIP archives, in which filenames must be stored inside the archive (in .gz archives this is optional).
By default, we thus assume that the input of .gz archives is XML. I’ll see if/how we can find a solution for this, and if we the input format choice can be utilized to correctly interpret the file contents.
P.S.: Regarding GitHub issues... I know how to search those. How
do I search past mailman threads?
You can search via the basex-talk mail archive (see the link on our web site [1]). Classical search engines will give you valuable results from StackOverflow and other sites.
Best, Christian
[1] http://basex.org/about/open-source/
On Sun, Nov 25, 2018 at 8:53 PM Christian Grün <
christian.gruen@gmail.com> wrote:
Hi Rick,
> Would've filed an issue, but the request is to post here first.
(?)
Thanks. Many GitHub issues in the past were no bugs, but
misunderstandings, so we are asking users to write to the list first.
> Using version 9.1 BaseX app, a GZIP archive of a JSON database
can't be used to properly create a database. Interestingly, a ZIP archive works fine.
Do you really want to create a BaseX database from a "JSON
database"? If yes, which format has this database?
Or does your archive contain a set of (tarred) JSON files, which
you would like to import in BaseX as XML? Did you try to rename your file suffix to .tgz?
Best, Christian