Re: [basex-talk] BUG: Can't parse JSON from GZIP archive

26 Nov 2018


      Hi Christian,
Thanks for your quick replies.
"ISIZE (Input SIZE)" from https://tools.ietf.org/html/rfc1952 looks
promising for most GZIP archives containing a single file.
N.B.: The Database Resource Properties INPUTSIZE for ZIP archives also
shows "0 b".
Thanks and regards,
RG
On Mon, Nov 26, 2018 at 3:29 PM Christian Grün christian.gruen@gmail.com
wrote:
...
...
... would you want to set the Database Resource Properties INPUTSIZE to
something other than "0 b" when the INPUTPATH is an archive?
In contrast to ZIP archives, there seems to be no trivial way in Java
to retrieve the uncompressed file size from gzipped input streams. We
could do some extra efforts (as e.g. proposed in [1]). As the
processed input stream in BaseX may not rely on a local file, I am not
sure if there is a generic solution for that.
[1]
https://stackoverflow.com/questions/7317243/gets-the-uncompressed-size-of-th...
...
On Mon, Nov 26, 2018 at 11:49 AM Christian Grün <
christian.gruen@gmail.com> wrote:
...
...
A new stable snapshot is available [1]. In the updated version, all
corner cases should be taken into consideration (such as gzip archive
with missing file suffix in the file name).
Hope this helps,
Christian
[1] http://files.basex.org/releases/latest/
On Mon, Nov 26, 2018 at 10:52 AM Christian Grün
christian.gruen@gmail.com wrote:
...
Hi Rick,
...
I just wanted to use
https://nvd.nist.gov/feeds/json/cve/1.0/nvdcve-1.0-recent.json.gz with
basexgui.  basexgui doesn't seem to process the archive correctly.
...
...
...
I got it. So you were choosing JSON as input format, and the archive
input was not chosen for import.
The challenge seems to be that the filename is not stored inside this
particular .gz archive, so the ".json" substring in the original file
is the only hint that the compressed file is a json file. This is
different for ZIP archives, in which filenames must be stored inside
the archive (in .gz archives this is optional).
By default, we thus assume that the input of .gz archives is XML. I’ll
see if/how we can find a solution for this, and if we the input format
choice can be utilized to correctly interpret the file contents.
...
P.S.:  Regarding GitHub issues...  I know how to search those.  How
do I search past mailman threads?
...
...
...
You can search via the basex-talk mail archive (see the link on our
web site [1]). Classical search engines will give you valuable results
from StackOverflow and other sites.
Best,
Christian
[1] http://basex.org/about/open-source/
...
On Sun, Nov 25, 2018 at 8:53 PM Christian Grün <
christian.gruen@gmail.com> wrote:
...
...
...
...
...
Hi Rick,
> Would've filed an issue, but the request is to post here first.
(?)
...
...
...
...
...
Thanks. Many GitHub issues in the past were no bugs, but
misunderstandings, so we are asking users to write to the list first.
...
...
...
...
...
> Using version 9.1 BaseX app, a GZIP archive of a JSON database
can't be used to properly create a database.  Interestingly, a ZIP archive
works fine.
...
...
...
...
...
Do you really want to create a BaseX database from a "JSON
database"? If yes, which format has this database?
...
...
...
...
...
Or does your archive contain a set of (tarred) JSON files, which
you would like to import in BaseX as XML? Did you try to rename your file
suffix to .tgz?
...
...
...
...
...
Best,
Christian

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] BUG: Can't parse JSON from GZIP archive