Would've filed an issue, but the request is to post here first. (?)
Using version 9.1 BaseX app, a GZIP archive https://nvd.nist.gov/feeds/json/cve/1.0/nvdcve-1.0-recent.json.gz of a JSON database can't be used to properly create a database. Interestingly, a ZIP archive https://nvd.nist.gov/feeds/json/cve/1.0/nvdcve-1.0-recent.json.zip works fine.
There is no error message, just an empty database is created silently.
IDK if the GZIP problem is more widespread.
Hi Rick,
Would've filed an issue, but the request is to post here first. (?)
Thanks. Many GitHub issues in the past were no bugs, but misunderstandings, so we are asking users to write to the list first.
Using version 9.1 BaseX app, a GZIP archive
https://nvd.nist.gov/feeds/json/cve/1.0/nvdcve-1.0-recent.json.gz of a JSON database can't be used to properly create a database. Interestingly, a ZIP archive https://nvd.nist.gov/feeds/json/cve/1.0/nvdcve-1.0-recent.json.zip works fine.
Do you really want to create a BaseX database from a "JSON database"? If yes, which format has this database?
Or does your archive contain a set of (tarred) JSON files, which you would like to import in BaseX as XML? Did you try to rename your file suffix to .tgz?
Best, Christian
Hi Christian,
Thanks for the reply.
I just wanted to use https://nvd.nist.gov/feeds/json/cve/1.0/nvdcve-1.0-recent.json.gz with basexgui. basexgui doesn't seem to process the archive correctly.
The archive https://nvd.nist.gov/feeds/json/cve/1.0/nvdcve-1.0-recent.json.zip seems to be processed fine by basexgui.
This seems to be a basex/basexgui bug or at least a limitation, yes?
Regards, RG
P.S.: Regarding GitHub issues... I know how to search those. How do I search past mailman threads?
On Sun, Nov 25, 2018 at 8:53 PM Christian Grün christian.gruen@gmail.com wrote:
Hi Rick,
Would've filed an issue, but the request is to post here first. (?)
Thanks. Many GitHub issues in the past were no bugs, but misunderstandings, so we are asking users to write to the list first.
Using version 9.1 BaseX app, a GZIP archive
https://nvd.nist.gov/feeds/json/cve/1.0/nvdcve-1.0-recent.json.gz of a JSON database can't be used to properly create a database. Interestingly, a ZIP archive https://nvd.nist.gov/feeds/json/cve/1.0/nvdcve-1.0-recent.json.zip works fine.
Do you really want to create a BaseX database from a "JSON database"? If yes, which format has this database?
Or does your archive contain a set of (tarred) JSON files, which you would like to import in BaseX as XML? Did you try to rename your file suffix to .tgz?
Best, Christian
Hi Rick,
I just wanted to use https://nvd.nist.gov/feeds/json/cve/1.0/nvdcve-1.0-recent.json.gz with basexgui. basexgui doesn't seem to process the archive correctly.
I got it. So you were choosing JSON as input format, and the archive input was not chosen for import.
The challenge seems to be that the filename is not stored inside this particular .gz archive, so the ".json" substring in the original file is the only hint that the compressed file is a json file. This is different for ZIP archives, in which filenames must be stored inside the archive (in .gz archives this is optional).
By default, we thus assume that the input of .gz archives is XML. I’ll see if/how we can find a solution for this, and if we the input format choice can be utilized to correctly interpret the file contents.
P.S.: Regarding GitHub issues... I know how to search those. How do I search past mailman threads?
You can search via the basex-talk mail archive (see the link on our web site [1]). Classical search engines will give you valuable results from StackOverflow and other sites.
Best, Christian
[1] http://basex.org/about/open-source/
On Sun, Nov 25, 2018 at 8:53 PM Christian Grün christian.gruen@gmail.com wrote:
Hi Rick,
Would've filed an issue, but the request is to post here first. (?)
Thanks. Many GitHub issues in the past were no bugs, but misunderstandings, so we are asking users to write to the list first.
Using version 9.1 BaseX app, a GZIP archive of a JSON database can't be used to properly create a database. Interestingly, a ZIP archive works fine.
Do you really want to create a BaseX database from a "JSON database"? If yes, which format has this database?
Or does your archive contain a set of (tarred) JSON files, which you would like to import in BaseX as XML? Did you try to rename your file suffix to .tgz?
Best, Christian
A new stable snapshot is available [1]. In the updated version, all corner cases should be taken into consideration (such as gzip archive with missing file suffix in the file name).
Hope this helps, Christian
[1] http://files.basex.org/releases/latest/
On Mon, Nov 26, 2018 at 10:52 AM Christian Grün christian.gruen@gmail.com wrote:
Hi Rick,
I just wanted to use https://nvd.nist.gov/feeds/json/cve/1.0/nvdcve-1.0-recent.json.gz with basexgui. basexgui doesn't seem to process the archive correctly.
I got it. So you were choosing JSON as input format, and the archive input was not chosen for import.
The challenge seems to be that the filename is not stored inside this particular .gz archive, so the ".json" substring in the original file is the only hint that the compressed file is a json file. This is different for ZIP archives, in which filenames must be stored inside the archive (in .gz archives this is optional).
By default, we thus assume that the input of .gz archives is XML. I’ll see if/how we can find a solution for this, and if we the input format choice can be utilized to correctly interpret the file contents.
P.S.: Regarding GitHub issues... I know how to search those. How do I search past mailman threads?
You can search via the basex-talk mail archive (see the link on our web site [1]). Classical search engines will give you valuable results from StackOverflow and other sites.
Best, Christian
[1] http://basex.org/about/open-source/
On Sun, Nov 25, 2018 at 8:53 PM Christian Grün christian.gruen@gmail.com wrote:
Hi Rick,
Would've filed an issue, but the request is to post here first. (?)
Thanks. Many GitHub issues in the past were no bugs, but misunderstandings, so we are asking users to write to the list first.
Using version 9.1 BaseX app, a GZIP archive of a JSON database can't be used to properly create a database. Interestingly, a ZIP archive works fine.
Do you really want to create a BaseX database from a "JSON database"? If yes, which format has this database?
Or does your archive contain a set of (tarred) JSON files, which you would like to import in BaseX as XML? Did you try to rename your file suffix to .tgz?
Best, Christian
Hi Christian,
Yes, that feature works fine in the latest snapshot. Thank you. I'm wondering if an email to nvd@nist.gov might encourage them to include filenames in all their archives.
And while you're poking around the BaseX archive stuff ... would you want to set the Database Resource Properties INPUTSIZE to something other than "0 b" when the INPUTPATH is an archive?
Thanks again, RG
On Mon, Nov 26, 2018 at 11:49 AM Christian Grün christian.gruen@gmail.com wrote:
A new stable snapshot is available [1]. In the updated version, all corner cases should be taken into consideration (such as gzip archive with missing file suffix in the file name).
Hope this helps, Christian
[1] http://files.basex.org/releases/latest/
On Mon, Nov 26, 2018 at 10:52 AM Christian Grün christian.gruen@gmail.com wrote:
Hi Rick,
I just wanted to use
https://nvd.nist.gov/feeds/json/cve/1.0/nvdcve-1.0-recent.json.gz with basexgui. basexgui doesn't seem to process the archive correctly.
I got it. So you were choosing JSON as input format, and the archive input was not chosen for import.
The challenge seems to be that the filename is not stored inside this particular .gz archive, so the ".json" substring in the original file is the only hint that the compressed file is a json file. This is different for ZIP archives, in which filenames must be stored inside the archive (in .gz archives this is optional).
By default, we thus assume that the input of .gz archives is XML. I’ll see if/how we can find a solution for this, and if we the input format choice can be utilized to correctly interpret the file contents.
P.S.: Regarding GitHub issues... I know how to search those. How do
I search past mailman threads?
You can search via the basex-talk mail archive (see the link on our web site [1]). Classical search engines will give you valuable results from StackOverflow and other sites.
Best, Christian
[1] http://basex.org/about/open-source/
On Sun, Nov 25, 2018 at 8:53 PM Christian Grün <
christian.gruen@gmail.com> wrote:
Hi Rick,
Would've filed an issue, but the request is to post here first. (?)
Thanks. Many GitHub issues in the past were no bugs, but
misunderstandings, so we are asking users to write to the list first.
Using version 9.1 BaseX app, a GZIP archive of a JSON database can't
be used to properly create a database. Interestingly, a ZIP archive works fine.
Do you really want to create a BaseX database from a "JSON database"?
If yes, which format has this database?
Or does your archive contain a set of (tarred) JSON files, which you
would like to import in BaseX as XML? Did you try to rename your file suffix to .tgz?
Best, Christian
... would you want to set the Database Resource Properties INPUTSIZE to something other than "0 b" when the INPUTPATH is an archive?
In contrast to ZIP archives, there seems to be no trivial way in Java to retrieve the uncompressed file size from gzipped input streams. We could do some extra efforts (as e.g. proposed in [1]). As the processed input stream in BaseX may not rely on a local file, I am not sure if there is a generic solution for that.
[1] https://stackoverflow.com/questions/7317243/gets-the-uncompressed-size-of-th...
On Mon, Nov 26, 2018 at 11:49 AM Christian Grün christian.gruen@gmail.com wrote:
A new stable snapshot is available [1]. In the updated version, all corner cases should be taken into consideration (such as gzip archive with missing file suffix in the file name).
Hope this helps, Christian
[1] http://files.basex.org/releases/latest/
On Mon, Nov 26, 2018 at 10:52 AM Christian Grün christian.gruen@gmail.com wrote:
Hi Rick,
I just wanted to use https://nvd.nist.gov/feeds/json/cve/1.0/nvdcve-1.0-recent.json.gz with basexgui. basexgui doesn't seem to process the archive correctly.
I got it. So you were choosing JSON as input format, and the archive input was not chosen for import.
The challenge seems to be that the filename is not stored inside this particular .gz archive, so the ".json" substring in the original file is the only hint that the compressed file is a json file. This is different for ZIP archives, in which filenames must be stored inside the archive (in .gz archives this is optional).
By default, we thus assume that the input of .gz archives is XML. I’ll see if/how we can find a solution for this, and if we the input format choice can be utilized to correctly interpret the file contents.
P.S.: Regarding GitHub issues... I know how to search those. How do I search past mailman threads?
You can search via the basex-talk mail archive (see the link on our web site [1]). Classical search engines will give you valuable results from StackOverflow and other sites.
Best, Christian
[1] http://basex.org/about/open-source/
On Sun, Nov 25, 2018 at 8:53 PM Christian Grün christian.gruen@gmail.com wrote:
Hi Rick,
Would've filed an issue, but the request is to post here first. (?)
Thanks. Many GitHub issues in the past were no bugs, but misunderstandings, so we are asking users to write to the list first.
Using version 9.1 BaseX app, a GZIP archive of a JSON database can't be used to properly create a database. Interestingly, a ZIP archive works fine.
Do you really want to create a BaseX database from a "JSON database"? If yes, which format has this database?
Or does your archive contain a set of (tarred) JSON files, which you would like to import in BaseX as XML? Did you try to rename your file suffix to .tgz?
Best, Christian
Hi Christian,
Thanks for your quick replies.
"ISIZE (Input SIZE)" from https://tools.ietf.org/html/rfc1952 looks promising for most GZIP archives containing a single file.
N.B.: The Database Resource Properties INPUTSIZE for ZIP archives also shows "0 b".
Thanks and regards, RG
On Mon, Nov 26, 2018 at 3:29 PM Christian Grün christian.gruen@gmail.com wrote:
... would you want to set the Database Resource Properties INPUTSIZE to
something other than "0 b" when the INPUTPATH is an archive?
In contrast to ZIP archives, there seems to be no trivial way in Java to retrieve the uncompressed file size from gzipped input streams. We could do some extra efforts (as e.g. proposed in [1]). As the processed input stream in BaseX may not rely on a local file, I am not sure if there is a generic solution for that.
[1] https://stackoverflow.com/questions/7317243/gets-the-uncompressed-size-of-th...
On Mon, Nov 26, 2018 at 11:49 AM Christian Grün <
christian.gruen@gmail.com> wrote:
A new stable snapshot is available [1]. In the updated version, all corner cases should be taken into consideration (such as gzip archive with missing file suffix in the file name).
Hope this helps, Christian
[1] http://files.basex.org/releases/latest/
On Mon, Nov 26, 2018 at 10:52 AM Christian Grün christian.gruen@gmail.com wrote:
Hi Rick,
I just wanted to use
https://nvd.nist.gov/feeds/json/cve/1.0/nvdcve-1.0-recent.json.gz with basexgui. basexgui doesn't seem to process the archive correctly.
I got it. So you were choosing JSON as input format, and the archive input was not chosen for import.
The challenge seems to be that the filename is not stored inside this particular .gz archive, so the ".json" substring in the original file is the only hint that the compressed file is a json file. This is different for ZIP archives, in which filenames must be stored inside the archive (in .gz archives this is optional).
By default, we thus assume that the input of .gz archives is XML. I’ll see if/how we can find a solution for this, and if we the input format choice can be utilized to correctly interpret the file contents.
P.S.: Regarding GitHub issues... I know how to search those. How
do I search past mailman threads?
You can search via the basex-talk mail archive (see the link on our web site [1]). Classical search engines will give you valuable results from StackOverflow and other sites.
Best, Christian
[1] http://basex.org/about/open-source/
On Sun, Nov 25, 2018 at 8:53 PM Christian Grün <
christian.gruen@gmail.com> wrote:
Hi Rick,
> Would've filed an issue, but the request is to post here first.
(?)
Thanks. Many GitHub issues in the past were no bugs, but
misunderstandings, so we are asking users to write to the list first.
> Using version 9.1 BaseX app, a GZIP archive of a JSON database
can't be used to properly create a database. Interestingly, a ZIP archive works fine.
Do you really want to create a BaseX database from a "JSON
database"? If yes, which format has this database?
Or does your archive contain a set of (tarred) JSON files, which
you would like to import in BaseX as XML? Did you try to rename your file suffix to .tgz?
Best, Christian
Hi Rick,
"ISIZE (Input SIZE)" from https://tools.ietf.org/html/rfc1952 looks promising for most GZIP archives containing a single file.
Yes, this field should be the one that is discussed in the StackOverflow entry. – As the field is limited to values of 2^32 bytes, the file size won’t be correct for files >4 GiB, so a more generic solution might be to count bytes while parsing them, and sum up the processed bytes after parsing.
N.B.: The Database Resource Properties INPUTSIZE for ZIP archives also shows "0 b".
I was surprised to read this. Once again, it’s due the contents of the NIST ZIP archives that don’t contain file lengths (just try some other ZIP archives to see the difference).
Best, Christian
basex-talk@mailman.uni-konstanz.de