When I add a directory to a database in 6.6.1 using the plain *.xml createfilter, and when there's a zip file below this directory, the zip file's contents will be indexed, too, as if they reside in the server's working directory. If this is a feature, how can I switch it off?
Gerrit
Dear Gerrit,
thanks for your observation. Yes, indeed you've come across a rather hidden feature, but I can understand pretty well that the behavior is not always desired. I've added a GitHub entry to document this:
https://github.com/BaseXdb/basex/issues/60
If you have some preferences how to resolve this issue, feel free to give more feedback.
Christian
On Thu, Apr 7, 2011 at 10:01 PM, Imsieke, Gerrit, le-tex gerrit.imsieke@le-tex.de wrote:
When I add a directory to a database in 6.6.1 using the plain *.xml createfilter, and when there's a zip file below this directory, the zip file's contents will be indexed, too, as if they reside in the server's working directory. If this is a feature, how can I switch it off?
Gerrit _______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
SET ADDARCHIVES ON/OFF (I'm not shouting at you, it's just that uppercase convention thing…) Default should be OFF.
Gerrit
On 2011-04-07 22:31, Christian Grün wrote:
Dear Gerrit,
thanks for your observation. Yes, indeed you've come across a rather hidden feature, but I can understand pretty well that the behavior is not always desired. I've added a GitHub entry to document this:
https://github.com/BaseXdb/basex/issues/60
If you have some preferences how to resolve this issue, feel free to give more feedback.
Christian
On Thu, Apr 7, 2011 at 10:01 PM, Imsieke, Gerrit, le-tex gerrit.imsieke@le-tex.de wrote:
When I add a directory to a database in 6.6.1 using the plain *.xml createfilter, and when there's a zip file below this directory, the zip file's contents will be indexed, too, as if they reside in the server's working directory. If this is a feature, how can I switch it off?
Gerrit _______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
SET ADDARCHIVES ON/OFF (I'm not shouting at you, it's just that uppercase convention thing…) Default should be OFF.
;) Thanks; I've added this suggestion to the GitHub entry. – As BaseX is getting more and more options, we'd be even happier with a better default solution, but the option could be a preliminary solution as well.
Christian
SET ADDARCHIVES ON/OFF (I'm not shouting at you, it's just that uppercase convention thing…) Default should be OFF.
...got it ;) It's resolved as proposed (well, pretty close, as we decided to stick with ON as default):
http://files.basex.org/releases/latest/ http://docs.basex.org/wiki/Options
Christian
On 2011-04-07 22:31, Christian Grün wrote:
Dear Gerrit,
thanks for your observation. Yes, indeed you've come across a rather hidden feature, but I can understand pretty well that the behavior is not always desired. I've added a GitHub entry to document this:
https://github.com/BaseXdb/basex/issues/60
If you have some preferences how to resolve this issue, feel free to give more feedback.
Christian
On Thu, Apr 7, 2011 at 10:01 PM, Imsieke, Gerrit, le-tex gerrit.imsieke@le-tex.de wrote:
When I add a directory to a database in 6.6.1 using the plain *.xml createfilter, and when there's a zip file below this directory, the zip file's contents will be indexed, too, as if they reside in the server's working directory. If this is a feature, how can I switch it off?
Gerrit _______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
-- Gerrit Imsieke Geschäftsführer / Managing Director le-tex publishing services GmbH Weissenfelser Str. 84, 04229 Leipzig, Germany Phone +49 341 355356 110, Fax +49 341 355356 510 gerrit.imsieke@le-tex.de, http://www.le-tex.de
Registergericht / Commercial Register: Amtsgericht Leipzig Registernummer / Registration Number: HRB 24930
Geschäftsführer: Gerrit Imsieke, Svea Jelonek, Thomas Schmidt, Dr. Reinhard Vöckler _______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Hi I like the feature, this is something, I was dreaming about, as we archive old xml files in zip files and importing them this way would easy the work quite well. It does not work perfecktly as my test is failing on some archives with Command: CREATE DB ZipTest D:\var\TICEReports\aws1\ziptest Error: "..." (Line 1): The processing instruction target matching "[xX][mM][lL]" is not allowed.
But I do not report this as bug until it is really published as supported feature (to know, what is really expected behaviour. Another wish for importing: sometime I have hundreds of xml files and some might be broken xml documents. Currently import always fails. It would be great to have an option "ignore invalid documents" which would allow quick import and which would print out file names of those invalid documents.
PS: We use for archiving command line python utility Dalimilhttps://bitbucket.org/vlcinsky/dalimil, which we wrote and published it under BSD Lincense. Still in beta phase, but is it considered safe and usable.
2011/4/7 Imsieke, Gerrit, le-tex gerrit.imsieke@le-tex.de
When I add a directory to a database in 6.6.1 using the plain *.xml createfilter, and when there's a zip file below this directory, the zip file's contents will be indexed, too, as if they reside in the server's working directory. If this is a feature, how can I switch it off?
Gerrit _______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Hi Jan,
Op 8 apr 2011, om 08:26 heeft Jan Vlčinský (CAD) het volgende geschreven:
I like the feature, this is something, I was dreaming about, as we archive old xml files in zip files and importing them this way would easy the work quite well. It does not work perfecktly as my test is failing on some archives with Command: CREATE DB ZipTest D:\var\TICEReports\aws1\ziptest Error: "..." (Line 1): The processing instruction target matching "[xX][mM][lL]" is not allowed.
Looks like there's an error in one of your XML files, there's probably some whitespace before the XML declaration, that would trigger this error.
Another wish for importing: sometime I have hundreds of xml files and some might be broken xml documents. Currently import always fails. It would be great to have an option "ignore invalid documents" which would allow quick import
Oooo yes! I want that too. Usecase: Sometimes during development some crud ends up in the folder I'm trying to import, f.i. from a bad export from another system. Then I have to clean everything first before importing into BaseX. But really, I don't care if a couple of documents fail to import, it's development and I don't need all of the documents in the DB.
and which would print out file names of those invalid documents.
So for cleaning I use a shell script with a for loop and 'xmllint -noout "$f"'. That automates it, but it would be a "nice to have" anyway if I can tell BaseX to just ignore files with errors, that would also work with .zip files, a shell script doesn't.
Hartelijke groet,
Huib.
-- Drs. Huib Verweij Senior software developer - The Language Archive Max Planck Institute for Psycholinguistics P.O. Box 310 6500 AH Nijmegen The Netherlands t +31-24-3521911 e huib.verwey@mpi.nl w http://www.mpi.nl/
Hi Huib I added the request "*Import xml files - add option to allow ignoring invalid files* https://github.com/BaseXdb/basex/issues/61#issue/61" to Issues: https://github.com/BaseXdb/basex/issues/61
https://github.com/BaseXdb/basex/issues/61Vote for it, if you like.
Jan
2011/4/8 Huib Verweij Huib.Verwey@mpi.nl
Hi Jan,
Op 8 apr 2011, om 08:26 heeft Jan Vlčinský (CAD) het volgende geschreven:
I like the feature, this is something, I was dreaming about, as we archive old xml files in zip files and importing them this way would easy the work quite well. It does not work perfecktly as my test is failing on some archives with Command: CREATE DB ZipTest D:\var\TICEReports\aws1\ziptest Error: "..." (Line 1): The processing instruction target matching "[xX][mM][lL]" is not allowed.
Looks like there's an error in one of your XML files, there's probably some whitespace before the XML declaration, that would trigger this error.
Another wish for importing: sometime I have hundreds of xml files and some might be broken xml documents. Currently import always fails. It would be great to have an option "ignore invalid documents" which would allow quick import
Oooo yes! I want that too. Usecase: Sometimes during development some crud ends up in the folder I'm trying to import, f.i. from a bad export from another system. Then I have to clean everything first before importing into BaseX. But really, I don't care if a couple of documents fail to import, it's development and I don't need all of the documents in the DB.
and which would print out file names of those invalid documents.
So for cleaning I use a shell script with a for loop and 'xmllint -noout "$f"'. That automates it, but it would be a "nice to have" anyway if I can tell BaseX to just ignore files with errors, that would also work with .zip files, a shell script doesn't.
Hartelijke groet,
Huib.
-- Drs. Huib Verweij Senior software developer - The Language Archive Max Planck Institute for Psycholinguistics P.O. Box 310 6500 AH Nijmegen The Netherlands t +31-24-3521911 e huib.verwey@mpi.nl w http://www.mpi.nl/
I voted, but isn't it the same issue as #12?
Hartelijke groet,
Huib Verweij.
-- Drs. Huib Verweij Senior software developer - The Language Archive Max Planck Institute for Psycholinguistics P.O. Box 310 6500 AH Nijmegen The Netherlands t +31-24-3521911 e huib.verwey@mpi.nl w http://www.mpi.nl/
Op 8 apr 2011, om 09:10 heeft Jan Vlčinský (CAD) het volgende geschreven:
Hi Huib I added the request "Import xml files - add option to allow ignoring invalid files" to Issues: https://github.com/BaseXdb/basex/issues/61
Vote for it, if you like.
Jan
I voted for #61 instead of #12 (yeah, gimme options)
On 2011-04-08 09:31, Huib Verweij wrote:
I voted, but isn't it the same issue as #12?
Hartelijke groet,
Huib Verweij.
-- Drs. Huib Verweij Senior software developer - The Language Archive Max Planck Institute for Psycholinguistics P.O. Box 310 6500 AH Nijmegen The Netherlands t +31-24-3521911 e huib.verwey@mpi.nl mailto:huib.verwey@mpi.nl w http://www.mpi.nl/
Op 8 apr 2011, om 09:10 heeft Jan Vlčinský (CAD) het volgende geschreven:
Hi Huib I added the request "*Import xml files - add option to allow ignoring invalid files* https://github.com/BaseXdb/basex/issues/61#issue/61" to Issues: https://github.com/BaseXdb/basex/issues/61
https://github.com/BaseXdb/basex/issues/61Vote for it, if you like.
Jan
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
I voted, but isn't it the same issue as #12?
…dito! As so many have already voted for #61, I've now closed issue #12, and added some comments.
Hartelijke groet, Huib Verweij.
-- Drs. Huib Verweij Senior software developer - The Language Archive Max Planck Institute for Psycholinguistics P.O. Box 310 6500 AH Nijmegen The Netherlands t +31-24-3521911 e huib.verwey@mpi.nl w http://www.mpi.nl/
Op 8 apr 2011, om 09:10 heeft Jan Vlčinský (CAD) het volgende geschreven:
Hi Huib I added the request "Import xml files - add option to allow ignoring invalid files" to Issues: https://github.com/BaseXdb/basex/issues/61 Vote for it, if you like. Jan
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
To add some more items to the wish list:
Currently basexclient doesn’t exit with an error code (≠0) when it encounters a file that throws an exception on the server side during import. This is unfortunate in our use case where we issue several svn checkout and BaseX ADD statements in a Makefile. Since there is no non-null exit status, make will always continue, and the errors will not be dealt with unless someone watches the process closely.
In addition, the server process will lock the file that it was unsuccessfully trying to import. I'll have to restart the server in order to make it (=server) unlock the file so that it (=file) can be deleted. So there should be some exception handler that at least closes the file handles.
And, although you (=Christian) expressed your feeling that there are already too many options around, it should be controllable by an option whether an error should be issued or whether corrupt files should be ignored (with a warning) during import, as Jan was asking for.
I don’t think it’s a bad thing to have many options, as long as they a) make sense b) are documented.
Gerrit
On 2011-04-08 08:50, Huib Verweij wrote:
Hi Jan,
Op 8 apr 2011, om 08:26 heeft Jan Vlčinský (CAD) het volgende geschreven:
I like the feature, this is something, I was dreaming about, as we archive old xml files in zip files and importing them this way would easy the work quite well. It does not work perfecktly as my test is failing on some archives with Command: CREATE DB ZipTest D:\var\TICEReports\aws1\ziptest Error: "..." (Line 1): The processing instruction target matching "[xX][mM][lL]" is not allowed.
Looks like there's an error in one of your XML files, there's probably some whitespace before the XML declaration, that would trigger this error.
Another wish for importing: sometime I have hundreds of xml files and some might be broken xml documents. Currently import always fails. It would be great to have an option "ignore invalid documents" which would allow quick import
Oooo yes! I want that too. Usecase: Sometimes during development some crud ends up in the folder I'm trying to import, f.i. from a bad export from another system. Then I have to clean everything first before importing into BaseX. But really, I don't care if a couple of documents fail to import, it's development and I don't need all of the documents in the DB.
and which would print out file names of those invalid documents.
So for cleaning I use a shell script with a for loop and 'xmllint -noout "$f"'. That automates it, but it would be a "nice to have" anyway if I can tell BaseX to just ignore files with errors, that would also work with .zip files, a shell script doesn't.
Hartelijke groet,
Huib.
-- Drs. Huib Verweij Senior software developer - The Language Archive Max Planck Institute for Psycholinguistics P.O. Box 310 6500 AH Nijmegen The Netherlands t +31-24-3521911 e huib.verwey@mpi.nl mailto:huib.verwey@mpi.nl w http://www.mpi.nl/
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Fine, so it quite looks as we don't get bored.. ;) I've added two issues in the bug tracker to memorize your observations.
To add some more items to the wish list:
Currently basexclient doesn’t exit with an error code (≠0) when it encounters a file that throws an exception on the server side during import. This is unfortunate in our use case where we issue several svn checkout and BaseX ADD statements in a Makefile. Since there is no non-null exit status, make will always continue, and the errors will not be dealt with unless someone watches the process closely.
In addition, the server process will lock the file that it was unsuccessfully trying to import. I'll have to restart the server in order to make it (=server) unlock the file so that it (=file) can be deleted. So there should be some exception handler that at least closes the file handles.
And, although you (=Christian) expressed your feeling that there are already too many options around, it should be controllable by an option whether an error should be issued or whether corrupt files should be ignored (with a warning) during import, as Jan was asking for.
I don’t think it’s a bad thing to have many options, as long as they a) make sense b) are documented.
Gerrit
On 2011-04-08 08:50, Huib Verweij wrote:
Hi Jan,
Op 8 apr 2011, om 08:26 heeft Jan Vlčinský (CAD) het volgende geschreven:
I like the feature, this is something, I was dreaming about, as we archive old xml files in zip files and importing them this way would easy the work quite well. It does not work perfecktly as my test is failing on some archives with Command: CREATE DB ZipTest D:\var\TICEReports\aws1\ziptest Error: "..." (Line 1): The processing instruction target matching "[xX][mM][lL]" is not allowed.
Looks like there's an error in one of your XML files, there's probably some whitespace before the XML declaration, that would trigger this error.
Another wish for importing: sometime I have hundreds of xml files and some might be broken xml documents. Currently import always fails. It would be great to have an option "ignore invalid documents" which would allow quick import
Oooo yes! I want that too. Usecase: Sometimes during development some crud ends up in the folder I'm trying to import, f.i. from a bad export from another system. Then I have to clean everything first before importing into BaseX. But really, I don't care if a couple of documents fail to import, it's development and I don't need all of the documents in the DB.
and which would print out file names of those invalid documents.
So for cleaning I use a shell script with a for loop and 'xmllint -noout "$f"'. That automates it, but it would be a "nice to have" anyway if I can tell BaseX to just ignore files with errors, that would also work with .zip files, a shell script doesn't.
Hartelijke groet,
Huib.
-- Drs. Huib Verweij Senior software developer - The Language Archive Max Planck Institute for Psycholinguistics P.O. Box 310 6500 AH Nijmegen The Netherlands t +31-24-3521911 e huib.verwey@mpi.nl mailto:huib.verwey@mpi.nl w http://www.mpi.nl/
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
-- Gerrit Imsieke Geschäftsführer / Managing Director le-tex publishing services GmbH Weissenfelser Str. 84, 04229 Leipzig, Germany Phone +49 341 355356 110, Fax +49 341 355356 510 gerrit.imsieke@le-tex.de, http://www.le-tex.de
Registergericht / Commercial Register: Amtsgericht Leipzig Registernummer / Registration Number: HRB 24930
Geschäftsführer: Gerrit Imsieke, Svea Jelonek, Thomas Schmidt, Dr. Reinhard Vöckler _______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
I don’t think it’s a bad thing to have many options, as long as they a) make sense b) are documented.
In our code history, we experienced that it's very easy to add new options, but much more difficult to get them out again. Next, we regularly had options which conflicted with other options at some stage. This is why we're generally happy if we can avoid adding new options to the list, and find a consistent solution that eventually satisfies even more users. I agree, however, that they are a pragmatic solution if they are well documented, and that it's usually the fastest way of solving open issues. – Christian
Currently basexclient doesn’t exit with an error code (≠0) when it encounters a file that throws an exception on the server side during import. This is unfortunate in our use case where we issue several svn checkout and BaseX ADD statements in a Makefile. Since there is no non-null exit status, make will always continue, and the errors will not be dealt with unless someone watches the process closely.
This has now been resolved for all command-line APIs; note that the Windows batch scripts (basex.bat, etc) were updated as well.
Christian
basex-talk@mailman.uni-konstanz.de