I'm trying to build a BaseX database containing the current
contents of the XSD 1.1 test suite, and have run across a
problem I suspect is easily soluble, but for which I am not
finding the solution.
The directory tree containing the database includes a large
number of test cases, mostly files with names ending in .xml,
and a smaller number of schema documents for the test cases,
which mostly have names ending in .xsd. (One set of test
contributions has three or four .xml files which are not in
fact XML but which are testing the behavior of the processor
in the presence of ill-formed input. I've renamed these so
their names end in .xmlnwf, because they were causing attempts
to index all the XML documents to fail.) There are also XML
files with metadata for the test cases, some of which have
names ending in .xml and some with names ending in .testSet.
And there is some random stuff that I'm not particularly
interested in indexing: some README files, a Microsoft Word
document included by one organization contributing test cases,
and (since I'm working under Mac OS X) some .DS_Store files
containg OS-level binary metadata about the directory. There
may be other files in the directories as well; I haven't tried
to make a full census.
If I could, I'd tell the database creation dialog to index
everything it can parse as XML and ignore the rest. But
when the parser encounters ill-formed input, it issues an
error message and aborts the creation of the database. (And
then the message disappears, so the user doesn't know which
file caused the problem. If you could change that, it would
be great, but really that's a separate issue.)
Alternatively, I'd like to tell the creation dialog to use
a filter that accepts either .xml files or .xsd files or
.testSet files. But I can't figure out what notation to use
for that. Since the example given ("*.xml") looks like a
bash wildcard, I've tried "*.xml, *.xsd, *.testSet", and
"*.xml,*.xsd,*.testSet" and "*.xml *.xsd *.testSet"
and "*.{xml,xsd,testSet}", none of which match any files.
Is there a way to specify a filter that matches several file
extensions?
Guided by
http://www.inf.uni-konstanz.de/dbis/basex/commandsI then tried
add /home/xmlschema/xsts11/*/*.xsd
in the command area in the GUI, but it tells me the command
ADD is not known. (Perhaps this is because I'm still using
5.7; perhaps ADD came later?)
I'm hoping wildcards are allowed in ADD commands, because the
task of finding and adding fourteen thousand .xsd files would
be a bit tedious. (I suppose I could use the command-line
interface to do it.) It would also be nice to have the
ADD functionality better integrated into the GUI. (How spoilt
I have become! I can't believe I'm not using the command
line, but the syntax checking in the query window is too
helpful for me.)
Any advice on (1) the notation of the filter parameter,
(2) the use of wildcards in the ADD command, or (3) any
other way to solve this task and build this database?
I'm using the GUI interface to BaseX (odd; my name for the
app makes clear I think it is version 5.7, but the About
dialog box gives the number 12.2). I'll send this note, and
then download and install 6.1 to see if it helps.