Dear Mr. Sperberg-McQueen, 

for the filetype-filter issue it might be helpful to set the "createfilter".
This can most easily be done by calling it programmatically as in [1].
At the moment unfortunately neither the ADD Command nor the GUI Dialog allow setting the filter to more than one extension at a time.

To use the GUI Command field try the following (I'm sorry I can not remember whether the ADD command is present in 5.7 [3]):
create collection MyCollection
creates and opens up an empty database.

set createfilter *.xsd
add /path/to/xml

now this adds all files matching *.xsd to the collection
You may now repeat the "set createfilter/add path" steps to match all your filetypes.

You may check if all relevant files are contained via the xquery:
for $d in .
return <uri>{base-uri($d)}</uri>
You may as well check the Example [2] on our website to see some queries detailing how to find a specific  file inside a collection.

Hope this helps, don't hesitate to ask further questions as my answer seems rather short compared to your email :-).



Kind Regards
Michael

P.S. We will send another email for dealing with the "interrupted add" if a file contains errors. I can't say if there is an easy solution or not.


Am 14.06.2010 um 03:40 schrieb C. M. Sperberg-McQueen:

I'm trying to build a BaseX database containing the current
contents of the XSD 1.1 test suite, and have run across a
problem I suspect is easily soluble, but for which I am not
finding the solution.

The directory tree containing the database includes a large
number of test cases, mostly files with names ending in .xml,
and a smaller number of schema documents for the test cases,
which mostly have names ending in .xsd.  (One set of test
contributions has three or four .xml files which are not in
fact XML but which are testing the behavior of the processor
in the presence of ill-formed input.  I've renamed these so
their names end in .xmlnwf, because they were causing attempts
to index all the XML documents to fail.)  There are also XML
files with metadata for the test cases, some of which have
names ending in .xml and some with names ending in .testSet.
And there is some random stuff that I'm not particularly
interested in indexing:  some README files, a Microsoft Word
document included by one organization contributing test cases,
and (since I'm working under Mac OS X) some .DS_Store files
containg OS-level binary metadata about the directory.  There
may be other files in the directories as well; I haven't tried
to make a full census.

If I could, I'd tell the database creation dialog to index
everything it can parse as XML and ignore the rest.  But
when the parser encounters ill-formed input, it issues an
error message and aborts the creation of the database.  (And
then the message disappears, so the user doesn't know which
file caused the problem.  If you could change that, it would
be great, but really that's a separate issue.)

Alternatively, I'd like to tell the creation dialog to use
a filter that accepts either .xml files or .xsd files or
.testSet files.  But I can't figure out what notation to use
for that.  Since the example given ("*.xml") looks like a
bash wildcard, I've tried "*.xml, *.xsd, *.testSet", and
"*.xml,*.xsd,*.testSet" and "*.xml *.xsd *.testSet"
and "*.{xml,xsd,testSet}", none of which match any files.

Is there a way to specify a filter that matches several file
extensions?

Guided by http://www.inf.uni-konstanz.de/dbis/basex/commands
I then tried

 add /home/xmlschema/xsts11/*/*.xsd

in the command area in the GUI, but it tells me the command
ADD is not known.  (Perhaps this is because I'm still using
5.7; perhaps ADD came later?)

I'm hoping wildcards are allowed in ADD commands, because the
task of finding and adding fourteen thousand .xsd files would
be a bit tedious.  (I suppose I could use the command-line
interface to do it.)  It would also be nice to have the
ADD functionality better integrated into the GUI.  (How spoilt
I have become!  I can't believe I'm not using the command
line, but the syntax checking in the query window is too
helpful for me.)

Any advice on (1) the notation of the filter parameter,
(2) the use of wildcards in the ADD command, or (3) any
other way to solve this task and build this database?

I'm using the GUI interface to BaseX (odd; my name for the
app makes clear I think it is version 5.7, but the About
dialog box gives the number 12.2).  I'll send this note, and
then download and install 6.1 to see if it helps.

[1] http://www.inf.uni-konstanz.de/dbis/basex/code/CollectionExample
[2] http://www.inf.uni-konstanz.de/dbis/basex/code/CollectionQueryExample
[3] Working with collections got more convenient in later BaseX releases as relative path information is preserved inside a collection.