Dear Mr. Sperberg-McQueen,
for the filetype-filter issue it might be helpful to set the "createfilter". This can most easily be done by calling it programmatically as in [1]. At the moment unfortunately neither the ADD Command nor the GUI Dialog allow setting the filter to more than one extension at a time.
To use the GUI Command field try the following (I'm sorry I can not remember whether the ADD command is present in 5.7 [3]):
create collection MyCollection
creates and opens up an empty database.
set createfilter *.xsd
add /path/to/xml
now this adds all files matching *.xsd to the collection You may now repeat the "set createfilter/add path" steps to match all your filetypes.
You may check if all relevant files are contained via the xquery:
for $d in . return <uri>{base-uri($d)}</uri>
You may as well check the Example [2] on our website to see some queries detailing how to find a specific file inside a collection.
Hope this helps, don't hesitate to ask further questions as my answer seems rather short compared to your email :-).
Kind Regards Michael
P.S. We will send another email for dealing with the "interrupted add" if a file contains errors. I can't say if there is an easy solution or not.
Am 14.06.2010 um 03:40 schrieb C. M. Sperberg-McQueen:
I'm trying to build a BaseX database containing the current contents of the XSD 1.1 test suite, and have run across a problem I suspect is easily soluble, but for which I am not finding the solution.
The directory tree containing the database includes a large number of test cases, mostly files with names ending in .xml, and a smaller number of schema documents for the test cases, which mostly have names ending in .xsd. (One set of test contributions has three or four .xml files which are not in fact XML but which are testing the behavior of the processor in the presence of ill-formed input. I've renamed these so their names end in .xmlnwf, because they were causing attempts to index all the XML documents to fail.) There are also XML files with metadata for the test cases, some of which have names ending in .xml and some with names ending in .testSet. And there is some random stuff that I'm not particularly interested in indexing: some README files, a Microsoft Word document included by one organization contributing test cases, and (since I'm working under Mac OS X) some .DS_Store files containg OS-level binary metadata about the directory. There may be other files in the directories as well; I haven't tried to make a full census.
If I could, I'd tell the database creation dialog to index everything it can parse as XML and ignore the rest. But when the parser encounters ill-formed input, it issues an error message and aborts the creation of the database. (And then the message disappears, so the user doesn't know which file caused the problem. If you could change that, it would be great, but really that's a separate issue.)
Alternatively, I'd like to tell the creation dialog to use a filter that accepts either .xml files or .xsd files or .testSet files. But I can't figure out what notation to use for that. Since the example given ("*.xml") looks like a bash wildcard, I've tried "*.xml, *.xsd, *.testSet", and "*.xml,*.xsd,*.testSet" and "*.xml *.xsd *.testSet" and "*.{xml,xsd,testSet}", none of which match any files.
Is there a way to specify a filter that matches several file extensions?
Guided by http://www.inf.uni-konstanz.de/dbis/basex/commands I then tried
add /home/xmlschema/xsts11/*/*.xsd
in the command area in the GUI, but it tells me the command ADD is not known. (Perhaps this is because I'm still using 5.7; perhaps ADD came later?)
I'm hoping wildcards are allowed in ADD commands, because the task of finding and adding fourteen thousand .xsd files would be a bit tedious. (I suppose I could use the command-line interface to do it.) It would also be nice to have the ADD functionality better integrated into the GUI. (How spoilt I have become! I can't believe I'm not using the command line, but the syntax checking in the query window is too helpful for me.)
Any advice on (1) the notation of the filter parameter, (2) the use of wildcards in the ADD command, or (3) any other way to solve this task and build this database?
I'm using the GUI interface to BaseX (odd; my name for the app makes clear I think it is version 5.7, but the About dialog box gives the number 12.2). I'll send this note, and then download and install 6.1 to see if it helps.
[1] http://www.inf.uni-konstanz.de/dbis/basex/code/CollectionExample [2] http://www.inf.uni-konstanz.de/dbis/basex/code/CollectionQueryExample [3] Working with collections got more convenient in later BaseX releases as relative path information is preserved inside a collection.