Hi,
I come back to this thread after some time:
Zitat von Christian Grün christian.gruen@gmail.com:
If I want to get whitespaces back, do I have to re-create the collection?
Yes; sorry for that. The database does not contain any information on chopped whitespaces, which is why you'll indeed have to reimport the documents.
Would this result in any change concerning the node-ids? We already have some data depending on node-ids. Is there some other way to get the original whitespaces back?
The node ids will change if the documents include pure whitespace texts.
I see.
Maybe someone can give me a hint on how to solve this problem:
I have a collection (Text-DB) created with whitespaces choped. Users already worked with this collection and so I have a relatively huge database (Collect-DB) consisting of 150 000 entries like this one:
<entry> <node>12345</node> <id>Ad0001</id> <query>contains abcd</query> </entry>
The "node" element contains the node-id from Text-DB where a certain xquery matched. The relevant nodes are paragraphs or lines from a TEI-document. I use the node-id and the query (as stored in the "query" element) in a later processing step to show the user the node with the relevant part by applying the original query to the original node using ft:mark.
When I re-create the collection with whitespace-chopping turned off, preserving the sequence of documents as in the whitespace-choped collection, the stored node-ids from Collect-DB would refer to completely different nodes. There is no way I could convince the users to do all the work again.
So my idea was to have the original Text-DB (without whitespace) and the new Text-DB (with whitespace), lets call it Text-DB-WS. All nodes in Text-DB have corresponding nodes in Text-DB-WS, they only differ concerning the node-id. So I should be able to detect which node-id of Text-DB corresponds to which node-id of Text-DB-WS. And then I could create a new version of Collect-DB by replacing the value of all "node" elements with the respective node-id from Text-DB-WS.
Could this be done using BaseX or should I rather do some Perl-scripting?
Best regards
Cerstin -- Dr. phil. Cerstin Mahlow
Universität Basel Departement Sprach- und Literaturwissenschaften Fachbereich Deutsche Sprach- und Literaturwissenschaft Nadelberg 4 4051 Basel Schweiz
Tel: +41 61 267 07 65 Fax: +41 61 267 34 40 Mail: cerstin.mahlow@unibas.ch Web: http://www.oldphras.net
---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program.