[basex-talk] whitespace

25 Jun 2012


      Hi,
I come back to this thread after some time:
Zitat von Christian Grün christian.gruen@gmail.com:
...
...
If I want to get whitespaces back, do I have to re-create the collection?
Yes; sorry for that. The database does not contain any information on
chopped whitespaces, which is why you'll indeed have to reimport the
documents.
...
Would this result in any change concerning the node-ids?  We already have
some data depending on node-ids.  Is there some other way to get the
original whitespaces back?
The node ids will change if the documents include pure whitespace
texts.
I see.
Maybe someone can give me a hint on how to solve this problem:
I have a collection (Text-DB) created with whitespaces choped. Users  
already worked with this collection and so I have a relatively huge  
database (Collect-DB) consisting of 150 000 entries like this one:
<entry>
<node>12345</node>
<id>Ad0001</id>
<query>contains abcd</query>
</entry>
The "node" element contains the node-id from Text-DB where a certain  
xquery matched.  The relevant nodes are paragraphs or lines from a  
TEI-document.  I use the node-id and the query (as stored in the  
"query" element) in a later processing step to show the user the node  
with the relevant part by applying the original query to the original  
node using ft:mark.
When I re-create the collection with whitespace-chopping turned off,  
preserving the sequence of documents as in the whitespace-choped  
collection, the stored node-ids from Collect-DB would refer to  
completely different nodes. There is no way I could convince the users  
to do all the work again.
So my idea was to have the original Text-DB (without whitespace) and  
the new Text-DB (with whitespace), lets call it Text-DB-WS. All nodes  
in Text-DB have corresponding nodes in Text-DB-WS, they only differ  
concerning the node-id.  So I should be able to detect which node-id  
of Text-DB corresponds to which node-id of Text-DB-WS.  And then I  
could create a new version of Collect-DB by replacing the value of all  
"node" elements with the respective node-id from Text-DB-WS.
Could this be done using BaseX or should I rather do some Perl-scripting?
Best regards
Cerstin
--
Dr. phil. Cerstin Mahlow
Universität Basel
Departement Sprach- und Literaturwissenschaften
Fachbereich Deutsche Sprach- und Literaturwissenschaft
Nadelberg 4
4051 Basel
Schweiz
Tel:  +41 61 267 07 65
Fax: +41 61 267 34 40
Mail: cerstin.mahlow@unibas.ch
Web: http://www.oldphras.net
----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

[basex-talk] whitespace