On 2012-05-09, Cerstin Mahlow cerstin.mahlow@unibas.ch wrote:
While I concede this may be useful in numerous use cases (and may even seem obvious), it would take quite some time to get implemented, so... please don't expect too much magic for the moment. There will also be some conceptual issues that need to be resolved. As an example, which result would you expect for the following query?
ft:mark(<a>X <b>Y</b> Z</a>[. contains text 'X Y'])
I think it should be
<a><mark>X</mark> <b><mark>Y</mark></b> Z</a>
Each token from the search string would be enclosed in a <mark>-element.
Exactly. While this probably wouldn't cover *all* possible scenarios, it would still cover most of the useful ones. In fact, it would be similar to http://www.raymondhill.net/blog/?p=272. It would also be applicable when ignoring elements in a search.
For complex applications it may help to get the start and end character positions of the matches (essentially standoff markup), and the application could then do the highlighting itself on the basis of this information.
[...]
If you don't need the inner elements, you may as well remove them from your document before applying ft:mark().
This is a great idea if you would like to know whether the search elements are somewhere in your text.
However, if you would like to show the results to end users (= humanities people) or to annotate the document further, it's not a good idea to destroy the original structure. Or maybe one would have to come up with some tricky workaround to first replace the hierarchical node with a flat one for searching, then annotate something and somehow replace the original hierarchical one with the annotated one preserving the original hierarchy.
And for searching only, the scenario is a TEI-document representing an old printed book with highlighting (e.g., some things in italics), foreign-language words printed in a different font, person names already marked, etc. The TEI rendering is intended to mimic the original printed page. When implementing a full-text search, the end user expects to see the highlighted search tokens within the rendered page. Therefore the "easiest" way is to search in descendant nodes and use ft:mark to highlight the hits, without any need to change the TEI rendering. This would also allow the end user to not only see the node where the search string was found, but scroll up and down to inspect the context of the node.
I fully agree, this is exactly what I need in my application: I don't want to retrieve snippets from the document, but I always have to display the full document with the hits highlighted.
What I'm going to do now is probably highlight the full paragraph which contains the node retrieved by the search, i.e., get the node ID, walk up the tree until I encounter a <p> and get its @xml:id, which I can then use in a CSS stylesheet. Or something like this. But this is clearly only an approximation.
Best regards