Re: [basex-talk] differencing the string value of documents

12 Jun 2021


      On Sat, Jun 12, 2021 at 04:23:23PM -0400, Liam R. E. Quin scripsit:
...
On Sat, 2021-06-12 at 15:38 -0400, Graydon wrote:
...
This test is meant to test only that no words have been lost or
re-ordered; that the transformation is semantically correct is out of
scope for it.
Somerandomwitterings...
So, i'd probably consider
(1) make a sequence of words from document A
Now, if you really hate your CPU :) you could transform A.seq into a
regular expression,
  w0.*w1.*w2...
and match it against the extracted string value of A.
I would have to hate my CPU intensely; some of the real documents run to
a thousand or more pages in PDF.
[snip]
...
Doug Lenat i think has written a book around parsing algorithms, as has
Anne Brüggemann-Klein; Michael Sperberg-McQueen gave a paper at
Balisage about applications to Schema Validation (or at Extreme
Markup). Anne's abstraction, whose namei can't remember (sorry), is
most promising since your problem can be recast as equivalent to
matching XML Schema grammars to input documents, with the unique
particle attribution restriction lifted; RelaxNG does this with a hedge
automaton and that's another approach.
I think this will be helpful in the longer term, since more
general solutions and solutions for whether the transformation is
semantically conformant will be wanted.
(Also likely another few buckets of water will be required. :)
Thanks!
Graydon
-- 
Graydon Saunders  | graydonish@gmail.com
Þæs oferéode, ðisses swá mæg.
-- Deor  ("That passed, so may this.")

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] differencing the string value of documents