Graydon,
That seems like a good solution. I will pursue it.
My only practical wrinkle is that I’m reading from local git clones so I have to make sure I’ve attempted to load any files pulled since the last load before checking for failed-to-load files, but that’s doable
of course.
Cheers,
E.
_____________________________________________
Eliot Kimber
Sr Staff Content Engineer
O: 512 554 9368
M: 512 554 9368
LinkedIn | Twitter | YouTube | Facebook
From:
Graydon <graydonish@gmail.com>
Date: Saturday, February 26, 2022 at 9:05 AM
To: Eliot Kimber <eliot.kimber@servicenow.com>
Cc: basex-talk@mailman.uni-konstanz.de <basex-talk@mailman.uni-konstanz.de>
Subject: Re: [basex-talk] Identify Unparseable XML Files in File System
[External Email]
On Sat, Feb 26, 2022 at 02:53:46PM +0000, Eliot Kimber scripsit:
> But maybe there’s a more direct way that I’ve overlooked?
If you trust the load process, you can get what's on disk with file:list(), and you can get what's in the system with some variation on collection()/document-uri(). You would then have to adjust the path names a little so they've got the same notional root.
Once you've done that, $disk[not(. = $system)] tells you which files aren't well-formed.
I'd expect this to be pretty brisk, and you don't have to try to parse
anything a second time.
--
Graydon Saunders | graydonish@gmail.com
Þæs oferéode, ðisses swá mæg.
-- Deor ("That passed, so may this.")