Graydon,
That seems like a good solution. I will pursue it.
My only practical wrinkle is that I’m reading from local git clones so I have to make sure I’ve attempted to load any files pulled since the last load before checking for failed-to-load files, but that’s doable of course.
Cheers,
E.
_____________________________________________ Eliot Kimber Sr Staff Content Engineer O: 512 554 9368 M: 512 554 9368 servicenow.comhttps://www.servicenow.com LinkedInhttps://www.linkedin.com/company/servicenow | Twitterhttps://twitter.com/servicenow | YouTubehttps://www.youtube.com/user/servicenowinc | Facebookhttps://www.facebook.com/servicenow
From: Graydon graydonish@gmail.com Date: Saturday, February 26, 2022 at 9:05 AM To: Eliot Kimber eliot.kimber@servicenow.com Cc: basex-talk@mailman.uni-konstanz.de basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Identify Unparseable XML Files in File System [External Email]
On Sat, Feb 26, 2022 at 02:53:46PM +0000, Eliot Kimber scripsit:
But maybe there’s a more direct way that I’ve overlooked?
If you trust the load process, you can get what's on disk with file:list(), and you can get what's in the system with some variation on collection()/document-uri(). You would then have to adjust the path names a little so they've got the same notional root.
Once you've done that, $disk[not(. = $system)] tells you which files aren't well-formed.
I'd expect this to be pretty brisk, and you don't have to try to parse anything a second time.
-- Graydon Saunders | graydonish@gmail.com Þæs oferéode, ðisses swá mæg. -- Deor ("That passed, so may this.")