Graydon,

 

That seems like a good solution. I will pursue it.

 

My only practical wrinkle is that I’m reading from local git clones so I have to make sure I’ve attempted to load any files pulled since the last load before checking for failed-to-load files, but that’s doable of course.

 

Cheers,

 

E.

 

_____________________________________________

Eliot Kimber

Sr Staff Content Engineer

O: 512 554 9368

M: 512 554 9368

servicenow.com

LinkedIn | Twitter | YouTube | Facebook

 

From: Graydon <graydonish@gmail.com>
Date: Saturday, February 26, 2022 at 9:05 AM
To: Eliot Kimber <eliot.kimber@servicenow.com>
Cc: basex-talk@mailman.uni-konstanz.de <basex-talk@mailman.uni-konstanz.de>
Subject: Re: [basex-talk] Identify Unparseable XML Files in File System

[External Email]


On Sat, Feb 26, 2022 at 02:53:46PM +0000, Eliot Kimber scripsit:
> But maybe there’s a more direct way that I’ve overlooked?

If you trust the load process, you can get what's on disk with file:list(), and you can get what's in the system with some variation on collection()/document-uri().  You would then have to adjust the path names a little so they've got the same notional root.

Once you've done that, $disk[not(. = $system)] tells you which files aren't well-formed.

I'd expect this to be pretty brisk, and you don't have to try to parse
anything a second time.

--
Graydon Saunders  | graydonish@gmail.com
Þæs oferéode, ðisses swá mæg.
-- Deor  ("That passed, so may this.")