Ok, its attached, can't see any issue. Hope you don't mind that I gzipped but there's nearly 300 thousand documents.
Yes, the data looks fine. As you already indicated, only invalid data would cause such errors, so I'm still not sure what causes the problem in your data, sorry.. Maybe someone else has an idea, or maybe you can do some more debugging?
Christian