Am 24.05.2021 um 09:22 schrieb Kristian Kankainen:
Hi folks,
I am aware that with the HTML module you can let it guess a file's encoding by itself by providing it in binary format:
If the input encoding is unknown, the data to be processed can be passed on in its binary representation. The HTML parser will automatically try to detect the correct encoding:
Query
html:parse(fetch:binary("https://en.wikipedia.org https://en.wikipedia.org"))
But is there a way to guess encoding of CSV files? So far I have tried with the fetch and CSV module with no results. I have a huge bunch of CSV files and they are all in different encodings. Maybe it is possible to pipe the content of the fetch:binary to a system command for guessing the encoding, and use this to read in the csv?
I think both HTML parsers and XML parsers rely on the presence of some encoding declaration (e.g. a meta charset in HTML or the XML declaration in XML) to "detect" an encoding; I am not sure CSV has anything like that.
But that is just my understanding of the parser world in general, I don't know the exact way things work in BaseX.