Hi, Neven,

Yes, that's what I'll probably end up doing. My real motive for trying to process the file as text was to test a BaseX extension function for parsing n-triples, using one of Gunther Rademacher's REx Parser Generator parsers[1].

I added a Java module with the parser to BaseX, and it works very well with small to medium sized files, generating a parse tree based on the EBNF for the format. I was just curious to see how it would perform with very large files.

Best,
Tim

[1] http://www.bottlecaps.de/rex/

--
Tim A. Thompson
Metadata Librarian (Spanish/Portuguese Specialty)
Princeton University Library


On Fri, Nov 25, 2016 at 12:47 PM, Neven Jovanović <filologanoga@gmail.com> wrote:
Hi Tim,

may I suggest that you convert the n-triples file to RDF/XML format
(I'm using Pythong rdflib for such tasks,
<http://rdflib.readthedocs.io/en/stable/>)? Perhaps it would be easier
for BaseX to ingest the XML instead of text format (which it thinks
the n-triples are).

Best,

Neven

Neven Jovanovic, Zagreb

On 25 November 2016 at 18:21, Christian Grün <christian.gruen@gmail.com> wrote:
> Hi Tim,
>
> In BaseX, texts/strings are internally represented as byte arrays. Due
> to the 32 bit limitation of Java arrays, the file will be too large to
> be oped as single text in main-memory.
>
> To be honest, I didn’t have a similar use case before, so I guess the
> best solution for now will be to split the file into smaller chunks
> before processing it with BaseX.
>
> Cheers,
> Christian
>
>
>
> On Fri, Nov 25, 2016 at 6:07 PM, Tim Thompson <timathom@gmail.com> wrote:
>> Hello,
>>
>> I have a large file[1] (3.5G unzipped) in the n-triples RDF format that I
>> would like to work with in BaseX. When I try to read in the file using
>> file:read-text(), I get the following error:
>>
>> Error:
>> Version: BaseX 8.6 beta 8fa97ca
>> Java: Oracle Corporation, 1.8.0_73
>> OS: Linux, amd64
>> Stack Trace:
>> java.lang.NegativeArraySizeException
>>     at java.util.Arrays.copyOf(Arrays.java:3236)
>>     at org.basex.util.TokenBuilder.addByte(TokenBuilder.java:247)
>>     at org.basex.util.TokenBuilder.add(TokenBuilder.java:176)
>>     at org.basex.io.in.TextInput.cache(TextInput.java:143)
>>     at org.basex.io.in.TextInput.content(TextInput.java:132)
>>     at org.basex.query.value.item.StrStream.materialize(StrStream.java:71)
>>     at org.basex.query.value.item.StrStream.string(StrStream.java:44)
>>     at org.basex.query.expr.ParseExpr.toToken(ParseExpr.java:273)
>>     at org.basex.query.expr.ParseExpr.toEmptyToken(ParseExpr.java:261)
>>     at org.basex.query.func.fn.FnSubstring.item(FnSubstring.java:22)
>>     at org.basex.query.expr.ParseExpr.iter(ParseExpr.java:44)
>>     at org.basex.query.expr.gflwor.GFLWOR$1.next(GFLWOR.java:99)
>>     at org.basex.query.scope.MainModule$1.next(MainModule.java:122)
>>     at org.basex.query.QueryContext.cache(QueryContext.java:648)
>>     at org.basex.query.QueryProcessor.cache(QueryProcessor.java:116)
>>     at org.basex.core.cmd.AQuery.query(AQuery.java:87)
>>     at org.basex.core.cmd.XQuery.run(XQuery.java:22)
>>     at org.basex.core.Command.run(Command.java:255)
>>     at org.basex.core.Command.execute(Command.java:93)
>>     at org.basex.gui.GUI.exec(GUI.java:479)
>>     at org.basex.gui.GUI.access$3(GUI.java:433)
>>     at org.basex.gui.GUI$7.run(GUI.java:421)
>>
>> When I try to create a text database using the GUI, I get an error stating
>> that the file could not be parsed.
>>
>> Is it possible to work with text files that are this large using BaseX?
>>
>> Thank you,
>> Tim
>>
>> [1] Available for download here:
>> http://www.bne.es/es/Inicio/Perfiles/Bibliotecarios/DatosEnlazados/DescargaFicheros/
>> (http://datos.bne.es/datadumps/autoridades.nt.bz2)
>>
>> --
>> Tim A. Thompson
>> Metadata Librarian (Spanish/Portuguese Specialty)
>> Princeton University Library
>>