Hi Tim,
may I suggest that you convert the n-triples file to RDF/XML format
(I'm using Pythong rdflib for such tasks,
<http://rdflib.readthedocs.io/en/stable/ >)? Perhaps it would be easier
for BaseX to ingest the XML instead of text format (which it thinks
the n-triples are).
Best,
Neven
Neven Jovanovic, Zagreb
On 25 November 2016 at 18:21, Christian Grün <christian.gruen@gmail.com> wrote:
> Hi Tim,
>
> In BaseX, texts/strings are internally represented as byte arrays. Due
> to the 32 bit limitation of Java arrays, the file will be too large to
> be oped as single text in main-memory.
>
> To be honest, I didn’t have a similar use case before, so I guess the
> best solution for now will be to split the file into smaller chunks
> before processing it with BaseX.
>
> Cheers,
> Christian
>
>
>
> On Fri, Nov 25, 2016 at 6:07 PM, Tim Thompson <timathom@gmail.com> wrote:
>> Hello,
>>
>> I have a large file[1] (3.5G unzipped) in the n-triples RDF format that I
>> would like to work with in BaseX. When I try to read in the file using
>> file:read-text(), I get the following error:
>>
>> Error:
>> Version: BaseX 8.6 beta 8fa97ca
>> Java: Oracle Corporation, 1.8.0_73
>> OS: Linux, amd64
>> Stack Trace:
>> java.lang.NegativeArraySizeException
>> at java.util.Arrays.copyOf(Arrays.java:3236)
>> at org.basex.util.TokenBuilder.addByte(TokenBuilder.java:247)
>> at org.basex.util.TokenBuilder.add(TokenBuilder.java:176)
>> at org.basex.io.in.TextInput.cache(TextInput.java:143)
>> at org.basex.io.in.TextInput.content(TextInput.java:132)
>> at org.basex.query.value.item.StrStream.materialize( StrStream.java:71)
>> at org.basex.query.value.item.StrStream.string(StrStream. java:44)
>> at org.basex.query.expr.ParseExpr.toToken(ParseExpr. java:273)
>> at org.basex.query.expr.ParseExpr.toEmptyToken( ParseExpr.java:261)
>> at org.basex.query.func.fn.FnSubstring.item(FnSubstring. java:22)
>> at org.basex.query.expr.ParseExpr.iter(ParseExpr.java: 44)
>> at org.basex.query.expr.gflwor.GFLWOR$1.next(GFLWOR.java:99)
>> at org.basex.query.scope.MainModule$1.next(MainModule. java:122)
>> at org.basex.query.QueryContext.cache(QueryContext.java:648)
>> at org.basex.query.QueryProcessor.cache( QueryProcessor.java:116)
>> at org.basex.core.cmd.AQuery.query(AQuery.java:87)
>> at org.basex.core.cmd.XQuery.run(XQuery.java:22)
>> at org.basex.core.Command.run(Command.java:255)
>> at org.basex.core.Command.execute(Command.java:93)
>> at org.basex.gui.GUI.exec(GUI.java:479)
>> at org.basex.gui.GUI.access$3(GUI.java:433)
>> at org.basex.gui.GUI$7.run(GUI.java:421)
>>
>> When I try to create a text database using the GUI, I get an error stating
>> that the file could not be parsed.
>>
>> Is it possible to work with text files that are this large using BaseX?
>>
>> Thank you,
>> Tim
>>
>> [1] Available for download here:
>> http://www.bne.es/es/Inicio/Perfiles/Bibliotecarios/ DatosEnlazados/ DescargaFicheros/
>> (http://datos.bne.es/datadumps/autoridades.nt.bz2 )
>>
>> --
>> Tim A. Thompson
>> Metadata Librarian (Spanish/Portuguese Specialty)
>> Princeton University Library
>>