Hi Tim,
may I suggest that you convert the n-triples file to RDF/XML format (I'm using Pythong rdflib for such tasks, http://rdflib.readthedocs.io/en/stable/)? Perhaps it would be easier for BaseX to ingest the XML instead of text format (which it thinks the n-triples are).
Best,
Neven
Neven Jovanovic, Zagreb
On 25 November 2016 at 18:21, Christian Grün christian.gruen@gmail.com wrote:
Hi Tim,
In BaseX, texts/strings are internally represented as byte arrays. Due to the 32 bit limitation of Java arrays, the file will be too large to be oped as single text in main-memory.
To be honest, I didn’t have a similar use case before, so I guess the best solution for now will be to split the file into smaller chunks before processing it with BaseX.
Cheers, Christian
On Fri, Nov 25, 2016 at 6:07 PM, Tim Thompson timathom@gmail.com wrote:
Hello,
I have a large file[1] (3.5G unzipped) in the n-triples RDF format that I would like to work with in BaseX. When I try to read in the file using file:read-text(), I get the following error:
Error: Version: BaseX 8.6 beta 8fa97ca Java: Oracle Corporation, 1.8.0_73 OS: Linux, amd64 Stack Trace: java.lang.NegativeArraySizeException at java.util.Arrays.copyOf(Arrays.java:3236) at org.basex.util.TokenBuilder.addByte(TokenBuilder.java:247) at org.basex.util.TokenBuilder.add(TokenBuilder.java:176) at org.basex.io.in.TextInput.cache(TextInput.java:143) at org.basex.io.in.TextInput.content(TextInput.java:132) at org.basex.query.value.item.StrStream.materialize(StrStream.java:71) at org.basex.query.value.item.StrStream.string(StrStream.java:44) at org.basex.query.expr.ParseExpr.toToken(ParseExpr.java:273) at org.basex.query.expr.ParseExpr.toEmptyToken(ParseExpr.java:261) at org.basex.query.func.fn.FnSubstring.item(FnSubstring.java:22) at org.basex.query.expr.ParseExpr.iter(ParseExpr.java:44) at org.basex.query.expr.gflwor.GFLWOR$1.next(GFLWOR.java:99) at org.basex.query.scope.MainModule$1.next(MainModule.java:122) at org.basex.query.QueryContext.cache(QueryContext.java:648) at org.basex.query.QueryProcessor.cache(QueryProcessor.java:116) at org.basex.core.cmd.AQuery.query(AQuery.java:87) at org.basex.core.cmd.XQuery.run(XQuery.java:22) at org.basex.core.Command.run(Command.java:255) at org.basex.core.Command.execute(Command.java:93) at org.basex.gui.GUI.exec(GUI.java:479) at org.basex.gui.GUI.access$3(GUI.java:433) at org.basex.gui.GUI$7.run(GUI.java:421)
When I try to create a text database using the GUI, I get an error stating that the file could not be parsed.
Is it possible to work with text files that are this large using BaseX?
Thank you, Tim
[1] Available for download here: http://www.bne.es/es/Inicio/Perfiles/Bibliotecarios/DatosEnlazados/DescargaF... (http://datos.bne.es/datadumps/autoridades.nt.bz2)
-- Tim A. Thompson Metadata Librarian (Spanish/Portuguese Specialty) Princeton University Library