When trying to to a full text index on a collection of texts, the process runs for a couple of hours with the exit message below - I think it is near completed. From the GUI, I have at least seen the progress bar get to around 80 %, so I think it is safe to assume that the error is connectedt the final stages.
The texts are unstructured and represented as one line pr. book. Here is the result from the index process. Parameters set in GUI are: Norwegian Snowball, lemmatization, diacritics. There is set aside 30GB for the GUI.
Path summary: doc(): 317259x, strings text: 317259x, leaf text(): 317259x, strings, leaf
Here is the error message:
Improper use? Potential bug? Your feedback is welcome: Contact: basex-talk@mailman.uni-konstanz.de Version: BaseX 8.2 beta 7d38949 Java: Oracle Corporation, 1.7.0_79 OS: Linux, amd64 Stack Trace: java.lang.NegativeArraySizeException at java.util.Arrays.copyOf(Arrays.java:2271) at org.basex.util.TokenBuilder.add(TokenBuilder.java:303) at org.basex.util.TokenBuilder.add(TokenBuilder.java:290) at org.basex.index.ft.FTBuilder.merge(FTBuilder.java:248) at org.basex.index.ft.FTBuilder.write(FTBuilder.java:155) at org.basex.index.ft.FTBuilder.index(FTBuilder.java:94) at org.basex.index.ft.FTBuilder.build(FTBuilder.java:102) at org.basex.index.ft.FTBuilder.build(FTBuilder.java:1) at org.basex.data.DiskData.createIndex(DiskData.java:195) at org.basex.core.cmd.ACreate.create(ACreate.java:117) at org.basex.core.cmd.CreateIndex.run(CreateIndex.java:62) at org.basex.core.Command.run(Command.java:398) at org.basex.core.Command.execute(Command.java:100) at org.basex.core.Command.execute(Command.java:123) at org.basex.gui.dialog.DialogProgress$1.run(DialogProgress.java:178)
Regards Lars G Johnsen National Library of Norway
Hi Lars,
It looks as if the input data is indeed too large to be indexed (the internal id lists seem to exceed the maximum array size in main memory). The usual alternative to make it work is to distribute your document(s) into multiple databases.
If you want, you can also provide us with the input data, but I assume it will take pretty much space?
Best, Christian
Sat, Jun 27, 2015 at 12:50 PM, Lars Johnsen yoonsen@gmail.com wrote:
When trying to to a full text index on a collection of texts, the process runs for a couple of hours with the exit message below - I think it is near completed. From the GUI, I have at least seen the progress bar get to around 80 %, so I think it is safe to assume that the error is connectedt the final stages.
The texts are unstructured and represented as one line pr. book. Here is the result from the index process. Parameters set in GUI are: Norwegian Snowball, lemmatization, diacritics. There is set aside 30GB for the GUI.
Path summary: doc(): 317259x, strings text: 317259x, leaf text(): 317259x, strings, leaf
Here is the error message:
Improper use? Potential bug? Your feedback is welcome: Contact: basex-talk@mailman.uni-konstanz.de Version: BaseX 8.2 beta 7d38949 Java: Oracle Corporation, 1.7.0_79 OS: Linux, amd64 Stack Trace: java.lang.NegativeArraySizeException at java.util.Arrays.copyOf(Arrays.java:2271) at org.basex.util.TokenBuilder.add(TokenBuilder.java:303) at org.basex.util.TokenBuilder.add(TokenBuilder.java:290) at org.basex.index.ft.FTBuilder.merge(FTBuilder.java:248) at org.basex.index.ft.FTBuilder.write(FTBuilder.java:155) at org.basex.index.ft.FTBuilder.index(FTBuilder.java:94) at org.basex.index.ft.FTBuilder.build(FTBuilder.java:102) at org.basex.index.ft.FTBuilder.build(FTBuilder.java:1) at org.basex.data.DiskData.createIndex(DiskData.java:195) at org.basex.core.cmd.ACreate.create(ACreate.java:117) at org.basex.core.cmd.CreateIndex.run(CreateIndex.java:62) at org.basex.core.Command.run(Command.java:398) at org.basex.core.Command.execute(Command.java:100) at org.basex.core.Command.execute(Command.java:123) at org.basex.gui.dialog.DialogProgress$1.run(DialogProgress.java:178)
Regards Lars G Johnsen National Library of Norway
After adding a list of stop words - removed the top 100 - the full text index works perfectly.
Thanks for suggestions! Lars
2015-06-28 0:03 GMT+02:00 Christian Grün christian.gruen@gmail.com:
Hi Lars,
It looks as if the input data is indeed too large to be indexed (the internal id lists seem to exceed the maximum array size in main memory). The usual alternative to make it work is to distribute your document(s) into multiple databases.
If you want, you can also provide us with the input data, but I assume it will take pretty much space?
Best, Christian
Sat, Jun 27, 2015 at 12:50 PM, Lars Johnsen yoonsen@gmail.com wrote:
When trying to to a full text index on a collection of texts, the process runs for a couple of hours with the exit message below - I think it is
near
completed. From the GUI, I have at least seen the progress bar get to
around
80 %, so I think it is safe to assume that the error is connectedt the
final
stages.
The texts are unstructured and represented as one line pr. book. Here is
the
result from the index process. Parameters set in GUI are: Norwegian Snowball, lemmatization, diacritics. There is set aside 30GB for the GUI.
Path summary: doc(): 317259x, strings text: 317259x, leaf text(): 317259x, strings, leaf
Here is the error message:
Improper use? Potential bug? Your feedback is welcome: Contact: basex-talk@mailman.uni-konstanz.de Version: BaseX 8.2 beta 7d38949 Java: Oracle Corporation, 1.7.0_79 OS: Linux, amd64 Stack Trace: java.lang.NegativeArraySizeException at java.util.Arrays.copyOf(Arrays.java:2271) at org.basex.util.TokenBuilder.add(TokenBuilder.java:303) at org.basex.util.TokenBuilder.add(TokenBuilder.java:290) at org.basex.index.ft.FTBuilder.merge(FTBuilder.java:248) at org.basex.index.ft.FTBuilder.write(FTBuilder.java:155) at org.basex.index.ft.FTBuilder.index(FTBuilder.java:94) at org.basex.index.ft.FTBuilder.build(FTBuilder.java:102) at org.basex.index.ft.FTBuilder.build(FTBuilder.java:1) at org.basex.data.DiskData.createIndex(DiskData.java:195) at org.basex.core.cmd.ACreate.create(ACreate.java:117) at org.basex.core.cmd.CreateIndex.run(CreateIndex.java:62) at org.basex.core.Command.run(Command.java:398) at org.basex.core.Command.execute(Command.java:100) at org.basex.core.Command.execute(Command.java:123) at org.basex.gui.dialog.DialogProgress$1.run(DialogProgress.java:178)
Regards Lars G Johnsen National Library of Norway
basex-talk@mailman.uni-konstanz.de