Dear all,
We've been developing POLFIE, an LFG grammar of Polish, at ICS PAS for
over 3 years. We'd like to concentrate on extending the empirical
scope of the grammar, but we're having performance issues which affect
the results quite badly.
Currently we're testing the grammar on sentences extracted from
Składnica, a treebank of Polish. There are 8333 sentences, average
sentence length is 10 (all segments are counted, including
punctuation).
How are these sentences parsed?
– an individual dictionary is created for each sentence (so the words
are already disambiguated morphosyntactically)
– each sentence is parsed on its own, in one XLE run.
The following performance variables are used when parsing:
– 100 seconds (set timeout 100)
– 4096 MB memory (set max_xle_scratch_storage 4096).
Current results (out of 8333 sentences):
– parsed: 6926
– failed: 154
– out of memory: 11
– timeout: 1228
– unknown error: 14
Almost 15% of sentences are timed out, which is very worrying. The
average length of a parsed sentence is almost 9 (8.74), while the
average length of a timed out sentence is almost 19 (18.67).
Have you had similar problems? Are you parsing real sentences, how
long are your sentences?
Do you have any suggestions what we could do so as to reduce the
number of timed out sentences?
Best,
Agnieszka