ParGram Meeting Notes, Spring Meeting 2014 Nuance
Monday 24.2.2014
----
XLE STATUS
John Maxwell was able to attend from PARC so we began the day with a discussion on the current status of XLE.
The open source version is available, but it is not clear to many people how. Remedy: put a note on the XLE Wiki site about how to access the open source version.
There have also been requests for accessing the precompiled version and the English grammar. Notes on how to do this also need to be added.
The English grammar should be put on a separate repository for people to download. It would in fact be nice if this could be done via the INESS site. Jani to check with Paul on this.
It would be good Konstanz got a cfsm license, then Konstanz could compile the FSM binaries for different platforms and distribute them with XLE.
At the time of the discussion, Jani and John had together managed to get XLE compiled on Jani's computer. This is the first time XLE had been compiled successfully outside of PARC. However, the transfer system could still not be compiled. Subsequently, Jani managed to compile everything on his computer (on a late Sunday night). The problem lay with some of the Boost C++ libraries, which keep changing. After Jani downloaded the newest version, everything was fixed.
We are currently working on getting the compilation complete with everything (tcl, etc.) and testing it at Konstanz.
Suggestion for the ParGram Wiki: list of available grammars, maybe get them to be automatically downloadable from the INESS website (see above).
At the last ParGram meeting, it was suggested to use a creative common licencse for the grammars like the Polish site does. Even if resources are freely available, people are often not sure if they can use them or under what conditions. So it would be good to make them officially available via a license.
Other action items:
Move XLE documentation to Konstanz, part of XLE redmine. Integrate Polish addition.
Put note on starter grammars on ParGram Wiki in a prominent place. Also common features and common templates.
----
DEBUGGING AND DOCUMENTATION ISSUES
Ron discussed debugging issues he is having with XLE.
It was acknowledged that learning out to debug with XLE is an acquired skill and that some things could be more helpful.
In particular, there is a concern with coming into a large grammar and understanding the history and design decisions that are embedded in it.
They key to solving this appears to be ample documentation within an individual grammar, but also easy accessibility to the overall decisions taken within ParGram over the decades and a summary of the state of the discussion for certain phenomena. Everybody is again encouraged to contribute to the ParGram Wiki:
http://typo.uni-konstanz.de/redmine/projects/pargram/wiki/
Some topics have already been put up. It would be great if the community could contribute to knowledge sharing by putting up brief entries for further topics.
------
PROSODY-SYNTAX AND MODULARITY DISCUSSION
Tina presented issues she had been working on with respect to the prosody-syntax interface. In particular, there was a question about how modularity in LFG is to be understood. The upshot of the discussion was that given the projection architecture of LFG, the question arises with respect to how much information is being passed back and forth and how much of that is in a sense duplicative.
John and Ron continue to advocate an FST approach as in the Boegel et al. (2010) paper.
-----
F-STRUCTURE COMPARISON
Testsuites:
The f-structure comparison this year focused on the treatment of negation (suggested by Gyuri at the last ParGram meeting) and possessive (suggested by Jani). We also had the first few lines of "The Gruffalo".
The intention is to add parts of this meetings Testsuite to the ParGramBank. For this, however, all the grammars will need to parallelize.
While constructing the testsuites, we also tried to avoid introducing extraneous issues primarily caused by unwise choice of vocabulary. That is, we tried to use vocabulary that was likely to exist in all languages. I.e., "chicken" instead of "gopher". This turns out not to be an easy problem to solve. For example, with Wolof we ran into trouble with "girl" and "boy". These always introduce an extra relative clause as "girl" translates to "child who is a woman" and "boy" translates to "child who is a man".
The problem of crosslinguistic testsuite construction is not trivial. We had discussed some issues at the Debrecen meeting and some further ones came up here. It seems that it would be worth writing a follow-up paper to the ParGramBank paper discussing these issues in some detail.
-----
Negation
In looking at the negation f-structures, we reviewed the ParGram Wiki entry and Gyuri's slides from 2013. These or a version of these need to be added to the ParGram Wiki (Miriam and Gyuri to do). to do).
One issue was whether one could project an ADJUNCT from a negation that is expressed morphologically. This is technically possible, so there is no reason in principle to resort just to a NEG + feature just because you have morphological negation.
Changes that need to be made to the grammars to ensure parallelism:
Norwegian: ADV-TYPE neg --> ADJUNCT-TYPE neg
STMT-TYPE --> CLAUSE-TYPE (see guidelines on this)
VFORM into a check feature?
Wolof: has NEG + feature --- should think about moving to an Adjunct type analysis (?)
German: why only constituent negation on "Die Katzen sind/lachen nicht im Haus."
Some examples in the negation testsuite included NPI items. Overall, NPI is not dealt with in any of the grammars.
"The cat is not in the house either."
Works okay in English, but not checking for NPI.
The cat is in the house either. --- parses fine but shouldn't.
Norwegian:
Should "Katten er i huset heller" be good? I.e., is it doing the wrong parallel thing like English is?
ADV-TYPE nexus? What is this?
NPI issue does not arise with Urdu for this, since the "either" is done with focus clitic "also" plus negation.
--
Neither is the cat in the house.
Not good in English. "Nor is the cat in the housse"
Recommendation: If have things like never/ever, have POL negative for the negative one.
Urdu: change nah to not be an ADV-TYPE sadv
German: Weder/Noch ist die Katz im Haus. ---> doesn't work
"The girl did not buy any apples."
No NPI -- "The girl bought any apples." works also
Urdu: Why is there number information on "koi"? Check on whether this is necessary. Could also have QUANT-TYPE info. English doesn't but Norwegian does. (QUANT-TYPE existential). German has no QUANT-TYPE.
For Urdu: could introduce a "QUANT-TYPE existential"
German does not register the negative factor in "keine". That would be difficult for transfer.
Wolof: QUANT-TYPE negative -- is calculated on the basis of the any/no item plus negation on the verb. Not really parallel with the rest, but may be the desirable thing to do for this language. Reason: also have existential quantifiers and they are different.
In general, English grammar thought about checking on NPI items and thought it was too costly an operation. The reason is that one would have to involve (IO)-FU and constraints on paths, but those are difficult/complex to determine.
Ron was extremely surprised by the fact that the grammar contained CASE obl. Nuance might look into changing CASE obl --> CASE acc for the English grammar.
-------------------------------------------------------------------
Tuesday 25.2.2014
----
DISCUSSION ON TREEBANKS/CONSTRUCTIONS
INESS --- It would be great if the treebank listings on INESS could be reorganized so that there is a group of ParGram treebanks and under that there should be the parallel ParGramBank.
For next time, should do some of the testsuites that are already in existence and parse them and add them to the ParGramBank.
EAGLES testsuite -- who has it? Miriam asked Hans Uszkoreit and he said he is trying to find it as well, he will let Miriam know.
One could also look into using some of DELPH-INs testsuites.
----
F-STRUCTURE COMPARISON CONTINUED
Possessive f-strs:
Norwegian: REF + needed?
English: bright red tractor, wrong analysis
bright red tractor: bright is not a good adjective to use, find a different one
maybe try light instead of bright.
Wolof: Adjunct type for "bright" in bright red tractor?
Urdu: change numbers to not have the stem value as the PRED value, but the actual number.
Check whether need NUM information in the Urdu numbers.
Wolof: why is "four" NUM sg and why is it at the SPEC level??? Will probably be removed.
The farmer has a fear of spiders -- different treatment of "of spiders" across grammars (Adjuncts in English/German, OBL in Norwegian), so not a good example for ParGramBank.
Jani will try to come up with a sentence that doesn't involve a complement of the noun.
Norwegian: why is the NUMBER f-str so complex? Can some of that go into CHECK features?
ADJ-GEN in German? Look up in Steffi's dissertation?
Urdu change: genitives to the right of N, also change to SPEC POSS rather than Adjunct analysis
----
Wednesday 26.2.14
Talks held at CSLI on lexical semantics and complex predicates -- see the slides.
There was also a discussion led by Lori Levin on CL curriculum and on establishing standards for resources. The background is more and more standards are being set (i.e., by Google), but the linguistic standards needed are not necessarily included because there is too little understanding of the issues involved (i.e., too little linguistic and grammar engineering knowledge). Also, it is expensive to train people in the kind of grammar writing we do. However, once people are trained up, they are a rare commodity that are coveted by companies.
The discussion revolved around how to fix this. One method is to populate the ParGram Wiki with more information on issues and fixes and why we decided to standardize things the way we did.
PLEASE EVERYBODY HELP WITH THIS.
One method is to provide courses, for example at the LSA institute or perhaps via on-line media (iTunes, etc.). Chris Brew, Lori Levin and Annie Zaenen said they would pursue this actively. Miriam is recording an on-line course on grammar development in the summer semester that will be freely available.
-----
Thursday 27.2.14
Talks held on the Cognition Parser, on parser evaluation and disambiguation in the Wolof grammar and where Nuance is going with NL at the moment. See the slides.
Dear Miriam et al.,
Many, many thanks for the notes – they are very helpful and interesting. Just a couple of corrections concerning the Polish grammar (Agnieszka may have more comments later) and a question:
At the last ParGram meeting, it was suggested to use a creative common licencse for the grammars like the Polish site does.
Actually, it's GNU General Public License (version 3). (I've also corrected this info on the ParGram redmine wiki.) I am not sure this is the best licence for a resource of this kind – let me know if you can see good arguments for making it available under CC, and we'll consider dual-licensing.
Some examples in the negation testsuite included NPI items. Overall, NPI is not dealt with in any of the grammars.
Well, the Polish grammar implements Negative Concord, which may be considered an extreme case of NPI-licensing.
See the slides.
Where are they? I found a link to “ParGram workspace” at http://pargram.b.uib.no/ but my ParGram password does not seem to work there…
Thanks again!
Best,
Adam P.
pargram@mailman.uni-konstanz.de