Hi, At the upcoming ParGram Meeting in Warsaw, we would like to experiment with a new way to do our usual structure comparison: We would like to use INESS (http://iness.uib.no/) and ParGramBank directly. There are a couple of advantages to this. One obvious advantage is that ParGramBank will grow. But we can also easily switch between languages/sentences, and there is no need to create a humongous PDF file with structures in it (last year, our structure handout had 234 pages). Paul Meurer (www.uib.no/persons/Paul.Meurer) has kindly implemented a couple of new features in INESS. I attach an email from Paul below. Please follow his instructions when adding to ParGramBank. To be able to upload structures to INESS (in Prolog format), you need to have an account there. Please follow the instructions on the INESS homepage to create your account. Once the account is created, Paul can give you the necessary user rights to upload files. Lastly, I attach the sentences in a text file to this email. As Paul says in his email, please use a consistent naming scheme for the structures; you should use something like urd-051-fs (i.e., ISO 639-3 language code - sentence number - fs). Start the sentence number at 51 (there are already 50 sentences in ParGramBank). If you have more questions, please don't hesitate to send email to Paul, or myself. Also, Agnieszka/Paul and others, if I forgot anything, feel free to jump in. Best, Jani -------- Weitergeleitete Nachricht -------- Betreff: Re: ParGram sentences Datum: Thu, 15 Jan 2015 11:48:57 +0100 Von: Paul Meurer <paul.meurer@uni.no> An: Agnieszka Patejuk <agnieszka.patejuk@googlemail.com> Kopie (CC): Sebastian Sulger <sebastian.sulger@uni-konstanz.de>, Adam Przepiórkowski <adamp@ipipan.waw.pl>, Miriam Butt <Miriam.Butt@uni-konstanz.de>, Victoria Rosén <victoria@uib.no> Hi,
BTW: I went through previous correspondence related to the meeting and have gathered the following points to cover:
PARGRAM MEETING (3 days) • structure comparison: – how: traditional or using INESS?
I think now everything needed to make INESS suitable for structure comparison is in place. Here is what is new: * I have implemented uploading of prolog files (one by one, or as a gzipped archive) * Once the sentences (not structures) are aligned, it is easy to switch between treebanks/languages. It is enough for the sentences to be aligned to a pivot language (e.g., Urdu). You can test this in the ParGram treebanks. What people should do: * Parse their sentences in XLE * Use a consistent naming scheme for the sentences (e.g., deu-050-fs), where the number corresponds to the running sentence number. We should not start with 1, but continue where we stopped last time. I think new sentences should start at 50. Alternative translations could be called deu-050a-fs etc. * Upload the sentences using the _upload files_ link on the Treebank overview page (either one-by-one, or as a gzipped archive, no other archiving program will work). * Disambiguate the sentences in INESS. * Add glosses, as described in the documentation. * Align the sentences, with English as a minimum, using the Alignment tool. This, Jani or I could do for them. Alternatively, for languages whose grammar is in INESS, the sentences could be parsed in INESS directly. Does this sound feasible? I think the sentences should be ready quite soon, for people to be able to do all this before Pargram. — Best wishes, Paul -- Sebastian Sulger FB Sprachwissenschaft Universität Konstanz http://ling.uni-konstanz.de/pages/home/sulger
Hi, There seems to be a problem with these sentences – we were to re-do the sentences from 2014 meeting in California and sentences sent by Jani differ from those. If you intend to take part in structure comparison, please wait for the next message with sentences – the list might change. Sorry for the confusion. Best, Agnieszka On 19 January 2015 at 15:11, Sebastian Sulger <sebastian.sulger@uni-konstanz.de> wrote:
Hi,
At the upcoming ParGram Meeting in Warsaw, we would like to experiment with a new way to do our usual structure comparison: We would like to use INESS (http://iness.uib.no/) and ParGramBank directly.
There are a couple of advantages to this. One obvious advantage is that ParGramBank will grow. But we can also easily switch between languages/sentences, and there is no need to create a humongous PDF file with structures in it (last year, our structure handout had 234 pages).
Paul Meurer (www.uib.no/persons/Paul.Meurer) has kindly implemented a couple of new features in INESS. I attach an email from Paul below. Please follow his instructions when adding to ParGramBank.
To be able to upload structures to INESS (in Prolog format), you need to have an account there. Please follow the instructions on the INESS homepage to create your account. Once the account is created, Paul can give you the necessary user rights to upload files.
Lastly, I attach the sentences in a text file to this email. As Paul says in his email, please use a consistent naming scheme for the structures; you should use something like urd-051-fs (i.e., ISO 639-3 language code - sentence number - fs). Start the sentence number at 51 (there are already 50 sentences in ParGramBank).
If you have more questions, please don't hesitate to send email to Paul, or myself. Also, Agnieszka/Paul and others, if I forgot anything, feel free to jump in.
Best, Jani
-------- Weitergeleitete Nachricht -------- Betreff: Re: ParGram sentences Datum: Thu, 15 Jan 2015 11:48:57 +0100 Von: Paul Meurer <paul.meurer@uni.no> An: Agnieszka Patejuk <agnieszka.patejuk@googlemail.com> Kopie (CC): Sebastian Sulger <sebastian.sulger@uni-konstanz.de>, Adam Przepiórkowski <adamp@ipipan.waw.pl>, Miriam Butt <Miriam.Butt@uni-konstanz.de>, Victoria Rosén <victoria@uib.no>
Hi,
BTW: I went through previous correspondence related to the meeting and have gathered the following points to cover:
PARGRAM MEETING (3 days) • structure comparison: – how: traditional or using INESS?
I think now everything needed to make INESS suitable for structure comparison is in place. Here is what is new:
* I have implemented uploading of prolog files (one by one, or as a gzipped archive) * Once the sentences (not structures) are aligned, it is easy to switch between treebanks/languages. It is enough for the sentences to be aligned to a pivot language (e.g., Urdu). You can test this in the ParGram treebanks.
What people should do:
* Parse their sentences in XLE * Use a consistent naming scheme for the sentences (e.g., deu-050-fs), where the number corresponds to the running sentence number. We should not start with 1, but continue where we stopped last time. I think new sentences should start at 50. Alternative translations could be called deu-050a-fs etc. * Upload the sentences using the _upload files_ link on the Treebank overview page (either one-by-one, or as a gzipped archive, no other archiving program will work). * Disambiguate the sentences in INESS. * Add glosses, as described in the documentation. * Align the sentences, with English as a minimum, using the Alignment tool. This, Jani or I could do for them.
Alternatively, for languages whose grammar is in INESS, the sentences could be parsed in INESS directly.
Does this sound feasible?
I think the sentences should be ready quite soon, for people to be able to do all this before Pargram.
— Best wishes, Paul
-- Sebastian Sulger FB Sprachwissenschaft Universität Konstanz http://ling.uni-konstanz.de/pages/home/sulger
Hi, This is my mistake - I sent a wrong version of the sentence file. The correct sentences are attached here. Please use these sentences in case you'd like to contribute structures. Also, I should emphasize (thanks for hinting at this, Agnieszka) that even if you cannot attend the ParGram meeting, you are invited to take part in structure comparison - we will take a look at your structures at the meeting to ensure parallelism. Just follow the procedure described by Paul. Thanks, Jani Am 20.01.15 um 14:22 schrieb Agnieszka Patejuk:
Hi,
There seems to be a problem with these sentences – we were to re-do the sentences from 2014 meeting in California and sentences sent by Jani differ from those.
If you intend to take part in structure comparison, please wait for the next message with sentences – the list might change.
Sorry for the confusion.
Best, Agnieszka
On 19 January 2015 at 15:11, Sebastian Sulger <sebastian.sulger@uni-konstanz.de> wrote:
Hi,
At the upcoming ParGram Meeting in Warsaw, we would like to experiment with a new way to do our usual structure comparison: We would like to use INESS (http://iness.uib.no/) and ParGramBank directly.
There are a couple of advantages to this. One obvious advantage is that ParGramBank will grow. But we can also easily switch between languages/sentences, and there is no need to create a humongous PDF file with structures in it (last year, our structure handout had 234 pages).
Paul Meurer (www.uib.no/persons/Paul.Meurer) has kindly implemented a couple of new features in INESS. I attach an email from Paul below. Please follow his instructions when adding to ParGramBank.
To be able to upload structures to INESS (in Prolog format), you need to have an account there. Please follow the instructions on the INESS homepage to create your account. Once the account is created, Paul can give you the necessary user rights to upload files.
Lastly, I attach the sentences in a text file to this email. As Paul says in his email, please use a consistent naming scheme for the structures; you should use something like urd-051-fs (i.e., ISO 639-3 language code - sentence number - fs). Start the sentence number at 51 (there are already 50 sentences in ParGramBank).
If you have more questions, please don't hesitate to send email to Paul, or myself. Also, Agnieszka/Paul and others, if I forgot anything, feel free to jump in.
Best, Jani
-------- Weitergeleitete Nachricht -------- Betreff: Re: ParGram sentences Datum: Thu, 15 Jan 2015 11:48:57 +0100 Von: Paul Meurer <paul.meurer@uni.no> An: Agnieszka Patejuk <agnieszka.patejuk@googlemail.com> Kopie (CC): Sebastian Sulger <sebastian.sulger@uni-konstanz.de>, Adam Przepiórkowski <adamp@ipipan.waw.pl>, Miriam Butt <Miriam.Butt@uni-konstanz.de>, Victoria Rosén <victoria@uib.no>
Hi,
BTW: I went through previous correspondence related to the meeting and have gathered the following points to cover:
PARGRAM MEETING (3 days) • structure comparison: – how: traditional or using INESS?
I think now everything needed to make INESS suitable for structure comparison is in place. Here is what is new:
* I have implemented uploading of prolog files (one by one, or as a gzipped archive) * Once the sentences (not structures) are aligned, it is easy to switch between treebanks/languages. It is enough for the sentences to be aligned to a pivot language (e.g., Urdu). You can test this in the ParGram treebanks.
What people should do:
* Parse their sentences in XLE * Use a consistent naming scheme for the sentences (e.g., deu-050-fs), where the number corresponds to the running sentence number. We should not start with 1, but continue where we stopped last time. I think new sentences should start at 50. Alternative translations could be called deu-050a-fs etc. * Upload the sentences using the _upload files_ link on the Treebank overview page (either one-by-one, or as a gzipped archive, no other archiving program will work). * Disambiguate the sentences in INESS. * Add glosses, as described in the documentation. * Align the sentences, with English as a minimum, using the Alignment tool. This, Jani or I could do for them.
Alternatively, for languages whose grammar is in INESS, the sentences could be parsed in INESS directly.
Does this sound feasible?
I think the sentences should be ready quite soon, for people to be able to do all this before Pargram.
— Best wishes, Paul
-- Sebastian Sulger FB Sprachwissenschaft Universität Konstanz http://ling.uni-konstanz.de/pages/home/sulger
-- Sebastian Sulger FB Sprachwissenschaft Universität Konstanz http://ling.uni-konstanz.de/pages/home/sulger
Hi, I have a question concerning examples with isolated NPs (see below) – which case should be assumed for their translation? In Polish there are 7 cases – when I leave out locative (which is assigned only by prepositions, which are not present in the examples) there are 6 left: nominative, accusative, genitive, dative, instrumental and vocative. I have to choose some of these to do the translation – which one did you choose when doing structure comparison in 2014? 35. NP: the dog's bone 36. NP: the farmer's cow 37. NP: the cat's dark fur 38. NP: the bright red color of the car 39. NP: the tree's branch 40. NP: the branch of the tree 41. NP: the farmer's problem 42. NP: the problem of the farmer 43. NP: the farmer's two brothers 44. NP: the two brothers of the farmer 45. NP: the farmer's cows' milk 46. NP: the milk of the cows of the farmer Best, Agnieszka On 20 January 2015 at 17:32, Sebastian Sulger <sebastian.sulger@uni-konstanz.de> wrote:
Hi,
This is my mistake - I sent a wrong version of the sentence file. The correct sentences are attached here. Please use these sentences in case you'd like to contribute structures.
Also, I should emphasize (thanks for hinting at this, Agnieszka) that even if you cannot attend the ParGram meeting, you are invited to take part in structure comparison - we will take a look at your structures at the meeting to ensure parallelism. Just follow the procedure described by Paul.
Thanks, Jani
Am 20.01.15 um 14:22 schrieb Agnieszka Patejuk:
Hi,
There seems to be a problem with these sentences – we were to re-do the sentences from 2014 meeting in California and sentences sent by Jani differ from those.
If you intend to take part in structure comparison, please wait for the next message with sentences – the list might change.
Sorry for the confusion.
Best, Agnieszka
On 19 January 2015 at 15:11, Sebastian Sulger <sebastian.sulger@uni-konstanz.de> wrote:
Hi,
At the upcoming ParGram Meeting in Warsaw, we would like to experiment with a new way to do our usual structure comparison: We would like to use INESS (http://iness.uib.no/) and ParGramBank directly.
There are a couple of advantages to this. One obvious advantage is that ParGramBank will grow. But we can also easily switch between languages/sentences, and there is no need to create a humongous PDF file with structures in it (last year, our structure handout had 234 pages).
Paul Meurer (www.uib.no/persons/Paul.Meurer) has kindly implemented a couple of new features in INESS. I attach an email from Paul below. Please follow his instructions when adding to ParGramBank.
To be able to upload structures to INESS (in Prolog format), you need to have an account there. Please follow the instructions on the INESS homepage to create your account. Once the account is created, Paul can give you the necessary user rights to upload files.
Lastly, I attach the sentences in a text file to this email. As Paul says in his email, please use a consistent naming scheme for the structures; you should use something like urd-051-fs (i.e., ISO 639-3 language code - sentence number - fs). Start the sentence number at 51 (there are already 50 sentences in ParGramBank).
If you have more questions, please don't hesitate to send email to Paul, or myself. Also, Agnieszka/Paul and others, if I forgot anything, feel free to jump in.
Best, Jani
-------- Weitergeleitete Nachricht -------- Betreff: Re: ParGram sentences Datum: Thu, 15 Jan 2015 11:48:57 +0100 Von: Paul Meurer <paul.meurer@uni.no> An: Agnieszka Patejuk <agnieszka.patejuk@googlemail.com> Kopie (CC): Sebastian Sulger <sebastian.sulger@uni-konstanz.de>, Adam Przepiórkowski <adamp@ipipan.waw.pl>, Miriam Butt <Miriam.Butt@uni-konstanz.de>, Victoria Rosén <victoria@uib.no>
Hi,
BTW: I went through previous correspondence related to the meeting and have gathered the following points to cover:
PARGRAM MEETING (3 days) • structure comparison: – how: traditional or using INESS?
I think now everything needed to make INESS suitable for structure comparison is in place. Here is what is new:
* I have implemented uploading of prolog files (one by one, or as a gzipped archive) * Once the sentences (not structures) are aligned, it is easy to switch between treebanks/languages. It is enough for the sentences to be aligned to a pivot language (e.g., Urdu). You can test this in the ParGram treebanks.
What people should do:
* Parse their sentences in XLE * Use a consistent naming scheme for the sentences (e.g., deu-050-fs), where the number corresponds to the running sentence number. We should not start with 1, but continue where we stopped last time. I think new sentences should start at 50. Alternative translations could be called deu-050a-fs etc. * Upload the sentences using the _upload files_ link on the Treebank overview page (either one-by-one, or as a gzipped archive, no other archiving program will work). * Disambiguate the sentences in INESS. * Add glosses, as described in the documentation. * Align the sentences, with English as a minimum, using the Alignment tool. This, Jani or I could do for them.
Alternatively, for languages whose grammar is in INESS, the sentences could be parsed in INESS directly.
Does this sound feasible?
I think the sentences should be ready quite soon, for people to be able to do all this before Pargram.
— Best wishes, Paul
-- Sebastian Sulger FB Sprachwissenschaft Universität Konstanz http://ling.uni-konstanz.de/pages/home/sulger
-- Sebastian Sulger FB Sprachwissenschaft Universität Konstanz http://ling.uni-konstanz.de/pages/home/sulger
Hi, wouldn't the nominative be the natural choice? - Paul
Am 25.01.2015 um 16:06 schrieb Agnieszka Patejuk <agnieszka.patejuk@googlemail.com>:
Hi,
I have a question concerning examples with isolated NPs (see below) – which case should be assumed for their translation? In Polish there are 7 cases – when I leave out locative (which is assigned only by prepositions, which are not present in the examples) there are 6 left: nominative, accusative, genitive, dative, instrumental and vocative. I have to choose some of these to do the translation – which one did you choose when doing structure comparison in 2014?
35. NP: the dog's bone 36. NP: the farmer's cow
37. NP: the cat's dark fur 38. NP: the bright red color of the car
39. NP: the tree's branch 40. NP: the branch of the tree
41. NP: the farmer's problem 42. NP: the problem of the farmer
43. NP: the farmer's two brothers 44. NP: the two brothers of the farmer
45. NP: the farmer's cows' milk 46. NP: the milk of the cows of the farmer
Best, Agnieszka
On 20 January 2015 at 17:32, Sebastian Sulger <sebastian.sulger@uni-konstanz.de> wrote:
Hi,
This is my mistake - I sent a wrong version of the sentence file. The correct sentences are attached here. Please use these sentences in case you'd like to contribute structures.
Also, I should emphasize (thanks for hinting at this, Agnieszka) that even if you cannot attend the ParGram meeting, you are invited to take part in structure comparison - we will take a look at your structures at the meeting to ensure parallelism. Just follow the procedure described by Paul.
Thanks, Jani
Am 20.01.15 um 14:22 schrieb Agnieszka Patejuk:
Hi,
There seems to be a problem with these sentences – we were to re-do the sentences from 2014 meeting in California and sentences sent by Jani differ from those.
If you intend to take part in structure comparison, please wait for the next message with sentences – the list might change.
Sorry for the confusion.
Best, Agnieszka
On 19 January 2015 at 15:11, Sebastian Sulger <sebastian.sulger@uni-konstanz.de> wrote:
Hi,
At the upcoming ParGram Meeting in Warsaw, we would like to experiment with a new way to do our usual structure comparison: We would like to use INESS (http://iness.uib.no/) and ParGramBank directly.
There are a couple of advantages to this. One obvious advantage is that ParGramBank will grow. But we can also easily switch between languages/sentences, and there is no need to create a humongous PDF file with structures in it (last year, our structure handout had 234 pages).
Paul Meurer (www.uib.no/persons/Paul.Meurer) has kindly implemented a couple of new features in INESS. I attach an email from Paul below. Please follow his instructions when adding to ParGramBank.
To be able to upload structures to INESS (in Prolog format), you need to have an account there. Please follow the instructions on the INESS homepage to create your account. Once the account is created, Paul can give you the necessary user rights to upload files.
Lastly, I attach the sentences in a text file to this email. As Paul says in his email, please use a consistent naming scheme for the structures; you should use something like urd-051-fs (i.e., ISO 639-3 language code - sentence number - fs). Start the sentence number at 51 (there are already 50 sentences in ParGramBank).
If you have more questions, please don't hesitate to send email to Paul, or myself. Also, Agnieszka/Paul and others, if I forgot anything, feel free to jump in.
Best, Jani
-------- Weitergeleitete Nachricht -------- Betreff: Re: ParGram sentences Datum: Thu, 15 Jan 2015 11:48:57 +0100 Von: Paul Meurer <paul.meurer@uni.no> An: Agnieszka Patejuk <agnieszka.patejuk@googlemail.com> Kopie (CC): Sebastian Sulger <sebastian.sulger@uni-konstanz.de>, Adam Przepiórkowski <adamp@ipipan.waw.pl>, Miriam Butt <Miriam.Butt@uni-konstanz.de>, Victoria Rosén <victoria@uib.no>
Hi,
BTW: I went through previous correspondence related to the meeting and have gathered the following points to cover:
PARGRAM MEETING (3 days) • structure comparison: – how: traditional or using INESS?
I think now everything needed to make INESS suitable for structure comparison is in place. Here is what is new:
* I have implemented uploading of prolog files (one by one, or as a gzipped archive) * Once the sentences (not structures) are aligned, it is easy to switch between treebanks/languages. It is enough for the sentences to be aligned to a pivot language (e.g., Urdu). You can test this in the ParGram treebanks.
What people should do:
* Parse their sentences in XLE * Use a consistent naming scheme for the sentences (e.g., deu-050-fs), where the number corresponds to the running sentence number. We should not start with 1, but continue where we stopped last time. I think new sentences should start at 50. Alternative translations could be called deu-050a-fs etc. * Upload the sentences using the _upload files_ link on the Treebank overview page (either one-by-one, or as a gzipped archive, no other archiving program will work). * Disambiguate the sentences in INESS. * Add glosses, as described in the documentation. * Align the sentences, with English as a minimum, using the Alignment tool. This, Jani or I could do for them.
Alternatively, for languages whose grammar is in INESS, the sentences could be parsed in INESS directly.
Does this sound feasible?
I think the sentences should be ready quite soon, for people to be able to do all this before Pargram.
— Best wishes, Paul
-- Sebastian Sulger FB Sprachwissenschaft Universität Konstanz http://ling.uni-konstanz.de/pages/home/sulger
-- Sebastian Sulger FB Sprachwissenschaft Universität Konstanz http://ling.uni-konstanz.de/pages/home/sulger
-- Paul
Thanks – nominative is fine with me. However, I think that in isolation it is as natural as anything else (say accusative). This is why this should be specified clearly next time – since in some languages you can have the same form for different case values (nominative and accusative, for instance), you have to disambiguate this. Best, Agnieszka On 25 January 2015 at 16:12, Paul Meurer <paul.meurer@uni.no> wrote:
Hi,
wouldn't the nominative be the natural choice?
- Paul
Am 25.01.2015 um 16:06 schrieb Agnieszka Patejuk <agnieszka.patejuk@googlemail.com>:
Hi,
I have a question concerning examples with isolated NPs (see below) – which case should be assumed for their translation? In Polish there are 7 cases – when I leave out locative (which is assigned only by prepositions, which are not present in the examples) there are 6 left: nominative, accusative, genitive, dative, instrumental and vocative. I have to choose some of these to do the translation – which one did you choose when doing structure comparison in 2014?
35. NP: the dog's bone 36. NP: the farmer's cow
37. NP: the cat's dark fur 38. NP: the bright red color of the car
39. NP: the tree's branch 40. NP: the branch of the tree
41. NP: the farmer's problem 42. NP: the problem of the farmer
43. NP: the farmer's two brothers 44. NP: the two brothers of the farmer
45. NP: the farmer's cows' milk 46. NP: the milk of the cows of the farmer
Best, Agnieszka
On 20 January 2015 at 17:32, Sebastian Sulger <sebastian.sulger@uni-konstanz.de> wrote:
Hi,
This is my mistake - I sent a wrong version of the sentence file. The correct sentences are attached here. Please use these sentences in case you'd like to contribute structures.
Also, I should emphasize (thanks for hinting at this, Agnieszka) that even if you cannot attend the ParGram meeting, you are invited to take part in structure comparison - we will take a look at your structures at the meeting to ensure parallelism. Just follow the procedure described by Paul.
Thanks, Jani
Am 20.01.15 um 14:22 schrieb Agnieszka Patejuk:
Hi,
There seems to be a problem with these sentences – we were to re-do the sentences from 2014 meeting in California and sentences sent by Jani differ from those.
If you intend to take part in structure comparison, please wait for the next message with sentences – the list might change.
Sorry for the confusion.
Best, Agnieszka
On 19 January 2015 at 15:11, Sebastian Sulger <sebastian.sulger@uni-konstanz.de> wrote:
Hi,
At the upcoming ParGram Meeting in Warsaw, we would like to experiment with a new way to do our usual structure comparison: We would like to use INESS (http://iness.uib.no/) and ParGramBank directly.
There are a couple of advantages to this. One obvious advantage is that ParGramBank will grow. But we can also easily switch between languages/sentences, and there is no need to create a humongous PDF file with structures in it (last year, our structure handout had 234 pages).
Paul Meurer (www.uib.no/persons/Paul.Meurer) has kindly implemented a couple of new features in INESS. I attach an email from Paul below. Please follow his instructions when adding to ParGramBank.
To be able to upload structures to INESS (in Prolog format), you need to have an account there. Please follow the instructions on the INESS homepage to create your account. Once the account is created, Paul can give you the necessary user rights to upload files.
Lastly, I attach the sentences in a text file to this email. As Paul says in his email, please use a consistent naming scheme for the structures; you should use something like urd-051-fs (i.e., ISO 639-3 language code - sentence number - fs). Start the sentence number at 51 (there are already 50 sentences in ParGramBank).
If you have more questions, please don't hesitate to send email to Paul, or myself. Also, Agnieszka/Paul and others, if I forgot anything, feel free to jump in.
Best, Jani
-------- Weitergeleitete Nachricht -------- Betreff: Re: ParGram sentences Datum: Thu, 15 Jan 2015 11:48:57 +0100 Von: Paul Meurer <paul.meurer@uni.no> An: Agnieszka Patejuk <agnieszka.patejuk@googlemail.com> Kopie (CC): Sebastian Sulger <sebastian.sulger@uni-konstanz.de>, Adam Przepiórkowski <adamp@ipipan.waw.pl>, Miriam Butt <Miriam.Butt@uni-konstanz.de>, Victoria Rosén <victoria@uib.no>
Hi,
BTW: I went through previous correspondence related to the meeting and have gathered the following points to cover:
PARGRAM MEETING (3 days) • structure comparison: – how: traditional or using INESS?
I think now everything needed to make INESS suitable for structure comparison is in place. Here is what is new:
* I have implemented uploading of prolog files (one by one, or as a gzipped archive) * Once the sentences (not structures) are aligned, it is easy to switch between treebanks/languages. It is enough for the sentences to be aligned to a pivot language (e.g., Urdu). You can test this in the ParGram treebanks.
What people should do:
* Parse their sentences in XLE * Use a consistent naming scheme for the sentences (e.g., deu-050-fs), where the number corresponds to the running sentence number. We should not start with 1, but continue where we stopped last time. I think new sentences should start at 50. Alternative translations could be called deu-050a-fs etc. * Upload the sentences using the _upload files_ link on the Treebank overview page (either one-by-one, or as a gzipped archive, no other archiving program will work). * Disambiguate the sentences in INESS. * Add glosses, as described in the documentation. * Align the sentences, with English as a minimum, using the Alignment tool. This, Jani or I could do for them.
Alternatively, for languages whose grammar is in INESS, the sentences could be parsed in INESS directly.
Does this sound feasible?
I think the sentences should be ready quite soon, for people to be able to do all this before Pargram.
— Best wishes, Paul
-- Sebastian Sulger FB Sprachwissenschaft Universität Konstanz http://ling.uni-konstanz.de/pages/home/sulger
-- Sebastian Sulger FB Sprachwissenschaft Universität Konstanz http://ling.uni-konstanz.de/pages/home/sulger
-- Paul
Rather than disambiguating a syncretic form, wouldn't it be better to use one of the representations for indeterminate feature values (set values or their characteristic functions)? --Ron
On Jan 25, 2015, at 7:21 AM, Agnieszka Patejuk <agnieszka.patejuk@googlemail.com> wrote:
Thanks – nominative is fine with me.
However, I think that in isolation it is as natural as anything else (say accusative). This is why this should be specified clearly next time – since in some languages you can have the same form for different case values (nominative and accusative, for instance), you have to disambiguate this.
Best, Agnieszka
On 25 January 2015 at 16:12, Paul Meurer <paul.meurer@uni.no> wrote:
Hi,
wouldn't the nominative be the natural choice?
- Paul
Am 25.01.2015 um 16:06 schrieb Agnieszka Patejuk <agnieszka.patejuk@googlemail.com>:
Hi,
I have a question concerning examples with isolated NPs (see below) – which case should be assumed for their translation? In Polish there are 7 cases – when I leave out locative (which is assigned only by prepositions, which are not present in the examples) there are 6 left: nominative, accusative, genitive, dative, instrumental and vocative. I have to choose some of these to do the translation – which one did you choose when doing structure comparison in 2014?
35. NP: the dog's bone 36. NP: the farmer's cow
37. NP: the cat's dark fur 38. NP: the bright red color of the car
39. NP: the tree's branch 40. NP: the branch of the tree
41. NP: the farmer's problem 42. NP: the problem of the farmer
43. NP: the farmer's two brothers 44. NP: the two brothers of the farmer
45. NP: the farmer's cows' milk 46. NP: the milk of the cows of the farmer
Best, Agnieszka
On 20 January 2015 at 17:32, Sebastian Sulger <sebastian.sulger@uni-konstanz.de> wrote:
Hi,
This is my mistake - I sent a wrong version of the sentence file. The correct sentences are attached here. Please use these sentences in case you'd like to contribute structures.
Also, I should emphasize (thanks for hinting at this, Agnieszka) that even if you cannot attend the ParGram meeting, you are invited to take part in structure comparison - we will take a look at your structures at the meeting to ensure parallelism. Just follow the procedure described by Paul.
Thanks, Jani
Am 20.01.15 um 14:22 schrieb Agnieszka Patejuk:
Hi,
There seems to be a problem with these sentences – we were to re-do the sentences from 2014 meeting in California and sentences sent by Jani differ from those.
If you intend to take part in structure comparison, please wait for the next message with sentences – the list might change.
Sorry for the confusion.
Best, Agnieszka
On 19 January 2015 at 15:11, Sebastian Sulger <sebastian.sulger@uni-konstanz.de> wrote:
Hi,
At the upcoming ParGram Meeting in Warsaw, we would like to experiment with a new way to do our usual structure comparison: We would like to use INESS (http://iness.uib.no/) and ParGramBank directly.
There are a couple of advantages to this. One obvious advantage is that ParGramBank will grow. But we can also easily switch between languages/sentences, and there is no need to create a humongous PDF file with structures in it (last year, our structure handout had 234 pages).
Paul Meurer (www.uib.no/persons/Paul.Meurer) has kindly implemented a couple of new features in INESS. I attach an email from Paul below. Please follow his instructions when adding to ParGramBank.
To be able to upload structures to INESS (in Prolog format), you need to have an account there. Please follow the instructions on the INESS homepage to create your account. Once the account is created, Paul can give you the necessary user rights to upload files.
Lastly, I attach the sentences in a text file to this email. As Paul says in his email, please use a consistent naming scheme for the structures; you should use something like urd-051-fs (i.e., ISO 639-3 language code - sentence number - fs). Start the sentence number at 51 (there are already 50 sentences in ParGramBank).
If you have more questions, please don't hesitate to send email to Paul, or myself. Also, Agnieszka/Paul and others, if I forgot anything, feel free to jump in.
Best, Jani
-------- Weitergeleitete Nachricht -------- Betreff: Re: ParGram sentences Datum: Thu, 15 Jan 2015 11:48:57 +0100 Von: Paul Meurer <paul.meurer@uni.no> An: Agnieszka Patejuk <agnieszka.patejuk@googlemail.com> Kopie (CC): Sebastian Sulger <sebastian.sulger@uni-konstanz.de>, Adam Przepiórkowski <adamp@ipipan.waw.pl>, Miriam Butt <Miriam.Butt@uni-konstanz.de>, Victoria Rosén <victoria@uib.no>
Hi,
BTW: I went through previous correspondence related to the meeting and have gathered the following points to cover:
PARGRAM MEETING (3 days) • structure comparison: – how: traditional or using INESS?
I think now everything needed to make INESS suitable for structure comparison is in place. Here is what is new:
* I have implemented uploading of prolog files (one by one, or as a gzipped archive) * Once the sentences (not structures) are aligned, it is easy to switch between treebanks/languages. It is enough for the sentences to be aligned to a pivot language (e.g., Urdu). You can test this in the ParGram treebanks.
What people should do:
* Parse their sentences in XLE * Use a consistent naming scheme for the sentences (e.g., deu-050-fs), where the number corresponds to the running sentence number. We should not start with 1, but continue where we stopped last time. I think new sentences should start at 50. Alternative translations could be called deu-050a-fs etc. * Upload the sentences using the _upload files_ link on the Treebank overview page (either one-by-one, or as a gzipped archive, no other archiving program will work). * Disambiguate the sentences in INESS. * Add glosses, as described in the documentation. * Align the sentences, with English as a minimum, using the Alignment tool. This, Jani or I could do for them.
Alternatively, for languages whose grammar is in INESS, the sentences could be parsed in INESS directly.
Does this sound feasible?
I think the sentences should be ready quite soon, for people to be able to do all this before Pargram.
— Best wishes, Paul
-- Sebastian Sulger FB Sprachwissenschaft Universität Konstanz http://ling.uni-konstanz.de/pages/home/sulger
-- Sebastian Sulger FB Sprachwissenschaft Universität Konstanz http://ling.uni-konstanz.de/pages/home/sulger
-- Paul
From the linguistic point of view it is certainly better as you can account for more phenomena, but I don't know how this would affect grammar performance with larger structures.
In Polish LFG we have implemented the feature indeterminacy solution for CASE described in "Indeterminacy by underspecification" by Dalrymple et al. 2009 in JoL – a complex CASE attribute, with subattributes for particular values of case. However, we have not tried to used this solution for parsing large amounts of text – it was only tested using a testsuite, so we don't know how it would work in a large scale grammar with real sentences. Maybe it is a good time to think about testing this. Perhaps someone has already tried this? If you have some experience with using different structures for representing case, I would be very happy to hear about this. Best, Agnieszka On 26 January 2015 at 19:40, Kaplan, Ronald <Ronald.Kaplan@nuance.com> wrote:
Rather than disambiguating a syncretic form, wouldn't it be better to use one of the representations for indeterminate feature values (set values or their characteristic functions)?
--Ron
On Jan 25, 2015, at 7:21 AM, Agnieszka Patejuk <agnieszka.patejuk@googlemail.com> wrote:
Thanks – nominative is fine with me.
However, I think that in isolation it is as natural as anything else (say accusative). This is why this should be specified clearly next time – since in some languages you can have the same form for different case values (nominative and accusative, for instance), you have to disambiguate this.
Best, Agnieszka
On 25 January 2015 at 16:12, Paul Meurer <paul.meurer@uni.no> wrote:
Hi,
wouldn't the nominative be the natural choice?
- Paul
Am 25.01.2015 um 16:06 schrieb Agnieszka Patejuk <agnieszka.patejuk@googlemail.com>:
Hi,
I have a question concerning examples with isolated NPs (see below) – which case should be assumed for their translation? In Polish there are 7 cases – when I leave out locative (which is assigned only by prepositions, which are not present in the examples) there are 6 left: nominative, accusative, genitive, dative, instrumental and vocative. I have to choose some of these to do the translation – which one did you choose when doing structure comparison in 2014?
35. NP: the dog's bone 36. NP: the farmer's cow
37. NP: the cat's dark fur 38. NP: the bright red color of the car
39. NP: the tree's branch 40. NP: the branch of the tree
41. NP: the farmer's problem 42. NP: the problem of the farmer
43. NP: the farmer's two brothers 44. NP: the two brothers of the farmer
45. NP: the farmer's cows' milk 46. NP: the milk of the cows of the farmer
Best, Agnieszka
On 20 January 2015 at 17:32, Sebastian Sulger <sebastian.sulger@uni-konstanz.de> wrote:
Hi,
This is my mistake - I sent a wrong version of the sentence file. The correct sentences are attached here. Please use these sentences in case you'd like to contribute structures.
Also, I should emphasize (thanks for hinting at this, Agnieszka) that even if you cannot attend the ParGram meeting, you are invited to take part in structure comparison - we will take a look at your structures at the meeting to ensure parallelism. Just follow the procedure described by Paul.
Thanks, Jani
Am 20.01.15 um 14:22 schrieb Agnieszka Patejuk:
Hi,
There seems to be a problem with these sentences – we were to re-do the sentences from 2014 meeting in California and sentences sent by Jani differ from those.
If you intend to take part in structure comparison, please wait for the next message with sentences – the list might change.
Sorry for the confusion.
Best, Agnieszka
On 19 January 2015 at 15:11, Sebastian Sulger <sebastian.sulger@uni-konstanz.de> wrote:
Hi,
At the upcoming ParGram Meeting in Warsaw, we would like to experiment with a new way to do our usual structure comparison: We would like to use INESS (http://iness.uib.no/) and ParGramBank directly.
There are a couple of advantages to this. One obvious advantage is that ParGramBank will grow. But we can also easily switch between languages/sentences, and there is no need to create a humongous PDF file with structures in it (last year, our structure handout had 234 pages).
Paul Meurer (www.uib.no/persons/Paul.Meurer) has kindly implemented a couple of new features in INESS. I attach an email from Paul below. Please follow his instructions when adding to ParGramBank.
To be able to upload structures to INESS (in Prolog format), you need to have an account there. Please follow the instructions on the INESS homepage to create your account. Once the account is created, Paul can give you the necessary user rights to upload files.
Lastly, I attach the sentences in a text file to this email. As Paul says in his email, please use a consistent naming scheme for the structures; you should use something like urd-051-fs (i.e., ISO 639-3 language code - sentence number - fs). Start the sentence number at 51 (there are already 50 sentences in ParGramBank).
If you have more questions, please don't hesitate to send email to Paul, or myself. Also, Agnieszka/Paul and others, if I forgot anything, feel free to jump in.
Best, Jani
-------- Weitergeleitete Nachricht -------- Betreff: Re: ParGram sentences Datum: Thu, 15 Jan 2015 11:48:57 +0100 Von: Paul Meurer <paul.meurer@uni.no> An: Agnieszka Patejuk <agnieszka.patejuk@googlemail.com> Kopie (CC): Sebastian Sulger <sebastian.sulger@uni-konstanz.de>, Adam Przepiórkowski <adamp@ipipan.waw.pl>, Miriam Butt <Miriam.Butt@uni-konstanz.de>, Victoria Rosén <victoria@uib.no>
Hi,
BTW: I went through previous correspondence related to the meeting and have gathered the following points to cover:
PARGRAM MEETING (3 days) • structure comparison: – how: traditional or using INESS?
I think now everything needed to make INESS suitable for structure comparison is in place. Here is what is new:
* I have implemented uploading of prolog files (one by one, or as a gzipped archive) * Once the sentences (not structures) are aligned, it is easy to switch between treebanks/languages. It is enough for the sentences to be aligned to a pivot language (e.g., Urdu). You can test this in the ParGram treebanks.
What people should do:
* Parse their sentences in XLE * Use a consistent naming scheme for the sentences (e.g., deu-050-fs), where the number corresponds to the running sentence number. We should not start with 1, but continue where we stopped last time. I think new sentences should start at 50. Alternative translations could be called deu-050a-fs etc. * Upload the sentences using the _upload files_ link on the Treebank overview page (either one-by-one, or as a gzipped archive, no other archiving program will work). * Disambiguate the sentences in INESS. * Add glosses, as described in the documentation. * Align the sentences, with English as a minimum, using the Alignment tool. This, Jani or I could do for them.
Alternatively, for languages whose grammar is in INESS, the sentences could be parsed in INESS directly.
Does this sound feasible?
I think the sentences should be ready quite soon, for people to be able to do all this before Pargram.
— Best wishes, Paul
-- Sebastian Sulger FB Sprachwissenschaft Universität Konstanz http://ling.uni-konstanz.de/pages/home/sulger
-- Sebastian Sulger FB Sprachwissenschaft Universität Konstanz http://ling.uni-konstanz.de/pages/home/sulger
-- Paul
participants (4)
-
Agnieszka Patejuk -
Kaplan, Ronald -
Paul Meurer -
Sebastian Sulger