No, sorry, this is not a toy.

We can talk about the wider implications of LLMs for information-rich industries if you want, not sure this is the right forum. I was hoping for some discussion on how linguists could use this powerful tool in their work.

On May 1, 2023, at 3:38 PM, Ron Kaplan <ron.kaplan@post.harvard.edu> wrote:
I’m reminded of a comment that John Maxwell once made, about some piece of software:
It’s like a talking dog. The fact that it talks at all is incredibly impressive—what it says, not so much.,
On May 1, 2023, at 9:47 AM, Cavar, Damir <dcavar@indiana.edu> wrote:
Hi there,

yes, powerful tools or toys, certainly, but not The Solution that replaces powerful engineers and renders smart linguists useless (or forces them into the hamburger wrapping or pizza delivery business).

My personal slogans are now more and more: Every time I fire a LLM expert, my systems run better. And, all statistical and neural parsers leak.

This after engineering them and running detailed evaluations and working with them for decades. XLE and the English grammar are still winning after 20 years of dependency corpora and training the smartest probabilistic/neural algorithms, or after piping almost the entire internet through an LLM like GPT 4.

(You all know that I love hyperbole, right?)

Now, it is funny:

I just tried again, saying "Give me the phrase structure tree for 'I stole a book and Peter kisses from Mary'" and it returned me:

ChatGPT
I'm sorry, but the sentence "I stole a book and Peter kisses from Mary" is not a grammatically correct sentence. It appears to have some syntax errors, which make it difficult to construct a meaningful phrase structure tree. Could you please provide a revised sentence or clarify the intended meaning?

Now it also has introspection and can provide grammaticality judgements, it seems. Very interesting. So, I ask:

Me again:
On a scale of 1 to 10, how ungrammatical is the sentence?

ChatGPT:
The sentence "I stole a book and Peter kisses from Mary" is highly ungrammatical, as it contains multiple errors in syntax and structure. It seems to mix up different verb tenses and noun phrases, making it difficult to determine the intended meaning. I would rate its ungrammaticality as a 9 or 10 on a scale of 1 to 10, with 10 being the most ungrammatical.

Then I asked again, adding a comma:
Draw me a phrase structure tree for "I stole a book, and Peter kisses from Mary"

ChatGPT:
                S
          _____|_____
         |           |
        NP          VP
         |      ____|___
         |     |        |
    Pronoun  V|P      ConjP
       |      |   _____|_____
       I   stole |  |         |
            |    |  NP       VP
           NP   |  |        _|_
         ___|___  |       |   |
        |      |  |       |  VP
        Det   Noun |    Verb Pronoun
         |     |   |       |    |
         a    book and    Peter kisses
In this tree, S represents the sentence as a whole, which is divided into two main branches: NP (noun phrase) and VP (verb phrase). The NP consists of a pronoun "I" and an NP headed by a determiner "a" and a noun "book". The VP is divided into two conjuncts by the coordinating conjunction "and". The first conjunct consists of a verb "stole" and a preposition "from" followed by an NP headed by the proper noun "Mary". The second conjunct consists of a proper noun "Peter" and a verb "kisses", forming a separate VP.

OK, so just one comma made all the difference in rating, now the sentence is good and the resulting analysis is quite interesting. But, let us try some simpler construction:

Me:
Draw me a phrase structure tree for "Peter likes to think about algorithms and Mary about cats"

ChatGPT:
                       S
          ____________|____________
         |                         |
     NP(Peter)                    S(Mary)
         |                __________|___________
      VP(likes)          |                       |
         |              VP(to think)         PP(about)
    S(to) and            |         ____________|________
     NP(algorithms)     NP(Mary)           NP(cats)
                      (about)
In this tree, S represents a sentence, NP represents a noun phrase, VP 
represents a verb phrase, and PP represents a prepositional phrase. The 
conjunction "and" is represented by S(to) and, where to is the 
conjunction, and S(Mary) represents the clause for Mary's action.
-------------------

Yes, yes, very good... (for me) :-) I won't even go into more complex constructions... And, well, XLE-Web with English does not get it, either, but it gets much more much better, with or without comma.

Some people argue that we should adjust our language so that AIs, LLMs, and some tools can process it. Others claim that ellipsis or gapping are weird constructions that only some crazy linguists can come up with. A colleague from a big US bank asked me why we put in these weird sentences into ChatGPT, why don't we just speak in plain English to it... :-)

Do we really want to study "prompts for different versions of GPT" and engineer around them, or do we want to engineer professional systems around our insights that we can falsify and test? I'll continue working on my LFG parser, quantum NLP, CGs, and real working tech, until ChatGPT figures that out. :-) I'd use LLMs for innocent problems where it cannot harm or create issues with generating problematic nonsense. It is a useful toy, not more than that (except the carbon emission...).

CU all!

DC

PS: I don't know how about you, but we got students submitting essays on different topics that start with:

"As an artificial intelligence language model, I do not have opinions on (...)"

Some folks just copy and paste, they do not even read what ChatGPT gives them. At least that is easy to grade...

On Mon, 2023-05-01 at 11:53 -0400, Vijay Saraswat wrote:
Hi Damir — Everything depends on how the problem is formulated. And which model you are using — the right prompts for GPT-4 are not necessarily right for GPT 3.5 or GPT 3. Anyway will give your examples a shot later today / tomorrow.

An important point to note is (and I said this in my message): One has the possibility of further training these models on labeled data, e.g. human corrected LFG parses etc. I believe one of the LFG teams trained a statistical parser in this way 10-15 years ago. But with Language Model (LMs) technology now having progressed substantially we should be able to use LMs to develop really powerful tools for computational linguists.
--
Damir Cavar
Associate Professor
Indiana University at Bloomington
NLP-Lab