Evidence Rebuts Chomsky's Theory of Language Learning

10 Sep 2016

Response to: Evidence Rebuts Chomsky’s Theory of Language Learning

The authors: Michael Tomasello Paul Ibbotson

Now, the basic argument of this article is that Chomsky’s theory of language learning and his ideas about “universal grammar” are wrong, and have been replaced by other approaches.

First, it’s worth noting that “Chomsky’s theory of language learning” is not a single thing, as he is a prolific author and the linguistic models he has expounded have changed drastically over time. But let’s assume that the authors are talking about the “standard model”, originally expounded in Aspects of the Theory of Syntax (Chomsky 1965).

Now, let’s start dissecting some of the sloppy criticism. Let’s start with this gem of a mischaracterization:

First, [Chomsky] posited that the languages people use to communicate in everyday life behaved like mathematically based languages of the newly emerging field of computer science.

Somebody either has not read their Aspects, or has no idea how computer languages work, or both. Here’s a slide showing the basic model articulated in Aspects.

Programming languages obviously don’t have a phonological component, nor do they have a “deep structure,” which was a core element of the Aspects model. More importantly, programming languages must have an unambiguous parse tree and an unambiguous semantic interpretation for a given statement, they are usually designed so they can be efficiently parsed (for example using an LALR or PEG parser), and they are almost always context-free grammars, which are much more restrictive than human languages. See: Chomsky hierarchy

Chomsky did introduce a rigorous formalism for describing human languages which is similar to the formalisms used to describe computer languages. Generative grammars can be expressed in a form similar to BNF, and computer languages are one example of making infinite use of finite means.

But today’s computer languages don’t have nearly the complexity or flexibility of human languages. The computer language equivalent of phonology (restrictions on surface structure) might be “readability”, which only requires the content of the program to look nice in a terminal window. Computers have much simpler and more literal semantic models than human languages.

Next, the authors suggest that not all human languages have recursion:

when linguists actually went looking at the variation in languages across the world, they found counterexamples to the claim that this type of recursion was an essential property of language. Some languages—the Amazonian Pirahã, for instance—seem to get by without Chomskyan recursion.

This example is trotted out again and again but many of the Everett’s conclusions seem to be unsupported. Here’s one article which addresses some of the specific arguments of Everett with respect to the Pirahã. Regardless, recursion does not seem to be a particular linchpin for a theory of universal grammar.

Then we have a silly argument predicated on conflating grammar rules with how the brain works:

Experimental studies confirm that children produce correct question sentences most often with particular wh-words and auxiliary verbs (often those with which they have most experience, such as ‘What does …’), while continuing to make errors with question sentences containing other (often less frequent) combinations of wh-words and auxiliary verbs: ‘Why he can’t come?’ The main response of universal grammarians to such findings is that children have the competence with grammar but that other factors can impede their performance and thus both hide the true nature of their grammar and get in the way of studying the ‘pure’ grammar posited by Chomsky’s linguistics.

Here I’ll let the master speak for himself:

To avoid what has been a continuing misunderstanding, it is perhaps worth while to reiterate that a generative grammar is not a model for a speaker or a hearer. It attempts to characterize in the most neutral possible terms the knowledge of the language that provides the basis for actual use of language by a speaker-hearer. […] When we say that a sentence has a certain derivation with respect to a particular generative grammar, we say nothing about how the speaker or hearer might proceed, in some practical or efficient way, to construct such a derivation. […] this generative grammar does not, in itself, prescribe the character or functioning of a perceptual model or a model of speech production. (Aspects p.9)

In the same way the grammar for a computer language does not prescribe an execution model for the program, a generative grammar does not prescribe a mechanism for creating or interpretting sentences.

The other important distinction here is between competence and performance. Competence is a description of knowledge of language under ideal circumstances, whereas performance includes all the details and distractions of the real world. Here’s how Chomsky lays it out:

Linguistic theory is concerned primarily with an ideal speaker-listener, in a completely homogeneous speech-community, who knows its language perfectly and is unaffected by such grammatically irrelevant conditions as memory limitations, distractions, shifts of attention and interest, and errors (random or characteristic) in applying his knowledge of the language in actual performance. […] To study performance, we must consider the interaction of a variety of factors, of which the underlying competence of the speaker-hearer is only one. […] Only under [this idealization] is performance a direct reflection of competence. (Aspects p.3-4)

The authors, failing to understand this distinction, portray this as a failure of scientific rigor:

As with the retreat from the cross-linguistic data and the tool-kit argument, the idea of performance masking competence is also pretty much unfalsifiable. Retreats to this type of claim are common in declining scientific paradigms that lack a strong empirical base—consider, for instance, Freudian psychology and Marxist interpretations of history.

Except, on pages 10-11 of Aspects, Chomsky lays out an entire model for investigating performance.

let us define the term “acceptable” to refer to utterances that are perfectly natural and immediately comprehensible without paper-and-pencil analysis, and in no way bizarre or outlandish. Obviously, acceptability will be a matter of degree, along various dimensions. One could go on to propose various operational tests to specify the notion more precisely (for example, rapidity, correctness, and uniformity of recall and recognition, normalcy of intonation).

Then he gives some examples of sentences which are grammatical, but would not be “acceptable” to a normal English speaker, such as:

(2) (ii) the man who the boy who the students recognized pointed out is a friend of mine

Obviously some English speakers (especially those who are still learning) will correctly apply rules sometimes, while failing to apply them correctly in other situations. Also, there are clearly some grammatical sentences which are difficult for us to construct or interpret. The example given above is an example of a “self-embedded structure” and seems to be difficult to understand due to limitations of our short-term memory. None of this suggests that the generic rule for transforming statements into questions using wh-words is somehow invalid, and must be replaced by ad hoc rules for each element of the lexicon.

Next, we get to the most fundamental mischaracterization of Chomsky’s ideas, specifically the definition of “universal grammar”:

Even beyond these empirical challenges to universal grammar, psycholinguists who work with children have difficulty conceiving theoretically of a process in which children start with the same algebraic grammatical rules for all languages and then proceed to figure out how a particular language—whether English or Swahili—connects with that rule scheme. Linguists call this conundrum the linking problem, and a rare systematic attempt to solve it in the context of universal grammar was made by Harvard University psychologist Steven Pinker for sentence subjects. Pinker’s account, however, turned out not to agree with data from child development studies or to be applicable to grammatical categories other than subjects. And so the linking problem—which should be the central problem in applying universal grammar to language learning—has never been solved or even seriously confronted.

This argument is premised on a fundamental misunderstanding and mischaracterization of Chomsky’s theory of language. In particular, the authors construct a strawman “universal grammar” which bears very little relation to the “universal grammar” described in Aspects. Here is how Chomsky introduces universal grammar in Aspects:

Within traditional linguistic theory, furthermore, it was clearly understood that one of the qualities that all languages have in common is their “creative” aspect. […] The grammar of a particular language, then is to be supplemented by a universal grammar that accommodates the creative aspect of language use and expresses the deep-seated regularities which, being universal, are omitted from the grammar itself. […] It is only when supplemented by a universal grammar that the grammar of a language provides a full account of the speaker-hearer’s competence.

Basically, “universal grammar” is a description of high-level rules which apply to all human languages. These might include facts about categories (e.g. “verb” and “noun” are hardwired categories), or facts about transformations. Thus “universal grammar” is best understood as a subset of the information needed to define a human language, most obviously lacking a lexicon, but also lacking many of the necessary rules which vary from language to language (e.g. SVO or SOV sentence structure).

Instead, the authors consistently write as if a “universal grammar” was a grammar in the traditional sense, describing a particular written or spoken language.

The “linking problem” (as I understand it) is about connecting the words in an utterance to entities or relations in the world. It is not about mapping an idealized “universal grammar” onto a human language by associating elements of their respective lexicons and rulesets. It’s not even clear what it would mean to “connect rule schemes” between languages. It seems to me that this would require a theory of higher-order transformations (i.e. transformations which act on transformations), because the rules themselves are expressed as transformations.

The reference to Pinker is interesting here, because it’s the only reference in the entire article to a specific scientist. Like the rest of the article, there are no citations to buttress their assertion that Pinker’s work “turned out not to agree with data from child development studies or to be applicable to grammatical categories other than subjects.”

Note: If you’re interested, Words and Rules (Pinker 2000) is an excellent and approachable book which explores the relationship between linguistics and psychology in the best Chomskyan tradition.

Finally, we get to the authors’ pet project:

Such an alternative, called usage-based linguistics, has now arrived. The theory, which takes a number of forms, proposes that grammatical structure is not in­­nate. Instead grammar is the product of history (the processes that shape how languages are passed from one generation to the next) and human psychology (the set of social and cognitive capacities that allow generations to learn a language in the first place). More important, this theory proposes that language recruits brain systems that may not have evolved specifically for that purpose and so is a different idea to Chomsky’s single-gene mutation for recursion.”

Obviously, grammar is the product of history and human psychology. Chomsky’s “universal grammar” is a basically just a set of constraints imposed on historical language development by human psychology. It was formulated in rebellion against the older regime of linguistics led by Leonard Bloomfield, which was caricatured as believing that human languages can differ from each other “arbitrarily and without limit.” Chomsky’s famous poverty of stimulus argument is an attempt to understand how humans create complex language models, when all we have to learn from is a relatively small random sample of all possible sentences. There is nothing revolutionary about “usage-based linguistics”, it’s a straightforward example of a “theory of language use” described by Chomsky in Aspects.

As for the idea that there is a single-gene mutation “for recursion,” I think this is a mischaracterization of both Chomsky and genetic theory. Almost certainly a process as complex as recursion, which relies on many parts of the brain, is not controlled by a single gene.

What Chomsky has argued, however, is that the ability to learn a human grammar is a capability specifically evolved by humans, and therefore there are some brain functions which are specialized for this purpose. For some excellent further discussion of this hypothesis, I recommend another book by Pinker, The Language Instinct (1994). In it, he discusses a variety of surprising facts, including the fact that adult humans in a foreign linguistic environment will often speak in a pidgin, but the first generation of children in the same situation tend to construct a creole, spontaneously merging multiple lexicons and different grammar rules into a single consistent language. In contrast, although chimps and other primates can learn to communicate in simple sentences using sign language, they do not construct complex sentences using a recursive generative grammar like humans.