Chapter 3: Phrase Structure Grammar (I)

Ý

3.1. Introduction

Ý

In Chapter 1 we observed that speakers have the intuition that sentences are hierarchically organized into layers of sub-strings each of which may belong to one category or another. In Chapter 2, we saw that the existence of sub-strings within larger strings and the existence of categories are evidenced by a variety of syntactic and morphological phenomena. In the course of our discussion we availed ourselves of certain traditional labels of syntactic categories (N, V, A, P, and NP, VP, etc.). You were also introduced to some notations, among them the notation of a tree diagram, to express the notion of constituency, where a given sub-string behaves as a unit (= is a constituent) to the exclusion of other elements. Thus, much of what we know about the sentence Colorless green ideas sleep furiously can be insightfully expressed with a tree diagram like (1) (repeated from Chapter 1):

(1)

This tree represents a theoretical construct which, in a fairly satisfactory manner, explains the following cluster of facts we know about this sentence:

(2) (i) The sub-string colorless green ideas may be moved around as a unit, as in It is colorless green ideas that sleep furiously, or What sleeps furiously is colorless green ideas.
(ii) The same string may be replaced by a single pronoun, as in they sleep furiously.
(iii) The sub-string green ideas may not be pronominalized, but it may be conjoined: Colorless green ideas and gray thoughts sleep furiously.
(iv) The sub-string sleep furiously can be moved: They said that colorless green ideas would sleep furiously in the afternoon, and sleep furiously, they certainly did.
(v) The sub-string ideas sleep cannot be moved, pronominalized, nor coordinated.

These facts are captured insightfully by the tree (1), as follows:

(3) (i) (2i) follows, because the sub-string in question is a constituent, an NP, according to the tree (1).
(ii) (2ii) also follows, because the sub-string is an NP, and NPs can be pronominalized.
(iii) (2iii) follows from the tree (1), which shows the sub-string to be an NP included in a larger NP.
(iv) (2iv) follows because (1) treats it as a VP constituent.
(v) (2v) follows because according to (1) the string is not a constituent.

Ý

Given the concepts of categories and constituent structure defined in terms of a tree diagram, we can express speakers' knowledge of their language in an insightful way. We can capture a speaker's intuition about a given sentence by giving it a constituent-structure analysis.

The few examples of tree diagrams we have seen are only an illustration of how some English sentences may be formed. To capture a speaker's full intuition about his/her language, we obviously must look for general rules and principles that underlie all the grammatical sentences (potentially or actually occurring) and exclude all ungrammatical ones. In this chapter, we will try to find out what some of the generalizations are of English syntactic structure, and how we can characterize those generalizations by rules. When we are able to state the relevant generalizations and characterize them systematically, we have developed a theory about our syntactic knowledge.

Ý

3.2. P-Markers and Phrase Structure Rules

Ý

Our goal is to develop a general theory that accounts for the speaker's knowledge of what counts as a sentence. Since, in the speaker's knowledge, a sentence is not merely a linear string, but has a hierarchical structure of some sort which we have represented by means of a tree, it is more correct to say that the theory that we are aiming to develop should define, in a general and principled way, what counts as a legitimate tree in a given language.

Formally, a theory consists of a number of concepts and a number of principles or general statements made in terms of those concepts. We have adopted the concepts involved in a labeled tree diagram to represent speakers' knowledge of sentence structure. To seek a general theory about possible sentence structures in language, let's start with a formal definition of some concepts involved in a tree. Consider (4) as an example.

(4)

A tree consists of nodes and branches. Each tree has a root node and one or more terminal nodes. In (4), the root node is A, and the terminal nodes are K, D, I, J, G, H. The nodes that are neither root nor terminal nodes will be called intermediate nodes. Nodes are connected, directly or indirectly, by branches.

A tree diagram is sometimes called a phrase-marker, or P-marker. A P-marker encodes two sorts of relations between nodes: (a) precedence and (b) dominance.

X precedes Y iff [if and only if] X and Y are linearly arranged and X appears to the left of Y. In (4), K precedes D, I, J, G, and H; J precedes G, H, but follows K, D, I. The relation of dominance is that of containment. X dominates Y iff X contains Y, or iff Y is contained in X. ["Contains Y" means "has Y as (one of) its parts". In (4), A dominates all the other nodes in the tree. C dominates D, E, and all nodes that E dominates (F, I, J, G, H) but it does not dominates B, K, or A. Similarly, F dominates I and J, but does not dominate any other nodes. And the terminal nodes dominate nothing other than their own selves. Hence, precedence is a linear relation, and dominance is a hierarchical relation. Two nodes are in either a hierarchical (dominance) relation or in a linear (precedence) relation, but not both. Thus, B and C are in a precedence relation (B precedes C), but not in a dominance relation. On the other hand, E dominates G, but neither precedes or follows the other.

X immediately dominates Y iff X is the lowest node that dominates Y. (There is no intervening Z such that Z dominates Y and but is dominated by X.) In (4), A dominates all other nodes, but it only immediately dominates B and C. Similarly, E dominates F, G, H, and I, J, but it does not immediately dominate I or J.

Where X dominates Y, we say that Y is a constituent of X. Y is an immediate constituent of X iff X immediately dominates Y. In (4), all the non-root nodes are constituents of A, but only the nodes B, C are A's immediate constituents. I is a constituent of F, it is also a constituent of E, of C, and of A; but it is only an immediate constituent of F, but not an immediate constituent of E, C, or A.

The structural relations under consideration are sometimes referred to in kinship terms. Thus X is the mother of Y iff X immediately dominates Y (in this case, Y is the daughter of X). Hence, the "daughter of" relation is equivalent to the relation "immediate constituent of". X, Y are sister nodes of each other iff they have the same mother (e.g., are the immediate constituents of the same mother node).

X exhaustively dominates a string Y iff X dominates the entire string Y and nothing outside of Y. In (4), C exhaustively dominates the string consisting of D-E, because it dominates the entire string covered by D and E, and nothing else that is not already part of the string D-E. Similarly B exhaustively dominates K because it dominates K and nothing else. E exhaustively dominates the terminal string I-J-G-H because it dominates this string and nothing outside the string. However, C does not exhaustively dominate I-J-G-H, because it dominates more than this string. C also does not exhaustively dominate K-D-E, because it does not dominate K at all. In (4), the string K-D is not exhaustively dominated by any node at all.

When a string Y is exhaustively dominated by X, we say that the string Y constitutes an X--the string forms a constituent whose label is X, or "Y is an X". In (4), I-J is an F, forms a constituent called F, constitutes F. Similarly F-G-H constitutes E, forms a constituent called E, is an E. On the other hand, neither D-F nor B-D constitutes anything, or forms a constituent by any name, or IS anything.

Now let's look at the actual tree diagram in (1). The string colorless green ideas constitutes an NP, and so does the smaller string green ideas. Sleep furiously forms a constituent called VP, and therefore is a VP. The string ideas constitutes an N, sleep constitutes a V, green is an A, and the string Colorless green ideas sleep furiously constitutes an S by this definition, because S exhaustively dominates this string. On the other hand, the string ideas sleep does not form a constituent of any sort, because it is not exhaustively dominated by any node. Nor does the string colorless green, nor green ideas sleep in this tree.

Up to now, we have been looking at random examples of English sentences and their associated P-markers only. Our goal is to state in general terms what constitutes a legitimate P-marker in English. We shall adopt the format of a Phrase Structure Rule (PSR) to express in general terms the kinds of dominance and precedence relations that are possible for a given language. Suppose we want to say that the P-marker (4) is a possible P-marker in a language L, we can express this fact with rules like the following (among others):

(5) A ----> B C

(6) E ----> F G H

Rule (5) states that in language L, a constituent of category A consists of B and C (in that order) as its sole immediate constituents, and (6) says that E is made up solely of the three daughters F, G, and H (in that order). The existence of PS rules (5)-(6) in language L accounts (partially) for (4) as one of the possible P-markers in L.

Let us now turn to English and have a look at a fragment of its Phrase Structure Grammar.

Ý

3.3. English Phrase Structure Grammar: A Fragment

Ý

3.3.1. The Sentence

What are the possible constituents of an English sentence, and how are they arranged? Let's start by considering the possible immediate constituents of a sentence.

In general, English sentences must contain an NP (as the subject) and a VP (as the "predicate"), in that order, and an optional auxiliary, as illustrated below:

(7) a. [John] [found a fly in the soup].
b. [The professor] [located a flaw in his argument].
c. [They] [could] [see the point immediately].

Assuming that a sentence consists at most of these 3 immediate constituents, these sentences, in the speakers' intuition about them, are associated with either one of the following partial P-markers:

(8) a.

b.

Assuming for the moment that all sentences have immediate constituent structures like these, we can state this generalization by means of the following PSR, where the parenthesis notation indicates that a given constituent may be optional:

(9) The Sentence Rule

S ------> NP (Aux) VP -------------------------------- (PSR1)

The rule (9), PSR1, is a statement of generality and, taken as a rule of English PSG, it admits grammatical English sentences having the structures of (8a) or (8b), at the same time excluding sentences having the form in (10) (since these are not instances of the pattern allowed by PSR1):

(10) a.

b.

Thus, PSR1 correctly characterizes sentences like (7) (plus most of the grammatical sentences we have seen up to now and thousands of others) as grammatical, while ruling out sentences like *Saw Bill the boy, *See the boy can Bill and thousands of other imaginable, but ungrammatical, "sentences". It therefore represents a significant generalization about what counts as a grammatical sentence in English.

Ý

3.3.2. The Noun Phrase

We can further state generalizations about what constituents an NP, VP, or Aux may contain. For example, an NP may take the 'bare form' of a noun, or it may contain a noun and an optional determiner and an adjectival phrase.

(11) a. Boys like basketball games.
b. The boys like basketball games.
c. Tall boys like basketball games.
d. The tall boys like basketball games.

This generalization can be stated in the form of a rule like (12):

(12) NP -----> (Det) (AP) N

In addition to prenominal modifiers, an NP can also take optional elements following the noun:

(13) a. Boys from this school like basketball most.
b. Boys who are tall tend to like basketball most.
c. Boys from this school who are tall tend to like basketball most.

We can state this generalization by the following rule:

(14) NP ----> N (PP) (S)

In fact, an NP may (or may not) contain both prenominal and post-nominal elements in addition to the noun:

(15) a. The king of England had an unhappy life.
b. The young king of England had an unhappy life.
c. The young king who gave up his throne had an unhappy life.
d. The king of England who gave up his throne had an unhappy life.

Putting these facts together, we may collapse rules (12) and (14) into the more general (16):

(16) NP -----> (Det) (AP) N (PP) (S)

This rule says that an NP contains an obligatory N, plus a number of optional prenominal or post-nominal elements in the order given.

Finally, any of the NPs illustrated in (11), (13), (15), may take the form of a single personal pronoun:

(17) a. They like basketball games.
b. He had an unhappy life.

Note that, unlike a noun, a pronoun cannot be "modified" by a prenominal or post-nominal element . Hence the following sentences are ungrammatical:1

(18) *The he arrived yesterday.

(19) *They who are tall like basketball games.

This means that a pronoun substitutes for [= has the same status as] a whole NP, and not just for a noun. We may express this possibility by the rule (20):

(20) NP -----> Pronoun

In other words, an NP either consists of a noun with possible optional elements, or solely of a pronoun. We can express this choice with a pair of braces:

(21) The NP Rule

When we make statements like (21), we are making a theory of what constitutes a grammatical noun phrase in English. By (21) we claim that a string of words in English is admissible (grammatical) as a noun phrase iff that string consists of a single pronoun, or has the form depicted by (21b). That is, a string of words in English is a noun phrase if it can be characterized by a tree that instantiates (21). The trees in (22) are instances of (21), but those in (23) are not . [Make sure you see what's wrong with each of (23).]

Ý

(22) a.

b. c.

Ý

(23) a. b. c.

Ý

This rule accounts for the grammaticality of examples like (11), (13), and (15) (and thousands like them), and it also correctly rules out imaginable but ungrammatical strings like the following in English:

(24) a. *the of England king ....
b. *Boys tall the like basketball games.
c. *The who are tall like basketball most.

You can illustrate the (relative) validity of this rule by giving more good grammatical sentences of your own that are correctly allowed by it, and imaginable but ungrammatical strings that are correctly ruled out by it. [Homework: problem #2]

An important aspect of the structure of an NP we notice, as expressed by the rule (21b), is that an NP always contains a noun (except in the case of a pronoun, which replaces the whole NP). We shall call the noun obligatorily contained in an NP the head of the NP, and the optional elements flanking the head its peripheries. The generalization to note is that an NP must have a head and may have a number (from 0 to n) of peripheral elements. The principle that a phrase must have a head is called the principle of endocentricity. We have seen that NPs are endocentric. In fact, there is reason to believe that, as a general principle, all phrases are endocentric.

Ý

3.3.3. The Verb Phrase

Now let us look at the VP constituent, and consider what counts as a grammatical VP in English. A VP may contain only a verb, as in (25):

(25) a. John died.
b. That son of a gun left.
c. A horrible accident happened.

We can express by this VP rule:

(26) VP ----> V

Or it may take a verb followed by an object NP, a prepositional phrase (PP), or a subordinate sentence (S):

(27) a. John saw Bill. (V NP)
b. John lived in Irvine. (V PP)
c. John thought Bill kicked the bucket. (V S)

This means that a VP may contain V optionally followed by a choice of an NP, PP or S:

Ý

(28)

In fact, for each of these three choices, there is an additional option of an NP preceding the choice:

(29) a. John gave the boy a nice gift. (V NP NP)
b. John put the book on the table. (V NP PP)
c. John told the little boy he won a prize. (V NP S)

In other words, a VP always contains a verb, and may in addition have one or two elements following the V. The range of 7 possibilities illustrated in (25)-(29) can be captured by the following rule:

Ý

(30) The VP Rule

This rule says that a VP must have a head V, which may be optionally followed by up to two other peripheral elements, in the order given. You can familiarize yourself with this rule by finding more examples of good VPs in English that are allowed by this rule, and showing that some other imaginable but ungrammatical strings are ruled out by this rule. [Homework: problem #3.]

Ý

3.3.4. Other Phrases

Let us consider the internal structure of other phrases. PPs, for example, typically consist of a P (the head) and an NP (object of the P), as in (31a-b); in some cases there need not be an NP object (31c-d):2

(31) a. John lived in the garage.
b. The man from Ohio decided to leave for good.
c. The train pulled in.
d. He wiped the dirt off.

The PSR that we need to characterize a PP is therefore (32):

(32) The PP Rule

PP ------> P (NP)

On the basis of examples like good, very good, extremely young, etc., we can postulate the structure of AP as in (33).

(33) The AP Rule

AP -----> (deg) A

What about the category Det? What kinds of stuff can make up a Determiner in English? The following are some samples:

(34) a. a student, the students, some students
b. that student, this student, those students, these students
c. my students, John's students, the man over there's students

A determiner may consist of an article, a demonstrative word, or a possessive noun phrase, but not a combination of two or more of these (*the that student, *my those students, *a John's student). This can be captured conveniently by the following PSR:

(35) The Determiner Rule

This rule says that a Det may take the form of an article, a demonstrative, or an NP marked for the Possessive case. If the possessive NP is a pronoun, it will be phonologically realized as my, your, his, her, their, etc. And if the NP takes a non-pronominal form, the possessive marker will be realized as /s/, /z/ or /iz/ in a form parasitic to the last phoneme (consonant or vowel) of a whole NP.3 The rules determining how the possessive (or genitive) case is realized on a pronoun or a non-pronominal NP belong in the domain of English morpho-phonology, and we shall not be concerned with it here.

Note that the possessive NP introduced in rule (35) is intended to be like any other NP except for the additional genitive case morpheme attached it. The NP itself may have an internal structure like any other NP--as long as it is an instance of the rule (21), or PSR2, and that it will take the genitive case. This part of the rule illustrates a feature of recursion that is also seen in PSR2 and PRS3, as well as in the PP rule (32). In PSR2, an S may occur as a peripheral element of an NP. In PSR3, an NP, PP, or S may occur as a peripheral element of a VP. According to (32), a PP may contain an NP; and according to PSR2, an NP can in turn contain a PP, which according to (32) may contain another NP, which according to PSR2 may contain another PP, etc. This recursive device built into our system accounts for the infinite possibility of our linguistic competence, illustrated by examples like the following:

(36) a. the man from the city in the little country in Eastern Europe . . . .
b. the cat which chased the mouse which ate the cheese which lay on the table which stood in the middle of the kitchen . . . .
c. John's mother's father's friend's teacher . . . .

You may wish to familiarize yourself with this aspect of the rules by assigning each of these examples an appropriate tree diagram. The structure of (36) is as in (37):

Ý

(37)

Ý

Ý

Remember that each tree diagram is an instantiation of the combination of the PS rules we have posited. For each sub-tree involving a mother node and its daughter(s), there must exist a possible expansion of a PSR we have posited. You can confirm (37) as a tree allowed by the set of PSRs we have posited by identifying the rule responsible for each sub-tree in it. By considering the possibilities of applying these rules, you can also obtain a correct P-marker for (36b) and (36c).

The PSRs posited here provide general statements about what kind of immediate constituents, in terms of their syntactic categories, a given syntactic construction may or must contain. If a construction is specified to consist of one or more immediate constituents which are themselves phrasal categories, these immediate constituents must be further specified, until lexical categories (word categories) are reached. The PS rules posited here are obligatory, so every phrasal category must undergo one of these rules, sometimes recursively, until lexical categories are specified. Each legitimate tree diagram represents one particular way a combination of these obligatory rules have applied. Each tree contains a root node and a set of non-terminal nodes which specify the identity of phrasal categories (large or small), and a set of terminal nodes each specifying the identity of a lexical category.

3.4. The Lexicon

We have now made several reasonable hypotheses about the internal structure of English sentences and of their constituents. What about categories like Aux, N, A, V, P, Adv, Art, Dem, and Deg? These categories are words, not phrases. Since they are each single words, they do not have internal syntactic structure. They are the basic building blocks of syntax. Of course, just as we must know what sorts of syntactic structures constitute grammatical S's, NP's, VP's, etc., in English, we must also know what sorts of things constitute legitimate N's, Aux's, V's, etc. We have characterized the first kind of knowledge by a set of PSR's. (That is, we assume that as speakers of English, we have a Grammar which includes these PSR's.) For the second kind of knowledge, we assume that our Grammar also contains a (mental) Lexicon, which lists the words under the category (part of speech) they are members of. Or our list might take the form of a familiar dictionary, where each word is marked for its part of speech. For our current purposes we might take the list to have the following form:

(38) A Fragment of the English Lexicon
a. Aux ---> can, could, may, will, . . . .
b. N -----> books, ideas, mother, man, student, girl, house, friend, cement, pilot, . . . .
c. V ------> kick, laugh, cry, buy, live, tell, give, put, say, . . . .
d. A ------> good, bad, colorless, green, long, redundant, . . . .
e. P ------> at, in, under, on, through, up, . . . .
f. Art ----> a, the, some, . . . .
g. Dem ----> this, that, these, those
h. deg - ----> very, extremely, . . . .

In other words, our Grammar consists of two components: a Phrase Structure Rules Component (PSR Component), and a Lexicon:

Ý

(39)

Ý

3.5. Summary: Generative Grammar

Ý

In this chapter we have hypothesized that the Grammar of English has a PSR component containing the following set of PSRs which specify the contents of an S, NP, VP, AP, PP, and Det, respectively:

Ý

(9) The Sentence Rule

S ------> NP (Aux) VP -------------------------------- (PSR1)

Ý

(21) The NP Rule

(30) The VP Rule

(33) The AP Rule

AP ------> (deg) A

(32) The PP Rule

PP -------> P (NP)

(35) The Determiner Rule

By hypothesizing these rules and the lexicon, we make the following claim about what is meant by a "grammatical" sentence or phrase in English. A phrase or sentence is grammatical in English just in case (iff) it can be characterized in terms of a tree diagram which can be generated by some application(s) of the set of PS rules and whose terminal nodes each dominate an appropriate item from the Lexicon. In other words, a sentence is grammatical just in case it can be assigned a structural description by the Grammar containing the PSRs and the Lexicon.

This kind of grammar, which we assume to be a representation of the mind, is called a Generative Grammar, a termed introduced by Chomsky. A grammar generates the set of grammatical sentences in a given language. Though one might think of the word generate in the intuitive sense of produce, this is at most a metaphor. It is important to remember that no claim is being made that we actually have a device in our mind that produces sentences like machines produce consumer products. We don't even claim that our mental grammar is like a computer with programmed instructions (software) that produce sentences (as sentence-generating devices).

We simply take the grammar to be a set of statements that serve to explicitly characterize our knowledge concerning, e.g., what is and what is not a possible member of our language. The PSR's, for example, are not instructions for what to do, but are simply statements of what are the possible (admissible) components of a given category. (E.g., Det, N are the possible components of an NP, etc.; or claims that if X is an English sentence, then X must have a structure of the sort defined by PSR1, etc.) So what is the meaning of generate? Chomsky uses the term in its mathematical sense. In mathematics we say that (a+b)2 generates (a2 + 2ab + b2). There is no sense in which one of these terms produces the other. Rather, (a+b)2 is one way to characterize or specify the properties of (a2 + 2ab + b2). Similarly, when we say our Grammar generates the sentences in our language, we simply mean that our Grammar provides a (proper) characterization for the sentences. In other words, "generates" means "characterizes structurally and formally", or "assigns a (proper) structural description to". The rules are basically statements of permission and prohibition that allow us to describe and explain, in a general way, speakers' knowledge of their languages. A sentence is characterized as grammatical iff it can receive a structural description under the provisions of the Grammar, and ungrammatical otherwise. This is the meaning of a "Generative Grammar".

Ý

Homework 3

Ý

(1) Consider the following tree diagram:

Ý

Ý

Answer true or false:

(a) A dominates L, M, O, F, and B
(b) The string JKL is a constituent.
(c) D exhaustively dominates M and N.
(d) The string IJK is a B.
(e) The string M-N-O constitutes a D.
(f) B immediately dominates I, J and K.
(g) G and H are sister nodes.
(h) M, N, O are each an immediate constituent of H.

Ý

(2) The rule (21b) (PSR2b)is an abbreviation of 16 possibilities of the internal structure of an NP (in addition to the possibility provided by (21a)). Can you list all those possibilities systematically, and give an example of your own of an English NP to illustrate each possibility?

Ý

(3) The rule (30) (PSR3) is an abbreviation of the 7 possibilities of English verbal complementation (7 possible internal structures of an English VP). Can you list these possibilities and give an example of your own (other than those already given in the text) to illustrate each possibility?

Ý

(4) Assuming the PSR's we have postulated, and based on your knowledge of the lexicon, please assign a structural description (a tree diagram) to each of the following sentences:

a. I pledge allegiance to the flag of America.
b. A government for the people won't perish from the earth.
c. The company will assume its responsibilities for the damages.
d. People who criticize others with good intentions deserve your respect.
e. The man put the book he bought on the table.
f. John's mother's friend's father's sister arrived on time.
g. The cat which chased the rat which ate the cheese broke my window.

Ý

Ý

Ý

Ý

1 In somewhat archaic usage, the pronoun he is used to mean one, and can be modified:

(i) He who insults himself first will be insulted by others.

We exclude such usage from consideration.

Ý

2 Such prepositions without an object are called "particles" in traditional treatments. Emonds (1972) was the first to argue that particles are no more than "intransitive prepositions".

Ý

3 Note again, as indicated by the examples below (repeatd from Chapter 2), that the possessive marker really marks the possessive case for the entire NP preceding it, not just a noun:

(i) The King of England's head is bald.

(ii) The man over there's friend betrayed him.

Ý

Ý

Ý

Ý