Skip navigation

An Example of Pumping Context-Free Languages

CS 532/051, 092, 251: Formal Languages, Spring 2006

Dr. Jim Sasaki, CS Dept., Illinois Institute of Technology

CFL Pumping Lemma II

The pumping lemma given in the textbook (let’s call it Pumping Lemma I) is an awfully weak one; let’s use this slightly stronger one:

(Pumping Lemma II): If L is context-free, then there exists a constant K > 0 such that for every w ∈ L, if |w| > K, then there exist strings u, v, x, y, z ∈ Σ* such that (i) w = uvxyz, (ii) |vy| > 0, (iii) |vxy| ≤ K, and (iv) for all n ≥ 0, uvnxynz  ∈ L.

The main difference between the two Pumping Lemmas is that version II adds clause (iii), which puts a limit on the distance between v and y. (There’s also a minor difference: the textbook explicitly gives the constant as a function of a grammar for the language.)

As usual, we’ll actually use the contrapositive of the lemma, to prove that a language is not context-free. The typical form is:

L is not context-free if the following holds: For all constant K > 0 there exists a w ∈ L with |w| > K such that for all strings u, v, x, y, z ∈ Σ* where w = uvxyz, |vy| > 0, and |vxy| ≤ K, there exists an n ≥ 0 such that uvnxynz  ∉ L.

Warning: The converse of the pumping lemma is false: Just because a language is pumpable, that doesn’t mean the language is necessarilly context-free. For example, Pumping Lemma I fails to show that the language { ambncmdn | m, n ≥ 0 } is context-free because only having clauses (i) and (ii) to work with doesn’t give us enough power to prove that the language is not pumpable. Pumping Lemma II, on the other hand, is powerful enough. as we’ll see in the next section.

An Example Using Pumping Lemma II

Detailed proof

Let L be the language { ambncmdn | m, n ≥ 0 }. Here’s a (detailed) proof that L is not context-free, using Pumping Lemma II.

Let K be the constant provided by the lemma, and let w = aKbKcKdK. Let u, v, x, y, and z be strings in Σ* with w = uvxyz, |vy| > 0, and |vxy| ≤ K. We will find an k ∈ N such that uvkxykz ∉ L; this will establish that L is not context-free.

The value for k depends on the structure of v and y. Since vxy has length ≤ K, it can only include one or two kinds of characters because to include three or four kinds of characters (in the form a+bKc+, for example) requires a string of length > K. Let p = |v| and q = |y| (at least one of these must be positive).

  1. Say v and y have the form ap and aq respectively. If k = 2 then uvkxyk has K+p+q a’s but only K c’s, so it is not in L.
  2. Similarly, if v and y are both all b’s, all c’s, or all d’s, then uv2xy2z has too many b’s, c’s, or d’s respectively to be in L.
  3. Say v has the form a+b+ (so y has the form bq) or y has the form a+b+ (so x has the form ap). If k = 2, then uvkxyk is not of the form a+b+c+d+, so it is not in L.
  4. Similarly, if v or y have the form b+c+ or c+d+, then uv2xy2z is not in L.

These are all the possible cases for v and y, so we know that no possible breakdown of w is pumpable and therefore L is not context-free.

Comments on the proof

  1. The case analysis for v and y depends heavily on the structure of w. For example, if instead of w = aKbKcKdK we had used w = aLbLcLdL, where L = K/4 (rounded up), then vxy could have spanned 3 or 4 kinds of symbols (a+b+c+, for example, or a+b+c+d+).
  2. We did not have to have the same number of a’s and b’s in w; strings in the language have to have the same number of a’s and c’s and the same number of b’s and d’s, but not necessarily the same number of a’s and b’s (or c’s and d’s). So w = aKbK+1cKdK+1 would’ve worked equally as well. (For that matter, w = aKbcKd would’ve worked too, but the case analysis would’ve been different, of course.)
  3. The reason we need Lemma II is that it constrains the distance between v and y (since |vxy| ≤ K). Without that constraint, we can’t claim (for example) that if v has the form ap, then y must have the form aq or a+b+. In particular, it y can have the form cq. If in addition p = q, then there is no k that makes uvkwykz not in L.

The Proof as a Program

It might help you understand the proof by thinking of it in programming terms. In pseudocode terms, the proof takes the description of a language L and an integer K, builds w, gets an arbitrary breakdown for w, and finds evidence for the unpumpability of w using that breakdown by returning an exponent for which pumping fails. Below, I'm using + for string concatenation and ^ for string exponentiation.

Unpumpable ShowUnpumpable(Language L, int K) {       // K > 0
    String w = a^K + b^K + c^K + d^K;
    String u, v, x, y, z;                // w = uvxyz, |vy| > 0, |vxy| <= K
    L.decompose(w, u, v, x, y, z);       // Let someone set u, v, x, y, z according to the comment above.

    if (v and y are both of form a* or b* or c* or d*) {
        return new Unpumpable(2, u, v, w, y, z, "Too many a's, b's, c's, or d's");
    else if (v or y is of the form a⁺b⁺, b⁺c⁺, or c⁺d⁺)
        return new Unpumpable(2, u, v, w, y, z, "Wrong format");
    }
    else {
        throw new Error("This should never happen -- we missed a case!");
    }       
}
    

In pseudocode terms, on Exam 1, many people did if-then-elses that missed some (or many) cases.

    if (y.equals("a"))
        return new Unpumpable(0, x, y, z, "Not enough a's");
    else
        throw new Error("This should never happen");   // Really?  What about y = "aa" or "aaa" or ...?
    

Posted: Fri Apr 14, 2006.

Copyright © 2006 (Spring) by James Sasaki, CS Dept., Illinois Institute of Technology

xhtml css