The pumping lemma given in the textbook (let’s call it Pumping Lemma I) is an awfully weak one; let’s use this slightly stronger one:
(Pumping Lemma II): If L is context-free, then there exists a constant K > 0 such that for every w ∈ L, if |w| > K, then there exist strings u, v, x, y, z ∈ Σ* such that (i) w = uvxyz, (ii) |vy| > 0, (iii) |vxy| ≤ K, and (iv) for all n ≥ 0, uvnxynz ∈ L.
The main difference between the two Pumping Lemmas is that version II adds clause (iii), which puts a limit on the distance between v and y. (There’s also a minor difference: the textbook explicitly gives the constant as a function of a grammar for the language.)
As usual, we’ll actually use the contrapositive of the lemma, to prove that a language is not context-free. The typical form is:
L is not context-free if the following holds: For all constant K > 0 there exists a w ∈ L with |w| > K such that for all strings u, v, x, y, z ∈ Σ* where w = uvxyz, |vy| > 0, and |vxy| ≤ K, there exists an n ≥ 0 such that uvnxynz ∉ L.
Warning: The converse of the pumping lemma is false: Just because a language is pumpable, that doesn’t mean the language is necessarilly context-free. For example, Pumping Lemma I fails to show that the language { ambncmdn | m, n ≥ 0 } is context-free because only having clauses (i) and (ii) to work with doesn’t give us enough power to prove that the language is not pumpable. Pumping Lemma II, on the other hand, is powerful enough. as we’ll see in the next section.
Let L be the language { ambncmdn | m, n ≥ 0 }. Here’s a (detailed) proof that L is not context-free, using Pumping Lemma II.
Let K be the constant provided by the lemma, and let w = aKbKcKdK. Let u, v, x, y, and z be strings in Σ* with w = uvxyz, |vy| > 0, and |vxy| ≤ K. We will find an k ∈ N such that uvkxykz ∉ L; this will establish that L is not context-free.
The value for k depends on the structure of v and y. Since vxy has length ≤ K, it can only include one or two kinds of characters because to include three or four kinds of characters (in the form a+bKc+, for example) requires a string of length > K. Let p = |v| and q = |y| (at least one of these must be positive).
These are all the possible cases for v and y, so we know that no possible breakdown of w is pumpable and therefore L is not context-free.
It might help you understand the proof by thinking of it in programming terms. In pseudocode terms, the proof takes the description of a language L and an integer K, builds w, gets an arbitrary breakdown for w, and finds evidence for the unpumpability of w using that breakdown by returning an exponent for which pumping fails. Below, I'm using + for string concatenation and ^ for string exponentiation.
Unpumpable ShowUnpumpable(Language L, int K) { // K > 0
String w = a^K + b^K + c^K + d^K;
String u, v, x, y, z; // w = uvxyz, |vy| > 0, |vxy| <= K
L.decompose(w, u, v, x, y, z); // Let someone set u, v, x, y, z according to the comment above.
if (v and y are both of form a* or b* or c* or d*) {
return new Unpumpable(2, u, v, w, y, z, "Too many a's, b's, c's, or d's");
else if (v or y is of the form a⁺b⁺, b⁺c⁺, or c⁺d⁺)
return new Unpumpable(2, u, v, w, y, z, "Wrong format");
}
else {
throw new Error("This should never happen -- we missed a case!");
}
}
In pseudocode terms, on Exam 1, many people did if-then-elses that missed some (or many) cases.
if (y.equals("a"))
return new Unpumpable(0, x, y, z, "Not enough a's");
else
throw new Error("This should never happen"); // Really? What about y = "aa" or "aaa" or ...?