2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, …

Euclid’s second theorem (Elements IX.20) shows that there is an infinite number of prime numbers. It is a really beautiful theorem, even after 2,300 years.

In Euclid’s time, various assumptions about numbers were somewhat different from ours; for example, numbers were often thought of in terms of relative measurements of line segments (Euclid was really into geometry), and infinity was not taken for granted. But translated into modern terms, Euclid’s theorem runs essentially like this:

Suppose you have a finite list of prime numbers (which you perhaps think to be complete, though if so you will turn out to have been wrong). Call the numbers p1, p2, p3, … pn, and call the list of them L. Now, take the product of these prime numbers, P (P=p1⋅p2⋅p3…⋅pn), and add 1, calling the sum Q (P+1=Q).

Find a prime factor for Q. If there is none, then Q is a prime number which was not in the list L (so L was incomplete).

If there is a prime factor for Q, call it p, then either p is in L or it is not. If it is not, then L was incomplete.

In fact, p cannot be in L. If p were in L, then p would divide P (since P was the product of all the numbers in L in the first place), and if p could divide both P and Q, it would have to divide 1, which is the difference between P and Q. But 1 is not a prime, and nothing else divides 1, so a factor of Q cannot be in L.

Thus, no finite list of primes is complete.

One’s not being prime is a bit ironic given the sense of prime to mean ‘first,’ though that sense seems to be historically derivative from a prior meaning of ‘before’ or ‘in front of’ (at least if you can trust Indo-European etymology). How do we know one isn’t a prime? Basically, the answer is that the Fundamental Theorem of Arithmetic does cool things if one is excluded.

The Fundamental Theorem of Arithmetic states that every positive integer other than one can be represented in exactly one way as a product of one or more primes. For example, 4 is factorizable as two squared, 6 is two times three, 8 is two to the third, 9 is three squared, 10 is two times five, and so on. This is a corollary of Euclid’s first theorem, which states that if p is a prime, and p divides the product of α and β, then p divides α or p divides β.

If one were a prime, then there would be infinitely many factorizations of every number, since you can always multiply by one without changing anything.

The uniqueness of the factorizations means that n-tuples of numbers can be represented as integers, with the n-tuples being uniquely recoverable from the integers.

Suppose you want to organize a data set consisting of different grammars. To keep the example simple, say the data set allows three parametric values for each functional head, and there is a strict order of functional heads, but languages vary in how many of those functional heads they have. Converting the parametric values to integers, and letting order represent position in the sequence of functional heads, the grammars can be described as Language A being 〈3,2,1,1〉, Language B being 〈1,3,2,1,2〉, and Language C being 〈2,2,1,1,2,1,1〉. Each language can now be converted into a unique integer by taking the values of the parameters as powers of successive primes; A=23⋅32⋅51⋅71=2,520; B=21⋅33⋅52⋅71⋅112=1,143,450; and C=22⋅32⋅51⋅71⋅112⋅131⋅171=33,693,660.

Factoring the products will restore the original sequences perfectly. In this way we can always be sure that a list of n-tuples (in this case, the total list of possible grammars) is enumerable, which is a wonderful result if you are worried that it might not be.