A.M. TURING AWARD WINNERS BY...

September 27, 1919, Strood, England

October 5, 1986, Teddington, England

Sir Joseph Williamson’s Mathematical School, Rochester, England; Trinity College Cambridge (First Class Honors, 1939)

British Government Ministry of Supply (1940–1946); National Physical Laboratory (1946–retirement in 1980); he also held several visiting positions at major American universities

DSc, Cambridge (1963); Fellow of the Royal Society (1969); SIAM John von Neumann Lecturer (1970); ACM Alan M. Turing Award (1970); American Mathematical Society Chauvenet Prize (1987). The J. H. Wilkinson Prize for Numerical Software is named in his honor.

United Kingdom – 1970

CITATION

For his research in numerical analysis to facilitiate the use of the high-speed digital computer, having received special recognition for his work in computations in linear algebra and "backward" error analysis.

**James (Jim) Hardy Wilkinson was a British mathematician who became the leading expert in a new, and important, field that emerged after World War II.** It goes under two names, *matrix computations* and (more pompously) *numerical linear algebra*.

Jim came from a humble family in the dairy business and was the third of five children. He showed a great interest in mathematical problems as a youngster and won a Foundation Scholarship to the Sir Joseph Williamson’s Mathematical School when he was eleven—a fortunate occurrence because the family business had fallen on hard times and they could not have paid for such an elite school.

Wilkinson graduated from Trinity College, Cambridge, the mecca of mathematics in Britain, in 1939 after a stellar undergraduate career. He was expected to become a pure analyst and continue with graduate work at Cambridge, but the outbreak of World War II put an end to that plan. He began working at the British Government Ministry of Supply doing research on topics such as thermodynamics of explosions, ballistics, and similar military interests. This work during the war changed his focus radically.

At the end of the war, rather than returning to Cambridge, he joined the National Physical Laboratory (NPL) staff where he worked with Alan Turing on problems associated with Turing’s proposal to build an electronic computer. When Turing left the NPL to go to Manchester University, Wilkinson (with others) took over the development of Turing’s computer—the Pilot ACE (operational in 1950).

This essay will attempt to explain, in as simple a way as possible (although some knowledge of mathematics might be helpful) the concerns of this field of matrix computations and why Wilkinson won the ACM A. W. Turing Award in 1970.

The field would not exist without the availability of fast digital computers. Although a few classical mathematicians, such as Carl Friedrich Gauss (1777-1855) and Karl G. J. Jacobi (1804-1851), had engaged in matrix computations by hand (a labor of Hercules) there was no shared field of study.

Let us consider these electronic computing machines for a moment.

People are awed at the prodigious speeds at which they execute primitive arithmetic operations such as addition and multiplication. Yet this speed is achieved at a price, almost every answer is wrong! Not many people are aware of this property. It is not due to difficulties in approximation. It is due to an iron law: all numbers used in a computer shall have a fixed number of digits. More precisely, the numbers we are talking about are usually floating point numbers that have two parts, a fraction and an exponent. It is the fractional parts with which we are concerned here. The exponents are also constrained and this can cause a condition of overflow or underflow (the numbers become too big or too small and cannot be represented in the machine) which is vexing but not our concern in this essay. To get a feel for the constraint on the fractional parts, consider them as restricted to 16 digits (this figure will vary depending on the computer in use). In Wilkinson's first machine the figure was 9 digits. That seems generous enough for practical purposes, even too generous, and for many calculations it is overkill. Nevertheless the product of two 16 digit fractions needs 32 digits to represent it and the computer always throws away the last (least significant) 16 of them. The error, more precisely the relative error, is miniscule. For calculations involving physical quantities, such as pressure or velocity, the given precision is usually adequate but in applications such as astronomy it may not be. The name for these miniscule errors is, appropriately, roundoff error, the computer “rounds off” the results of all arithmetic operations. The study of the effect of these errors is called roundoff error analysis.

A dramatic demonstration of the catastrophic effect of roundoff error can be obtained with a simple hand held calculator. Enter a positive integer between 2 and 9 (say 5). Press the square root key a large number of times (20 to 40 depending on the sophistication of the calculator) and then press the key for squaring an equal number of times. The original integer would be returned in exact arithmetic but the calculator will usually (depending on how many decimal digits it retains after each calculation) return either1or an approximation to 5 (e.g., 4.9998) instead. The differences are due to accumulating roundoff error. No subtractions are involved and each arithmetic error is tiny in a relative sense, but they accumulate if enough operations are performed.

Wilkinson pioneered, but did not invent, successful error analyses of all the matrix algorithms of his day. More on that topic later. Before leaving this description of round off error it should be stressed that in most of scientific and engineering computations roundoff error is only one, and a minor one at that, of the sources of error in the output. Other types of error are data uncertainty, approximation error, and discretation error when dealing with derivatives. However matrix computations are distinguished by the fact that for the so-called direct methods roundoff error is the only source of error.

Another striking feature of matrix computations is that it deals with objects that are not natural physical quantities such as pressure or temperature. Determinants and polynomials are wild. It was only gradually appreciated that the value of a polynomial of degree 1000 or more overflows in the computer unless the evaluation point is very close to a zero of the polynomial. Numbers with a large range of exponent occur very easily.

With that background to computers and to roundoff error let us return to the state of the pioneers in the use of these new devices. Long before the computer age scientists had approximated derivatives by difference quotients. After all a derivative is just the limit of a sequence of divided differences, (f(*x*+*h*) - f(*x*))/((*x*+*h*) – *x*), for small *h*. These details are not important but the great reward for making these approximations is that differential equations (difficult) turn into difference equations and these are disguised forms of algebraic equations. When the original differential equation is linear these difference equations turn into systems of linear algebraic equations (which are usually considered easy). In modern language a set of linear algebraic equations in unknowns called *x*_{1}, *x*_{2}, ... is called a matrix equation and written *Ax* = *b*, where the vector *b* holds the right hand sides, the vector *x* holds the unknowns and *A* is a matrix. For our purposes a matrix is a rectangular array of numbers. Readers who have not been exposed to matrices should just tell themselves that they are an important notational invention allowing a mass of detail to be compressed in a few symbols. The purpose of this paragraph is to hint at how a host of diverse scientific and engineering problems boil down to solving a few standard matrix problems. Matrix computations are almost never of interest for their own sake—they usually occur as intermediate steps in a bigger process.

It was immediately apparent to all involved in building automatic digital computers that it if these machines could solve systems of equations with, say, hundreds of unknowns then engineering models could be made much more realistic with huge benefits to technology. Direct methods for solving *Ax* = *b* had been refined for humans with handheld computing devices since the 1930s and it did not seem too much of a challenge to automate these methods. The only cloud on the horizon was the threat of all those roundoff errors. Might they undermine the hopes inspired by the new devices? Mathematicians of the greatest power (e.g., John von Neumann in the U.S.A. and Alan Turing in England), thought hard about these problems. Remember that no human being had ever solved 50 equations in 50 unknowns by a direct method. Here was unknown territory. A very senior statistician, Harold T. Hotelling, came up with a very pessimistic analysis; it seemed that the errors could accumulate at an exponential rate. Turing was pessimistic. In the U.S.A. von Neumann and Herman H. Goldstine came up with a seminal analysis and a clever, but complicated solution. In technical terms they proposed solving the normal equations instead of *Ax* = *b*. It is not essential to know what the normal equations are, but for those who have been exposed to matrix theory they are *A'Ax* = *A'b* where *A'* is the transpose of *A*.

By this time Wilkinson was the chief assistant to Alan Turing at the National Physical Laboratory which was charged with building one of Britain's earliest automatic general purpose digital computer, the Pilot ACE (Automatic Computing Engine), and of course they were giving much thought to the problems it should grapple with initially. In fact Wilkinson designed (and built) the multiplication unit for the Pilot ACE. And experimented by writing numerical programs (in an attempt to see what problems presented themselves) for the machine even before it was built.

Wilkinson had one great advantage over von Neumann and Turing. He had been obliged to solve 18 equations in 18 unknowns, with a hand cranked mechanical calculator, during WWII. He had seen how amazingly accurate the direct method was. That arduous exercise helped Wilkinson think in a different way from these other experts in accounting for those troublesome roundoff errors: stop obsessing over how large the error in the final output might be and ask instead how little could the data be changed so that the output is exactly correct? This is the fruitful question.

Another task where Wilkinson made major contributions was the *eigenvalue* problem. Here the matrix *A* is square (same number of rows as columns) and the task is to find those few (very special) values *e* so that *A* - *eI* is not invertible. Here *I* is the identity matrix whose special property is that *Iv* = *v* for all vectors *v*. Formally this problem is related to the every old problem of finding the zeros of a given polynomial. Wilkinson showed that it is most unwise to reduce the eigenvalue problem to the problem of finding the zeros of its characteristic polynomial despite the fact that this reduction is the natural approach of a mathematician. Once grasped the reason is simple; the zeros of a polynomial are, almost always, extremely sensitive to tiny changes in the coefficients of the polynomial. Indeed a big effect of the arrival of automatic computing devices on mathematics, and engineering, was to boost the importance of Perturbation Theory to an essential feature of the field of Matrix Computations. The invention of algorithms to solve the matrix eigenvalue problem is an interesting topic but a bit too technical for this essay. Wilkinson guided its development in its first three decades (1950 - 1980). A key idea is to change the given matrix, to make it closer to triangular, without changing the eigenvalues. That requirement severely constrains the changes that can be made to the given matrix.

The great rivals to direct methods for solving *Ax*=*b* are iterative methods. These methods, developed with great success by Richard Southwell during WWI, do not aim to get the exact answer, instead they aim to improve an approximate solution at each step. The process continues until the approximation *z* is deemed satisfactory because its residual *b* - *Az* is small enough for the user's needs. Often these needs were modest and 3 correct decimal digits would suffice. So the experts were inclined to put their money on these iterative methods. The serious flaw in iterative methods is that they were not applicable to the general case; they required special properties of the matrix *A*.

The 1950s were the time of exploration. Programming was an ordeal until compilers came along late in the decade. However Wilkinson was able to try various methods for various matrix problems on the Pilot ACE and learn very valuable lessons.

Without too much distortion we could terminate this essay by saying that Wilkinson won the Turing prize for showing that the fears of the experts were unfounded, for understanding precisely the role of roundoff error in matrix computations, and for showing a way to make it all look rather easy. His tool was the so-called backward error analysis. Instead of concentrating on the size of the error, find the problem which the computed solution solved exactly! For the *Ax* = *b* problem let the computer output be *z*. Look for a small matrix *E* and a small vector *e* such that (*A* + *E*)*z* = *b* + *e*. If the bounds on *E* and *e* are really tiny should we not be satisfied? After all the entries in *A *and *b* may well be uncertain. If the error is still large then the problem itself, i.e. the pair (*A*, *b*), must be close to being illposed.

In fact Wilkinson did a great deal more than has been indicated above. He appreciated that there were often subtle but important differences in the way that a method is implemented. For example the order in which the squares of the entries of a vector are added up (top down or bottom up) can make a difference to the computed norm of the vector.

Here is a very simple example of an issue which is of no importance whatsoever in exact arithmetic but can make a significant difference when a computer is used: for numbers *a* and *b*, *a*^{2} – *b*^{2} = (*a* - b)(*a* + *b*) in exact arithmetic but not in the computer. For example, if *a* and *b* are very close in value then, a computer performing the operation (*a* – *b*) might calculate the result as zero (because the computer can only store a limited number of digits for each number). This would result in the right hand side of the above equation being (0)(*a *+ *b*) which is always zero. On the other hand, first calculating *a*^{2} and then *b*^{2}, then doing the subtraction, might well result in a non-zero value. Which form should one use when programming?

By the time he retired, Jim had been given the rank of Special Merit Chief Scientific Officer within the British Civil Service, a category which is very rarely used and allows one to continue with their research without having to worry about other duties.

In addition to his technical expertise Wilkinson was an excellent expositor with an informal style.

In 1945 Jim married Heather Nora Ware (also a mathematician) and they had a son and a daughter.

Author: Beresford Neill Parlett