Jurgen J. Vinju Blog

We use computers to do all of our computations. The faster and more ubiquitous computer hardware has become, the more we use them to compute all kinds of numerical output. But what if they are wrong? Unfortunately, many computations produce inaccurate results. In two cases a computer, however faithfully it executes its code, is the source of increasing the inaccuracy of the output numbers:

What if the input data is inaccurate. In that case, the more we compute with the numbers, the more inaccurate the outcome is. Say we measured daily maximum temperatures over the last century. No doubt these data contain both random errors and structural errors that we do not know about. However, we store the data in a spreadsheet with all those random digits looking pristinely accurate. When we use a statistical procedure to confirm a correlation, then the output of this computation could be wildly different given accidentally different numbers. Many errors are heterogenous, non-constant and non-independent. So even averaging the result of multiple measurements does not help.

Computers do not compute all that accurately. Although they use numbers which appear to be exact, in fact most of the computations done by computer chips are based on floating points. These floating points have a fixed set of binary digits, leading to (binary) rounding errors, and they might differ in significance due to the “floating point”. As a result, floating point arithmetic behaves nothing like high school mathematics. Calculations are not always associative or distributive, as is the case with exact numbers. Calculations based on floating points therefore might lead to (cumulative) mistakes that are not immediately obvious. Practically all computers implement the same kind of standardized floating point arithmetic, and so they all make the same mistakes.

Experts in numerical methods know exactly how to write their code and constrain their inputs to minimize cumulative floating point errors and the effects of inaccurate inputs. However, we now have a huge programmer population who neither have the time nor received the training to do the same. Computations on inaccurate data with inaccurate floating points fail us silently, since they produce many numbers that look plausible. What if programming languages would make all that inaccuracy explicit? That at least would make the issue of inaccurate numbers visible. It would come at a cost, surely. Perhaps the computation would be ten times as slow. However, it would be very good to know how noisy the numbers are that we build our lives around.

Jurgen J. Vinju • blogging on the art & science of software

Noisy numbers