We all have to make hard decisions from time to time. The hardest of my life was whether or not to change research fields after my PhD, from fundamental physics to climate physics. I had job offers that could have taken me in either direction – one to join Stephen Hawking’s Relativity and Gravitation Group at Cambridge University, another to join the Met Office as a scientific civil servant.
I wrote down the pros and cons of both options as one is supposed to do, but then couldn’t make up my mind at all. Like Buridan’s donkey, I was unable to move to either the bale of hay or the pail of water. It was a classic case of paralysis by analysis.
Since it was doing my head in, I decided to try to forget about the problem for a couple of weeks and get on with my life. In that intervening time, my unconscious brain decided for me. I simply walked into my office one day and the answer had somehow become obvious: I would make the change to studying the weather and climate.
More than four decades on, I’d make the same decision again. My fulfilling career has included developing a new, probabilistic way of forecasting weather and climate which is helping humanitarian and disaster relief agencies make better decisions ahead of extreme weather events. (This and many other aspects are described in my new book, The Primacy of Doubt.)
But I remain fascinated by what was going on in my head back then, which led my subconscious to make a life-changing decision that my conscious could not. Is there something to be understood here not only about how to make difficult decisions but about how humans make the leaps of imagination that characterise us as such a creative species? I believe the answer to both questions lies in a better understanding of the extraordinary power of noise.
I went from the pencil-and-paper mathematics of Einstein’s theory of general relativity to running complex climate models on some of the world’s biggest supercomputers. Yet big as they were, they were never big enough – the real climate system is, after all, very complex.
In the early days of my research, one only had to wait a couple of years and top-of-the-range supercomputers would get twice as powerful. This was the era where transistors were getting smaller and smaller, allowing more to be crammed on to each microchip. The consequent doubling of computer performance for the same power every couple of years was known as Moore’s Law.
There is, however, only so much miniaturisation you can do before the transistor starts becoming unreliable in its key role as an on-off switch. Today, with transistors starting to approach atomic size, we have pretty much reached the limit of Moore’s Law. To achieve more number-crunching capability, computer manufacturers must bolt together more and more computing cabinets, each one crammed full of chips.
But there’s a problem. Increasing number-crunching capability this way requires a lot more electric power – modern supercomputers the size of tennis courts consume tens of megawatts. I find it something of an embarrassment that we need so much energy to try to accurately predict the effects of climate change.
That’s why I became interested in how to construct a more accurate climate model without consuming more energy. And at the heart of this is an idea that sounds counterintuitive: by adding random numbers, or “noise”, to a climate model, we can actually make it more accurate in predicting the weather.
A constructive role
Noise is usually seen as a nuisance – something to be minimised wherever possible. In telecommunications, we speak about trying to maximise the “signal-to-noise ratio” by boosting the signal or reducing the background noise as much as possible. However, in nonlinear systems, noise can be your friend and actually contribute to boosting a signal. (A nonlinear system is one whose output does not vary in direct proportion to the input. You will likely be very happy to win £100 million on the lottery, but probably not twice as happy to win £200 million.)
Noise can, for example, help us find the maximum value of a complicated curve such as in Figure 1, below. There are many situations in the physical, biological and social sciences as well as in engineering where we might need to find such a maximum. In my field of meteorology, the process of finding the best initial conditions for a global weather forecast involves identifying the maximum point of a very complicated meteorological function.
However, employing a “deterministic algorithm” to locate the global maximum doesn’t usually work. This type of algorithm will typically get stuck at a local peak (for example at point a) because the curve moves downwards in both directions from there.
An answer is to use a technique called “simulated annealing” – so called because of its similarities with (annealing), the heat treatment process that changes the properties of metals. Simulated annealing, which employs noise to get round the issue of getting stuck at local peaks, has been used to solve many problems including the classic travelling salesman puzzle of finding the shortest path between a large number of cities on a map.
Figure 1 shows a possible route to locating the curve’s global maximum (point 9) by using the following criteria:
- If a randomly chosen point is higher than the current position on the curve, then the new point is always moved to.
- If it is lower than the current position, the suggested point isn’t necessarily rejected. It depends whether the new point is a lot lower or just a little lower.
However, the decision to move to a new point also depends on how long the analysis has been running. Whereas in the early stages, random points quite a bit lower than the current position may be accepted, in later stages only those that are higher or just a tiny bit lower are accepted.
The technique is known as simulated annealing because early on – like hot metal in the early phase of cooling – the system is pliable and changeable. Later in the process – like cold metal in the late phase of cooling – it is almost rigid and unchangeable.
Noise was introduced into comprehensive weather and climate models around 20 years ago. A key reason was to represent model uncertainty in our ensemble weather forecasts – but it turned out that adding noise also reduced some of the biases the models had, making them more accurate simulators of weather and climate.
Unfortunately, these models require huge supercomputers and a lot of energy to run them. They divide the world into small gridboxes, with the atmosphere and ocean within each assumed to be constant – which, of course, it isn’t. The horizontal scale of a typical gridbox is around 100km – so one way of making a model more accurate is to reduce this distance to 50km, or 10km or 1km. However, halving the volume of a gridbox increases the computational cost of running the model by up to a factor of 16, meaning it consumes a lot more energy.
Here again, noise offered an appealing alternative. The proposal was to use it to represent the unpredictable (and unmodellable) variations in small-scale climatic processes like turbulence, cloud systems, ocean eddies and so on. I argued that adding noise could be a way of boosting accuracy without having to incur the enormous computational cost of reducing the size of the gridboxes. For example, as has now been verified, adding noise to a climate model increases the likelihood of producing extreme hurricanes – reflecting the potential reality of a world whose weather is growing more extreme due to climate change.
The computer hardware we use for this modelling is inherently noisy – electrons travelling along wires in a computer move in partly random ways due to its warm environment. Such randomness is called “thermal noise”. Could we save even more energy by tapping into it, rather than having to use software to generate pseudo-random numbers? To me, low-energy “imprecise” supercomputers that are inherently noisy looked like a win-win proposal.
But not all of my colleagues were convinced. They were uncomfortable that computers might not give the same answers from one day to the next. To try to persuade them, I began to think about other real-world systems that, because of limited energy availability, also use noise that is generated within their hardware. And I stumbled on the human brain.
Noise in brain
Every second of the waking day, our eyes alone send gigabytes of data to the brain. That’s not much different to the amount of data a climate model produces each time it outputs data to memory.
The brain has to process this data and somehow make sense of it. If it did this using the power of a supercomputer, that would be impressive enough. But it does it using one millionth of that power, about 20W instead of 20MW – what it takes to power a lightbulb. Such energy efficiency is mind-bogglingly impressive. How on Earth does the brain do it?
An adult brain contains some 80 billion neurons. Each neuron has a long slender biological cable – the axon – along which electrical impulses are transmitted from one set of neurons to the next. But these impulses, which collectively describe information in the brain, have to be boosted by protein “transistors” positioned at regular intervals along the axons. Without them, the signal would dissipate and be lost.
The energy for these boosts ultimately comes from an organic compound in the blood called ATP (adenosine triphosphate). This enables electrically charged atoms of sodium and potassium (ions) to be pushed through small channels in the neuron walls, creating electrical voltages which, much like those in silicon transistors, amplify the neuronal electric signals as they travel along the axons.
With 20W of power spread across tens of billions of neurons, the voltages involved are tiny, as are the axon cables. And there is evidence that axons with a diameter less than about 1 micron (which most in the brain are) are susceptible to noise. In other words, the brain is a noisy system.
If this noise simply created unhelpful “brain fog”, one might wonder why we evolved to have so many slender axons in our heads. Indeed, there are benefits to having fatter axons: the signals propagate along them faster. If we still needed fast reaction times to escape predators, then slender axons would be disadvantageous. However, developing communal ways of defending ourselves against enemies may have reduced the need for fast reaction times, leading to an evolutionary trend towards thinner axons.
Perhaps, serendipitously, evolutionary mutations that further increased neuron numbers and reduced axon sizes, keeping overall energy consumption the same, made the brain’s neurons more susceptible to noise. And there is mounting evidence that this had another remarkable effect: it encouraged in humans the ability to solve problems that required leaps in imagination and creativity.
Perhaps we only truly became Homo Sapiens when significant noise began to appear in our brains?
Noise and genius
Many animals have developed creative approaches to solving problems, but there is nothing to compare with a Shakespeare, a Bach or an Einstein in the animal world.
How do creative geniuses come up with their ideas? Here’s a quote from Andrew Wiles, perhaps the most famous mathematician alive today, about the time leading up to his celebrated proof of the maths problem (misleadingly) known as Fermat’s Last Theorem:
When you reach a real impasse, then routine mathematical thinking is of no use to you. Leading up to that kind of new idea, there has to be a long period of tremendous focus on the problem without any distraction. You have to really think about nothing but that problem – just concentrate on it. And then you stop. [At this point] there seems to be a period of relaxation during which the subconscious appears to take over – and it’s during this time that some new insight comes.
This notion seems universal. Physics Nobel Laureate Roger Penrose has spoken about his “Eureka moment” when crossing a busy street with a colleague (perhaps reflecting on their conversation while also looking out for oncoming traffic). For the father of chaos theory Henri Poincaré, it was catching a bus.
And it’s not just creativity in mathematics and physics. Comedian John Cleese, of Monty Python fame, makes much the same point about artistic creativity – it occurs not when you are focusing hard on your trade, but when you relax and let your unconscious mind wander.
Of course, not all the ideas that bubble up from your subconscious are going to be Eureka moments. Physicist Michael Berry talks about these subconscious ideas as if they are elementary particles called “claritons”:
Actually, I do have a contribution to particle physics … the elementary particle of sudden understanding: the “clariton”. Any scientist will recognise the “aha!” moment when this particle is created. But there is a problem: all too frequently, today’s clariton is annihilated by tomorrow’s “anticlariton”. So many of our scribblings disappear beneath a rubble of anticlaritons.
Here is something we can all relate to: that in the cold light of day, most of our “brilliant” subconscious ideas get annihilated by logical thinking. Only a very, very, very small number of claritons remain after this process. But the ones that do are likely to be gems.
In his renowned book Thinking Fast and Slow, the Nobel prize-winning psychologist Daniel Kahneman describes the brain in a binary way. Most of the time when walking, chatting and looking around (in other words when multitasking), it operates in a mode Kahneman calls “system 1” – a rather fast, automatic, effortless mode of operation.
By contrast, when we are thinking hard about a specific problem (unitasking), the brain is in the slower, more deliberative and logical “system 2”. To perform a calculation like 37x13, we have to stop walking, stop talking, close our eyes and even put our hands over our ears. No chance for significant multitasking in system 2.
My 2015 paper with computational neuroscientist Michael O’Shea interpreted system 1 as a mode where available energy is spread across a large number of active neurons, and system 2 as where energy is focused on a smaller number of active neurons. The amount of energy per active neuron is therefore much smaller when in the system 1 mode, and it would seem plausible that the brain is more susceptible to noise when in this state. That is, in situations when we are multitasking, the operation of any one of the neurons will be most susceptible to the effects of noise in the brain.
Berry’s picture of clariton-anticlariton interaction seems to suggest a model of the brain where the noisy system 1 and the deterministic system 2 act in synergy. The anticlariton is the logical analysis that we perform in system 2 which, most of the time, leads us to reject our crazy system 1 ideas.
But sometimes one of these ideas turns out to be not so crazy.
This is reminiscent of how our simulated annealing analysis (Figure 1) works. Initially, we might find many “crazy” ideas appealing. But as we get closer to locating the optimal solution, the criteria for accepting a new suggestion becomes more stringent and discerning. Now, system 2 anticlaritons are annihilating almost everything the system 1 claritons can throw at them – but not quite everything, as Wiles found to his great relief.
Key to creativity
If the key to creativity is the synergy between noisy and deterministic thinking, what are some consequences of this?
On the one hand, if you do not have the necessary background information then your analytic powers will be depleted. That’s why Wiles says that leading up to the moment of insight, you have to immerse yourself in your subject. You aren’t going to have brilliant ideas which will revolutionise quantum physics unless you have a pretty good grasp of quantum physics in the first place.
But you also need to leave yourself enough time each day to do nothing much at all, to relax and let your mind wander. I tell my research students that if they want to be successful in their careers, they shouldn’t spend every waking hour in front of their laptop or desktop. And swapping it for social media probably doesn’t help either, since you still aren’t really multitasking – each moment you are on social media, your attention is still fixed on a specific issue.
But going for a walk or bike ride or painting a shed probably does help. Personally, I find that driving a car is a useful activity for coming up with new ideas and thoughts – provided you don’t turn the radio on.
When making difficult decisions, this suggests that, having listed all the pros and cons, it can be helpful not to actively think about the problem for a while. I think this explains how, years ago, I finally made the decision to change my research direction – not that I knew it at the time.
Because the brain’s system 1 is so energy efficient, we use it to make the vast majority of the many decisions in our daily lives (some say as many as 35,000) – most of which aren’t that important, like whether to continue putting one leg in front of the other as we walk down to the shops. (I could alternatively stop after each step, survey my surroundings to make sure a predator was not going to jump out and attack me, and on that basis decide whether to take the next step.)
However, this system 1 thinking can sometimes lead us to make bad decisions, because we have simply defaulted to this low-energy mode and not engaged system 2 when we should have. How many times do we say to ourselves in hindsight: “Why didn’t I give such and such a decision more thought?”
Of course, if instead we engaged system 2 for every decision we had to make, then we wouldn’t have enough time or energy to do all the other important things we have to do in our daily lives (so the shops may have shut by the time we reach them).
From this point of view, we should not view giving wrong answers to unimportant questions as evidence of irrationality. Kahneman cites the fact that more than 50% of students at MIT, Harvard and Princeton gave the incorrect answer to this simple question – a bat and ball costs $1.10; the bat costs one dollar more than the ball; how much does the ball cost? – as evidence of our irrationality. The correct answer, if you think about it, is 5 cents. But system 1 screams out ten cents.
If we were asked this question on pain of death, one would hope we would spend enough thought to come up with the correct answer. But if we were asked the question as part of an anonymous after-class test, when we had much more important things to spend time and energy doing, then I’d be inclined to think of it as irrational to give the right answer.
If we had 20MW to run the brain, we could spend part of it solving unimportant problems. But we only have 20W and we need to use it carefully. Perhaps it’s the 50% of MIT, Harvard and Princeton students who gave the wrong answer who are really the clever ones.
Just as a climate model with noise can produce types of weather that a model without noise can’t, so a brain with noise can produce ideas that a brain without noise can’t. And just as these types of weather can be exceptional hurricanes, so the idea could end up winning you a Nobel Prize.
So, if you want to increase your chances of achieving something extraordinary, I’d recommend going for that walk in the countryside, looking up at the clouds, listening to the birds cheeping, and thinking about what you might eat for dinner.
Will computers, one day, be as creative as Shakespeare, Bach or Einstein? Will they understand the world around us as we do? Stephen Hawking famously warned that AI will eventually take over and replace mankind.
However, the best-known advocate of the idea that computers will never understand as we do is Hawking’s old colleague, Roger Penrose. In making his claim, Penrose invokes an important “meta” theorem in mathematics known as Gödel’s theorem, which says there are mathematical truths that can’t be proven by deterministic algorithms.
There is a simple way of illustrating Gödel’s theorem. Suppose we make a list of all the most important mathematical theorems that have been proven since the time of the ancient Greeks. First on the list would be Euclid’s proof that there are an infinite number of prime numbers, which requires one really creative step (multiply the supposedly finite number of primes together and add one). Mathematicians would call this a “trick” – shorthand for a clever and succinct mathematical construction.
But is this trick useful for proving important theorems further down the list, like Pythagoras’s proof that the square root of two cannot be expressed as the ratio of two whole numbers? It’s clearly not; we need another trick for that theorem. Indeed, as you go down the list, you’ll find that a new trick is typically needed to prove each new theorem. It seems there is no end to the number of tricks that mathematicians will need to prove their theorems. Simply loading a given set of tricks on a computer won’t necessarily make the computer creative.
Does this mean mathematicians can breathe easily, knowing their jobs are not going to be taken over by computers? Well maybe not.
I have been arguing that we need computers to be noisy rather than entirely deterministic, “bit-reproducible” machines. And noise, especially if it comes from quantum mechanical processes, would break the assumptions of Gödel’s theorem: a noisy computer is not an algorithmic machine in the usual sense of the word.
Does this imply that a noisy computer can be creative? Alan Turing, pioneer of the general-purpose computing machine, believed this was possible, suggesting that “if a machine is expected to be infallible then it cannot also be intelligent”. That is to say, if we want the machine to be intelligent then it had better be capable of making mistakes.
Others may argue there is no evidence that simply adding noise will make an otherwise stupid machine into an intelligent one – and I agree, as it stands. Adding noise to a climate model doesn’t automatically make it an intelligent climate model.
However, the type of synergistic interplay between noise and determinism – the kind that sorts the wheat from the chaff of random ideas – has hardly yet been developed in computer codes. Perhaps we could develop a new type of AI model where the AI is trained by getting it to solve simple mathematical theorems using the clariton-anticlariton model; by making guesses and seeing if any of these have value.
For this to be at all tractable, the AI system would need to be trained to focus on “educated random guesses”. (If the machine’s guesses are all uneducated ones, it will take forever to make progress – like waiting for a group of monkeys to type the first few lines of Hamlet.)
For example, in the context of Euclid’s proof that there are an unlimited number of primes, could we train an AI system in such a way that a random idea like “multiply the assumed finite number of primes together and add one” becomes much more likely than the completely useless random idea “add the assumed finite number of primes together and subtract six”? And if a particular guess turns out to be especially helpful, can we train the AI system so that the next guess is a refinement of the last one?
If we can somehow find a way to do this, it could open up modelling to a completely new level that is relevant to all fields of study. And in so doing, we might yet reach the so-called “singularity” when machines take over from humans. But only when AI developers fully embrace the constructive role of noise – as it seems the brain did many thousands of years ago.
For now, I feel the need for another walk in the countryside. To blow away some fusty old cobwebs – and perhaps sow the seeds for some exciting new ones.
Tim Palmer is Royal Society Research Professor at University of Oxford.
This article first appeared on The Conversation.