In July 2022, this column pointed out that artificial intelligence (AI) had done at least two pieces of research for which it deserved Nobel prizes. One was working out how to efficiently manage magnetic fields that enable controlled nuclear fusion. The other involved understanding the mechanics of protein folding, and making good guesses about the biochemical impact of such protein folding.
The 2024 Nobel Prize for Chemistry has just been awarded for the latter, “for computational protein design.” Two computer scientists, Demis Hassabis and John Jumper shared half the Nobel. They conceptualised the “Alphafold” algorithm, which worked out protein folding for Google subsidiary DeepMind. Chemistry professor David Baker, who uses computerised methods to create new proteins, was awarded the other half. The Alpha algorithm was also responsible for the research into “Magnetic control of tokamak plasmas through deep reinforcement learning”.
Alpha is a self-learning algorithm. It initially became famous around 2017 for playing incredibly strong chess and Go. In both games, it went far beyond the limits of human understanding. It used the same self-learning capabilities to work through protein folding and handle magnetic fields.
Many machine learning algorithms start with humans “teaching” them the already known, and then letting the algo loose on existing data where it may find new patterns. Before describing how Alpha’s methods differ, we need to understand a key distinction between rules and heuristics.
Rules are hard and fast. Acids and alkalis neutralise each other; chess rooks move in straight lines. Heuristics are rules of thumb. They are not hard and fast. Rooks must move in straight lines but a heuristic is an understanding of where a rook should be placed (“behind passed pawns”) for effectiveness. Or, when doing integration by parts, mathematicians often use a LIATE heuristic (Logarithms, Inverse trig, Algebraic, Trig, Exponential) to judge how to break up the function — the integration can also be done without LIATE.
Heuristics don’t always work. In assisted machine learning, rules and heuristics may both be taught to the algo. After learning the rules, the algorithm may work through databases of existing chess games, where the heuristics of good chess play are evident, or through known protein folds.
Instead, Alpha “learns” only the hard and fast rules of how pieces and counters move in Go and chess, or how proteins fold, or how plasma interacts with magnetic fields. Then the algo starts playing on its own to create new databases. The algo works out millions of examples on its own, discovering what heuristics work, just by using the rules. Ideally, the algo rediscovers all the heuristics humans know, and goes beyond.
Even more than chess and Go, protein folding is an insanely big computational problem. There are over 200 million known proteins, each composed of about 20 amino acids. Proteins may consist of chains of 300-odd amino acids or more, strung together in various ways. Any protein can be folded multiple ways. The biochemical actions depend on how it is folded, as chemistry varies with physical proximity. If proteins are mis-folded, congenital diseases can occur, or drugs are ineffective.
Hence, understanding how proteins are likely to be folded is critically important. This is guesswork, driven by combinatorial maths and experimental observations. Researchers doing experiments and using cryo-electron microscopy often take years to determine the structure of a single protein.
In 2018, AlphaFold started working on protein folding. It soon established itself as the best “guesser” when it came to Protein Structure Prediction. In 2021, AlphaFold2 was open-sourced. The source code was released with a paper that outlined the detailed methodology of guessing protein folding. DeepMind also released a compendium of the structures of 200 million proteins. Over 2 million researchers have since used Alphafold2 and its database. Meanwhile, Professor Baker has been a pioneer in splicing together amino acids in different sequences to create entirely new proteins.
Proteins are chemical building blocks that create muscles, horns, feathers, hormones, and antibodies. Many proteins form enzymes that drive chemical reactions within living organisms, and they are also crucial for communication between cells and their surroundings. Thanks to Professor Baker and to Alphafold, research in these areas has been dramatically accelerated. A philosophical question remains: If Alpha is self-learning, should it have become the first non-human recipient of a Nobel?