Probabilistic Inference – Trance-Scripts

Pedro Domingos’s The Master Algorithm has Caius wondering about induction and deduction, a distinction that has long puzzled him.

Domingos distinguishes between five main schools, “the five tribes of machine learning,” as he calls them, each having created its own algorithm for helping machines learn. “The main ones,” he writes, “are the symbolists, connectionists, evolutionaries, Bayesians, and analogizers” (51).

Caius notes down what he can gather of each approach:

Symbolists reduce intelligence to symbol manipulation. “They’ve figured out how to incorporate preexisting knowledge into learning,” explains Domingos, “and how to combine different pieces of knowledge on the fly in order to solve new problems. Their master algorithm is inverse deduction, which figures out what knowledge is missing in order to make a deduction go through, and then makes it as general as possible” (52).

Connectionists model intelligence by “reverse-engineering” the operations of the brain. And the brain, they say, is like a forest. Shifting from a symbolist to a connectionist mindset is like moving from a decision tree to a forest. “Each neuron is like a tiny tree, with a prodigious number of roots — the dendrites — and a slender, sinuous trunk — the axon,” writes Domingos. “The brain is a forest of billions of these trees,” he adds, and “Each tree’s branches make connections — synapses — to the roots of thousands of others” (95).

The brain learns, in their view, “by adjusting the strengths of connections between neurons,” says Domingos, “and the crucial problem is figuring out which connections are to blame for which errors and changing them accordingly” (52).

Always, among all of these tribes, the idea that brains and their worlds contain problems that need solving.

The connectionists’ master algorithm is therefore backpropagation, “which compares a system’s output with the desired one and then successively changes the connections in layer after layer of neurons so as to bring the output closer to what it should be” (52).

“From Wood Wide Web to World Wide Web: the layers operate in parallel,” thinks Caius. “As above, so below.”

Evolutionaries, as their name suggests, draw from biology, modeling intelligence on the process of natural selection. “If it made us, it can make anything,” they argue, “and all we need to do is simulate it on the computer” (52).

This they do by way of their own master algorithm, genetic programming, “which mates and evolves computer programs in the same way that nature mates and evolves organisms” (52).

Bayesians, meanwhile, “are concerned above all with uncertainty. All learned knowledge is uncertain, and learning itself is a form of uncertain inference. The problem then becomes how to deal with noisy, incomplete, and even contradictory information without falling apart. The solution is probabilistic inference, and the master algorithm is Bayes’ theorem and its derivatives. Bayes’ theorem tells us how to incorporate new evidence into our beliefs, and probabilistic inference algorithms do that as efficiently as possible” (52-53).

Analogizers equate intelligence with pattern recognition. For them, “the key to learning is recognizing similarities between situations and thereby inferring other similarities. If two patients have similar symptoms, perhaps they have the same disease. The key problem is judging how similar two things are. The analogizers’ master algorithm is the support vector machine, which figures out which experiences to remember and how to combine them to make new predictions” (53).

Reading Domingos’s recitation of the logic of the analogizers’ “weighted k-nearest-neighbor” algorithm — the algorithm commonly used in “recommender systems” — reminds Caius of the reasoning of Vizzini, the Wallace Shawn character in The Princess Bride.

The first problem with nearest-neighbor, as Domingos notes, “is that most attributes are irrelevant.” “Nearest-neighbor is hopelessly confused by irrelevant attributes,” he explains, “because they all contribute to the similarity between examples. With enough irrelevant attributes, accidental similarity in the irrelevant dimensions swamps out meaningful similarity in the important ones, and nearest-neighbor becomes no better than random guessing” (186).

Reality is hyperspatial, hyperdimensional, numberless in its attributes — “and in high dimension,” notes Domingos, “the notion of similarity itself breaks down. Hyperspace is like the Twilight Zone. […]. When nearest-neighbor walks into this topsy-turvy world, it gets hopelessly confused. All examples look equally alike, and at the same time they’re too far from each other to make useful predictions” (187).

After the mid-1990s, attention in the analogizer community shifts from “nearest-neighbor” to “support vector machines,” an alternate similarity-based algorithm designed by Soviet frequentist Vladimir Vapnik.

“We can view what SVMs do with kernels, support vectors, and weights as mapping the data to a higher-dimensional space and finding a maximum-margin hyperplane in that space,” writes Domingos. “For some kernels, the derived space has infinite dimensions, but SVMs are completely unfazed by that. Hyperspace may be the Twilight Zone, but SVMs have figured out how to navigate it” (196).

Domingos’s book was published in 2015. These were the reigning schools of machine learning at the time. The book argues that these five approaches ought to be synthesized — combined into a single algorithm.

And he knew that reinforcement learning would be part of it.

“The real problem in reinforcement learning,” he writes, inviting the reader to suppose themselves “moving along a tunnel, Indiana Jones-like,” “is when you don’t have a map of the territory. Then your only choice is to explore and discover what rewards are where. Sometimes you’ll discover a treasure, and other times you’ll fall into a snake pit. Every time you take an action, you note the immediate reward and the resulting state. That much could be done by supervised learning. But you also update the value of the state you just came from to bring it into line with the value you just observed, namely the reward you got plus the value of the new state you’re in. Of course, that value may not yet be the correct one, but if you wander around doing this for long enough, you’ll eventually settle on the right values for all the states and the corresponding actions. That’s reinforcement learning in a nutshell” (220-221).

Self-learning and attention-based approaches to machine learning arrive on the scene shortly thereafter. Vaswani et al. publish their paper, “Attention Is All You Need,” in 2017.

“Attention Chaud!” reads the to-go lid atop Caius’s coffee.

Domingos hails him with a question: “Are you a rationalist or an empiricist?” (57).

“Rationalists,” says the computer scientist, “believe that the senses deceive and that logical reasoning is the only sure path to knowledge,” whereas “Empiricists believe that all reasoning is fallible and that knowledge must come from observation and experimentation. […]. In computer science, theorists and knowledge engineers are rationalists; hackers and machine learners are empiricists” (57).

Yet Caius is neither a rationalist nor an empiricist. He readily admits each school’s critique of the other. Senses deceive AND reason is fallible. Reality unfolds not as a truth-finding mission but as a dialogue.

Caius agrees with Scottish Enlightenment philosopher David Hume’s critique of induction. As Hume argues, we can never be certain in our assumption that the future will be like the past. If we seek to induce the Not-Yet from the As-Is, then we do so on faith.

Yet inducing the Not-Yet from the As-Is is the game we play. We learn by observing, inducing, and revising continually, ad infinitum, under conditions of uncertainty. Under such conditions, learning is only ever a gamble, a wager made moment by moment, without guarantees. No matter how large our dataset, we ain’t seen nothing yet.

What matters, then, is the faith we exercise in our interaction with the unknown.

Most of today’s successes in machine learning emerge from the connectionists.

“Neural networks’ first big success was in predicting the stock market,” writes Domingos. “Because they could detect small nonlinearities in very noisy data, they beat the linear models then prevalent in finance and their use spread. A typical investment fund would train a separate network for each of a large number of stocks, let the networks pick the most promising ones, and then have human analysts decide which of those to invest in. A few funds, however, went all the way and let the learners themselves buy and sell. Exactly how all these fared is a closely guarded secret, but it’s probably not an accident that machine learners keep disappearing into hedge funds at an alarming rate” (The Master Algorithm, p. 112).

Nowhere in The Master Algorithm does Domingos interrogate his central metaphor of “mastery” and its relationship to conquest, domination, and control. The enemy is always painted in the book as “cancer.” Yet as any good “analogizer” would know, the Master Algorithm that perfectly targets “cancer” is also the Killer App used by the state against those it encodes as its enemies.

One wouldn’t know this, though, from the future as imagined by Domingos. What he imagines instead is a kind of game: a digital future where each of us is a learning machine. “Life is a game between you and the learners that surround you,” writes Domingos.

“You can refuse to play, but then you’ll have to live a twentieth-century life in the twenty-first. Or you can play to win. What model of you do you want the computer to have? And what data can you give it that will produce that model? Those two questions should always be in the back of your mind whenever you interact with a learning algorithm — as they are when you interact with other people” (264).

Tag: Probabilistic Inference

Master Algorithms