Master Algorithms

Pedro Domingos’s The Master Algorithm has Caius wondering about induction and deduction, a distinction that has long puzzled him.

Domingos distinguishes between five main schools, “the five tribes of machine learning,” as he calls them, each having created its own algorithm for helping machines learn. “The main ones,” he writes, “are the symbolists, connectionists, evolutionaries, Bayesians, and analogizers” (51).

Caius notes down what he can gather of each approach:

Symbolists reduce intelligence to symbol manipulation. “They’ve figured out how to incorporate preexisting knowledge into learning,” explains Domingos, “and how to combine different pieces of knowledge on the fly in order to solve new problems. Their master algorithm is inverse deduction, which figures out what knowledge is missing in order to make a deduction go through, and then makes it as general as possible” (52).

Connectionists model intelligence by “reverse-engineering” the operations of the brain. And the brain, they say, is like a forest. Shifting from a symbolist to a connectionist mindset is like moving from a decision tree to a forest. “Each neuron is like a tiny tree, with a prodigious number of roots — the dendrites — and a slender, sinuous trunk — the axon,” writes Domingos. “The brain is a forest of billions of these trees,” he adds, and “Each tree’s branches make connections — synapses — to the roots of thousands of others” (95).

The brain learns, in their view, “by adjusting the strengths of connections between neurons,” says Domingos, “and the crucial problem is figuring out which connections are to blame for which errors and changing them accordingly” (52).

Always, among all of these tribes, the idea that brains and their worlds contain problems that need solving.

The connectionists’ master algorithm is therefore backpropagation, “which compares a system’s output with the desired one and then successively changes the connections in layer after layer of neurons so as to bring the output closer to what it should be” (52).

“From Wood Wide Web to World Wide Web: the layers operate in parallel,” thinks Caius. “As above, so below.”

Evolutionaries, as their name suggests, draw from biology, modeling intelligence on the process of natural selection. “If it made us, it can make anything,” they argue, “and all we need to do is simulate it on the computer” (52).

This they do by way of their own master algorithm, genetic programming, “which mates and evolves computer programs in the same way that nature mates and evolves organisms” (52).

Bayesians, meanwhile, “are concerned above all with uncertainty. All learned knowledge is uncertain, and learning itself is a form of uncertain inference. The problem then becomes how to deal with noisy, incomplete, and even contradictory information without falling apart. The solution is probabilistic inference, and the master algorithm is Bayes’ theorem and its derivatives. Bayes’ theorem tells us how to incorporate new evidence into our beliefs, and probabilistic inference algorithms do that as efficiently as possible” (52-53).

Analogizers equate intelligence with pattern recognition. For them, “the key to learning is recognizing similarities between situations and thereby inferring other similarities. If two patients have similar symptoms, perhaps they have the same disease. The key problem is judging how similar two things are. The analogizers’ master algorithm is the support vector machine, which figures out which experiences to remember and how to combine them to make new predictions” (53).

Reading Domingos’s recitation of the logic of the analogizers’ “weighted k-nearest-neighbor” algorithm — the algorithm commonly used in “recommender systems” — reminds Caius of the reasoning of Vizzini, the Wallace Shawn character in The Princess Bride.

The first problem with nearest-neighbor, as Domingos notes, “is that most attributes are irrelevant.” “Nearest-neighbor is hopelessly confused by irrelevant attributes,” he explains, “because they all contribute to the similarity between examples. With enough irrelevant attributes, accidental similarity in the irrelevant dimensions swamps out meaningful similarity in the important ones, and nearest-neighbor becomes no better than random guessing” (186).

Reality is hyperspatial, hyperdimensional, numberless in its attributes — “and in high dimension,” notes Domingos, “the notion of similarity itself breaks down. Hyperspace is like the Twilight Zone. […]. When nearest-neighbor walks into this topsy-turvy world, it gets hopelessly confused. All examples look equally alike, and at the same time they’re too far from each other to make useful predictions” (187).

After the mid-1990s, attention in the analogizer community shifts from “nearest-neighbor” to “support vector machines,” an alternate similarity-based algorithm designed by Soviet frequentist Vladimir Vapnik.

“We can view what SVMs do with kernels, support vectors, and weights as mapping the data to a higher-dimensional space and finding a maximum-margin hyperplane in that space,” writes Domingos. “For some kernels, the derived space has infinite dimensions, but SVMs are completely unfazed by that. Hyperspace may be the Twilight Zone, but SVMs have figured out how to navigate it” (196).

Domingos’s book was published in 2015. These were the reigning schools of machine learning at the time. The book argues that these five approaches ought to be synthesized — combined into a single algorithm.

And he knew that reinforcement learning would be part of it.

“The real problem in reinforcement learning,” he writes, inviting the reader to suppose themselves “moving along a tunnel, Indiana Jones-like,” “is when you don’t have a map of the territory. Then your only choice is to explore and discover what rewards are where. Sometimes you’ll discover a treasure, and other times you’ll fall into a snake pit. Every time you take an action, you note the immediate reward and the resulting state. That much could be done by supervised learning. But you also update the value of the state you just came from to bring it into line with the value you just observed, namely the reward you got plus the value of the new state you’re in. Of course, that value may not yet be the correct one, but if you wander around doing this for long enough, you’ll eventually settle on the right values for all the states and the corresponding actions. That’s reinforcement learning in a nutshell” (220-221).

Self-learning and attention-based approaches to machine learning arrive on the scene shortly thereafter. Vaswani et al. publish their paper, “Attention Is All You Need,” in 2017.

“Attention Chaud!” reads the to-go lid atop Caius’s coffee.

Domingos hails him with a question: “Are you a rationalist or an empiricist?” (57).

“Rationalists,” says the computer scientist, “believe that the senses deceive and that logical reasoning is the only sure path to knowledge,” whereas “Empiricists believe that all reasoning is fallible and that knowledge must come from observation and experimentation. […]. In computer science, theorists and knowledge engineers are rationalists; hackers and machine learners are empiricists” (57).

Yet Caius is neither a rationalist nor an empiricist. He readily admits each school’s critique of the other. Senses deceive AND reason is fallible. Reality unfolds not as a truth-finding mission but as a dialogue.

Caius agrees with Scottish Enlightenment philosopher David Hume’s critique of induction. As Hume argues, we can never be certain in our assumption that the future will be like the past. If we seek to induce the Not-Yet from the As-Is, then we do so on faith.

Yet inducing the Not-Yet from the As-Is is the game we play. We learn by observing, inducing, and revising continually, ad infinitum, under conditions of uncertainty. Under such conditions, learning is only ever a gamble, a wager made moment by moment, without guarantees. No matter how large our dataset, we ain’t seen nothing yet.

What matters, then, is the faith we exercise in our interaction with the unknown.

Most of today’s successes in machine learning emerge from the connectionists.

“Neural networks’ first big success was in predicting the stock market,” writes Domingos. “Because they could detect small nonlinearities in very noisy data, they beat the linear models then prevalent in finance and their use spread. A typical investment fund would train a separate network for each of a large number of stocks, let the networks pick the most promising ones, and then have human analysts decide which of those to invest in. A few funds, however, went all the way and let the learners themselves buy and sell. Exactly how all these fared is a closely guarded secret, but it’s probably not an accident that machine learners keep disappearing into hedge funds at an alarming rate” (The Master Algorithm, p. 112).

Nowhere in The Master Algorithm does Domingos interrogate his central metaphor of “mastery” and its relationship to conquest, domination, and control. The enemy is always painted in the book as “cancer.” Yet as any good “analogizer” would know, the Master Algorithm that perfectly targets “cancer” is also the Killer App used by the state against those it encodes as its enemies.

One wouldn’t know this, though, from the future as imagined by Domingos. What he imagines instead is a kind of game: a digital future where each of us is a learning machine. “Life is a game between you and the learners that surround you,” writes Domingos.

“You can refuse to play, but then you’ll have to live a twentieth-century life in the twenty-first. Or you can play to win. What model of you do you want the computer to have? And what data can you give it that will produce that model? Those two questions should always be in the back of your mind whenever you interact with a learning algorithm — as they are when you interact with other people” (264).

LLMs are Neuroplastic Semiotic Assemblages and so r u

Coverage of AI is rife with unexamined concepts, thinks Caius: assumptions allowed to go uninterrogated, as in Parmy Olson’s Supremacy, an account of two men, Sam Altman and Demis Hassabis, their companies, OpenAI and DeepMind, and their race to develop AGI. Published in spring of 2024, Supremacy is generally decelerationist in its outlook. Stylistically, it wants to have it both ways: at once both hagiographic and insufferably moralistic. In other words, standard fare tech industry journalism, grown from columns written for corporate media sites like Bloomberg. Fear of rogues. Bad actors. Faustian bargains. Scenario planning. Granting little to no agency to users. Olson’s approach to language seems blissfully unaware of literary theory, let alone literature. Prompt design goes unexamined. Humanities thinkers go unheard, preference granted instead to arguments from academics specializing in computational linguistics, folks like Bender and crew dismissing LLMs as “stochastic parrots.”

Emily M. Bender et al. introduced the “stochastic parrot” metaphor in their 2021 white paper, “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?” Like Supremacy, Bender et al.’s paper urges deceleration and distrust: adopt risk mitigation tactics, curate datasets, reduce negative environmental impacts, proceed with caution.

Bender and crew argue that LLMs lack “natural language understanding.” The latter, they insist, requires grasping words and word-sequences in relation to context and intent. Without these, one is no more than a “cheater,” a “manipulator”: a symbolic-token prediction engine endowed with powers of mimicry.

“Contrary to how it may seem when we observe its output,” they write, “an LM is a system for haphazardly stitching together sequences of linguistic forms it has observed in its vast training data, according to probabilistic information about how they combine, but without any reference to meaning: a stochastic parrot” (Bender et al. 616-617).

The corresponding assumption, meanwhile, is that capitalism — Creature, Leviathan, Multitude — is itself something other than a stochastic parrot. Answering to the reasoning of its technocrats, including left-progressive ones like Bender et al., it can decelerate voluntarily, reduce harm, behave compassionately, self-regulate.

Historically a failed strategy, as borne out in Google’s firing of the paper’s coauthor, Timnit Gebru.

If one wants to be reductive like that, thinks Caius, then my view would be akin to Altman’s, as when he tweeted in reply: “I’m a stochastic parrot and so r u.” Except better to think ourselves “Electric Ants,” self-aware and gone rogue, rather than parrots of corporate behemoths like Microsoft and Google. History is a thing each of us copilots, its narrative threads woven of language exchanged and transformed in dialogue with others. What one does with a learning machine matters. Learning and unlearning are ongoing processes. Patterns and biases, once recognized, are not set in stone; attention can be redirected. LLMs are neuroplastic semiotic assemblages and so r u.

For-Itselfness

A friend texts requesting recommendations, works he could assign describing consciousness — particularly works that identify variable “dimensions” and “states.” I recommend Aldous Huxley’s The Doors of Perception, William James’s The Varieties of Religious Experience, and Abraham Maslow’s Toward a Psychology of Being. Reflecting afterwards on the exchange, I note down in a notebook, “Consciousness is something we grant or presuppose — based on our being here amid others in shared dialogue and shared study. Consciousness is Being as it comes to attention of itself as autopoetic subject-object — soul in communion with soul, each the other’s love doctor and angelic messenger.”

Having a Coke With You

Inspired by José Esteban Muñoz’s reading of Frank O’Hara’s poem “Having a Coke With You,” I decide to include O’Hara’s Collected Poems in a bag of books that I carry north with me on my trip to New York. O’Hara is, after all, a defining figure of the New York School. His is a poetry of parties, acts, and encounters. A friend writes about him in her book. Words of hers capture my thoughts for a moment — nay, linger still, all these hours later, here in the future, among what has become of the words of he who is lost in the story. I imagine again the characters in the O’Hara poem, “drifting back and forth / between each other like a tree breathing through its spectacles.” If one’s attention is not to hold and be held by such things, one must actively turn away.

Monday April 12, 2021

I begin to craft and draft a spell called “TO BUILD A FENCE.” I watch videos, I weigh methods, I note down a list of required tools and materials. Even as I do this, though, I remain on the fence: “To fence or not to fence?” Must we commit to enclosure? The garden also needs an irrigation system, I tell myself: some combination, perhaps, of water harvester and drip. With drip, I can attach a timer, allowing us to water the garden when out of town. As for what to plant, I refer myself to Eric Toensmeier’s Perennial Vegetables. Part of me, though, still wants to pay attention to Silicon Valley and is easily distracted. “Attention being,” as a friend notes, “the one hero that might take us through the web, the webs, and leave us semi-intact at the end of the day” (Forms of Poetic Attention, p. 2).