Computers can calculate anything in a near instant. They’re so good at it, that we’ve outsourced most of this type of work to computers, freeing us up to be more creative. The programmer defines what the program should accomplish, breaks it down into a series of IF THEN statements and FOR loops, and the logic gets encoded in the computer. FOR all the notes in a song, IF the next note is C Minor, THEN vibrate the air at an 18.35 Hz frequency. This is a time-consuming, expensive, and error-prone process, but once done the logic can be executed infinite amount of times for basically no cost.
So now we can click a button and have our computer play music to us. The calculations to determine the right sequence of air vibrations necessary to stimulate our audio cortex. Our brain when stimulated in the right way, hallucinates a near exact copy of the original recording. We feel like we’re actually there. But we still need a musician to write the song, a DJ to choose to play it next, and a programmer to write the millions of IF statements and FOR loops that handle playing it.
So what if the computer writes IF statements for us? IF people who listen to song A, also listen to song B, THEN then play that next. IF songs A and C are both in a Minor key, THEN recommend it to this set of users. Throw enough compute at the problem, and you generate enough IF statements to categorize songs automatically. IF song D passes logic gates 2, 4, 9, 19, 26, 102... label it as ‘classic rock’. This is still a time-consuming, expensive, and error-prone process, but it’s several orders of magnitude more automated.
This is what Andrej Karpathy calls “Progamming 2.0”. If Programming 1.0 was about human-engineered source code, programming 2.0 is about compiling the dataset, defining an appropriate skeleton structure and reward function for the neural network, then letting it train itself on completing the task. The work takes the form of curating, growing, massaging and cleaning labeled datasets, rather than engineering a solution yourself: a fundamental shift in the way the work is done. It means with enough data we can ‘write’ algorithms for any task, even tasks that we have no explanation for, or not enough talent or conviction to program manually.
This is where the magic begins. To classify something is to understand its essence. So if you can classify something, you can actually just re-create it. Take a classification algorithm and reverse it, as has been done with CLIP and DALL-E, and you can generate new creative works. To our music example we can actually use the same neural network that classifies music, to generate it. With enough examples of what humans like, we can search the latent space of all possible songs, and algorithmically select the ones that are most listenable. For example if you generate a new ‘classic rock’ song, it will contains all of the right classic rock ‘memes’ that our brains recognize, both subconsciously and consciously, because it reverse-engineered these rules from our labels. When you’ve seen every classic rock song in existence, you have a better idea of what classic rock IS than even the humans that created them. These machine-generated songs aren’t yet as good as the best human-created ones yet, but AIs are exponentially improving: they’re terrifyingly quick learners.