All computers are built on logic. 1s and 0s represent TRUEs and FALSEs, which combine to form logic gates: ANDs, ORs, NOTs. There’s also XOR, NAND, NOR, and XNOR. Combine enough of these together and you can represent any logic, which is to say, any knowledge. Layering on abstractions we get to numbers and letters, then whole words. Images may say a thousand words, but all they really are is a series of individual pixel blocks, each communicating a color number value from 0-255 in the red, green, and blue spectrum.
Laddering up from there we can take a 2D image and make it look 3D, using tricks like implied depth, shade-mixing, and room lighting. Move the pixels along in concert and we can trick our brain into thinking an image is moving. Vibrate the air a little in the right cadence, and you have accompanying music and dialogue. Computer programs are dumb; just pure logic. They don’t “know” the meaning of what they’re doing. Yet that doesn’t stop you feeling something when trillions of 1s and 0s combine to play that song you danced to at your wedding. Videos of that special day are just trillions more 1s and 0s, but that doesn’t stop you from reminiscing and suspending disbelief for a moment.
String together enough IF THEN statements and magic happens. IF a user searches “B-e-a-t-l-e-s”, THEN display that artist. IF a user clicks “Here comes the sun”, THEN find the corresponding audio file in storage and start to stream it. IF receiving a stream of 1s and 0s, decode the patterns into soundwaves, THEN make the right vibrations at the right frequencies at the right time. Do it in such a way that the Auditory Cortex is stimulated in the right areas, so the listener hallucinates an almost exact replica of how the Beatles would have done it live. Of course this took an enormous amount of work by many programmers to decide what IF THEN statements to write, and how to write them.
What if we let the computer decide what IF THENs to write, based on patterns it saw int he data? IF song matches “pop classics”, then play more from that playlist. IF song often appears together with the rest in this playlist, THEN recommend playing it next. IF you find another correlation in the data, THEN use it to better predict what to play next. IF your predictions keep people listening, THEN keep that rule and throw out those that don’t. Repeat that cycle enough times, and robo DJs start to outperform human ones. Better prediction means longer listening times, which means more data, for making better predictions… eventually the advantage is insurmountable.
Why stop at DJing? If the computer can identify what makes a popular song in order to make recommendations, why can’t it write one that follows those rules? That’s precisely how modern generative AI works: start with random noise and keep iterating until you predict humans will really like it. These models have billions of parameters and cost millions of dollars to train, but their ability to generate new text, images, music that nobody has heard before, is infinitely valuable. Computers may be ‘dumb’ but they can process more data by brute force in a few minutes, than any one human can absorb in a lifetime.