Modern ad platforms optimize creative with AI, but when one ad gets 90% of the budget, you wonder if the rest were given a fair chance. In the short term machine learning tends to get far better results, and without the time cost of putting a human on the job, which is why we delegated optimization decisions to algorithms in the first place. However for learning about consumer behavior, there’s no substitute for randomized controlled experiments (RCTs). These experiments can be expensive to set up, maintain and interpret, but the proof they provide is impossible to argue with. It can take a lot of time and traffic to reach statistical significance – your backlog of ideas always fills up faster than you can test them – so the process tends to get hijacked by the opinions of whoever is the highest paid person in the room. A/B testing everything would be expensive, slow, and impractical. We’re running a business not a lab, so we have to be comfortable making decisions under conditions of uncertainty. So when do we let the algorithms decide, and when should we prioritize human learning over machine learning?
Time in your testing schedule should be reserved for the most consequential strategic decisions – that can’t be easily reversed – what Bezos calls ‘one-way doors’. “But most decisions aren't like that -- they are changeable, reversible -- they're two-way doors” as Bezos says. However big ideas, top level concepts, are important to get right. If you’re focused on an evolutionary dead end, then further iteration is a waste. If you build your brand on quicksand, it can sink your whole company. Once you start down a path, it’s hard to change course even with supporting data: “that’s just how things are done here”. So start from the top, and zoom out to the Concept level, and generate a basket of big ideas to test: ignore the finer details for now. If you’re a Travel app, do customers mostly want beach holidays or city breaks? In FinTech, is it payment notifications or splitting the bill with friends that resonates? For a Fashion eCommerce site, is it ethically sourced products or the quality of the materials that matters? To take the Gaming example from Eric Seufert’s Creative Testing Framework: do they click on Ogres or Dragons?
Once you’ve found a concept that works, drill down into themes within that concept. Anything that’s easily changeable or reversible “can and should be made quickly by high judgment individuals or small groups” as Bezos says. In Travel does a Florida beach perform better than a Spanish one? Should you show friends splitting the bill over pizza or coffee to advertise your FinTech app? For Fashion, is Italian leather or American perceived as higher quality? Do realistic looking Ogres beat cartoonish ones in Gaming? Within a winning theme, there are thousands of potential variants to try: border colors, button text, ad formats. These 1-2% differences add up, but they probably aren’t worth testing individually. This is where you can safely abandon science, and let the algorithm decide.
Eventually even high performing combinations will saturate, as your audience get tired of seeing them: a phenomenon called ‘creative fatigue’. When that happens, start again at the top with a new Concept, or revisit an existing Theme that you think could work if given a chance. Concept-Theme-Variant: if you maintain the discipline this framework demands, you’ll always know what to test next.
Name | Link | Type |
---|---|---|
1997 LETTER TO SHAREHOLDERS - AMAZON.COM | Reference | |
Blog | ||
Online Controlled Experiments at Large Scale | Paper | |
Preparing for a
Global App Launch,
Game Perspective, King | Presentation | |
What's the best way to test creatives on Facebook? | Forum |