19 Jan 2025 10 min read

How does product strategy change in the LLM era?

The emergence of Large Language Models (LLMs) fundamentally changes product strategy.

TLDR

LLM's unique position as an effectively self-improving component means certain product capabilities automatically get better with new model releases. It also means competitors get the same improvements simultaneously.

As a result, traditional differentiators (like UX) become less effective as competitive moats. Instead, successful LLM-based products will hinge on building strong information retrieval systems, developing robust evaluation frameworks, and creating unique value through data quality and domain expertise.

The most important thing to understand is that – while LLM capabilities will be widely available – how you retrieve, present, and apply information to solve specific problems becomes the true competitive advantage for your products.

This suggests three key areas to focus on with a product strategy.

Firstly, you've got to build systems to operationalise automatic evaluations and quality assurance on your data, and ensure your retrieval is best in class.

Secondly, you should optimise for capabilities that don't depend on UI: creating building blocks for information retrieval and manipulation behind easy-to-use APIs. Application teams can then determine which building blocks to pull together (in which way) for the core product. This should make application iteration faster (since pre-existing composable capabilities in the platform), make it easier to get capabilities into the hands of users through non-core-product-channels (e.g. API users, open data, organisations building off of platform capabilities), and make it easier to integrate capabilities and data into the diverse portfolio of tools that users regularly employ and which we do not want to try and re-invent.

Lastly, and this one isn't new, design your platform to make sure you have tight feedback loops and can frictionlessly iterate your UX or prompt frameworks.

As these are high-churn areas, rapid UX and prompting iteration entail greater responsiveness to research progress and greater resilience to technology shocks.

LLMs change product best practices – introduction

Two and a bit years after ChatGPT was introduced to the world – breaking the record for user adoption by reaching 100 million people within weeks – we're just starting to scratch the surface of how LLMs will change the ways we interact with software products.

We're still in the Wild West era of generative AI. Like when computers still required house-sized buildings in which to operate; or when putting a site up on the World Wide Web meant you needed to code a web server to run on the Pentium 2 powered beige box in your basement. The days of AOL and Geocities rather than Facebook and Spotify.

Thousands of new AI startups launch every day. Many of them will be trying product strategies that simply weren't possible before. We don't know if they'll work or not. "Best practices" come from the pain of experience, and from observing the trials and tribulations of others that came before. Many of the best practices of the modern web came more than a decade after the initial technology trigger.

A question I've been pondering on recently is: "How does product strategy change in the LLM era?"

I'm using the term LLM-based product here to describe any product where we use LLM or other GenAI components as part of the value it brings to users. Different products use this to lesser or greater extents, but there are some fairly generic but pivotal heuristics that apply to all of them.

There are parts of your product that now self-improve

This is a fairly new phenomenon for software products.

In the old days, before 2022, improving your product required explicit effort. After all, a component that does Job X will only improve if you apply effort to that improvement.

However, we now have parts of the system where every few weeks (even days!), we can drop in a new version of that component that is simply better than what it was before.

Sure, there's only certain parts of the system that can be used for LLMs, so only certain bits will work like this. But it is still powerful.

Say that part of your product summarises a document, or extracts tables and figures, or works out what the subject of an enquiry is (to pick just three of a great, great many possibilities). Those components may now just get better over time, without you explicitly needing to improve them.

This is different from the updates of software dependencies that engineers know and love. In traditional software dependency situations, vendors of components in the products work to improve their products. Yes, theoretically, often that involves parts of the product improving without us putting effort in. In practice, however, this has often required engineers to update how the code is integrated into the product, or adding new capabilities not already used in the product, fixing bugs, or other items.

This is markedly different to the LLM-oriented update of "it just does the job much, much better" as a drop-in replacement.

Let's look at a simplified view of a common LLM-based product architecture: a Retrieval-Augmented Generation (RAG) pipeline. In this system, we take user inputs, and use our information retrieval system to find relevant information to that input.

An oversimplified view of a RAG pipeline

At Climate Policy Radar, that might mean we help users find relevant passages from law and policy documents.

The LLM is given the results, and we use some prompt framework to get the LLM to use the information from the external information store and process, transform, understand, and output it to the product's frontend. Here, the UX makes it accessible to the user in a way that they get value from (...we hope!).

What does it mean for product strategy when we assume that the LLM box here will continuously improve?

Evaluation frameworks are non-negotiable

We have to understand the nature, size, and shape of the improvement. Automatically evaluating the change in capability of this box is key to leveraging the property strategically and effectively.

Did it improve for our uses?
Did it involve trade-offs?

Remember that LLMs excel at narrowly-scoped tasks. Decompose the boxes into the smallest work-units you can, and wrap each of those in task-specific evals. It will be easier to define good evals for small tasks, and it will be easier to get LLMs to be good at them. You will have fewer edge cases and less unexpected emergent behaviour.

The prompt framework will be unstable and rapidly changing

The prompt framework is how we interact with the LLM for a given product. On that front, we're still in the days of magic tricks and poorly understood guru-mythology. Tricks like Chain-of-Thought prompting, how to format lists and input data, understanding what language works and what doesn't are still key skills in the LLM world. But they are tightly coupled to specific models. As the models change, prompt frameworks are likely to need to be updated in tandem.

This is the area where one should spend minimal effort. Use the evaluation framework to try a few things and get it working well, but don't over-engineer this.

Everyone in the market gets the same improvements at the same time

When a new model is released, every product using that model can improve simultaneously. It doesn't give you a differentiating advantage.

Instead, ensure that your responsiveness to and ability to leverage those improvements outstrips that of your competitors.

It's hard to differentiate on intelligence

If your product is mostly LLM and not much else, you will find it hard to differentiate, because your biggest value increments come from the model updates; and everyone gets those.

Differentiate elsewhere in the stack.

(You may also have heard of "agentic" AI. Here, you'd also equip the AI with tools (such as "search using this search term", "add to this spreadsheet", and so on), ask it to develop and maintain a todo list to attain the goal, and connect its output back to its input. In doing so, it becomes an agent, with goal-seeking behaviour. My analysis about how AI changes product strategy not only holds for agentic AI: the conclusions become even stronger.)

UX is a differentiator, not a competitive moat

We're still exploring different ways we can use AI to help people using software products– remember there aren't UX best practices yet. In fact, I think it's clear now that the end-state of GenAI products will not be "conversational" (I believe we'll eventually see conversational GenAI interfaces as a relic of the early days of the AI revolution: like blue and purple coloured links are signals of the early days of the web.)

If you're trying to make UX your competitive moat, you will be competing with everyone currently trying to make AI products. Literally everyone. Including the big dogs of tech. The number of barely-launched startups rendered obsolete by each OpenAI update is so large that it has become a meme.

That's not to say that UX doesn't matter for product strategy that encompasses GenAI. Not at all. You need to differentiate with UX, and try new things. (We're already discovering there're lots of bad ways to do AI UX!) But use it to cement your value rather than trying to compete on it.

There's another facet to this. Increasingly, users of your products will be AI. Not human. Not just in scraping, but in operation. Anthropic recently fired the starting gun on the race for AI that effectively controls a computer in the same way as a human (and OpenAI isn't far behind). AI products regularly equip LLMs with web browsing and code execution tools, or enable them to call APIs for other products and interact with them programmatically.

So the UX of your product is not just the user interface. It is also in the API, and how frictionless that is for AIs to use, or for developers to integrate into their AI tools. That's why I've been thinking of the user experience as:

How you manage interoperability in the right way; enabling integration against ecosystem partners that are competitively aligned, and adding friction against those who aren't
A channel for the value brought by your data

By leveraging good UX principles (and including interoperability as part of that UX), you use your product experience to generate an improvement in the data in your information retrieval system or prompting framework.

In practice, here's a nice way to think about this. Take Google's NotebookLM, a RAG system that lets users upload a bunch of documents and ask questions of them using the Gemini foundation model. It has caught the imagination of many early adopters, and has superseded hundreds of "chat with your documents" products which now have Google as a key competitor.

But if you can make it easy to help your user get more out of NotebookLM, then maybe you can use the popularity of something like NotebookLM to your advantage.

It's worth considering if you need to be the one interface/product that everyone uses: is your focus on monopoly or market dominance? Is it a problem for if someone uses NotebookLM, or any of a million other products available, if your product, data, or tooling augments their value? You can still be sticky and get engagement!

Information Retrieval (IR) is the competitive moat

Garbage-In-Garbage-Out (GIGO) has only become more true in the generative AI world: the data you supply to an LLM is the most significant predictor of the value of the output of that LLM. This is the core axiom for designing RAG products. If we give them better data, we get better outputs.

Data – and the ease and precision with which we can access and explore it – is the easiest thing to create a moat around in this context. But there's significant friction in assembling, curating, and searching a large dataset.

This is why I say Information Retrieval is the moat, not just data. You may have the best data, but if the slice of that data you give to the LLM is rubbish, it means nothing.

Given two products with the same LLMs and the same use cases, the one with the better IR to feed the LLMs will win out. Every time. That's why focusing on IR is the number one way to lead the market.

The ways that people retrieve information is also changing rapidly: search volumes are decreasing. Users are switching to AI tools that give them customised, focused answers from search engine results. Perplexity lead the way here, Google has introduced AI Overviews for search, and OpenAI has just launched SearchGPT. Tools that serve AI friendly information are more likely to be cited by AI tools, and more likely to reach users in the new world.

It's also imperative that you are able to identify when your system creates the Garbage-Out portion of the equation. Evals are key here, but ensure also that you have a good system for observing and intervening in the system when it's over-optimising for undesired results. Don't over-rely on evals – they are the map, not the territory. You don't want to miss the unknown-unknowns, and with LLMs there are many. Monitor your outputs obsessively. Try and extract insights into automated evals as much as possible, just as you'd create a failing test for a bug once discovered.

Taste and preferences are distinct from intelligence

I think we'll see this more and more. Try an experiment for me: open your chat tool of choice (mine is Anthropic's Claude, but it applies to ChatGPT and Gemini and all of them). Think of a problem you had to solve recently, and ask it to generate 20 ideas to solve it. Some of them will be great. Some OK, and most are rubbish.

Now ask it to pick the top 3. I'm willing to bet actual money that you disagree with the three that it chooses.

That's because applying taste and preferences to something isn't the domain of intelligence. No matter how intelligent a model gets, it will have an emergent taste and preference ranking that likely differs from yours, and everyone's. That's a big part of modern knowledge work: being able to pick the best strategy, or choose the right architecture; in general, to apply one's taste to a situation – these are a big part of the value that justifies the salary. Google Maps gives you several options for routes: fastest or lowest emission. You might have a preference for scenic. The app supports you applying your preferences in combination with its capabilities. That is where the power of human/AI partnerships lie.

A product encodes the taste and preferences of the designer of that product. When everyone gets the same improvements to the self-improving parts of the product, then the incremental and differentiating value of the taste and preferences we apply around that component are significantly emphasised.

What this tells us about LLM product strategy

Product strategy changes markedly in LLM-supported products. We have to shift how we think about UX as a moat, and start considering AI-users as first-class citizens. We have to consider that part of the product will improve in performance rapidly without significant effort from our side. That's a double-edged sword. On the one hand, we get improvements for free. On the other hand, so does everyone else, and it induces fragility and churn into the parts of our system that directly interact with the LLM. And it motivates us to think about the capabilities we can build around that improvement to create lasting value.