Skip to content

From framework to first step: an honest approach

95% of generative AI pilots fail. Not because the technology does not work, but because organisations start with the tool instead of the diagnosis. An honest approach for breaking that pattern.

95% of generative AI pilots fail.

That is the figure from the 2025 MIT report. It sounds alarming. But once you understand the reason behind it, it is actually reassuring — because it is a problem you can avoid.

The pilots fail not because the technology does not work. They fail because organisations start with the tool, not the question. With the demo, not the diagnosis. With the vendor, not the architecture. They build a proof of concept that impresses in a presentation and is irrelevant in production.

This series has spent seven weeks building a framework. The question now: how do you translate that into a first step that actually produces value?

The most common mistake: starting with the tool

There is a pattern that repeats in almost every failed AI initiative:

  1. The board sees a demo and wants “something with AI”
  2. IT or a consultancy selects a platform
  3. A pilot is built on an arbitrary use case
  4. The pilot impresses in a presentation
  5. The pilot does not roll out because the business value is unclear
  6. Budget rises, trust drops, the project stops

The fundamental mistake sits in steps 1 and 2. You begin with the solution before defining the problem. And you select a platform before knowing which architecture you need.

The correct sequence is reversed: start with a concrete work process with a measurable outcome, define which AI capability accelerates that process, and only then determine which architectural choices are required.

Step 1: Select one use case suitable for the flywheel

Not every use case is a good starting point for an AI flywheel. The best first use case meets four criteria.

High repetition value. It has to be a process that runs dozens or hundreds of times a week. One-off or rarely repeated tasks do not generate enough data to train memory and get the flywheel spinning.

Measurable outcome. You must be able to define success in advance: lead time, accuracy, escalation rate, conversion rate, cost per unit. Vague goals like “working more efficiently” are unsuitable as a starting point.

Manageable risk. Do not begin with processes where a mistake by the AI system causes irreversible damage. Customer-service routing, proposal support, document classification, internal knowledge retrieval — these are processes with high repetition, measurable output and manageable risk when the system occasionally makes an error.

Sound data foundation. Memory can only build on data that exists. Begin with a use case for which historical data is already available: customer interactions, decision logs, process outcomes. A use case without a data foundation cannot build a flywheel.

Practical test: a use case that scores on all four criteria and for which you can produce measurable production results within 90 days is a good start.

Step 2: Make the three architectural choices deliberately

Once the use case is selected, make three explicit architectural choices before you start building.

Model choice. Which model do you use for which task within this use case? Is there reasoning that demands a frontier model, or is a smaller, faster model sufficient for most steps? Define this up front — and build in routing, even if you have only one model on day one.

Harness design. Which guardrails are required for this use case? Which actions may the system execute autonomously? Which require human approval? Which tools are connected, with what validation logic? Write this up as a design document before you write a line of code.

Memory strategy. What must the system remember? Define the three layers: what is episodic (specific interactions), what is semantic (domain knowledge), what is procedural (learned approaches)? Which storage solution fits the volume and complexity of this use case?

These three choices take a day to work out. They save months of rework and rebuilding.

Step 3: Build small, measure fast, scale deliberately

The flywheel does not start big. It starts with one process, one team, one set of measurable KPIs.

Define the KPIs per layer before launch:

Model KPIs. Task completion rate, hallucination rate on your use case (measured with your evaluation set, not a benchmark), latency per task type.

Harness KPIs. Tool-call success rate, escalation rate to human review, fallback frequency, average cost per task.

Memory KPIs. Retrieval accuracy (does the system surface the relevant context?), memory growth per week, improvement in output quality over time as a measure of cumulative learning.

Business KPIs. Lead-time reduction, error rate compared to the manual process, employee satisfaction (does the system require more or less manual correction over time?).

Measure at four weeks. Measure at three months. Adjust based on data, not on assumptions. Only scale when the KPIs show that the flywheel is turning — not when the demo goes well.

The internal question nobody asks but everybody must answer

There is one organisational choice that matters more for success than all the technical decisions combined: who owns the AI system?

Not the project owner of the pilot. Not the vendor who supplies the platform. But the structural owner responsible for the quality of the model, the robustness of the harness, the growth of memory, and the improvement of the KPIs over time.

Organisations that treat AI as a project — with a start, an end and a delivery date — build something that stops evolving the moment the project closes. Organisations that treat AI as a capability — alongside finance, compliance or customer management — build something that improves structurally.

That requires an owner. Not necessarily a large department. But someone who looks at the KPIs on Monday, ships the improvements on Friday, and reconsiders the architectural choices annually based on what has been learned.

The honest expectation

AI is not a project. It is an organisational capability you build.

The first ninety days deliver measurable value on one use case. The first year builds the foundation: model strategy, harness architecture, memory strategy. After two years, a well-built system has an institutional memory that differentiates it from organisations that started later.

That is not fast. But it is not slow either — compared to the organisations launching their third pilot project today without a structural approach.

25% of AI initiatives deliver the expected return. The 75% that do not almost always share the same problem: no clear architectural choices, no deliberate model strategy, no memory strategy, no ownership after the pilot.

Start small. Build deep. Make the three choices deliberately. And measure what actually matters: not how impressive the demo is, but how much better the system has become over the past ninety days.

That is AI as an accelerator.

About this series

This series was an attempt to be honest about what AI is and is not — and about the architectural choices that determine which side of that distinction you end up on.

The seven posts at a glance:

  1. Most organisations are building better RPA — the fundamental distinction between automating and accelerating
  2. The fragmented landscape as a design problem — how to make architectural choices in a chaotic ecosystem
  3. Pillar 1: Model Quality — the right model for the right task, with routing instead of one-for-all
  4. Pillar 2: Harness Strength — the control plane that makes a model reliable in production
  5. Pillar 3: Persistent Memory — the institutional memory that makes AI cumulative
  6. The three pillars together — the flywheel that accelerates the longer it runs
  7. From framework to first step — an honest approach to getting started

Questions, reactions, a different perspective? Reply below. I read everything.

Lees in het Nederlands →