A Proof of Concept That Proves the Technology Has Proved Almost Nothing

1. The demo trap

An AI team runs a pilot.

The model can summarise documents, answer questions or generate code. Users like it. The sponsor presents positive feedback. The programme asks for production funding.

The technology may have proved it can perform the task.

It has not yet proved:

that the task matters
that performance is reliable enough
that users will change their behaviour
that the workflow will improve
that the full production cost is acceptable
that risk is manageable
that the organisation can capture the benefit
that the use case is better than alternatives
that it deserves scarce capital

Gartner recommends turning a proof of concept into a proof of value by weighing achieved benefits against AI cost.[^gartner] That sounds simple. In practice, it requires a different pilot design.

2. Four proofs, not one

Proof 1: feasibility

Can the system technically perform the task?

Required evidence:

functional completion
integration viability
latency
security feasibility
data access
model availability

Proof 2: performance

Does it perform at the required standard?

Required evidence:

accuracy
acceptance
error and hallucination rates
robustness
consistency
escalation rate
performance by case type

Proof 3: operating adoption

Will people and systems use it in the real workflow?

Required evidence:

repeat use
task fit
trust
workarounds
shadow use
review behaviour
manager and process-owner acceptance
agent or system integration

Proof 4: value

Did a meaningful outcome change, and can the organisation capture it?

Required evidence:

baseline movement
cost per successful outcome
throughput
revenue or margin
quality
risk or loss
released and redeployed capacity
customer or employee outcome
confidence in attribution

A pilot that completes only Proof 1 is a technical experiment, not an investment case.

Proof standard

distinct proofs required before scale

proof completed by a technical demo alone

investment case created by feasibility on its own

Analysis · AI Revenue Economics

Revenue Ambition vs. Revenue Reality

Most leaders expect AI to generate significant revenue. Far fewer can specify how. The gap between ambition and identified mechanism is where AI investment fails to convert into commercial advantage.

The AI revenue mechanism chain — all six links must hold

AI capability

→

Changed proposition or route to market

→

Customer behaviour change

→

Revenue

→

Contribution margin

→

Durable advantage

The five AI revenue mechanisms — every growth case should specify which applies

Conversion & retention

AI improves win rates, reduces churn, or expands accounts through better personalisation or response speed

Premium product or feature

AI capability justifies a price premium, a higher-tier subscription, or a new product line customers will pay for

New service or consumption model

AI enables a service that could not previously be delivered at this cost, speed, or scale

Faster time to market

AI compresses development cycles enough to enter markets or launch features ahead of competitors

Ecosystem or data monetisation

AI creates a platform, marketplace, or data asset that compounds in value as participants increase

Source: IBM Institute for Business Value, “The Enterprise in 2030” (2025 survey of 2,000 executives). The 79% and 24% figures are executive expectations, not observed outcomes. Survey methodology: self-reported expectations about future AI revenue impact. Read as directional signals of ambition vs. analytical readiness, not as predictions of actual commercial results. Revenue mechanism chain: AI Economics Hub framework.

3. Start with a value contract

Before the pilot begins, write a one-page value contract.

It should state:

problem and current baseline
business owner
users and affected workflow
target outcome
value dimension
acceptable quality and risk
full-cost hypothesis
evidence method
production-scale scenario
scale, redesign and stop criteria
date of decision

This prevents the success definition changing after results arrive.

4. Build the baseline first

Without a baseline, every improvement claim becomes anecdotal.

Relevant baselines may include:

time per case
cases per employee
first-time resolution
defect rate
conversion
loss rate
cycle time
backlog
customer effort
review effort
cost per completed task
risk incidents

The baseline should include distribution, not just an average. AI may improve easy cases and damage complex ones.

5. Measure the full workflow

A model benchmark is not a workflow benchmark.

The pilot should measure:

input preparation
retrieval
model processing
tool execution
human review
correction
escalation
downstream processing
exception handling
governance and audit work

A system that produces an answer in seconds may still make the workflow slower if review and repair increase.

6. Bridge pilot cost to production cost

Pilot economics are structurally favourable:

small volumes
motivated users
limited integration
manual support
subsidised vendor access
reduced controls
selected data
hidden engineering effort

Production introduces:

higher and less predictable volume
integration and observability
identity and access
evaluation
incident response
support
compliance
change management
data pipelines
vendor commitments
resilience
model drift
human oversight

Deloitte's build-versus-buy analysis demonstrates how cost structure can change as volume and complexity scale.[^deloitte] Its own TCO model also excludes several categories, reinforcing why a pilot must use the site's full AI TCO Framework rather than a model invoice alone.

7. Include behavioural evidence

A pilot changes people as well as process.

Measure:

who adopts and who avoids
whether users over-trust outputs
whether experienced and inexperienced workers perform differently
whether people create shadow workflows
whether review becomes superficial
whether junior learning is displaced
whether people retain the ability to work without the system
whether managers redesign work or simply add AI on top

Gartner argues that behavioural outcomes deserve the same rigour as business and technology outcomes.[^gartner]

8. Prove capture, not just potential

A common claim is:

"AI saves ten minutes per task across 100,000 tasks."

That is capacity potential.

Value capture requires an explicit path:

Will headcount reduce?
Will output increase?
Will queues fall?
Will service improve?
Will people move to higher-value work?
Is there demand for the additional capacity?
Is the receiving process ready?
Is the benefit owner accountable?

If nobody can answer, the benefit remains theoretical.

9. Decision gates

Gate 0: permission to experiment

Required:

problem worth testing
owner
baseline plan
risk boundary
small budget
decision date

Gate 1: technical viability

Required:

integration feasible
data available
minimum performance met
no disqualifying risk

Decision:

stop
redesign
continue to controlled workflow test

Gate 2: operating viability

Required:

workflow fit
repeat adoption
review burden known
support model known
production cost range

Decision:

stop
redesign
limited production

Gate 3: value evidence

Required:

outcome movement
credible attribution
benefit owner
value-capture plan
acceptable cost per successful outcome

Decision:

scale
optimise
contain
stop

Gate 4: portfolio scale

Required:

comparison with competing investments
strategic fit
vendor and sovereignty assessment
capability and risk implications
funding source

10. What should count as failure

Failure includes:

no outcome movement
value below hurdle rate
unacceptable review burden
cost scaling faster than benefit
adoption without workflow impact
material quality disparity
inability to capture released capacity
vendor dependency beyond risk tolerance
evidence too weak for the next funding stage

Stopping can be a successful governance outcome.

11. A worked illustration

A contract-review assistant reduces first-pass review from 90 minutes to 25 minutes.

A conventional pilot declares success.

A Proof of Value asks:

Does legal review time fall after correction and escalation?
Are material clauses missed?
Does the workflow handle complex and multilingual contracts?
Does faster review increase completed deals or merely move the queue?
Can lawyers redeploy time?
What is the production cost including retrieval, audit and human review?
Who owns the benefit?
What happens at ten times the volume?
Is the model allowed to process the data in required jurisdictions?

The initial speed result remains useful. It is one input, not the conclusion.

Conclusion

A proof of concept demonstrates possibility.

A proof of value demonstrates enough technical, operating and economic evidence to justify the next decision.

The distinction is not administrative. It is the difference between funding learning and funding hope.

Downloadable

AI Proof of Value Scorecard - A structured template for evaluating AI pilots across feasibility, performance, adoption, cost, risk and value evidence.

Sources

[^gartner]: Gartner, "For AI Value, Focus on Your Use Cases", https://www.gartner.com/en/articles/ai-value. Accessed June 2026.

[^deloitte]: Deloitte, The pivot to tokenomics: Navigating AI's new spend dynamics, pp. 13-19 and pp. 25-26. The report models infrastructure sourcing decisions and explicitly lists TCO exclusions.

[^nist]: NIST AI Risk Management Framework, https://www.nist.gov/itl/ai-risk-management-framework. Emphasises explicit human roles, responsibilities and oversight across AI systems.

[^oecd]: OECD, "The effects of generative AI on productivity, innovation and entrepreneurship", https://www.oecd.org/en/publications/the-effects-of-generative-ai-on-productivity-innovation-and-entrepreneurship_b21df222-en.html. Reviews task-specific productivity evidence and variation by context.

A Proof of Concept That Proves the Technology Has Proved Almost Nothing

1. The demo trap

2. Four proofs, not one

Proof 1: feasibility

Proof 2: performance

Proof 3: operating adoption

Proof 4: value

Revenue Ambition vs. Revenue Reality

3. Start with a value contract

4. Build the baseline first

5. Measure the full workflow

6. Bridge pilot cost to production cost

7. Include behavioural evidence

8. Prove capture, not just potential

9. Decision gates

Gate 0: permission to experiment

Gate 1: technical viability

Gate 2: operating viability

Gate 3: value evidence

Gate 4: portfolio scale

10. What should count as failure

11. A worked illustration

Conclusion

Downloadable

Sources

Continue exploring

AI ROI Models

AI TCO Framework

Metrics Framework

AI Business Case Guide

A Proof of Concept That Proves the Technology Has Proved Almost Nothing

1. The demo trap

2. Four proofs, not one

Proof 1: feasibility

Proof 2: performance

Proof 3: operating adoption

Proof 4: value

3. Start with a value contract

4. Build the baseline first

5. Measure the full workflow

6. Bridge pilot cost to production cost

7. Include behavioural evidence

8. Prove capture, not just potential

9. Decision gates

Gate 0: permission to experiment

Gate 1: technical viability

Gate 2: operating viability

Gate 3: value evidence

Gate 4: portfolio scale

10. What should count as failure

11. A worked illustration

Conclusion

Downloadable

Related reading

Sources

Continue exploring

AI ROI Models

AI TCO Framework

Metrics Framework

AI Business Case Guide