The honest ROI formula
Most AI ROI calculators are fantasy. They multiply 'time saved per task' by 'tasks per day' by 'hourly rate' and report eye-watering returns. Real ROI is messier because: (1) time saved doesn't always convert to revenue or cost reduction; (2) productivity gains are often partially offset by adoption friction; (3) the savings are spread across many people, not concentrated on one role.
The honest ROI formula has four components: time savings (real, hours per week per affected role), conversion uplift (where the GPT touches the customer), risk reduction (avoided errors, faster compliance), and capacity unlock (work that wouldn't have happened without the GPT).
Real benchmarks by use case
Customer service GPT
$200,000–$800,000/year savings for a 30-agent team. Typical ROI: 4–8x in year 1. Driver: ticket deflection (60–80%) + handle time reduction on escalated tickets (30–40%).
Sales enablement GPT
$300,000–$2,000,000/year revenue impact for a 15-rep AE team. Typical ROI: 6–15x in year 1. Driver: 15–25% lift in opportunity-to-close conversion + faster ramp on new hires.
Internal knowledge GPT
$150,000–$500,000/year productivity savings for a 200-person company. Typical ROI: 3–6x in year 1. Driver: 4–8 hours/week per knowledge worker on 'how do I' questions.
Document analysis GPT
$100,000–$600,000/year for legal/compliance teams. Typical ROI: 4–10x in year 1. Driver: contract review time reduction (3x faster) + reduced senior-lawyer review hours.
Where ROI calculations go wrong
Five places ROI math is consistently overstated, in our experience reviewing 60+ business cases:
- Time saved ≠ value created. Saving an hour of an engineer's time only converts to revenue if that hour goes to billable or revenue-driving work. If they spend it in extra meetings or browsing Reddit, the saved hour is real but the dollar impact is zero.
- Adoption rates are aspirational. Year 1 adoption is typically 40–60% of the eligible user base, not 100%. Bake that into the model.
- Inference cost compounds. A 10x increase in usage means a 10x increase in inference spend. Calculate the bill at full adoption, not at pilot scale.
- Implementation time isn't free. Your team's time on the project (workshops, data prep, change management) is real cost. A $50k build often has $30k of internal time alongside it.
- The 'do nothing' counterfactual matters. Without the GPT, would your team have hired? Would your tickets have grown? The ROI is measured against the realistic alternative, not against zero.
A grounded ROI worksheet
For an internal knowledge GPT in a 200-person company, here's the calculation we use:
- Inputs: 200 employees × 60% adoption × 4 hours/week saved × $80/hour blended rate × 48 weeks/year × 60% conversion-to-real-value = $440,640/year benefit
- Costs: $40,000 build + $36,000/year operations + $24,000/year inference = $76,000 year 1, $60,000/year ongoing
- Year 1 net benefit: $440,640 − $76,000 = $364,640
- Year 1 ROI: $364,640 ÷ $76,000 = 4.8x
- 3-year cumulative ROI: ($440k × 3 − $196k) ÷ $196k = 5.7x
How to make ROI realer in your business case
Three things to do before presenting any AI business case internally: (1) Run the math with conservative inputs — 40% adoption, 50% conversion-to-real-value, 30% time savings instead of 50%. If it still pencils out, you have a real business case. (2) Identify the specific revenue or cost line item the savings hit. 'Productivity improvement' is too vague to commit to. (3) Define a measurable post-deployment metric you'll report on at month 6, month 12, month 24.
The honest ROI conversation. Custom GPT projects with conservative business cases (3–5x year 1 ROI) tend to deliver. Projects with aggressive business cases (15–25x year 1 ROI) tend to disappoint. We'd rather build for the conservative case and overdeliver than promise unrealistic numbers and underdeliver. The pattern is consistent.
Frequently asked questions
What if the ROI doesn't materialise?
Common outcomes when ROI underwhelms: (1) adoption was lower than planned — usually fixable with better change management; (2) the use case was too narrow to move the needle; (3) the data wasn't ready and the GPT's accuracy suffered. We do quarterly business reviews on every engagement specifically to catch these early.
How do you measure adoption?
Daily active users / total eligible users, plus query volume per user, plus 'helpful'/'not helpful' feedback rates. Healthy adoption is 50–70% DAU/total within 90 days. Below 30% is a red flag; below 15% is a project failure.
Can we run a pilot to test ROI before committing?
Strongly recommended. We typically run 6–8 week pilots scoped to one team, one use case, with explicit success metrics defined upfront. Pilots that hit the metrics convert to full deployments; pilots that miss surface what to change. Better to spend $25k on a clear-eyed pilot than $100k on a hopeful full build.
How long until we know if it's working?
First signal at week 4 (early adopters using it). Real signal at month 3 (broader adoption, measurable metric movement). Confidence at month 6. Long-term ROI clear at month 12. Don't make decisions at month 1 or 2 — too early.
Ready to build your custom GPT?
Get a free 30-minute scoping call. We'll map your use case, data sources, and ROI before you commit.
Start the Conversation