| What compresses | Typical reduction |
| Tool outputs, logs & API responses | up to ~90% |
| Code & search results | up to ~90% |
| RAG chunks & long documents | 40–75% |
| Conversation history | 50–80% |
Illustrative: a 60,000-token incident log shrinks to roughly 5,000 tokens before the model ever reads it — and the answer stays the same. Task accuracy holds flat on standard math and factual benchmarks after compression.
Measured, not guessed
Output-side savings are counterfactual — we never see what the model would have written. So OptScale reports an honest estimate with a confidence range rather than a round marketing number:
Reduction: 31.7% (95% CI 27.7% … 35.7%)
Want a measured figure instead? Hold out a slice of traffic as an unshaped control group, and the dashboard shows tokens saved side-by-side, labelled measured vs estimated.
This is the same discipline behind our analytics pillar: quality and savings are measured after the fact with an audit trail, not predicted before it.