Introduction#

What is OptScale AI#

OptScale AI is a unified AI operations platform—AI Gateway, FinOps, profiling, and observability in one place—for teams that manage LLM access, monitor token spend, enforce governance, and run model training workflows. Administrators configure providers, policies, MCP servers, and virtual keys in the Admin UI; practitioners use Chat, APIs, and external tools such as OpenCode and Cursor for day-to-day conversations and automation. IDE and agent clients use the same OpenAI-compatible LLM proxy and organization virtual keys as Chat, so traffic stays under the same policies, guardrails, and observability. Dashboards, traces, and usage views cover both inference traffic and profiled training runs.

For platform structure, navigation modules, and the governed request path through policies and providers, see Architecture Overview. To configure your first organization and provider, see First Steps; for Admin UI and Model Training screen detail, see Core Services and Model Training. To connect OpenAI-compatible clients, copy connection values from Chat and follow External Tools.

Key capabilities#

AI Gateway — Connect external AI vendors, expose a unified provider catalog, and route traffic with load balancing, fallbacks, and routing strategies.
FinOps and usage analytics — Track token consumption, provider costs, and provider usage from dashboards; compare spend across providers and teams; measure optimization savings on the Optimizations page—see Core Services — Optimizations.
Governance — Apply organization-wide policies (timeouts, sampling, request filtering) and reusable guardrails for safety, compliance, and prompt restrictions.
Virtual keys — Issue scoped API keys for applications and teams, with usage limits, isolation, and rotation.
MCP integrations — Register Model Context Protocol servers, manage transport and authentication, and assign access by team.
Observability — Inspect request traces, analyze errors, and debug multi-step AI workflows.
Model Training — Register models and datasets, track tasks and artifacts, book shared environments, and connect cloud accounts and CI integrations—see Model Training.
Chat — Interactive workspace for multi-provider conversations, attachments, voice, web access, and chat history.
External tools — Connect OpenCode, Cursor, and other OpenAI-compatible clients to the governed LLM proxy with virtual keys from AI Access; copy Base URL, API key, and Model name from Chat so IDE and agent traffic uses the same providers, policies, and Traces as the rest of the organization—see External Tools.

Core concepts#

Table 1: OptScale AI core concepts glossary — organization, providers, routing, governance, and observability
Concept	Description
Organization	Top-level tenant that owns providers, policies, teams, usage data, and billing context.
Team	Group within an organization used for access control and resource isolation.
Provider	A registered AI endpoint (vendor integration plus callable identity such as `openai/gpt-4o`), including credentials, health status, limits, and use in routing and Chat.
Router	Configuration that directs requests to providers using routing strategies, load balancing, and fallback rules.
Virtual key	API credential scoped to an organization or team, used by apps and integrations instead of raw provider keys.
Policy	Operational rule applied to AI requests (filtering, timeouts, sampling, enforcement stage).
Guardrail	Reusable safety or compliance control linked to policies. See Guardrail types and Guardrail thresholds for available engines and tuning guidance.
MCP server	External tool or data source exposed through the Model Context Protocol.
Trace	Record of a single AI request lifecycle for debugging and observability.

Supported AI providers#

OptScale AI integrates with multiple commercial and self-hosted vendor families. Supported integrations include:

OpenAI — GPT family endpoints
Anthropic — Claude endpoints
Google — Gemini and related Google AI endpoints
Cohere
Mistral
Ollama — Self-hosted open-source endpoints

The exact providers available in your deployment depend on which vendor connections you register. Administrators add and enable providers on the Providers page; end users then select from the enabled catalog in Chat or via API.

Context compression#

AI requests often include large amounts of context, which increases token consumption and inference costs. OptScale AI provides context compression to reduce the number of tokens sent to AI providers while preserving the information required to maintain the conversation context.

Context compression can be enabled at the following levels:

Users — AI Access → Users
Teams — AI Access → Teams
Agents

To enable optimization for a user, team, or agent, turn on the Enable context compression setting on the corresponding configuration page.

For information about enabling context compression for users and teams, see Enable context compression.

For how the compression engine processes requests, see Compression workflow. To measure savings from compression and other optimizations, see Optimizations.

Main use cases#

Centralize AI access for the organization
Replace scattered vendor API keys with virtual keys, unified routing, and a single provider catalog so teams share governed access instead of ad hoc credentials.

Control cost and usage
Monitor token volume and spend by team and provider; use dashboards and usage views to find inefficiencies and enforce budgets.

Enforce governance and compliance
Apply policies and guardrails so requests respect timeouts, content rules, and safety requirements before they reach external providers.

Operate multi-provider AI workloads
Route traffic across providers with fallbacks and load balancing; compare provider performance and cost without rewriting client applications.

Extend assistants with approved tools
Connect MCP servers so assistants can use organization-approved tools and external context.

Debug and improve AI workflows
Use traces and request lifecycle views to troubleshoot failures, latency, and unexpected provider behavior.

Collaborate through Chat
Give practitioners a shared chat workspace with history, provider switching, attachments, and organization-scoped access.

Connect OpenCode and Cursor to OptScale AI
Route IDE and agent clients through the governed LLM proxy with virtual keys—see Connect OpenCode and Connect Cursor.

Migrate existing LLM applications
Point OpenAI-compatible SDKs and HTTP clients at OptScale AI Gateway by updating base URL, API key, and model name—see Switch to OptScale AI Gateway.

Migrate existing AI conversations
Export history from ChatGPT, Claude, or Gemini and import it into OptScale AI Chat so teams keep context under organization governance—see Use Cases — Migrate chat history.

Track ML experiments and artifacts
Version datasets, models, tasks, and experiment outputs, and manage shared environments and cloud connections—see Model Training.