Skip to content

Core Services#

These pages cover the AI Gateway, access control, and FinOps areas in the Admin UI sidebar above MODEL TRAINING. For AIOps pages, see Model Training overview, Experiment Tracking, and Environments & Operations.

Home / Dashboards#

Dashboards is the default landing page in the Admin UI. It gives operators a visual overview of spend, token volume, provider activity, and request success vs failure over time without opening Usage and Cost or Traces for every check.

Page layout and controls#

  • TitleDashboards at the top of the main area.
  • View tabs — Switch between personal and shared dashboards:
    • MY — Dashboards you own or that are shared with you (often shown with a share icon).
    • DEFAULT — Organization-wide or system-provided dashboard views.
  • Actions (top right):
    • REFRESH — Reload widget data; may include a dropdown (for example auto-refresh OFF / on an interval).
    • PIN DASHBOARD — Pin the current dashboard for quick access.
    • + CREATE DASHBOARD — Create a custom dashboard layout.
    • EDIT — Change which widgets appear and how they are arranged (pencil icon).

Dashboard widgets#

The DEFAULT (and custom) dashboards typically show time-series charts on a shared date axis (for example daily buckets from early May through mid-May). Common widgets include:

Cost

  • Daily spend in USD on the vertical axis, dates on the horizontal axis.
  • A multi-series line chart broken down by provider (each line is a registered provider identifier), for example llama3:latest, gpt-4o, claude-3-mini, and gpt-5-nano.
  • Use this chart to compare vendor or endpoint cost trends and spot spikes.

Token usage

  • Total Input vs Output token volume over time (separate series, often blue and yellow in the legend).
  • Vertical scale is usually token counts (for example thousands per day).
  • Use this chart to see whether growth is driven by prompts, completions, or both.

Provider usage

  • Request or traffic volume per provider over the same period (one line per provider identifier, aligned with the Cost chart legend).
  • Vertical scale is typically request or usage counts (for example thousands per day).
  • Use this chart to correlate load with cost and to see which providers carry the most traffic.

Request volume

  • Total request count over time, split into Successful requests and Failed requests (separate series, often blue and yellow in the legend).
  • Vertical scale is usually raw request counts (for example 0–300 per day on the same date axis as other widgets).
  • Use this chart to spot error spikes or days when failures rise relative to success, then drill into Traces for details.

Widgets share a consistent time range so you can compare cost, tokens, provider usage, and request outcomes for the same interval. After changing providers or routing in Providers, allow a refresh cycle before expecting charts to reflect new traffic patterns.

AI Access#

The AI Access page provides centralized management of organization users, roles, and access settings. Invite users, review role assignments, and control provider restrictions for organization members.

See AI Access principles.

Page Overview#

The AI Access page provides several user and access management controls.

Table 1: AI Access page controls and actions
Control Purpose
INVITE Open the flow to add users to the organization.
DOWNLOAD Export the current user list; see Export user data for format options.
Search Filter the table by user name, email, or other visible fields (magnifying glass on the right).

The main table displays organization members and their access configuration. Each row contains the user display name, unique user identifier, email address, assigned organization roles, and last login information. The table also shows assigned virtual keys with copy-to-clipboard controls, associated team membership, restricted provider access settings, and available row actions.

Below the table, a status line shows counts such as Total: 4 and Unassigned: 0 so you can see how many users are in the organization and how many lack role or key assignment.

Export user data#

Click DOWNLOAD to open a dropdown and choose an export format:

  • XLSX spreadsheet — Download the user list as an Excel-compatible file for reporting or offline review.
  • JSON file — Download the same data as JSON for automation, backups, or integration with other tools.

The export reflects the users currently visible in the table (respecting any active Search filter).

Invite users#

To invite new users to the organization:

1. Go to AI Access → click Invite.

2. On the Invite Users page enter email and set roles.

  • Email: enter one or more emails to invite users to your organization.
  • Add role: choose a role.

3. Click Invite.

Allowed providers#

Allowed providers control which registered provider connections a user can use. Registering a provider under Providers makes it available to the organization; assigning allowed providers decides which of those connections appear for a specific member in AI Chat and related access paths (including virtual keys scoped to that user).

When to use it#

  • After you add or activate providers, before users open AI Chat for the first time.
  • When different teams or roles should use different vendor endpoints (for example only ollama for developers, only managed cloud APIs for analysts).
  • When troubleshooting “provider not in dropdown” reports—confirm the provider is Active on the Providers and assigned in Allowed providers for that user.

Configure allowed providers#

  1. Open AI Access and find the user row (use Search if needed).
  2. In the Allowed providers column, open the edit flow for that user.
  3. Select one or more Active providers from the organization catalog.
  4. Save the assignment.

Repeat for each member who needs access. Inviting a user (Invite users) does not assign providers automatically—you configure Allowed providers separately after the account exists in the table.

During initial setup, this step follows Add first provider in First Steps.

Expected behavior#

  • AI Chat — The provider selector shows only providers assigned to the signed-in user (among those that are Active).
  • No assignment — If Allowed providers is empty for a user, they will not see unassigned endpoints in Chat even when those providers are healthy for the organization.

Providers#

The Providers area is where administrators connect external AI services and decide how traffic reaches them. The UI is split into two tabs—Providers and Routers—that answer different questions: which endpoints are registered? versus how should requests choose among them?

Providers list#

The Providers tab lists callable provider endpoints that OptScale AI can use after you configure vendor credentials. Each row is one selectable target (for example for Chat, APIs, or routing rules), often shown as a provider-qualified identifier such as openai/gpt-4o or anthropic/claude-3-haiku.

Typical entries include:

  • openai/gpt-4o
  • anthropic/claude-3-sonnet
  • google/gemini-1.5-pro
  • llama3:latest (for example via Ollama or another compatible backend)

The table usually surfaces operational metadata so you can see integration health at a glance:

  • Provider identifier and display name
  • Health or connectivity status
  • How credentials are supplied (for example API key vs OAuth, depending on your setup)
  • Audit-style fields such as creation time and who configured the entry

Provider detail page#

Selecting a provider opens its detail view. Use EDIT to change configuration and REFRESH to rerun checks and reload metrics.

Along the top, summary cards give a live snapshot:

  • Health status — Current probe result (for example Healthy), usually color-coded
  • Last health check — Timestamp of the latest check
  • Requests (24h) — Request volume over the rolling window
  • Success / Failed — Completed vs failed calls in that window
  • Spend (24h) — Estimated cost for the period
  • Provider — Vendor key or slug (copyable)
  • Input cost / Output cost — Rates per million tokens where pricing is configured that way
  • Created at / Created by — Audit metadata for the registration

The main column lists the configured properties for that endpoint, including:

  • Name and internal ID (UUID), with copy helpers where offered
  • Pricing fields — Input and output cost, optional cost per second when applicable
  • Vendor and API base — Upstream endpoint (for example https://api.anthropic.com)
  • Organization / Team ID — Scope when your deployment ties providers to org or team rows
  • Throughput limitsTPM (tokens per minute) and RPM (requests per minute)
  • ReliabilityMax retries, Timeout, and Stream timeout for calls to this provider
  • Access and safetyProvider access groups and Tags for discovery or routing
  • Health check provider — Which logical provider name is probed for status (often the same as the row)
  • Cache control — Caching behavior when exposed by your stack

On the side, JSON preview panels show the raw configuration—for example a compact provider metadata block and LLM params mirroring integration settings (api_base, vendor driver, per-token costs, feature flags, canonical provider string, tags, and related fields). Legacy JSON keys may still use names such as model or model_id. Each block can usually be copied for support tickets or infrastructure-as-code workflows.

Use the Providers tab when you need:

  • Explicit selection — Users or integrations pick one known provider (predictable behavior, comparative testing, or mandated vendor policies).
  • Direct vendor access — Teams consume a provider connection without an intermediate routing policy.
  • Per-provider observability — You monitor availability and errors per endpoint rather than only at the router layer.

Policies + Guardrails#

Policies + Guardrails is where administrators define how AI traffic is evaluated and which reusable safety controls apply. Policies are rule sets with conditions, evaluation scope, and linked guardrails. Guardrails are standalone controls that you link to policies.

For common patterns, limitations, and an end-to-end example, see Architecture Overview — Policies and guardrails.

List page#

The list page provides an overview of organizational governance configuration and allows switching between policies and reusable guardrails.

Summary cards at the top of the page display the number of active policies and reusable guardrails available in the organization.

Use the POLICIES tab to manage organization policies and the GUARDRAILS tab to manage reusable guardrail definitions.

Use + ADD to create a new policy or guardrail, depending on the selected tab. Use Search to filter the current table by name or other visible fields.

Policies table#

Each row represents one policy. Typical columns:

Table 2: OptScale AI policies list — column reference
Column Meaning
Name Display name with status icon (for example green when active), plus policy UUID and copy control
Description Short operator-facing summary
Stage When evaluation runs (for example Output)
Sampling rate Share of traffic evaluated (for example 100%)
Timeout Maximum evaluation time (for example 1500 ms)
Request type API surface governed (for example chat_completion)
Linked guardrails Count of guardrails attached to this policy
Actions Edit (pencil) and Delete (trash)

A footer line shows totals (for example Total: 1 Displayed: 1). Click a policy name to open its detail page.

Guardrails table#

Each row represents one reusable guardrail. Typical columns:

Table 3: OptScale AI guardrails list — column reference
Column Meaning
Name Guardrail identifier (link to the detail page, for example guardrail1)
Type Guardrail engine; see Guardrail types for the full list
Stage When evaluation runs (for example Input)
Description Short operator-facing summary
Total invocations How many times the guardrail has executed
Policies using it Count of policies that reference this guardrail
Violation rate Share of invocations that triggered a violation (may show - when there is no data)
Created at Timestamp when the guardrail was created
Actions Edit (pencil) and Delete (trash)

Footer shows Total and Displayed counts. Choose + ADD to create a guardrail (see First Steps — Add guardrail). For Type and Threshold reference, see Architecture Overview — Guardrail types and Guardrail thresholds.

Policy detail page#

An active policy often shows a green status icon beside the title. Use EDIT (top right) to change configuration.

Summary cards (24-hour snapshot):

  • Requests evaluated (24h) — How many requests ran through this policy in the window
  • Violation rate (24h) — Share of evaluations that violated the policy
  • Bypass rate (24h) — Share that bypassed evaluation
  • Match rate (24h) — Share that matched policy conditions
  • Avg latency — Mean time to evaluate the policy
  • Linked guardrails — Number of guardrails linked to this policy

Values may show 0 or - when there has been little or no traffic yet.

Policy overview#

General metadata and rule logic for the policy.

  • Description — Free-text purpose (for example test policy).
  • SummaryPolicy ID, Organization ID, Policy name, State (for example Enabled badge), Created at.
  • Evaluation scopeStage, Sampling rate, Timeout, Request type (for example Chat Completion), Pass on timeout (whether requests continue if evaluation times out), Linked guardrails count.
  • Conditions — When the policy applies; often shown as an expression (for example request_type == "text_completion") or built from rules in the editor.

Linked guardrails Tab#

Table of guardrails attached to this policy. Typical columns:

Table 4: OptScale AI policy detail — linked guardrails column reference
Column Meaning
Name Guardrail identifier (link to the guardrail definition)
Type Guardrail engine; see Guardrail types for the full list
Stage When the guardrail runs (for example Input)
Configuration Threshold, Policy action (for example Redact), PII fields, Custom patterns

Footer shows Total row count. Manage links from here or from the guardrail side on the GUARDRAILS tab.

Usage statistics Tab#

Operational metrics and breakdowns for the policy over time.

  • Aggregated metrics — Evaluations (total), requests evaluated (24h), match / violation / bypass rates (24h and total), Avg latency, Latency p50, Latency p95 (values appear when traffic exists).
  • Violations by day — Time-series chart of violations (may show No data until evaluations occur).
  • Top scopes — Where activity concentrates, grouped by dimensions such as Request type, Provider, Model (UI label; often a specific provider endpoint), Vector store, Team, User role.
  • Violations by type — Table of outcomes (for example Score, Timeout, Skipped).
  • Guardrail invocations — How often linked guardrails fired (may show No data until traffic exists).

Use Usage statistics after rollout to confirm sampling, latency, and violation trends before tightening conditions.

Guardrail detail page#

Use EDIT (top right) to change the guardrail definition.

Summary cards

  • Policies using it — How many policies link this guardrail (for example 0 when unused)
  • Total invocations — Lifetime or windowed execution count (for example 0 before traffic)
  • Violation rate — Share of invocations that violated the guardrail (may show - when there is no data)

Description

Full-width text summarizing the guardrail (for example guardrail1 - test guardrail).

Overview

  • Name — Display identifier (for example guardrail1)
  • Type — Guardrail engine; see Guardrail types for available options
  • Stage — When the guardrail runs (for example Input, before the provider responds)
  • Created at — Creation timestamp

Configuration

  • Threshold — Confidence required to trigger; see Guardrail thresholds (for example 0.5)
  • Policy — Action when triggered (for example Redact)
  • PII fields — Selected PII categories, or None when not restricted to specific fields
  • Custom patterns — Additional regexp-based matchers, or None when only built-in types are used

Attach guardrails when add or edit a policy (Linked Guardrails section). Use Policies using it on the summary cards to confirm where the guardrail is in effect.

MCP Servers#

In the Admin UI, open MCP Servers.

The MCP Servers page provides centralized management of registered Model Context Protocol (MCP) servers and their available tools. Register new servers, review connection status, monitor tool availability, and manage existing integrations.

At the top of the page, summary cards display key metrics:

  • Total servers — Number of registered MCP servers.
  • Connected — Servers currently connected and responding successfully.
  • Total tools — Total number of tools exposed across all registered servers.

Below click + ADD to register a new MCP server. The available filters help narrow the server list by connection type, state, or auth type. The search field filters servers by name and connection string.

The main table displays the registered servers and their operational state. Each row includes:

  • Name — MCP server identifier.
  • Connection string — Configured endpoint or transport connection details.
  • Connection type — Communication method used by the server.
  • State — Current connection status.
  • Enabled tools — Number of enabled tools compared to the total detected tools.
  • Tools to auto-execute — Tools configured for automatic execution.
  • Actions — Options to edit or delete the server configuration.

Servers with the Connected state are operational and available for Chat AI workloads. Use the action icons on the right side of the table to modify or remove server configurations.

MCP Server details#

In the Admin UI, open MCP Servers, click on the server name in the table.

The MCP server detail page displays the configuration, operational state, and tool availability for a selected MCP server. Review connection settings, verify server health, and manage server behavior from this page.

At the top of the page, summary cards display the current operational state of the server:

  • State — Current connection status of the MCP server.
  • Total tools — Number of tools exposed by the server.
  • Enabled tools — Number of active tools compared to the total detected tools.
  • Tools to auto-execute — Tools configured for automatic execution.
  • Ping available — Indicates whether the server supports lightweight health-check ping operations.

Use RECONNECT in the upper-right corner to re-establish the server connection. Click EDIT to modify the server configuration.

The page contains two main tabs:

  • OVERVIEW — Displays configuration and operational metadata.
  • TOOLS — Displays the list of tools available from the MCP server.

MCP server overview#

The OVERVIEW tab contains two information panels:

The Summary section displays the primary MCP server configuration and connection information. It includes the server UUID with a copy option, the configured server name, connection type, connection string, and authentication method. The section also shows configured request headers, virtual key access settings, and whether code mode client support is enabled.

The Details section displays operational and synchronization information for the MCP server. This includes ping availability, the configured tool synchronization interval, the number of enabled tools, and the number of tools configured for automatic execution.

Tools tab#

The TOOLS tab displays the list of tools exposed by the selected MCP server and allows to control tool availability and execution settings.

The main table lists all detected MCP tools together with their descriptions and operational settings. Each row includes:

  • Tool name — MCP tool identifier with functional summary of the tool behavior.
  • Enabled — Toggle that enables or disables the tool for use.
  • Auto-execute — Toggle that allows the tool to run automatically when requested by workloads.
  • Cost (USD) — Optional per-tool execution cost configuration.
  • SAVE — Applies configuration changes for the selected tool.

Use the toggles to control which tools are available to Chat AI workloads and which tools can execute automatically without additional approval. The cost field can be used for usage accounting or operational cost tracking.

Expand a tool row to review additional configuration details or operational information if available.

Edit MCP Server#

The Edit MCP Server page allows to modify operational settings for an existing MCP server. The editable configuration includes the Code mode client option, the Ping available for health check setting, the Allow on all virtual keys option, and the Tool sync interval value, which controls how frequently the platform synchronizes tools from the MCP server.

The page also provides several management actions. Use DELETE to permanently remove the MCP server configuration from the platform.

Vector Stores#

The Vector Stores page provides centralized management of vector knowledge bases used for retrieval-augmented generation (RAG), AI search, and contextual workflows. Create vector stores, monitor synchronization health, review indexed content, and manage knowledge access scopes.

At the top of the page, summary cards display key operational metrics, including the number of active vector stores, recently refreshed knowledge bases, indexed chunks, synchronization health status, and librarian agent activity.

The page contains several tabs:

  • VECTOR STORES — Main registry of configured vector stores.
  • TEST VECTOR STORE — Validation and testing interface for retrieval workflows.
  • LIBRARIAN AGENTS — Management of indexing and synchronization agents.
  • STATUS — Operational and synchronization monitoring.

Usage and Cost#

The Usage and Cost page provides centralized monitoring of request activity, token consumption, latency, and operational spending across models and users in OptScale AI.

Use the date range selector in the upper-right corner to define the reporting period. All summary cards, charts, and tables update for the selected time range.

Summary cards at the top of the page display organization-wide metrics, including total requests, success rate, input and output token usage, estimated spend, and average request latency.

The page contains several analytical tabs:

  • TOTALS for organization-wide request and token trends over time
  • COST for spend analysis and per-model cost breakdowns
  • MODEL ACTIVITY for metrics and usage trends related to a selected model
  • USER ACTIVITY for request and consumption analytics grouped by user

Use Traces to inspect individual requests and detailed execution metadata, or Dashboards for pinned organization-wide monitoring widgets.

Totals#

On TOTALS, two time-series charts share the selected date range on the horizontal axis.

Total tokens — Line chart of token volume over time. Available series:

Series Meaning
Input tokens Prompt/input token volume
Output tokens Completion/output token volume
Total tokens Combined input and output
Cache read tokens Tokens served from cache, when applicable

Total requests — Line chart of request volume over time. Available series:

Series Meaning
Successful requests Requests that completed successfully
Failed requests Requests that ended in error
Total requests Combined successful and failed volume

Use TOTALS to see whether spikes in traffic, tokens, or failures align on the same days before drilling into COST, MODEL ACTIVITY, or Traces.

Cost#

On the COST tab, spending is analyzed across time periods and models.

The Cost breakdown chart displays total spend per day or reporting interval for the selected date range. Use it to identify cost spikes and correlate them with request and token trends shown on the TOTALS tab.

The Models breakdown chart ranks models by total spend. Use the top-N controls (5, 10, 25, 50, or All) to limit the number of displayed models. Each bar represents a model identifier, such as gpt-4o-2024, claude-3-opus-20240229, llama3-8b-instruct, etc.

Model activity#

On the MODEL ACTIVITY tab, select a model from the Model dropdown to filter summary cards and charts for a specific endpoint.

Summary cards display the same operational metrics shown at the top of the page — Total requests, Success rate, Total input tokens, Total output tokens, Total spend, and Avg latency — recalculated for the selected model only.

The Cost breakdown chart displays spending trends for the selected model across the selected date range.

The Total tokens and Total requests charts display the same series available on TOTALS, filtered to the selected model.

User activity#

On the USER ACTIVITY tab, select a user from the User dropdown to filter summary cards and charts for a specific account.

Summary cards display the same operational metrics shown at the top of the page — Total requests, Success rate, Total input tokens, Total output tokens, Total spend, and Avg latency — recalculated for the selected user.

The Model usage section displays the models used by the selected account. Switch between Table view and Chart view to review request distribution and activity by model. The table includes model identifiers together with total, successful, and failed request counts.

The Cost breakdown, Total tokens, and Total requests charts display the same analytics available on MODEL ACTIVITY, filtered to the selected user.

Use USER ACTIVITY to analyze request volume, token usage, and spending by account. For request-level investigation, combine this view with Traces.

Traces#

The Traces page provides controls for monitoring request activity, analyzing operational metrics, and filtering trace data.

Use REFRESH to reload summary cards and trace results for the selected filters and time range.

Summary cards at the top of the page display aggregated request statistics, including total requests, success rate, token usage, estimated spend, and average request latency for the selected period.

Use PROVIDER, MODEL, and EMPLOYEE filters to narrow trace results by provider, model endpoint, or user activity.

The time-range selector supports predefined intervals such as 1 day, 1 week, 2 weeks, and 1 month, as well as a custom date range. Selected filters and time ranges apply to both summary cards and the trace table.

The main table lists AI requests, sorted by Time. The table is wide—scroll horizontally to reach token, cost, and routing columns on the right.

Table 5: OptScale AI request traces — column reference
Column Meaning
Time Timestamp when the request was recorded
Provider Registered provider that handled the call (for example ollama, cerebras)
Model Model identifier used for the request (for example nexusriot/qwen3.5-op…, gpt-oss-120b)
Status Outcome badge (for example Success)
Virtual key name API key identifier associated with the request
Employee User account that initiated the request
Input tokens Token count for the prompt or input payload
Output tokens Token count for the model completion
Tokens Total tokens for the request (input plus output)
Cost Estimated charge for the request in organization currency (- when cost is unavailable)
Latency End-to-end request duration (for example 15s 566ms or 494ms)
Routing rule Router rule that selected the provider, when applicable (- when routing did not apply)
Stream Whether the response used streaming (Yes / No)

Select a row to open the trace details panel.

Trace details#

Selecting a trace from the table opens a detail panel displaying the trace UUID in the header (with a copy control if needed), together with the Provider, Model, Status, and Time.

The Request details section displays request metadata, including the provider that handled the request, the full model identifier, the virtual key used for the request, the applied routing rule, the initiating employee or principal, and whether streaming was enabled.

The Metrics section displays request statistics, including total token usage with prompt and completion breakdowns, end-to-end latency, request cost, and the request timestamp.

The Request & Response section displays captured prompt and completion payloads. When request bodies are unavailable or not stored for the deployment, the panel displays No data.

The Params section contains the JSON request payload sent to the provider, including request parameters such as tools, function definitions, and parameter schemas. Use the copy control in the panel corner to copy the full payload for debugging or support purposes.

Use Traces to audit who used which models, debug latency or failures after routing changes in Providers, and reconcile per-request token usage with Usage and Cost. For profiled training runs and experiment outputs, see Experiment Tracking — Tasks and Artifacts.

Topic Where to read more
Model Training (tasks, models, datasets, environments) Model Training, Experiment Tracking, Environments & Operations
Platform architecture and request flow Architecture Overview
Initial setup First Steps
Chat UI Interface Overview