Skip to content

Usage and Cost#

The Usage and Cost page provides centralized monitoring of request activity, token consumption, latency, and operational spending across models and users in OptScale AI.

Use the date range selector in the upper-right corner to define the reporting period. All summary cards, charts, and tables update for the selected time range.

Summary cards at the top of the page display organization-wide metrics, including total requests, success rate, input and output token usage, estimated spend, and average request latency.

The page contains several analytical tabs:

  • TOTALS for organization-wide request and token trends over time
  • COST for spend analysis and per-model cost breakdowns
  • MODEL ACTIVITY for metrics and usage trends related to a selected model
  • USER ACTIVITY for request and consumption analytics grouped by user

Use Traces to inspect individual requests and detailed execution metadata, or Dashboards for pinned organization-wide monitoring panels.

Totals#

The TOTALS tab provides an overview of request and token volume over time. Both charts use the selected date range on the horizontal axis, making it easy to compare usage and traffic patterns across the same period.

The Total tokens chart displays token consumption over time and includes separate series for input tokens, output tokens, total tokens, and cache read tokens. Use this chart to identify changes in token usage and evaluate the impact of cache reads when caching is enabled.

The Total requests chart displays request volume over time and includes successful requests, failed requests, and total requests. Use this chart to identify traffic spikes and determine whether increases in failures correlate with changes in request volume.

Use the TOTALS tab to identify trends in traffic, token consumption, and request outcomes before investigating detailed cost, model activity, or trace data.

Cost#

On the COST tab, spending is analyzed across time periods and models.

The Cost breakdown chart displays total spend per day or reporting interval for the selected date range. Use it to identify cost spikes and correlate them with request and token trends shown on the TOTALS tab.

The Models breakdown chart ranks models by total spend. Use the top-N controls (5, 10, 25, 50, or All) to limit the number of displayed models. Each bar represents a model identifier, such as gpt-4o-2024, claude-3-opus-20240229, llama3-8b-instruct, etc.

Model activity#

On the MODEL ACTIVITY tab, select a model from the Model dropdown to filter summary cards and charts for a specific endpoint.

Summary cards display the same operational metrics shown at the top of the page — Total requests, Success rate, Total input tokens, Total output tokens, Total spend, and Avg latency — recalculated for the selected model only.

The Cost breakdown chart displays spending trends for the selected model across the selected date range.

The Total tokens and Total requests charts display the same series available on TOTALS, filtered to the selected model.

User activity#

On the USER ACTIVITY tab, select a user from the User dropdown to filter summary cards and charts for a specific account.

Summary cards display the same operational metrics shown at the top of the page — Total requests, Success rate, Total input tokens, Total output tokens, Total spend, and Avg latency — recalculated for the selected user.

The Model usage section displays the models used by the selected account. Switch between Table view and Chart view to review request distribution and activity by model. The table includes model identifiers together with total, successful, and failed request counts.

The Cost breakdown, Total tokens, and Total requests charts display the same analytics available on MODEL ACTIVITY, filtered to the selected user.

Use USER ACTIVITY to analyze request volume, token usage, and spending by account. For request-level investigation, combine this view with Traces.