Source pack
- CometAPI API documentation home — primary place to verify current API contract details such as supported paths, request shape, response shape, and authentication requirements before wiring checks into an AI gateway.
- CometAPI pricing documentation — source to verify pricing, billing, metering, or model-cost assumptions before using them in a unit economics calculation.
- CometAPI support help center — source to identify the right support route when contract, billing, or account-specific details are unclear.
- FinOps Foundation Unit Economics capability — methodology source for connecting technology spend to business value units rather than reviewing total spend alone.
Intent brief
This note is for operators who already route AI traffic through an API gateway and need to know whether cost is being measured against the right unit of value.
The operational question is not “Which model is cheapest?” It is: Can we explain the cost of a tenant, workflow, or successful business event well enough to make routing, budget, and product decisions?
Use this page to:
- define one useful economic unit for AI API traffic;
- verify the API and billing contract details that feed the calculation;
- run a small validation probe without hard-coding unsupported assumptions;
- identify leakage from retries, failures, over-long prompts, and unallocated shared traffic.
This page does not publish current CometAPI pricing, model availability, endpoint paths, rate limits, or billing fields. Those values must be checked against the linked CometAPI documentation before use.
Unit economics checks for AI gateway costs
Last reviewed: 2026-06-11.
Who this is for: FinOps, platform, product, and AI operations teams responsible for converting AI API gateway spend into defensible cost-per-unit metrics.
For related cost-control operating notes, see the site’s AI cost-control posts. If you are building a broader review process, keep this page with the operations note archive so gateway checks and budget checks stay connected.
Key takeaways
- Unit economics for AI gateway traffic should be based on a business unit, not only on tokens, requests, or total invoice spend.
- Before calculating cost per unit, verify the API contract, billing fields, and pricing assumptions in the current CometAPI API documentation and CometAPI pricing documentation.
- A useful calculation separates successful business outcomes from retries, failed calls, test traffic, and unallocated shared traffic.
- Treat numerical thresholds as local tuning values. Do not copy another team’s cost-per-request or cost-per-customer target without checking workload mix, prompt size, model mix, and failure behavior.
- The FinOps Foundation frames unit economics as a capability for connecting technology cost to business value; use that framing to keep AI gateway reports useful to product and finance stakeholders (FinOps Foundation Unit Economics capability).
Definition: AI API gateway unit economics
AI API gateway unit economics is the practice of mapping AI API cost to a useful business unit, such as:
- cost per successful support resolution;
- cost per generated report;
- cost per user session with at least one accepted AI answer;
- cost per tenant per billable workflow;
- cost per order, case, document, or transaction enriched by AI.
The important distinction is that the denominator should represent value delivered, not merely traffic produced. Tokens and API calls are still required inputs, but they are not always the right unit for product or margin decisions.
The FinOps Unit Economics capability is the external methodology reference for tying cost to business value. For AI gateway operations, the same idea becomes: which AI spend helped create a useful product outcome, and which spend was overhead, waste, testing, retry, or failure?
Why this check matters for AI gateways
AI gateway costs can look controlled while unit economics worsen.
That happens when aggregate spend is flat but:
- prompts become longer;
- retry volume increases;
- low-value workflows consume more tokens;
- fallback paths double-call providers;
- test traffic is mixed with production traffic;
- tenants with different usage patterns are averaged together;
- failed or abandoned outputs are counted as if they created value.
A gateway-level unit economics check helps you separate “we spent less this week” from “we improved the cost of a successful business outcome.”
The three checks to run first
1. Economic-unit fit check
Pick one unit that a product owner and finance partner both understand.
Good candidate units:
| Workload | Better economic unit | Weaker unit |
|---|---|---|
| Support assistant | Resolved case with accepted answer | Raw API request |
| Document summarization | Completed document summary delivered to user | Prompt submitted |
| Sales research | Qualified account brief generated | Tokens consumed |
| Coding assistant | Accepted suggestion or completed task | Completion call |
| Internal analyst bot | Answer used in a workflow | Chat turn |
A request can be useful for debugging, but it rarely explains business value by itself. If the gateway only reports spend per request, add a join to application events so the denominator reflects accepted or completed outcomes.
2. Contract-source check
Before calculating anything, confirm the fields you plan to collect are actually supported by the current API and pricing contract.
Use the CometAPI API documentation home to verify endpoint and payload details. Use the CometAPI pricing documentation to verify the billing basis before converting usage into cost.
Do not assume:
- the exact chat endpoint path;
- the authentication header name or scheme;
- the model identifier format;
- whether usage fields appear in the response body, headers, dashboard export, or invoice export;
- rate-limit values;
- current prices;
- rounding, minimum-charge, or billing aggregation behavior.
Those details are contract inputs, not guesses.
3. Exception-leakage check
For unit economics, exception traffic matters because it can inflate cost without increasing delivered value.
Track these buckets separately:
- successful business outcome;
- successful API response but rejected output;
- retry after timeout or transient error;
- fallback call after primary route failure;
- user-abandoned response;
- test or evaluation traffic;
- system health probes;
- internal admin usage;
- uncategorized traffic.
The most useful metric is often not only:
cost_per_unit = allocated_ai_cost / completed_business_units
It is also:
exception_cost_share = exception_ai_cost / total_ai_cost
Treat thresholds as examples to tune locally. A team with latency-sensitive fallback requirements may tolerate more exception cost than a batch summarization workflow.
Contract details to verify
Use this table before implementing the unit economics job. Each row names the value to verify rather than inventing values not quoted in the source pack.
| Contract area | Value to verify before implementation | Why it matters for unit economics | Primary source to check |
|---|---|---|---|
| Endpoint paths | Verify the current CometAPI base URL and the exact path used for the relevant operation, such as a chat or completion-style request, from the docs. | The gateway log key must match the operation being costed; otherwise unrelated traffic may be included. | CometAPI API documentation home |
| Auth headers | Verify the required authentication header name, token format, and any project/account scoping requirements from the docs. | Misconfigured auth checks can mix production, staging, and test usage or fail to attribute traffic to the right account. | CometAPI API documentation home |
| Request fields | Verify the required model field, input/message field, token-limit field, metadata fields, and any supported customer attribution fields. | Unit economics needs stable dimensions such as tenant, workflow, environment, and model route. | CometAPI API documentation home |
| Response fields | Verify where usage, token, request ID, model, status, latency, or finish fields are returned. | Cost allocation depends on joining gateway logs to provider/API usage signals without double-counting. | CometAPI API documentation home |
| Error behavior | Verify documented error response structure, status codes, retry guidance, and whether failed requests can still create billable usage. | Retry and failure traffic must be separated from successful business outcomes. | CometAPI API documentation home and CometAPI support help center |
| Rate-limit assumptions | Verify current rate-limit behavior and any account-specific limits in the docs or support channel. | Rate-limit retries can distort cost per successful unit and latency per unit. | CometAPI API documentation home and CometAPI support help center |
| Billing assumptions | Verify the pricing basis, billing dimensions, rounding behavior, and any model-specific pricing details from the pricing page before calculating allocated cost. | A wrong billing assumption makes every downstream cost-per-unit metric unreliable. | CometAPI pricing documentation |
A practical validation workflow
Step 1: Choose one reporting window
Start with a short window, such as one day or one deployment interval. The goal is not statistical perfection. The goal is to prove that gateway logs, application events, and billing assumptions can be joined cleanly.
Record:
- window start and end time;
- timezone;
- environment;
- gateway route;
- model route or model family label;
- application workflow;
- tenant or customer segment;
- request ID or trace ID;
- business outcome ID.
Step 2: Define the unit ledger
Create a small table where each row represents one business unit.
Example ledger columns:
| Column | Purpose |
|---|---|
business_unit_id | Stable ID for the completed workflow, case, document, or accepted answer. |
tenant_id | Allocation dimension for customer or account-level reporting. |
workflow_name | Product feature or job that created the unit. |
completed_at | Timestamp for the value event. |
accepted_outcome | Whether the output was accepted, delivered, or used. |
gateway_trace_ids | One or more API calls associated with the unit. |
exception_flag | Marks retry, fallback, rejected output, or non-production traffic. |
allocation_status | allocated, shared, unmatched, or excluded. |
The gateway does not have to own this table. It can be built in your warehouse by joining gateway telemetry to application events.
Step 3: Split cost into allocated and unallocated buckets
Do not force every cost into a business unit too early.
Use buckets such as:
| Bucket | Include | Operator action |
|---|---|---|
| Allocated production cost | Calls joined to completed business units | Use in cost-per-unit metric. |
| Allocated exception cost | Calls joined to failed, retried, rejected, or fallback units | Report separately from successful units. |
| Shared platform cost | System prompts, evaluation harnesses, admin tooling, shared caches | Allocate by a documented rule or keep separate. |
| Unmatched cost | Calls with no traceable application event | Investigate logging and attribution gaps. |
| Excluded cost | Load tests, experiments, demos, known non-production usage | Keep out of production unit economics. |
If unmatched cost is material, fix attribution before tuning prompts or switching models. Otherwise, you may optimize the wrong workload.
Step 4: Verify pricing inputs before calculating
Before converting usage into currency, check the current billing basis in the CometAPI pricing documentation. If a field or billing rule is not explicit to your team, use the CometAPI support help center path to clarify account-specific questions.
Do not bake unknown values into code. Put them in a reviewed configuration file with source notes, review dates, and owner sign-off.
Example configuration shape:
{
"pricing_source_url": "https://apidoc.cometapi.com/pricing/about-pricing",
"pricing_reviewed_at": "2026-06-11",
"billing_basis_to_verify": "<BILLING_BASIS_FROM_COMETAPI_PRICING_DOCS>",
"model_cost_rules": [
{
"model_id": "<VALIDATED_MODEL_ID>",
"unit_field_to_verify": "<USAGE_OR_BILLING_FIELD_FROM_DOCS>",
"price_value_to_verify": "<PRICE_FROM_CURRENT_PRICING_SOURCE>",
"currency_to_verify": "<CURRENCY_FROM_CURRENT_PRICING_SOURCE>",
"rounding_rule_to_verify": "<ROUNDING_RULE_FROM_CURRENT_PRICING_SOURCE>"
}
],
"allocation_policy": {
"primary_dimension": "business_unit_id",
"secondary_dimensions": ["tenant_id", "workflow_name", "environment"],
"unmatched_cost_policy": "hold_out_for_investigation"
}
}
Step 5: Run a sanitized gateway probe
Use a small probe to confirm that your gateway captures the fields you need. Replace every placeholder with values verified from the current CometAPI docs before running.
cat > unit-economics-probe.json <<'JSON'
{
"<MODEL_FIELD_FROM_DOCS>": "<VALIDATED_MODEL_ID>",
"<INPUT_FIELD_FROM_DOCS>": [
{
"<ROLE_FIELD_FROM_DOCS>": "user",
"<CONTENT_FIELD_FROM_DOCS>": "Return a one-sentence health check for a unit economics logging probe."
}
],
"<TOKEN_LIMIT_FIELD_FROM_DOCS>": 64,
"<METADATA_FIELD_FROM_DOCS>": {
"environment": "staging",
"workflow_name": "unit_economics_probe",
"business_unit_id": "probe-<UNIQUE_ID>",
"tenant_id": "internal-cost-controls"
}
}
JSON
curl -sS -X POST "<COMETAPI_BASE_URL_FROM_DOCS><COMETAPI_CHAT_PATH_FROM_DOCS>" \
-H "<AUTH_HEADER_FROM_DOCS>: <COMETAPI_API_KEY>" \
-H "Content-Type: application/json" \
--data-binary @unit-economics-probe.json \
-D response-headers.txt \
-o response-body.json
printf '\nReview response headers, response body, gateway logs, and billing/usage exports for traceability.\n'
Validation questions:
- Did the gateway log a request ID or trace ID?
- Did the application event store the same ID?
- Is the model or route label present?
- Is usage visible in the response, dashboard, export, or another verified source?
- Can the call be categorized as production, staging, test, or evaluation?
- Can the cost be allocated to a business unit?
- If the call failed, can you tell whether it should be included in exception cost?
- Does the pricing configuration cite the current pricing source?
Recommended metrics
Start with metrics that separate value delivery from traffic volume.
| Metric | Formula | Why it is useful |
|---|---|---|
| Cost per completed unit | allocated_success_cost / completed_business_units | Main unit economics metric. |
| Cost per accepted output | cost_for_outputs_accepted_by_user / accepted_outputs | Filters out generated but unused responses. |
| Exception cost share | retry_fallback_failure_cost / total_ai_cost | Shows cost leakage from reliability behavior. |
| Unmatched cost share | unmatched_gateway_cost / total_ai_cost | Measures attribution quality. |
| Cost per tenant workflow | tenant_workflow_cost / tenant_completed_workflows | Helps identify segment-level margin pressure. |
| Prompt overhead share | system_and_context_cost / total_unit_cost | Shows when context growth is driving cost. |
| Evaluation/test cost share | eval_and_test_cost / total_ai_cost | Prevents non-production usage from hiding in production unit economics. |
Avoid setting a universal target from these formulas. Tune thresholds by product tier, workflow value, latency requirements, and quality requirements.
What to review when costs move
When cost per unit increases, check these in order:
- Denominator change: Did completed business units fall while traffic stayed flat?
- Prompt growth: Did system prompts, retrieved context, or conversation history expand?
- Route mix: Did traffic shift to a different model route or capability?
- Retry behavior: Did timeout, rate-limit, or transient-error retries increase?
- Fallback behavior: Did fallback paths create multiple calls for one unit?
- Rejected output rate: Are users discarding more responses?
- Attribution gap: Did unmatched traffic increase?
- Pricing assumption drift: Did the billing basis or model pricing assumption change?
- Environment leakage: Did staging, evaluation, or demo traffic enter production reports?
The pricing and billing checks should point back to the current CometAPI pricing documentation, not to copied spreadsheet values without review dates.
Operating guardrails
Use these guardrails to keep the metric useful:
- Require every production AI call to carry environment, workflow, tenant, and trace metadata where supported.
- Keep test and evaluation keys, projects, or metadata separate from production.
- Report cost per successful unit and exception cost share together.
- Review pricing assumptions on a schedule and after any provider, gateway, or model-route change.
- Keep an “unmatched cost” bucket visible instead of silently spreading it across customers.
- Require source links in any pricing configuration or dashboard annotation.
- Escalate unclear billing or contract questions through the documented support route rather than guessing from logs alone; the CometAPI support help center is the source pack reference for support access.
FAQ
Is cost per request a unit economics metric?
It can be a gateway efficiency metric, but it is usually not enough for unit economics. Unit economics should connect cost to a value unit, such as a resolved case, accepted answer, generated report, or completed workflow.
Should failed requests count in cost per unit?
Track them, but do not hide them inside the successful-unit numerator without a separate exception view. Failed, retried, and fallback calls can create real cost while producing no completed business outcome.
Where should pricing values come from?
Use the current pricing source, such as the CometAPI pricing documentation, and record the review date. Do not hard-code prices from memory or from an old dashboard screenshot.
What if the API response does not include every usage field I need?
Verify the current response contract in the CometAPI API documentation. If the needed field is not available in the response, check whether it is available in another approved usage, billing, export, dashboard, or support workflow before designing the allocation job.
How often should the unit economics check run?
Run it often enough to catch route, prompt, and workload changes before they affect margin. Many teams start with a daily or per-deployment review, then move stable workloads into automated dashboards. The exact schedule should match traffic volume and business risk.
What is the most common mistake?
The most common mistake is using total AI spend divided by total requests and calling it unit economics. That hides workload mix, failed calls, rejected outputs, and differences in business value.
Can this be used before billing data is final?
Yes, but label the result as estimated. Use verified usage fields and reviewed pricing assumptions, then reconcile against final billing data when available.
Should gateway-level unit economics replace product analytics?
No. The gateway provides cost and routing visibility. Product analytics provides the business outcome denominator. Unit economics needs both.
Sources checked
Access date: 2026-06-11.
| Source | Purpose |
|---|---|
| CometAPI API documentation home | Used as the primary source to verify API paths, authentication requirements, request fields, response fields, and error behavior before implementation. |
| CometAPI pricing documentation | Used as the source to verify pricing, billing basis, and cost-conversion assumptions before calculating cost per unit. |
| CometAPI support help center | Used as the source for support escalation when account-specific billing, rate-limit, or contract details are unclear. |
| FinOps Foundation Unit Economics capability | Used as the methodology reference for connecting technology spend to business value units. |
If you are evaluating whether CometAPI fits your gateway cost-control workflow, start from the documented API and pricing sources above, then validate the fields your unit economics ledger needs before moving the calculation into production.