Source pack

  • CometAPI API documentation home — primary place to verify current API contract details such as supported paths, request shape, response shape, and authentication requirements before wiring checks into an AI gateway.
  • CometAPI pricing documentation — source to verify pricing, billing, metering, or model-cost assumptions before using them in a unit economics calculation.
  • CometAPI support help center — source to identify the right support route when contract, billing, or account-specific details are unclear.
  • FinOps Foundation Unit Economics capability — methodology source for connecting technology spend to business value units rather than reviewing total spend alone.

Intent brief

This note is for operators who already route AI traffic through an API gateway and need to know whether cost is being measured against the right unit of value.

The operational question is not “Which model is cheapest?” It is: Can we explain the cost of a tenant, workflow, or successful business event well enough to make routing, budget, and product decisions?

Use this page to:

  • define one useful economic unit for AI API traffic;
  • verify the API and billing contract details that feed the calculation;
  • run a small validation probe without hard-coding unsupported assumptions;
  • identify leakage from retries, failures, over-long prompts, and unallocated shared traffic.

This page does not publish current CometAPI pricing, model availability, endpoint paths, rate limits, or billing fields. Those values must be checked against the linked CometAPI documentation before use.

Unit economics checks for AI gateway costs

Last reviewed: 2026-06-11.

Who this is for: FinOps, platform, product, and AI operations teams responsible for converting AI API gateway spend into defensible cost-per-unit metrics.

For related cost-control operating notes, see the site’s AI cost-control posts. If you are building a broader review process, keep this page with the operations note archive so gateway checks and budget checks stay connected.

Key takeaways

  • Unit economics for AI gateway traffic should be based on a business unit, not only on tokens, requests, or total invoice spend.
  • Before calculating cost per unit, verify the API contract, billing fields, and pricing assumptions in the current CometAPI API documentation and CometAPI pricing documentation.
  • A useful calculation separates successful business outcomes from retries, failed calls, test traffic, and unallocated shared traffic.
  • Treat numerical thresholds as local tuning values. Do not copy another team’s cost-per-request or cost-per-customer target without checking workload mix, prompt size, model mix, and failure behavior.
  • The FinOps Foundation frames unit economics as a capability for connecting technology cost to business value; use that framing to keep AI gateway reports useful to product and finance stakeholders (FinOps Foundation Unit Economics capability).

Definition: AI API gateway unit economics

AI API gateway unit economics is the practice of mapping AI API cost to a useful business unit, such as:

  • cost per successful support resolution;
  • cost per generated report;
  • cost per user session with at least one accepted AI answer;
  • cost per tenant per billable workflow;
  • cost per order, case, document, or transaction enriched by AI.

The important distinction is that the denominator should represent value delivered, not merely traffic produced. Tokens and API calls are still required inputs, but they are not always the right unit for product or margin decisions.

The FinOps Unit Economics capability is the external methodology reference for tying cost to business value. For AI gateway operations, the same idea becomes: which AI spend helped create a useful product outcome, and which spend was overhead, waste, testing, retry, or failure?

Why this check matters for AI gateways

AI gateway costs can look controlled while unit economics worsen.

That happens when aggregate spend is flat but:

  • prompts become longer;
  • retry volume increases;
  • low-value workflows consume more tokens;
  • fallback paths double-call providers;
  • test traffic is mixed with production traffic;
  • tenants with different usage patterns are averaged together;
  • failed or abandoned outputs are counted as if they created value.

A gateway-level unit economics check helps you separate “we spent less this week” from “we improved the cost of a successful business outcome.”

The three checks to run first

1. Economic-unit fit check

Pick one unit that a product owner and finance partner both understand.

Good candidate units:

WorkloadBetter economic unitWeaker unit
Support assistantResolved case with accepted answerRaw API request
Document summarizationCompleted document summary delivered to userPrompt submitted
Sales researchQualified account brief generatedTokens consumed
Coding assistantAccepted suggestion or completed taskCompletion call
Internal analyst botAnswer used in a workflowChat turn

A request can be useful for debugging, but it rarely explains business value by itself. If the gateway only reports spend per request, add a join to application events so the denominator reflects accepted or completed outcomes.

2. Contract-source check

Before calculating anything, confirm the fields you plan to collect are actually supported by the current API and pricing contract.

Use the CometAPI API documentation home to verify endpoint and payload details. Use the CometAPI pricing documentation to verify the billing basis before converting usage into cost.

Do not assume:

  • the exact chat endpoint path;
  • the authentication header name or scheme;
  • the model identifier format;
  • whether usage fields appear in the response body, headers, dashboard export, or invoice export;
  • rate-limit values;
  • current prices;
  • rounding, minimum-charge, or billing aggregation behavior.

Those details are contract inputs, not guesses.

3. Exception-leakage check

For unit economics, exception traffic matters because it can inflate cost without increasing delivered value.

Track these buckets separately:

  • successful business outcome;
  • successful API response but rejected output;
  • retry after timeout or transient error;
  • fallback call after primary route failure;
  • user-abandoned response;
  • test or evaluation traffic;
  • system health probes;
  • internal admin usage;
  • uncategorized traffic.

The most useful metric is often not only:

cost_per_unit = allocated_ai_cost / completed_business_units

It is also:

exception_cost_share = exception_ai_cost / total_ai_cost

Treat thresholds as examples to tune locally. A team with latency-sensitive fallback requirements may tolerate more exception cost than a batch summarization workflow.

Contract details to verify

Use this table before implementing the unit economics job. Each row names the value to verify rather than inventing values not quoted in the source pack.

Contract areaValue to verify before implementationWhy it matters for unit economicsPrimary source to check
Endpoint pathsVerify the current CometAPI base URL and the exact path used for the relevant operation, such as a chat or completion-style request, from the docs.The gateway log key must match the operation being costed; otherwise unrelated traffic may be included.CometAPI API documentation home
Auth headersVerify the required authentication header name, token format, and any project/account scoping requirements from the docs.Misconfigured auth checks can mix production, staging, and test usage or fail to attribute traffic to the right account.CometAPI API documentation home
Request fieldsVerify the required model field, input/message field, token-limit field, metadata fields, and any supported customer attribution fields.Unit economics needs stable dimensions such as tenant, workflow, environment, and model route.CometAPI API documentation home
Response fieldsVerify where usage, token, request ID, model, status, latency, or finish fields are returned.Cost allocation depends on joining gateway logs to provider/API usage signals without double-counting.CometAPI API documentation home
Error behaviorVerify documented error response structure, status codes, retry guidance, and whether failed requests can still create billable usage.Retry and failure traffic must be separated from successful business outcomes.CometAPI API documentation home and CometAPI support help center
Rate-limit assumptionsVerify current rate-limit behavior and any account-specific limits in the docs or support channel.Rate-limit retries can distort cost per successful unit and latency per unit.CometAPI API documentation home and CometAPI support help center
Billing assumptionsVerify the pricing basis, billing dimensions, rounding behavior, and any model-specific pricing details from the pricing page before calculating allocated cost.A wrong billing assumption makes every downstream cost-per-unit metric unreliable.CometAPI pricing documentation

A practical validation workflow

Step 1: Choose one reporting window

Start with a short window, such as one day or one deployment interval. The goal is not statistical perfection. The goal is to prove that gateway logs, application events, and billing assumptions can be joined cleanly.

Record:

  • window start and end time;
  • timezone;
  • environment;
  • gateway route;
  • model route or model family label;
  • application workflow;
  • tenant or customer segment;
  • request ID or trace ID;
  • business outcome ID.

Step 2: Define the unit ledger

Create a small table where each row represents one business unit.

Example ledger columns:

ColumnPurpose
business_unit_idStable ID for the completed workflow, case, document, or accepted answer.
tenant_idAllocation dimension for customer or account-level reporting.
workflow_nameProduct feature or job that created the unit.
completed_atTimestamp for the value event.
accepted_outcomeWhether the output was accepted, delivered, or used.
gateway_trace_idsOne or more API calls associated with the unit.
exception_flagMarks retry, fallback, rejected output, or non-production traffic.
allocation_statusallocated, shared, unmatched, or excluded.

The gateway does not have to own this table. It can be built in your warehouse by joining gateway telemetry to application events.

Step 3: Split cost into allocated and unallocated buckets

Do not force every cost into a business unit too early.

Use buckets such as:

BucketIncludeOperator action
Allocated production costCalls joined to completed business unitsUse in cost-per-unit metric.
Allocated exception costCalls joined to failed, retried, rejected, or fallback unitsReport separately from successful units.
Shared platform costSystem prompts, evaluation harnesses, admin tooling, shared cachesAllocate by a documented rule or keep separate.
Unmatched costCalls with no traceable application eventInvestigate logging and attribution gaps.
Excluded costLoad tests, experiments, demos, known non-production usageKeep out of production unit economics.

If unmatched cost is material, fix attribution before tuning prompts or switching models. Otherwise, you may optimize the wrong workload.

Step 4: Verify pricing inputs before calculating

Before converting usage into currency, check the current billing basis in the CometAPI pricing documentation. If a field or billing rule is not explicit to your team, use the CometAPI support help center path to clarify account-specific questions.

Do not bake unknown values into code. Put them in a reviewed configuration file with source notes, review dates, and owner sign-off.

Example configuration shape:

{
  "pricing_source_url": "https://apidoc.cometapi.com/pricing/about-pricing",
  "pricing_reviewed_at": "2026-06-11",
  "billing_basis_to_verify": "<BILLING_BASIS_FROM_COMETAPI_PRICING_DOCS>",
  "model_cost_rules": [
    {
      "model_id": "<VALIDATED_MODEL_ID>",
      "unit_field_to_verify": "<USAGE_OR_BILLING_FIELD_FROM_DOCS>",
      "price_value_to_verify": "<PRICE_FROM_CURRENT_PRICING_SOURCE>",
      "currency_to_verify": "<CURRENCY_FROM_CURRENT_PRICING_SOURCE>",
      "rounding_rule_to_verify": "<ROUNDING_RULE_FROM_CURRENT_PRICING_SOURCE>"
    }
  ],
  "allocation_policy": {
    "primary_dimension": "business_unit_id",
    "secondary_dimensions": ["tenant_id", "workflow_name", "environment"],
    "unmatched_cost_policy": "hold_out_for_investigation"
  }
}

Step 5: Run a sanitized gateway probe

Use a small probe to confirm that your gateway captures the fields you need. Replace every placeholder with values verified from the current CometAPI docs before running.

cat > unit-economics-probe.json <<'JSON'
{
  "<MODEL_FIELD_FROM_DOCS>": "<VALIDATED_MODEL_ID>",
  "<INPUT_FIELD_FROM_DOCS>": [
    {
      "<ROLE_FIELD_FROM_DOCS>": "user",
      "<CONTENT_FIELD_FROM_DOCS>": "Return a one-sentence health check for a unit economics logging probe."
    }
  ],
  "<TOKEN_LIMIT_FIELD_FROM_DOCS>": 64,
  "<METADATA_FIELD_FROM_DOCS>": {
    "environment": "staging",
    "workflow_name": "unit_economics_probe",
    "business_unit_id": "probe-<UNIQUE_ID>",
    "tenant_id": "internal-cost-controls"
  }
}
JSON

curl -sS -X POST "<COMETAPI_BASE_URL_FROM_DOCS><COMETAPI_CHAT_PATH_FROM_DOCS>" \
  -H "<AUTH_HEADER_FROM_DOCS>: <COMETAPI_API_KEY>" \
  -H "Content-Type: application/json" \
  --data-binary @unit-economics-probe.json \
  -D response-headers.txt \
  -o response-body.json

printf '\nReview response headers, response body, gateway logs, and billing/usage exports for traceability.\n'

Validation questions:

  1. Did the gateway log a request ID or trace ID?
  2. Did the application event store the same ID?
  3. Is the model or route label present?
  4. Is usage visible in the response, dashboard, export, or another verified source?
  5. Can the call be categorized as production, staging, test, or evaluation?
  6. Can the cost be allocated to a business unit?
  7. If the call failed, can you tell whether it should be included in exception cost?
  8. Does the pricing configuration cite the current pricing source?

Start with metrics that separate value delivery from traffic volume.

MetricFormulaWhy it is useful
Cost per completed unitallocated_success_cost / completed_business_unitsMain unit economics metric.
Cost per accepted outputcost_for_outputs_accepted_by_user / accepted_outputsFilters out generated but unused responses.
Exception cost shareretry_fallback_failure_cost / total_ai_costShows cost leakage from reliability behavior.
Unmatched cost shareunmatched_gateway_cost / total_ai_costMeasures attribution quality.
Cost per tenant workflowtenant_workflow_cost / tenant_completed_workflowsHelps identify segment-level margin pressure.
Prompt overhead sharesystem_and_context_cost / total_unit_costShows when context growth is driving cost.
Evaluation/test cost shareeval_and_test_cost / total_ai_costPrevents non-production usage from hiding in production unit economics.

Avoid setting a universal target from these formulas. Tune thresholds by product tier, workflow value, latency requirements, and quality requirements.

What to review when costs move

When cost per unit increases, check these in order:

  1. Denominator change: Did completed business units fall while traffic stayed flat?
  2. Prompt growth: Did system prompts, retrieved context, or conversation history expand?
  3. Route mix: Did traffic shift to a different model route or capability?
  4. Retry behavior: Did timeout, rate-limit, or transient-error retries increase?
  5. Fallback behavior: Did fallback paths create multiple calls for one unit?
  6. Rejected output rate: Are users discarding more responses?
  7. Attribution gap: Did unmatched traffic increase?
  8. Pricing assumption drift: Did the billing basis or model pricing assumption change?
  9. Environment leakage: Did staging, evaluation, or demo traffic enter production reports?

The pricing and billing checks should point back to the current CometAPI pricing documentation, not to copied spreadsheet values without review dates.

Operating guardrails

Use these guardrails to keep the metric useful:

  • Require every production AI call to carry environment, workflow, tenant, and trace metadata where supported.
  • Keep test and evaluation keys, projects, or metadata separate from production.
  • Report cost per successful unit and exception cost share together.
  • Review pricing assumptions on a schedule and after any provider, gateway, or model-route change.
  • Keep an “unmatched cost” bucket visible instead of silently spreading it across customers.
  • Require source links in any pricing configuration or dashboard annotation.
  • Escalate unclear billing or contract questions through the documented support route rather than guessing from logs alone; the CometAPI support help center is the source pack reference for support access.

FAQ

Is cost per request a unit economics metric?

It can be a gateway efficiency metric, but it is usually not enough for unit economics. Unit economics should connect cost to a value unit, such as a resolved case, accepted answer, generated report, or completed workflow.

Should failed requests count in cost per unit?

Track them, but do not hide them inside the successful-unit numerator without a separate exception view. Failed, retried, and fallback calls can create real cost while producing no completed business outcome.

Where should pricing values come from?

Use the current pricing source, such as the CometAPI pricing documentation, and record the review date. Do not hard-code prices from memory or from an old dashboard screenshot.

What if the API response does not include every usage field I need?

Verify the current response contract in the CometAPI API documentation. If the needed field is not available in the response, check whether it is available in another approved usage, billing, export, dashboard, or support workflow before designing the allocation job.

How often should the unit economics check run?

Run it often enough to catch route, prompt, and workload changes before they affect margin. Many teams start with a daily or per-deployment review, then move stable workloads into automated dashboards. The exact schedule should match traffic volume and business risk.

What is the most common mistake?

The most common mistake is using total AI spend divided by total requests and calling it unit economics. That hides workload mix, failed calls, rejected outputs, and differences in business value.

Can this be used before billing data is final?

Yes, but label the result as estimated. Use verified usage fields and reviewed pricing assumptions, then reconcile against final billing data when available.

Should gateway-level unit economics replace product analytics?

No. The gateway provides cost and routing visibility. Product analytics provides the business outcome denominator. Unit economics needs both.

Sources checked

Access date: 2026-06-11.

SourcePurpose
CometAPI API documentation homeUsed as the primary source to verify API paths, authentication requirements, request fields, response fields, and error behavior before implementation.
CometAPI pricing documentationUsed as the source to verify pricing, billing basis, and cost-conversion assumptions before calculating cost per unit.
CometAPI support help centerUsed as the source for support escalation when account-specific billing, rate-limit, or contract details are unclear.
FinOps Foundation Unit Economics capabilityUsed as the methodology reference for connecting technology spend to business value units.

If you are evaluating whether CometAPI fits your gateway cost-control workflow, start from the documented API and pricing sources above, then validate the fields your unit economics ledger needs before moving the calculation into production.