Billing and request-volume signals to monitor

Last reviewed: 2026-05-09

Who this is for: operators responsible for API spend, token budgets, usage monitoring, or billing-support evidence for CometAPI-backed applications.

The public source checked for this draft is the CometAPI Help Center. Treat that page as a support and billing-reference starting point, not as a substitute for your signed commercial terms, account console, or endpoint-level API contract.

For related cost-control operating notes, see the AI cost controls home and the posts index.

Key takeaways

  • Request count is a weak spend proxy by itself. Track request count together with model, token usage, status code, retry attempt, timeout, and project or key owner.
  • Do not assume whether failed, timed-out, streamed, retried, or client-aborted requests are billable until your CometAPI contract, account records, or support response confirms the behavior.
  • Keep a local usage ledger even if you rely on a vendor dashboard. The ledger gives you evidence for disputes, budget alerts, and incident review.
  • Use example thresholds only as starting points. Tune them by application, model class, traffic pattern, and your verified billing terms.
  • Escalations are faster when you can provide UTC timestamps, request IDs, endpoint path, model name, status code, retry count, and your local token estimates.

Concise definition

For this checklist, request-volume caveats are the cases where “number of API calls” does not cleanly equal “billable usage.” Common examples include retries, streaming interruptions, upstream timeouts, validation failures, multi-model routing, and delayed billing records.

A monitoring signal is a metric or log field that helps explain cost movement before the final bill is posted.

Why request volume alone is not enough

A dashboard showing “10,000 requests” may hide materially different spend outcomes:

  • 10,000 short classification calls can have a different cost shape than 10,000 long-context generation calls.
  • A retry loop can double request volume without doubling user-visible work.
  • A timeout can be ambiguous: the client may have stopped waiting while the upstream model continued processing.
  • A streamed response can be partially delivered, fully delivered, or interrupted before your client captures final usage metadata.
  • A model alias or route change can shift the unit economics even when request count stays flat.

The CometAPI Help Center is the provided public support reference for this draft. Use it when building your internal runbook for where operators should look before opening a billing-support case, but verify contract-specific billing semantics separately.

Monitoring signals checklist

Track these fields at the application boundary, before data is transformed by observability pipelines.

SignalWhy it mattersPractical alert example to tune
request_id or local correlation IDLets you join app logs, gateway logs, and support evidenceAlert if more than 1% of requests are missing correlation IDs
UTC timestamp at send and receiveRequired for hourly reconciliation and support windowsAlert if clock skew exceeds your observability tolerance
Endpoint pathDistinguishes chat, embedding, image, batch, or admin callsAlert on unexpected endpoint paths in production
Model or route labelCost and token behavior often vary by model classAlert on unknown model labels or sudden model-mix shifts
API key, project, tenant, or cost centerPrevents one workload from hiding inside anotherAlert on unassigned or shared “miscellaneous” keys
HTTP status codeSeparates success, client error, server error, and rate-limit behaviorAlert on elevated 4xx/5xx rates by endpoint
Client timeout and upstream durationIdentifies ambiguous completion and billing casesAlert when timeout rate rises above normal baseline
Retry attempt numberRetries can multiply spend and request volumeAlert when retry-attempted requests exceed an example 2–5% band
Input token estimateHelps forecast spend before vendor postingAlert on p95 prompt token growth
Output tokens reported or estimatedCaptures runaway generation and streaming anomaliesAlert on p95 output token growth or missing usage metadata
Streaming flag and stream completion statusInterrupted streams are common billing-investigation casesAlert when stream starts exceed stream completions
Budget ledger write statusYour cost monitor fails silently if ledger writes failAlert on any sustained ledger-ingestion error

The percentage examples above are operating starting points, not universal thresholds. Establish your own baseline from normal traffic and adjust after each billing review.

Local ledger example

This is not a CometAPI endpoint contract. It is a sanitized example of a local usage event your application can emit to your own cost-control pipeline after an API call.

{
  "provider": "cometapi",
  "environment": "production",
  "local_request_id": "req_20260509_000123_sanitized",
  "sent_at_utc": "2026-05-09T12:00:01Z",
  "received_at_utc": "2026-05-09T12:00:04Z",
  "endpoint_path": "/v1/chat/completions",
  "model": "model-id-from-approved-config",
  "http_status": 200,
  "retry_attempt": 0,
  "streaming": false,
  "client_timeout_ms": 60000,
  "input_tokens_estimated": 842,
  "output_tokens_reported": 196,
  "usage_source": "client_response_or_tokenizer_estimate",
  "cost_rate_source": "contract_or_account_console_not_hardcoded",
  "billability_status": "pending_vendor_reconciliation"
}

Operational notes:

  • Keep cost_rate_source explicit. Do not hardcode pricing into application code unless you also have a controlled process for updating it.
  • Keep billability_status separate from HTTP success. A request can be operationally failed but still require billing verification.
  • Store enough metadata to reproduce the billing question without logging prompts, secrets, or personal data.

Contract details to verify

Before treating a monitoring rule as authoritative, confirm these details against your CometAPI account materials, API documentation, contract, or support response. The public evidence source for this article is the CometAPI Help Center; it should not be read as a full endpoint contract unless the relevant details are explicitly present for your account.

Contract areaWhat to verifyWhy operators need itSource support to record
Endpoint pathsApproved production paths for chat, embeddings, images, batches, file upload, billing, usage export, or admin actionsPrevents monitoring gaps and false endpoint assumptionsHelp Center reviewed as public support reference; verify exact paths in API reference, account console, or support ticket
Auth headersRequired header names, token format, project scoping, and whether multiple keys can share one billing accountMis-scoped keys can hide spend or break cost-center reportingHelp Center reviewed; verify auth contract in official integration docs for your account
Request fieldsModel field, max token controls, stream flag, tool fields, metadata fields, and idempotency support if anyDetermines which fields your logger must capture for cost attributionHelp Center reviewed; verify endpoint-specific schema before relying on fields
Response fieldsUsage object, token counts, model actually served, request ID, error body, and stream-final metadataNeeded for ledger accuracy and dispute evidenceHelp Center reviewed; confirm response schema with API docs and sample responses from your environment
Error behaviorWhether 4xx, 5xx, timeout, cancellation, stream interruption, or validation failures can be chargedPrevents incorrect “failed equals free” assumptionsHelp Center reviewed; verify billability with account terms or written support response
Rate-limit assumptionsPer-key, per-project, per-model, or account-level limits; retry-after behavior; burst handlingRetry storms can turn limits into spend and reliability incidentsHelp Center reviewed; verify current limits in account-specific documentation
Billing assumptionsUnit of billing, rounding, token accounting, minimum charges, posting delay, credits, refunds, and time zoneRequired for reconciliation and budget alertsHelp Center reviewed; verify in billing page, invoice, contract, or support case
Usage exportWhether downloadable usage records or API-based usage exports are availableDetermines whether reconciliation can be automatedHelp Center reviewed; verify available exports in your account
Support evidenceRequired fields for a billing inquiry, expected response path, and escalation channelReduces back-and-forth during cost incidentsHelp Center reviewed as public support reference; document your internal escalation process

Practical validation steps

1. Build a one-hour baseline window

Choose a normal traffic hour and collect:

  • total requests by endpoint;
  • total requests by model;
  • status-code distribution;
  • retry-attempt distribution;
  • input token estimate by p50, p90, and p95;
  • output token count or estimate by p50, p90, and p95;
  • stream started versus stream completed;
  • timeouts and client cancellations;
  • spend or usage records from the account view if available.

Keep the raw query or dashboard link in your runbook under a stable location such as your internal equivalent of the AI cost controls editorial page.

2. Compare three ledgers

For the same UTC window, compare:

  1. application ledger;
  2. gateway or load-balancer logs;
  3. CometAPI account, invoice, export, or support-confirmed usage record.

Do not expect perfect agreement on the first pass. Instead, classify differences:

  • missing local logs;
  • duplicated retries;
  • delayed vendor posting;
  • mismatched time zones;
  • model-name normalization differences;
  • streaming responses with missing final usage;
  • requests sent by untagged keys.

3. Validate retry accounting

Pick a small sample of requests that have retry attempts. For each request group, confirm:

  • the original request ID;
  • each retry attempt timestamp;
  • whether the retry used the same prompt and model;
  • whether the prior attempt received a response, timed out, or was cancelled;
  • whether your local ledger marks all attempts or only the final user-visible result.

Budget dashboards should usually show retry-attempted volume separately from user-visible task volume. Otherwise, an incident can look like organic demand growth.

4. Validate timeout and cancellation behavior

Create a report of requests where the client stopped waiting before a normal response was logged. For each category, verify the billability rule with your CometAPI account terms or written support guidance:

  • client-side timeout before any response headers;
  • timeout after headers but before full body;
  • user cancellation during streaming;
  • network interruption;
  • upstream 5xx after a long model-processing duration.

Avoid assuming these are free or charged until confirmed.

5. Validate token-budget enforcement

Check whether token controls are applied before the call leaves your application:

  • maximum input size;
  • maximum output size;
  • model-specific token ceiling;
  • per-user daily or monthly budget;
  • per-tenant budget;
  • emergency cutoff.

A useful pattern is to log both tokens_allowed and tokens_sent. If tokens_sent exceeds tokens_allowed, the cost-control failure is in your application, not in billing reconciliation.

6. Prepare support-ready evidence

For any billing or request-volume question, gather:

  • UTC start and end time;
  • account or project identifier, without exposing secrets;
  • endpoint path;
  • model identifier;
  • local request IDs and vendor request IDs if available;
  • status codes;
  • retry counts;
  • token estimates or response usage;
  • screenshots or exports from the account view if available;
  • a short statement of the expected versus observed result.

Use the CometAPI Help Center as the public support-reference link in the runbook, then attach your account-specific evidence through the appropriate support channel.

Suggested alert set

Start with a small set of alerts that explain cost movement without overwhelming the on-call operator.

AlertTrigger example to tuneInvestigation question
Unknown model in productionAny request with model not in approved configDid a deployment, fallback, or manual test change routing?
Retry volume spikeRetry-attempted requests exceed baseline by 2xAre timeouts, rate limits, or upstream errors causing duplicate calls?
Token p95 growthp95 input or output tokens exceed baseline by 50%Did prompt templates, retrieved context, or max-output settings change?
Missing usage metadataMore than an example 1% of successful calls lack token dataIs streaming finalization or response parsing broken?
Untagged cost centerAny production call without tenant/project ownerCan this usage be attributed before billing closes?
Timeout ambiguityTimeouts increase while account usage also risesAre client-aborted calls still processed upstream?
Billing-posting lagVendor/account usage not visible within expected internal windowIs the lag normal, or does support need evidence?

Only turn an example into a hard threshold after you have baseline data and contract-confirmed semantics.

FAQ

Is request count a reliable estimate of CometAPI spend?

Not by itself. Request count is useful for detecting volume changes, but spend analysis should also include model, input tokens, output tokens, retries, status codes, and billing records.

Are failed requests billable?

Do not assume either way. Verify the treatment of 4xx, 5xx, timeouts, stream interruptions, and client cancellations in your account terms, billing records, or written support response.

Should we log prompts to investigate billing?

Usually no. Billing investigations normally need metadata: timestamps, request IDs, endpoint paths, models, token counts, status codes, and retry counts. Avoid storing secrets, personal data, or full prompts unless your security and privacy policy explicitly allows it.

What should we do if local usage and account usage do not match?

First normalize time zones, model names, endpoint names, and retry handling. Then compare a small UTC window request by request. If the gap persists, prepare support-ready evidence and use the CometAPI help or support path.

How often should operators review billing signals?

For production systems with meaningful spend, review high-level spend and volume daily, investigate alert anomalies immediately, and perform a deeper reconciliation on a weekly or billing-cycle cadence. Adjust frequency based on traffic volatility and business risk.

Can we automate budget cutoffs?

Yes, but make the cutoff rule explicit and reversible. For example, a system can stop noncritical workloads when a tenant reaches a verified monthly budget. Keep emergency overrides, audit logs, and stakeholder notifications in the runbook.

Sources checked

SourceAccess datePurpose
CometAPI Help Center2026-05-09Used as the provided public CometAPI support and billing-reference source for this operator checklist; endpoint-level billing, rate-limit, and schema details should still be verified against account-specific documentation, contract terms, or support responses.