Billing and request-volume signals to monitor

Last reviewed: 2026-05-09

Who this is for: operators responsible for API spend, token budgets, usage monitoring, or billing-support evidence for CometAPI-backed applications.

The public source checked for this draft is the CometAPI Help Center . Treat that page as a support and billing-reference starting point, not as a substitute for your signed commercial terms, account console, or endpoint-level API contract.

For related cost-control operating notes, see the AI cost controls home and the posts index .

Key takeaways

Request count is a weak spend proxy by itself. Track request count together with model, token usage, status code, retry attempt, timeout, and project or key owner.
Do not assume whether failed, timed-out, streamed, retried, or client-aborted requests are billable until your CometAPI contract, account records, or support response confirms the behavior.
Keep a local usage ledger even if you rely on a vendor dashboard. The ledger gives you evidence for disputes, budget alerts, and incident review.
Use example thresholds only as starting points. Tune them by application, model class, traffic pattern, and your verified billing terms.
Escalations are faster when you can provide UTC timestamps, request IDs, endpoint path, model name, status code, retry count, and your local token estimates.

Concise definition

For this checklist, request-volume caveats are the cases where “number of API calls” does not cleanly equal “billable usage.” Common examples include retries, streaming interruptions, upstream timeouts, validation failures, multi-model routing, and delayed billing records.

A monitoring signal is a metric or log field that helps explain cost movement before the final bill is posted.

Why request volume alone is not enough

A dashboard showing “10,000 requests” may hide materially different spend outcomes:

10,000 short classification calls can have a different cost shape than 10,000 long-context generation calls.
A retry loop can double request volume without doubling user-visible work.
A timeout can be ambiguous: the client may have stopped waiting while the upstream model continued processing.
A streamed response can be partially delivered, fully delivered, or interrupted before your client captures final usage metadata.
A model alias or route change can shift the unit economics even when request count stays flat.

The CometAPI Help Center is the provided public support reference for this draft. Use it when building your internal runbook for where operators should look before opening a billing-support case, but verify contract-specific billing semantics separately.

Monitoring signals checklist

Track these fields at the application boundary, before data is transformed by observability pipelines.

Signal	Why it matters	Practical alert example to tune
`request_id` or local correlation ID	Lets you join app logs, gateway logs, and support evidence	Alert if more than 1% of requests are missing correlation IDs
UTC timestamp at send and receive	Required for hourly reconciliation and support windows	Alert if clock skew exceeds your observability tolerance
Endpoint path	Distinguishes chat, embedding, image, batch, or admin calls	Alert on unexpected endpoint paths in production
Model or route label	Cost and token behavior often vary by model class	Alert on unknown model labels or sudden model-mix shifts
API key, project, tenant, or cost center	Prevents one workload from hiding inside another	Alert on unassigned or shared “miscellaneous” keys
HTTP status code	Separates success, client error, server error, and rate-limit behavior	Alert on elevated 4xx/5xx rates by endpoint
Client timeout and upstream duration	Identifies ambiguous completion and billing cases	Alert when timeout rate rises above normal baseline
Retry attempt number	Retries can multiply spend and request volume	Alert when retry-attempted requests exceed an example 2–5% band
Input token estimate	Helps forecast spend before vendor posting	Alert on p95 prompt token growth
Output tokens reported or estimated	Captures runaway generation and streaming anomalies	Alert on p95 output token growth or missing usage metadata
Streaming flag and stream completion status	Interrupted streams are common billing-investigation cases	Alert when stream starts exceed stream completions
Budget ledger write status	Your cost monitor fails silently if ledger writes fail	Alert on any sustained ledger-ingestion error

The percentage examples above are operating starting points, not universal thresholds. Establish your own baseline from normal traffic and adjust after each billing review.

Local ledger example

This is not a CometAPI endpoint contract. It is a sanitized example of a local usage event your application can emit to your own cost-control pipeline after an API call.

{
  "provider": "cometapi",
  "environment": "production",
  "local_request_id": "req_20260509_000123_sanitized",
  "sent_at_utc": "2026-05-09T12:00:01Z",
  "received_at_utc": "2026-05-09T12:00:04Z",
  "endpoint_path": "/v1/chat/completions",
  "model": "model-id-from-approved-config",
  "http_status": 200,
  "retry_attempt": 0,
  "streaming": false,
  "client_timeout_ms": 60000,
  "input_tokens_estimated": 842,
  "output_tokens_reported": 196,
  "usage_source": "client_response_or_tokenizer_estimate",
  "cost_rate_source": "contract_or_account_console_not_hardcoded",
  "billability_status": "pending_vendor_reconciliation"
}

Operational notes:

Keep cost_rate_source explicit. Do not hardcode pricing into application code unless you also have a controlled process for updating it.
Keep billability_status separate from HTTP success. A request can be operationally failed but still require billing verification.
Store enough metadata to reproduce the billing question without logging prompts, secrets, or personal data.

Contract details to verify

Before treating a monitoring rule as authoritative, confirm these details against your CometAPI account materials, API documentation, contract, or support response. The public evidence source for this article is the CometAPI Help Center ; it should not be read as a full endpoint contract unless the relevant details are explicitly present for your account.

Contract area	What to verify	Why operators need it	Source support to record
Endpoint paths	Approved production paths for chat, embeddings, images, batches, file upload, billing, usage export, or admin actions	Prevents monitoring gaps and false endpoint assumptions	Help Center reviewed as public support reference; verify exact paths in API reference, account console, or support ticket
Auth headers	Required header names, token format, project scoping, and whether multiple keys can share one billing account	Mis-scoped keys can hide spend or break cost-center reporting	Help Center reviewed; verify auth contract in official integration docs for your account
Request fields	Model field, max token controls, stream flag, tool fields, metadata fields, and idempotency support if any	Determines which fields your logger must capture for cost attribution	Help Center reviewed; verify endpoint-specific schema before relying on fields
Response fields	Usage object, token counts, model actually served, request ID, error body, and stream-final metadata	Needed for ledger accuracy and dispute evidence	Help Center reviewed; confirm response schema with API docs and sample responses from your environment
Error behavior	Whether 4xx, 5xx, timeout, cancellation, stream interruption, or validation failures can be charged	Prevents incorrect “failed equals free” assumptions	Help Center reviewed; verify billability with account terms or written support response
Rate-limit assumptions	Per-key, per-project, per-model, or account-level limits; retry-after behavior; burst handling	Retry storms can turn limits into spend and reliability incidents	Help Center reviewed; verify current limits in account-specific documentation
Billing assumptions	Unit of billing, rounding, token accounting, minimum charges, posting delay, credits, refunds, and time zone	Required for reconciliation and budget alerts	Help Center reviewed; verify in billing page, invoice, contract, or support case
Usage export	Whether downloadable usage records or API-based usage exports are available	Determines whether reconciliation can be automated	Help Center reviewed; verify available exports in your account
Support evidence	Required fields for a billing inquiry, expected response path, and escalation channel	Reduces back-and-forth during cost incidents	Help Center reviewed as public support reference; document your internal escalation process

Practical validation steps

1. Build a one-hour baseline window

Choose a normal traffic hour and collect:

total requests by endpoint;
total requests by model;
status-code distribution;
retry-attempt distribution;
input token estimate by p50, p90, and p95;
output token count or estimate by p50, p90, and p95;
stream started versus stream completed;
timeouts and client cancellations;
spend or usage records from the account view if available.

Keep the raw query or dashboard link in your runbook under a stable location such as your internal equivalent of the AI cost controls editorial page .

2. Compare three ledgers

For the same UTC window, compare:

application ledger;
gateway or load-balancer logs;
CometAPI account, invoice, export, or support-confirmed usage record.

Do not expect perfect agreement on the first pass. Instead, classify differences:

missing local logs;
duplicated retries;
delayed vendor posting;
mismatched time zones;
model-name normalization differences;
streaming responses with missing final usage;
requests sent by untagged keys.

3. Validate retry accounting

Pick a small sample of requests that have retry attempts. For each request group, confirm:

the original request ID;
each retry attempt timestamp;
whether the retry used the same prompt and model;
whether the prior attempt received a response, timed out, or was cancelled;
whether your local ledger marks all attempts or only the final user-visible result.

Budget dashboards should usually show retry-attempted volume separately from user-visible task volume. Otherwise, an incident can look like organic demand growth.

4. Validate timeout and cancellation behavior

Create a report of requests where the client stopped waiting before a normal response was logged. For each category, verify the billability rule with your CometAPI account terms or written support guidance:

client-side timeout before any response headers;
timeout after headers but before full body;
user cancellation during streaming;
network interruption;
upstream 5xx after a long model-processing duration.

Avoid assuming these are free or charged until confirmed.

5. Validate token-budget enforcement

Check whether token controls are applied before the call leaves your application:

maximum input size;
maximum output size;
model-specific token ceiling;
per-user daily or monthly budget;
per-tenant budget;
emergency cutoff.

A useful pattern is to log both tokens_allowed and tokens_sent. If tokens_sent exceeds tokens_allowed, the cost-control failure is in your application, not in billing reconciliation.

6. Prepare support-ready evidence

For any billing or request-volume question, gather:

UTC start and end time;
account or project identifier, without exposing secrets;
endpoint path;
model identifier;
local request IDs and vendor request IDs if available;
status codes;
retry counts;
token estimates or response usage;
screenshots or exports from the account view if available;
a short statement of the expected versus observed result.

Use the CometAPI Help Center as the public support-reference link in the runbook, then attach your account-specific evidence through the appropriate support channel.

Suggested alert set

Start with a small set of alerts that explain cost movement without overwhelming the on-call operator.

Alert	Trigger example to tune	Investigation question
Unknown model in production	Any request with model not in approved config	Did a deployment, fallback, or manual test change routing?
Retry volume spike	Retry-attempted requests exceed baseline by 2x	Are timeouts, rate limits, or upstream errors causing duplicate calls?
Token p95 growth	p95 input or output tokens exceed baseline by 50%	Did prompt templates, retrieved context, or max-output settings change?
Missing usage metadata	More than an example 1% of successful calls lack token data	Is streaming finalization or response parsing broken?
Untagged cost center	Any production call without tenant/project owner	Can this usage be attributed before billing closes?
Timeout ambiguity	Timeouts increase while account usage also rises	Are client-aborted calls still processed upstream?
Billing-posting lag	Vendor/account usage not visible within expected internal window	Is the lag normal, or does support need evidence?

Only turn an example into a hard threshold after you have baseline data and contract-confirmed semantics.

FAQ

Is request count a reliable estimate of CometAPI spend?

Not by itself. Request count is useful for detecting volume changes, but spend analysis should also include model, input tokens, output tokens, retries, status codes, and billing records.

Are failed requests billable?

Do not assume either way. Verify the treatment of 4xx, 5xx, timeouts, stream interruptions, and client cancellations in your account terms, billing records, or written support response.

Should we log prompts to investigate billing?

Usually no. Billing investigations normally need metadata: timestamps, request IDs, endpoint paths, models, token counts, status codes, and retry counts. Avoid storing secrets, personal data, or full prompts unless your security and privacy policy explicitly allows it.

What should we do if local usage and account usage do not match?

First normalize time zones, model names, endpoint names, and retry handling. Then compare a small UTC window request by request. If the gap persists, prepare support-ready evidence and use the CometAPI help or support path.

How often should operators review billing signals?

For production systems with meaningful spend, review high-level spend and volume daily, investigate alert anomalies immediately, and perform a deeper reconciliation on a weekly or billing-cycle cadence. Adjust frequency based on traffic volatility and business risk.

Can we automate budget cutoffs?

Yes, but make the cutoff rule explicit and reversible. For example, a system can stop noncritical workloads when a tenant reaches a verified monthly budget. Keep emergency overrides, audit logs, and stakeholder notifications in the runbook.

Sources checked

Source	Access date	Purpose
CometAPI Help Center	2026-05-09	Used as the provided public CometAPI support and billing-reference source for this operator checklist; endpoint-level billing, rate-limit, and schema details should still be verified against account-specific documentation, contract terms, or support responses.