Last reviewed: 2026-06-05
Direct answer
When a budget review asks for usage evidence, two CometAPI data sources produce the numbers you need. The balance and usage query service (GET https://query.cometapi.com/user/quota) returns account-level totals — current balance, cumulative spend, total request count, and per-key usage broken down by date when you supply a date range. The chat completions response (POST /v1/chat/completions) returns a usage object on every non-streaming call containing prompt_tokens, completion_tokens, and total_tokens for that single request. Together these two sources let you reconstruct spend at whatever granularity a reviewer needs: individual requests, per-key daily summaries, or a running account total.
For a budget review you typically need:
- A dated export from the query service covering the review period.
- A sample of raw response objects showing per-request token counts.
- A reconciliation note explaining how CometAPI’s internal quota units map to USD spend — the official pricing doc confirms the 0.8:1 ratio for token-billed models relative to upstream provider list prices, plus separate per-call pricing for models without official APIs (such as image and video generation models). Exact current ratios and model-specific rates must be verified at the source links in this article.
Before reading further, verify all field names, endpoint paths, and billing ratios against the sources cited in the Sources checked and Contract details to verify sections. The CometAPI platform evolves and some fields shown in this article may have changed since the last review date.
Who this is for
This article is for:
- Finance reviewers and budget owners who receive AI API line items and need to trace spend back to individual keys or date ranges.
- Platform operators and developers who must produce usage evidence for internal cost reviews, team chargeback reports, or external audits.
- Cost engineers building repeatable logging pipelines to capture token counts at the request level before aggregating into monthly budget reports.
It assumes you already have a CometAPI API key and can make authenticated HTTP requests. It does not cover CometAPI account signup, model selection, or infrastructure deployment.
Key takeaways
- CometAPI provides two distinct usage-evidence endpoints: account-level balance and usage via query.cometapi.com, and per-request token counts in the usage object returned by /v1/chat/completions.
- The query service supports date-range filtering (start_date, end_date in YYYY-MM-DD format) and returns per-key daily breakdowns when a range is provided — exactly the shape a budget reviewer expects.
- Token-billed models (GPT, Claude, and similar official-API models) follow a pricing ratio relative to upstream provider list prices. Per-call-billed models (Midjourney, Kling, and others without official APIs) use CometAPI-set prices. Both categories may receive volume discounts. Verify exact current rates at the pricing source before presenting numbers in a formal budget review.
- Keep a dedicated low-quota API key for balance queries: the official docs suggest setting its quota to 0 so that even if the key is exposed, it cannot be used to make model requests.
- Anomalous charges may indicate a leaked key; the help center recommends checking request logs for unfamiliar IPs and rotating the key immediately.
- For streaming calls, add stream_options: { include_usage: true } to receive token counts in the final chunk — otherwise streaming responses do not include the usage field by default.
Gathering account-level usage evidence
The primary endpoint for budget review evidence is the query service:
GET https://query.cometapi.com/user/quota?key=YOUR_KEY&start_date=YYYY-MM-DD&end_date=YYYY-MM-DD
Key response fields for a budget review package (field names verified against the official balance query reference — confirm current schema before use):
| Response field | What it contains | Budget review use |
|---|---|---|
| total_quota | Current account balance (USD) | Opening/closing balance |
| total_used_quota | Cumulative spend (USD) | Total period spend |
| request_count | Total requests across all keys | Volume verification |
| keys[].name | API key display name | Per-team attribution |
| keys[].used_quota | Spend for this key (USD) | Chargeback by key |
| daily_quota[date][].quota_used | Per-key spend on that date (USD) | Daily burn rate |
| daily_quota[date][].request_count | Per-key requests on that date | Volume by day |
The daily_quota object only appears when both start_date and end_date are included in the request. For a monthly review, supply the first and last days of the month.
Capturing per-request token evidence
For request-level evidence, the usage object in a chat completions response contains:
- prompt_tokens — tokens consumed by the input (messages, system prompt, tool definitions)
- completion_tokens — tokens in the model’s output
- total_tokens — sum of the above
For streaming calls you must request usage explicitly:
{
"stream": true,
"stream_options": { "include_usage": true }
}
When max_completion_tokens is used (required for reasoning models and recommended for GPT-4.1+ and GPT-5 series), reasoning tokens are included in the completion token count. Verify the exact field names for reasoning token breakdowns against the chat completions reference for the specific model family you are auditing.
Understanding billing categories
CometAPI groups its models into two billing categories, which affects how you translate token counts into USD for a budget review:
Category 1 — Token-billed official models (GPT series, Claude series, and similar): billed per token at a ratio relative to the upstream provider’s list price. The pricing documentation confirms a 20% discount structure as the baseline. The actual ratio and current rates must be verified at the pricing source before presenting numbers formally.
Category 2 — Per-call models without official APIs (Midjourney, Kling, and similar image/video generation models): billed per call at CometAPI-set rates. Token counts from the chat completions usage field do not apply to these models — cost evidence comes entirely from the request count fields in the query service.
For a reconciliation note in a budget review: separate your request_count and quota_used data by model category, verify the applicable rates from the pricing docs, and note that volume discounts may apply for accounts above the monthly consumption thresholds documented in the pricing reference.
Smoke-test workflow
Before relying on usage data in a formal budget review, run this verification sequence.
Setup assumptions
- You have a CometAPI API key with at least one completed chat request in the review period.
- You have created a second low-quota key dedicated to balance queries (recommended by the official docs; quota set to 0).
- You have access to an HTTP client (curl, Python requests, or equivalent).
Happy path
- Call GET https://query.cometapi.com/user/quota?key=QUERY_KEY and confirm the response contains total_quota, total_used_quota, and request_count.
- Add start_date and end_date for a date range you know contains activity and confirm daily_quota appears in the response with at least one entry.
- Make one non-streaming chat completions call (POST https://api.cometapi.com/v1/chat/completions) with a minimal prompt and confirm the response body contains a usage object with prompt_tokens, completion_tokens, and total_tokens as integers.
- Confirm that total_tokens equals prompt_tokens + completion_tokens for this sample call.
Error path check
- Call the query service with an invalid date format (e.g. start_date=2026/06/01). Verify the API returns an error response rather than silently returning incorrect data. Do not proceed to a budget review if the API accepts malformed date inputs without complaint.
- Call the query service with the date range set outside any known activity period. Confirm the daily_quota key is either absent or empty rather than returning stale data.
Minimum assertions
- total_tokens is a positive integer for a real request.
- request_count increments after a new request is made.
- daily_quota data for a specific date matches the sum you expect from known requests on that date (within any quota-unit conversion the platform may apply).
What the smoke test must not assert
- Do not assert specific token counts from previous runs — token counts vary with model routing.
- Do not assert exact USD figures — rates may change between reviews.
- Do not assert uptime or latency thresholds in a budget review smoke test.
Pass/fail log fields to record
smoke_test_date: YYYY-MM-DD query_key_name: query_date_range: YYYY-MM-DD to YYYY-MM-DD total_quota_returned: total_used_quota_returned: request_count_returned: daily_quota_dates_present: sample_chat_total_tokens: usage_field_present: true/false date_range_filter_working: true/false error_path_returned_error: true/false reviewer:
Ops note: rate multipliers and anomaly handling
The CometAPI help center introduces the concept of rate multipliers — internal calculation factors applied per model that are designed to track official provider pricing. These multipliers can change when upstream providers adjust their rates. The help center documents the following for operators:
- GPT model multipliers are usually synchronized with official pricing; other vendors are adjusted according to market conditions.
- Price adjustment announcements are published to the console panel and can be subscribed to by email.
- The 1-5 AM maintenance window may cause temporary instability; batch workloads should implement reconnection and request-save logic.
For an anomaly in a budget review — unexpected spend spikes or request counts from unfamiliar IP ranges — the recommended step is to check the request log page for unrecognised source IPs. If a key leak is suspected, disable the affected key and create a new one before the review closes.
For guidance on controlling cost at the request level, see the CometAPI Pricing Reconciliation Checklist on this site.
To get started with CometAPI, visit Start with CometAPI.
Failure modes
- Evidence gap: the agent cannot inspect the failing log, source page, pull request, or local command output. The safe action is to stop and record the missing evidence instead of guessing.
- Scope drift: the agent edits files that are not connected to the observed failure. Keep the repair tied to the failing signal and leave unrelated cleanup for a separate task.
- Environment mismatch: the local check uses different versions, credentials, feature flags, or runtime settings than the hosted path. Record the mismatch before treating the result as proof.
- Unreviewed fallback: the agent changes models, endpoints, permissions, or retry behavior to make a run pass without preserving the review boundary. Treat access and provider failures as operational blockers, not topic failures.
- Weak handoff: the final note says the issue is fixed but omits the command, result, changed files, and remaining uncertainty. That makes the next operator repeat the investigation.
Sources checked
- CometAPI documentation - accessed 2026-06-05; purpose: verify current CometAPI documentation navigation.
- CometAPI chat completions reference - accessed 2026-06-05; purpose: verify chat completion contract areas.
- CometAPI responses reference - accessed 2026-06-05; purpose: verify responses endpoint contract areas.
- CometAPI pricing documentation - accessed 2026-06-05; purpose: verify pricing documentation boundaries.
- CometAPI help center - accessed 2026-06-05; purpose: verify support and escalation documentation areas.
Contract details to verify
| Area | What to verify | Source URL | Accessed | Safe candidate wording |
|---|---|---|---|---|
| Query service endpoint | Confirm query.cometapi.com/user/quota is the current host and path; check whether authentication moved to Bearer or remains as key query param | https://apidoc.cometapi.com/pricing/balance-query | 2026-06-05 | “The balance query endpoint documented at the pricing reference” |
| Volume discount thresholds | Confirm current thresholds for volume or enterprise discount activation | https://apidoc.cometapi.com/pricing/about-pricing | 2026-06-05 | “Volume discounts available above documented monthly thresholds” |
| Rate multiplier mechanism | Confirm current multiplier definition and where the multiplier calculation method is published | https://apidoc.cometapi.com/support/help-center | 2026-06-05 | “Rate multipliers as described in the help center” |
| Maintenance window hours | Confirm current scheduled maintenance window and recommended retry behaviour | https://apidoc.cometapi.com/support/help-center | 2026-06-05 | “Scheduled maintenance as documented in help center” |
FAQ
What is the fastest way to pull usage data for a budget review period? Call GET https://query.cometapi.com/user/quota with start_date and end_date set to the first and last days of the review period. The response includes total_used_quota, request_count, and a daily_quota breakdown per API key. Verify the current parameter names and endpoint host against the balance query reference before running a formal report.
Do token counts in /v1/chat/completions responses map directly to USD spend? Not directly. Token counts must be combined with the applicable per-token rate for the model used. Token-billed models follow a ratio relative to upstream provider list prices (a 20% discount structure is documented as the baseline). For per-call models (image and video generation), token counts do not apply — only request counts matter. Always verify current rates at the pricing documentation before presenting USD figures.
How do I attribute costs to individual teams or projects? Create a separate API key per team or project. The query service returns per-key usage in keys[].used_quota and daily breakdowns in daily_quota. The official docs recommend generating a dedicated low-quota key (quota set to 0) for balance queries so that key can be shared with finance without risk of it being used for model requests.
What should I do if the usage numbers look wrong or unexpectedly high? Check the request log page for unfamiliar source IP addresses. If an unrecognised IP appears, treat it as a potential key leak: disable the affected key and create a new one immediately. The help center documents this as the recommended response to anomalous charges.
How do I get token counts from streaming calls? Set stream_options: { include_usage: true } in the request body. The usage object then appears in the final Server-Sent Event chunk before the [DONE] marker. Without this flag, streaming responses do not include token counts by default.
Can I audit usage at the individual model level? The query service response includes account-level and per-key totals. The per-request usage object in chat completions responses is the most granular source for model-level breakdowns — log model, usage.prompt_tokens, usage.completion_tokens, and usage.total_tokens from each API response to your own data store if you need per-model aggregations that the query service does not directly provide.
Are there rate limits or concurrency caps I should account for in a budget review? The help center describes a cluster load-balancing system that supports high concurrency by default. Enterprise-dedicated lines with higher TPM limits are available. For specifics on current RPM, TPM, and concurrency limits relevant to your account tier, consult the help center and interface stability documentation; do not rely on numbers in this article for formal compliance or SLA reporting.
Reader next step
Run the next implementation or review pass against CometAPI Pricing Reconciliation Checklist, then keep Home nearby for the surrounding editorial and source boundary.
After the source checks, request assumptions, and review owner are clear, use CometAPI as the reference gateway only for the request paths, model routes, or cost checks the team has actually verified.