Three architecture patterns power every AI product I've shipped — and all three run on Cloudflare's free tier. At 50,000 requests/month, my total infrastructure cost is $7.40. Here are the patterns: the Event-Driven Agent, the Async Pipeline, and the Edge Cache Layer — with real wrangler configs and production cost data for each.
Why Architecture Matters More at Zero Cost
When you're paying $300/month for SaaS tools, inefficiency is hidden in the bill. When you're on free tiers, a bad architectural decision hits you immediately — either in latency, in rate limit errors, or in the one place you can't avoid paying: the AI API itself.
The three patterns I'm about to describe aren't theoretical. I derived them from running Browning Digital's product infrastructure — a sales engine, a delivery system, an email worker, and an AI relay — all on Cloudflare Workers. Each pattern solves a specific problem in AI workload architecture.
Pattern 1: The Event-Driven Agent
Event-Driven Agent
Best for: one-shot AI tasks triggered by user action (form submission, purchase, webhook)
Flow: HTTP trigger → Worker validates input → Claude API call → result written to R2/KV → 200 response
This is the simplest pattern and covers 80% of solopreneur AI use cases. A user submits something, your Worker calls Claude, stores the output, and returns immediately. The key design decisions:
- Validate before you call the API. Input validation in the Worker costs nothing. A Claude API call costs $0.0005+. Reject bad inputs at the gate.
- Write results to R2, not KV. KV has a 25KB value limit. AI outputs regularly exceed this. R2 has no practical size limit and costs $0.015/GB — store everything there, use KV only for lookup keys.
- Return a job ID immediately. Don't hold the HTTP connection open for a 3-second Claude call. Return a job ID, let the client poll, use a Durable Object if you need push.
// wrangler.toml — Event-Driven Agent
name = "my-ai-agent"
main = "src/index.ts"
compatibility_date = "2025-01-01"
compatibility_flags = ["nodejs_compat"]
[[kv_namespaces]]
binding = "JOBS"
id = "your-kv-namespace-id"
[[r2_buckets]]
binding = "OUTPUTS"
bucket_name = "my-ai-outputs"
[vars]
CLAUDE_MODEL = "claude-haiku-4-5"// src/index.ts — core pattern
export default {
async fetch(request: Request, env: Env): Promise<Response> {
if (request.method !== 'POST') return new Response('Method not allowed', { status: 405 });
const body = await request.json() as { input: string; job_id?: string };
if (!body.input || body.input.length > 4000) {
return Response.json({ error: 'Invalid input' }, { status: 400 });
}
const jobId = crypto.randomUUID();
// Fire and return — don't await the AI call
env.ctx.waitUntil(processJob(env, jobId, body.input));
return Response.json({ job_id: jobId, status: 'queued' });
}
};
async function processJob(env: Env, jobId: string, input: string) {
const result = await callClaude(env, input);
await env.OUTPUTS.put(`jobs/${jobId}.json`, JSON.stringify({ result, completed_at: Date.now() }));
await env.JOBS.put(jobId, 'done', { expirationTtl: 3600 });
}Production cost at 10,000 AI jobs/month: Workers free tier (compute: $0), KV operations ~$0.01, R2 storage ~$0.15, Claude Haiku API ~$5. Total: ~$5.16/month.
Pattern 2: The Async Pipeline
Async Pipeline
Best for: batch processing, multi-step AI workflows, high-volume operations
Flow: Trigger → Worker enqueues → Queue consumer Worker → Claude API (batched) → R2 storage → downstream notify
When you need to process multiple items — blog posts, product descriptions, email sequences, resume optimizations — the Async Pipeline prevents you from hammering the Claude API with simultaneous requests and keeps you well within rate limits.
Cloudflare Queues is the engine. The free tier gives you 1 million operations/month and 1MB max message size. The consumer Worker processes messages in batches (up to 10 at a time), with automatic retry on failure.
// wrangler.toml — Async Pipeline
[[queues.producers]]
queue = "ai-pipeline"
binding = "QUEUE"
[[queues.consumers]]
queue = "ai-pipeline"
max_batch_size = 5 # process 5 items at a time
max_batch_timeout = 10 # wait up to 10s to fill batch
max_retries = 3
dead_letter_queue = "ai-pipeline-dlq"// Consumer Worker — batch processing
export default {
async queue(batch: MessageBatch<Job>, env: Env): Promise<void> {
const results = await Promise.allSettled(
batch.messages.map(msg => processItem(env, msg.body))
);
// Only retry failed items
for (let i = 0; i < results.length; i++) {
if (results[i].status === 'rejected') {
batch.messages[i].retry();
} else {
batch.messages[i].ack();
}
}
}
};Key insight: The Async Pipeline lets you process 50,000 AI jobs/month with zero infrastructure cost — only the Claude API calls are billable. At Haiku pricing, 50K jobs with average 600 tokens in/400 out = ~$25 in API costs. The infrastructure that surrounds it: $0.
Pattern 3: The Edge Cache Layer
Edge Cache Layer
Best for: AI products with repeat queries, content generation, semantic search
Flow: Request → Worker checks KV cache → cache hit: return instantly / miss: call Claude → store in KV → return
The most underused pattern for solopreneurs. If your AI product answers similar questions repeatedly, you're burning API budget on identical (or near-identical) calls. The Edge Cache Layer intercepts requests at the Worker level and serves cached responses from KV.
The cache key strategy is everything. Don't hash the raw input — normalize it first. Strip punctuation, lowercase, trim whitespace. "What is cloudflare workers?" and "what is cloudflare workers" should hit the same cache entry.
// Cache key normalization
function cacheKey(input: string): string {
return 'ai:' + input
.toLowerCase()
.trim()
.replace(/[^\w\s]/g, '')
.replace(/\s+/g, ' ')
.slice(0, 200); // cap at 200 chars
}
async function cachedAiCall(env: Env, input: string): Promise<string> {
const key = cacheKey(input);
// Check cache first
const cached = await env.CACHE.get(key);
if (cached) return cached;
// Miss — call Claude
const result = await callClaude(env, input);
// Store for 24h
await env.CACHE.put(key, result, { expirationTtl: 86400 });
return result;
}In practice, the Edge Cache Layer delivers a 40–70% API cost reduction on products with repeat query patterns. My AI relay saw a 62% cache hit rate within 30 days of adding this pattern — dropping per-request Claude costs from $0.0009 to $0.00034 effective average.
Combining the Patterns
These three patterns aren't mutually exclusive. My current product stack uses all three simultaneously:
| Product | Pattern | Monthly Requests | Infra Cost |
|---|---|---|---|
| Sales Engine | Event-Driven Agent | ~800 | $0 |
| Email Worker | Async Pipeline | ~2,400 | $0 |
| AI Relay | Edge Cache Layer | ~48,000 | $1.20 |
| Delivery Worker | Event-Driven Agent | ~300 | $0 |
| Total | ~51,500 | $1.20 |
The remaining ~$6 of my monthly bill is Claude API costs — not infrastructure. That's the target architecture state: infrastructure is free, you only pay for the intelligence.
Common Mistakes to Avoid
- Awaiting AI calls in the request path. Use
ctx.waitUntil()for anything that doesn't need to block the response. A 2-second Claude call holding an HTTP connection open will spike your P95 latency and time out mobile clients. - Storing large AI outputs in KV. KV values max out at 25KB. A 1,500-word article output exceeds this. Route large outputs to R2 and store only the R2 key in KV.
- No rate limiting at the Worker level. Add a KV-backed rate limiter before your Claude API call. A single bug or bad actor can run up $50 in API costs in minutes. A 10-line rate limiter prevents this entirely.
- Skipping the dead-letter queue. In the Async Pipeline, always configure a DLQ. Failed jobs disappear silently without one. The Cloudflare Queues DLQ is free and takes 2 lines of config.
The Architecture Advantage
The reason solopreneurs lose to funded competitors isn't resources — it's architecture. A $50K/month AWS bill doesn't make an AI product better. These three patterns give you the same production reliability at a fraction of the cost: edge distribution, automatic retry, caching, and async processing, all within Cloudflare's free tier.
When you're spending $7/month on infrastructure, every dollar of revenue is margin. That's the real advantage of building on zero-cost architecture — not just cost savings, but the operational freedom that comes from not needing to hit revenue targets just to cover your server bills.
Get the Complete Architecture Templates
The Zero-Cost AI Kit includes production-ready implementations of all three patterns — Event-Driven Agent, Async Pipeline, and Edge Cache Layer — with full wrangler configs, TypeScript types, and deploy scripts. Skip the setup and ship in under an hour.
Get the Zero-Cost AI Kit — $47