Why Cloudflare Workers for AI (And Not Lambda)
I've run AI backends on AWS Lambda, Google Cloud Functions, Railway, and Cloudflare Workers. For solopreneur AI products, Workers win on every metric that matters:
Cold start: <5ms
Free requests: 3M/month
Global locations: 300+
Deploy time: ~10 seconds
Vendor lock-in: Low (standard JS)
Cold start: 200–800ms
Free requests: 1M/month
Global locations: 30 regions
Deploy time: 30–60 seconds
Vendor lock-in: High (IAM, VPC, etc.)
The 5ms cold start alone is worth it. Every AI call you make to Claude API adds latency — you don't want your own infrastructure adding another 500ms on top. Workers are also genuinely simpler: no VPCs, no IAM roles, no CloudFormation. Just JavaScript and a wrangler.toml.
Prerequisites
You need three things before starting:
- A Cloudflare account (free — cloudflare.com)
- Node.js 18+ installed (
node --versionto check) - An Anthropic API key from console.anthropic.com
That's it. No credit card required for the Cloudflare free tier. The Anthropic API requires a payment method but you'll spend less than $1 following this tutorial.
Step 1: Install Wrangler 4 and Authenticate
Wrangler is the Cloudflare Workers CLI. Always use v4 — v3 has breaking changes with some newer Workers features.
Run these commands exactly:
# Install Wrangler 4 globally
npm install -g wrangler@latest
# Verify version (should be 4.x)
wrangler --version
# Authenticate with your Cloudflare account
# This opens a browser window — click "Allow"
wrangler login
# Verify auth worked
wrangler whoami
After wrangler login, you should see your email address in the terminal. If authentication fails, try wrangler login --browser=false and manually copy the URL.
Step 2: Create the Project Structure
Create a new Worker project. I recommend the JavaScript (not TypeScript) template for simplicity — you can migrate to TS later.
# Create project
wrangler init my-ai-worker --type javascript
# Navigate into it
cd my-ai-worker
# The structure you get:
# my-ai-worker/
# ├── wrangler.toml ← config file
# ├── src/
# │ └── index.js ← your Worker code
# └── package.json
Step 3: Configure wrangler.toml
This is the most important file. Get it right and everything else is easy. Here's the production config I use for AI Workers:
# wrangler.toml — Production AI Worker configuration
name = "my-ai-worker"
main = "src/index.js"
compatibility_date = "2026-01-01"
compatibility_flags = ["nodejs_compat"]
# KV namespace for session management and caching
[[kv_namespaces]]
binding = "CACHE"
id = "REPLACE_WITH_YOUR_KV_ID"
# R2 bucket for storing AI outputs and assets
[[r2_buckets]]
binding = "STORAGE"
bucket_name = "my-ai-outputs"
# Environment variables (non-secret)
[vars]
ENVIRONMENT = "production"
MAX_TOKENS = "1024"
DEFAULT_MODEL = "claude-haiku-4-5"
# Rate limiting (Cloudflare managed, free tier)
[[unsafe.bindings]]
type = "ratelimit"
name = "RATE_LIMITER"
namespace_id = "1001"
simple = { limit = 100, period = 60 }
You'll need to create the KV namespace and R2 bucket before deploying. Do this next.
# Create KV namespace
wrangler kv:namespace create "CACHE"
# Output: { binding = "CACHE", id = "abc123..." }
# Copy the id into wrangler.toml
# Create R2 bucket
wrangler r2 bucket create my-ai-outputs
# Set your Anthropic API key as a secret (never in wrangler.toml)
wrangler secret put ANTHROPIC_API_KEY
# Enter your key when prompted — it's encrypted at rest
Step 4: Write the AI Worker
Here's the complete production Worker. This handles auth, rate limiting, Claude API integration, response caching, and error handling — the full stack in under 100 lines:
// src/index.js — Production AI Worker
export default {
async fetch(request, env, ctx) {
// CORS preflight
if (request.method === 'OPTIONS') {
return new Response(null, {
headers: {
'Access-Control-Allow-Origin': '*',
'Access-Control-Allow-Methods': 'POST, OPTIONS',
'Access-Control-Allow-Headers': 'Content-Type, Authorization',
}
});
}
// Only allow POST to /api/generate
const url = new URL(request.url);
if (url.pathname !== '/api/generate' || request.method !== 'POST') {
return new Response('Not Found', { status: 404 });
}
// Validate request
let body;
try {
body = await request.json();
} catch {
return Response.json({ error: 'Invalid JSON' }, { status: 400 });
}
const { prompt, model = env.DEFAULT_MODEL } = body;
if (!prompt || typeof prompt !== 'string' || prompt.length > 4000) {
return Response.json({ error: 'Invalid prompt' }, { status: 400 });
}
// Rate limiting via IP
const clientIP = request.headers.get('CF-Connecting-IP') || 'unknown';
const rateLimitKey = `rl:${clientIP}:${Math.floor(Date.now() / 60000)}`;
const currentCount = parseInt(await env.CACHE.get(rateLimitKey) || '0');
if (currentCount >= 20) {
return Response.json({ error: 'Rate limit exceeded. Max 20 requests/minute.' },
{ status: 429, headers: { 'Retry-After': '60' } });
}
// Increment rate limit counter (TTL: 2 minutes)
ctx.waitUntil(env.CACHE.put(rateLimitKey, String(currentCount + 1),
{ expirationTtl: 120 }));
// Check cache for identical prompt (optional — saves API costs)
const cacheKey = `prompt:${btoa(prompt).substring(0, 32)}`;
const cached = await env.CACHE.get(cacheKey);
if (cached) {
return Response.json({ output: cached, cached: true });
}
// Call Claude API
try {
const aiResponse = await fetch('https://api.anthropic.com/v1/messages', {
method: 'POST',
headers: {
'x-api-key': env.ANTHROPIC_API_KEY,
'anthropic-version': '2023-06-01',
'content-type': 'application/json',
},
body: JSON.stringify({
model: model,
max_tokens: parseInt(env.MAX_TOKENS),
messages: [{ role: 'user', content: prompt }]
})
});
if (!aiResponse.ok) {
const err = await aiResponse.json();
console.error('Claude API error:', err);
return Response.json({ error: 'AI service error' }, { status: 502 });
}
const data = await aiResponse.json();
const output = data.content[0].text;
// Cache the response for 1 hour
ctx.waitUntil(env.CACHE.put(cacheKey, output, { expirationTtl: 3600 }));
// Log to R2 for analytics (async, doesn't block response)
const logEntry = JSON.stringify({
timestamp: new Date().toISOString(),
model,
promptLength: prompt.length,
outputLength: output.length,
inputTokens: data.usage.input_tokens,
outputTokens: data.usage.output_tokens
});
ctx.waitUntil(env.STORAGE.put(`logs/${Date.now()}.json`, logEntry));
return Response.json({
output,
usage: data.usage,
cached: false
}, {
headers: { 'Access-Control-Allow-Origin': '*' }
});
} catch (error) {
console.error('Worker error:', error);
return Response.json({ error: 'Internal error' }, { status: 500 });
}
}
};
Step 5: Deploy and Test
# Deploy to Cloudflare
wrangler deploy
# Output:
# Uploaded my-ai-worker (1.23 sec)
# Deployed my-ai-worker triggers (0.26 sec)
# https://my-ai-worker.YOUR_SUBDOMAIN.workers.dev
# Test with curl
curl -X POST https://my-ai-worker.YOUR_SUBDOMAIN.workers.dev/api/generate \
-H "Content-Type: application/json" \
-d '{"prompt": "Explain zero-cost AI infrastructure in 2 sentences."}'
# Expected response:
# {"output":"Zero-cost AI infrastructure...","usage":{...},"cached":false}
That's it. You have a production AI endpoint running globally, with rate limiting, caching, and analytics logging — all on the free tier. Total time from zero: under 20 minutes.
Production Patterns: What You Need After Day 1
Pattern 1: Streaming Responses
For user-facing AI products, streaming dramatically improves perceived performance. Users see text appear word-by-word rather than waiting for the full response. Workers support streaming via the TransformStream API:
// Streaming Claude responses from Workers
const stream = await fetch('https://api.anthropic.com/v1/messages', {
method: 'POST',
headers: {
'x-api-key': env.ANTHROPIC_API_KEY,
'anthropic-version': '2023-06-01',
'content-type': 'application/json',
},
body: JSON.stringify({
model: 'claude-haiku-4-5',
max_tokens: 1024,
stream: true, // Enable SSE streaming
messages: [{ role: 'user', content: prompt }]
})
});
// Forward the stream directly to the client
return new Response(stream.body, {
headers: {
'Content-Type': 'text/event-stream',
'Cache-Control': 'no-cache',
'Access-Control-Allow-Origin': '*',
}
});
Pattern 2: Workers Cron for Background AI Tasks
Scheduled Workers let you run AI tasks on a schedule — content generation, email digest creation, data summarization — at zero cost:
# Add to wrangler.toml
[triggers]
crons = ["0 9 * * MON"] # Every Monday 9am UTC
# In index.js, add scheduled handler
export default {
async fetch(request, env, ctx) { /* ... */ },
async scheduled(event, env, ctx) {
// This runs on your cron schedule
const content = await generateWeeklyDigest(env);
await sendEmail(content, env);
console.log('Weekly digest sent:', event.scheduledTime);
}
};
Pattern 3: Environment-Based Model Routing
Route requests to different Claude models based on task complexity. This is the pattern I use to keep costs predictable while maintaining quality:
function selectModel(taskType) {
const routing = {
'classify': 'claude-haiku-4-5', // Fast, cheap: $0.0004/req
'summarize': 'claude-haiku-4-5', // Fast, cheap
'generate': 'claude-haiku-4-5', // Default for content
'analyze': 'claude-sonnet-4-6', // Complex reasoning
'code_review': 'claude-sonnet-4-6', // Needs depth
'strategy': 'claude-opus-4-6', // Reserved for complex
};
return routing[taskType] || 'claude-haiku-4-5';
}
Monitoring Your Worker in Production
Cloudflare provides free real-time analytics for every Worker. In your Cloudflare dashboard, navigate to Workers → your worker → Analytics. You'll see:
- Request count by time period — track daily usage vs free tier limits
- Error rate — anything above 1% warrants investigation
- CPU time p50/p99 — if p99 approaches 10ms, you're near the free tier CPU limit
- Subrequests — counts your Claude API calls as outbound fetch requests
For custom metrics, use console.log() statements in your Worker — they appear in the real-time log stream via wrangler tail:
# Stream live logs from your deployed Worker
wrangler tail my-ai-worker
# You'll see logs like:
# [2026-03-13T14:23:01Z] GET /api/generate - 200 OK (342ms)
# model=claude-haiku-4-5 tokens=287+156 cost=$0.0003
Common Errors and Fixes
Error: "Script startup exceeded CPU time limit" — Your Worker is doing too much work at import time. Move initialization inside the handler function, not at the module level.
Error: "Exceeded resource limits" — CPU time exceeded 10ms. Cache more aggressively, or upgrade to the $5/month Workers paid plan for 30-second CPU time.
Claude returns 429 — You're hitting Anthropic's rate limits (5 requests/minute on tier 1). Implement exponential backoff or upgrade your Anthropic account tier. My solution: queue requests in KV with a lock and retry mechanism.
CORS errors in browser — Add the CORS headers shown in the full Worker code above. The OPTIONS preflight handler is required for browser clients.
Skip the Setup — Get Production-Ready Templates
The Zero-Cost AI Kit includes 9 pre-built Worker templates: AI API proxy, rate limiter, streaming handler, cron task runner, R2 file manager, and more. All configured, tested, and ready to deploy with your Claude API key. Skip the 6-hour setup and be live in an hour.
Get the Zero-Cost AI Kit — $47Frequently Asked Questions
fetch() API inside your Worker with the Anthropic API key stored as a Wrangler secret. Set headers: x-api-key to your key and anthropic-version to '2023-06-01'. Workers support async/await natively. Set your key securely with: wrangler secret put ANTHROPIC_API_KEY