Why Cloudflare Workers for AI (And Not Lambda)

I've run AI backends on AWS Lambda, Google Cloud Functions, Railway, and Cloudflare Workers. For solopreneur AI products, Workers win on every metric that matters:

Cloudflare Workers

Cold start: <5ms
Free requests: 3M/month
Global locations: 300+
Deploy time: ~10 seconds
Vendor lock-in: Low (standard JS)

AWS Lambda

Cold start: 200–800ms
Free requests: 1M/month
Global locations: 30 regions
Deploy time: 30–60 seconds
Vendor lock-in: High (IAM, VPC, etc.)

The 5ms cold start alone is worth it. Every AI call you make to Claude API adds latency — you don't want your own infrastructure adding another 500ms on top. Workers are also genuinely simpler: no VPCs, no IAM roles, no CloudFormation. Just JavaScript and a wrangler.toml.

Prerequisites

You need three things before starting:

  1. A Cloudflare account (free — cloudflare.com)
  2. Node.js 18+ installed (node --version to check)
  3. An Anthropic API key from console.anthropic.com

That's it. No credit card required for the Cloudflare free tier. The Anthropic API requires a payment method but you'll spend less than $1 following this tutorial.

Step 1: Install Wrangler 4 and Authenticate

Wrangler is the Cloudflare Workers CLI. Always use v4 — v3 has breaking changes with some newer Workers features.

Step 1 — Install Wrangler

Run these commands exactly:

# Install Wrangler 4 globally
npm install -g wrangler@latest

# Verify version (should be 4.x)
wrangler --version

# Authenticate with your Cloudflare account
# This opens a browser window — click "Allow"
wrangler login

# Verify auth worked
wrangler whoami

After wrangler login, you should see your email address in the terminal. If authentication fails, try wrangler login --browser=false and manually copy the URL.

Step 2: Create the Project Structure

Create a new Worker project. I recommend the JavaScript (not TypeScript) template for simplicity — you can migrate to TS later.

# Create project
wrangler init my-ai-worker --type javascript

# Navigate into it
cd my-ai-worker

# The structure you get:
# my-ai-worker/
# ├── wrangler.toml    ← config file
# ├── src/
# │   └── index.js    ← your Worker code
# └── package.json

Step 3: Configure wrangler.toml

This is the most important file. Get it right and everything else is easy. Here's the production config I use for AI Workers:

# wrangler.toml — Production AI Worker configuration
name = "my-ai-worker"
main = "src/index.js"
compatibility_date = "2026-01-01"
compatibility_flags = ["nodejs_compat"]

# KV namespace for session management and caching
[[kv_namespaces]]
binding = "CACHE"
id = "REPLACE_WITH_YOUR_KV_ID"

# R2 bucket for storing AI outputs and assets
[[r2_buckets]]
binding = "STORAGE"
bucket_name = "my-ai-outputs"

# Environment variables (non-secret)
[vars]
ENVIRONMENT = "production"
MAX_TOKENS = "1024"
DEFAULT_MODEL = "claude-haiku-4-5"

# Rate limiting (Cloudflare managed, free tier)
[[unsafe.bindings]]
type = "ratelimit"
name = "RATE_LIMITER"
namespace_id = "1001"
simple = { limit = 100, period = 60 }
Important

You'll need to create the KV namespace and R2 bucket before deploying. Do this next.

# Create KV namespace
wrangler kv:namespace create "CACHE"
# Output: { binding = "CACHE", id = "abc123..." }
# Copy the id into wrangler.toml

# Create R2 bucket
wrangler r2 bucket create my-ai-outputs

# Set your Anthropic API key as a secret (never in wrangler.toml)
wrangler secret put ANTHROPIC_API_KEY
# Enter your key when prompted — it's encrypted at rest

Step 4: Write the AI Worker

Here's the complete production Worker. This handles auth, rate limiting, Claude API integration, response caching, and error handling — the full stack in under 100 lines:

// src/index.js — Production AI Worker
export default {
  async fetch(request, env, ctx) {
    // CORS preflight
    if (request.method === 'OPTIONS') {
      return new Response(null, {
        headers: {
          'Access-Control-Allow-Origin': '*',
          'Access-Control-Allow-Methods': 'POST, OPTIONS',
          'Access-Control-Allow-Headers': 'Content-Type, Authorization',
        }
      });
    }
    
    // Only allow POST to /api/generate
    const url = new URL(request.url);
    if (url.pathname !== '/api/generate' || request.method !== 'POST') {
      return new Response('Not Found', { status: 404 });
    }
    
    // Validate request
    let body;
    try {
      body = await request.json();
    } catch {
      return Response.json({ error: 'Invalid JSON' }, { status: 400 });
    }
    
    const { prompt, model = env.DEFAULT_MODEL } = body;
    if (!prompt || typeof prompt !== 'string' || prompt.length > 4000) {
      return Response.json({ error: 'Invalid prompt' }, { status: 400 });
    }
    
    // Rate limiting via IP
    const clientIP = request.headers.get('CF-Connecting-IP') || 'unknown';
    const rateLimitKey = `rl:${clientIP}:${Math.floor(Date.now() / 60000)}`;
    const currentCount = parseInt(await env.CACHE.get(rateLimitKey) || '0');
    
    if (currentCount >= 20) {
      return Response.json({ error: 'Rate limit exceeded. Max 20 requests/minute.' }, 
        { status: 429, headers: { 'Retry-After': '60' } });
    }
    
    // Increment rate limit counter (TTL: 2 minutes)
    ctx.waitUntil(env.CACHE.put(rateLimitKey, String(currentCount + 1), 
      { expirationTtl: 120 }));
    
    // Check cache for identical prompt (optional — saves API costs)
    const cacheKey = `prompt:${btoa(prompt).substring(0, 32)}`;
    const cached = await env.CACHE.get(cacheKey);
    if (cached) {
      return Response.json({ output: cached, cached: true });
    }
    
    // Call Claude API
    try {
      const aiResponse = await fetch('https://api.anthropic.com/v1/messages', {
        method: 'POST',
        headers: {
          'x-api-key': env.ANTHROPIC_API_KEY,
          'anthropic-version': '2023-06-01',
          'content-type': 'application/json',
        },
        body: JSON.stringify({
          model: model,
          max_tokens: parseInt(env.MAX_TOKENS),
          messages: [{ role: 'user', content: prompt }]
        })
      });
      
      if (!aiResponse.ok) {
        const err = await aiResponse.json();
        console.error('Claude API error:', err);
        return Response.json({ error: 'AI service error' }, { status: 502 });
      }
      
      const data = await aiResponse.json();
      const output = data.content[0].text;
      
      // Cache the response for 1 hour
      ctx.waitUntil(env.CACHE.put(cacheKey, output, { expirationTtl: 3600 }));
      
      // Log to R2 for analytics (async, doesn't block response)
      const logEntry = JSON.stringify({ 
        timestamp: new Date().toISOString(),
        model, 
        promptLength: prompt.length,
        outputLength: output.length,
        inputTokens: data.usage.input_tokens,
        outputTokens: data.usage.output_tokens
      });
      ctx.waitUntil(env.STORAGE.put(`logs/${Date.now()}.json`, logEntry));
      
      return Response.json({ 
        output,
        usage: data.usage,
        cached: false
      }, {
        headers: { 'Access-Control-Allow-Origin': '*' }
      });
      
    } catch (error) {
      console.error('Worker error:', error);
      return Response.json({ error: 'Internal error' }, { status: 500 });
    }
  }
};

Step 5: Deploy and Test

# Deploy to Cloudflare
wrangler deploy

# Output:
# Uploaded my-ai-worker (1.23 sec)
# Deployed my-ai-worker triggers (0.26 sec)
# https://my-ai-worker.YOUR_SUBDOMAIN.workers.dev

# Test with curl
curl -X POST https://my-ai-worker.YOUR_SUBDOMAIN.workers.dev/api/generate \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Explain zero-cost AI infrastructure in 2 sentences."}'

# Expected response:
# {"output":"Zero-cost AI infrastructure...","usage":{...},"cached":false}

That's it. You have a production AI endpoint running globally, with rate limiting, caching, and analytics logging — all on the free tier. Total time from zero: under 20 minutes.

Production Patterns: What You Need After Day 1

Pattern 1: Streaming Responses

For user-facing AI products, streaming dramatically improves perceived performance. Users see text appear word-by-word rather than waiting for the full response. Workers support streaming via the TransformStream API:

// Streaming Claude responses from Workers
const stream = await fetch('https://api.anthropic.com/v1/messages', {
  method: 'POST',
  headers: { 
    'x-api-key': env.ANTHROPIC_API_KEY,
    'anthropic-version': '2023-06-01',
    'content-type': 'application/json',
  },
  body: JSON.stringify({
    model: 'claude-haiku-4-5',
    max_tokens: 1024,
    stream: true,  // Enable SSE streaming
    messages: [{ role: 'user', content: prompt }]
  })
});

// Forward the stream directly to the client
return new Response(stream.body, {
  headers: {
    'Content-Type': 'text/event-stream',
    'Cache-Control': 'no-cache',
    'Access-Control-Allow-Origin': '*',
  }
});

Pattern 2: Workers Cron for Background AI Tasks

Scheduled Workers let you run AI tasks on a schedule — content generation, email digest creation, data summarization — at zero cost:

# Add to wrangler.toml
[triggers]
crons = ["0 9 * * MON"]  # Every Monday 9am UTC

# In index.js, add scheduled handler
export default {
  async fetch(request, env, ctx) { /* ... */ },
  
  async scheduled(event, env, ctx) {
    // This runs on your cron schedule
    const content = await generateWeeklyDigest(env);
    await sendEmail(content, env);
    console.log('Weekly digest sent:', event.scheduledTime);
  }
};

Pattern 3: Environment-Based Model Routing

Route requests to different Claude models based on task complexity. This is the pattern I use to keep costs predictable while maintaining quality:

function selectModel(taskType) {
  const routing = {
    'classify':     'claude-haiku-4-5',   // Fast, cheap: $0.0004/req
    'summarize':    'claude-haiku-4-5',   // Fast, cheap
    'generate':     'claude-haiku-4-5',   // Default for content
    'analyze':      'claude-sonnet-4-6',  // Complex reasoning
    'code_review':  'claude-sonnet-4-6',  // Needs depth
    'strategy':     'claude-opus-4-6',    // Reserved for complex
  };
  return routing[taskType] || 'claude-haiku-4-5';
}

Monitoring Your Worker in Production

Cloudflare provides free real-time analytics for every Worker. In your Cloudflare dashboard, navigate to Workers → your worker → Analytics. You'll see:

For custom metrics, use console.log() statements in your Worker — they appear in the real-time log stream via wrangler tail:

# Stream live logs from your deployed Worker
wrangler tail my-ai-worker

# You'll see logs like:
# [2026-03-13T14:23:01Z] GET /api/generate - 200 OK (342ms)
# model=claude-haiku-4-5 tokens=287+156 cost=$0.0003

Common Errors and Fixes

Error: "Script startup exceeded CPU time limit" — Your Worker is doing too much work at import time. Move initialization inside the handler function, not at the module level.

Error: "Exceeded resource limits" — CPU time exceeded 10ms. Cache more aggressively, or upgrade to the $5/month Workers paid plan for 30-second CPU time.

Claude returns 429 — You're hitting Anthropic's rate limits (5 requests/minute on tier 1). Implement exponential backoff or upgrade your Anthropic account tier. My solution: queue requests in KV with a lock and retry mechanism.

CORS errors in browser — Add the CORS headers shown in the full Worker code above. The OPTIONS preflight handler is required for browser clients.

Skip the Setup — Get Production-Ready Templates

The Zero-Cost AI Kit includes 9 pre-built Worker templates: AI API proxy, rate limiter, streaming handler, cron task runner, R2 file manager, and more. All configured, tested, and ready to deploy with your Claude API key. Skip the 6-hour setup and be live in an hour.

Get the Zero-Cost AI Kit — $47

Frequently Asked Questions

What is Cloudflare Workers and why use it for AI?
Cloudflare Workers is a serverless platform running JavaScript at the edge in 300+ locations. For AI apps, the key advantages are sub-5ms cold starts, 100,000 free requests/day, native R2/KV integration, and zero infrastructure management. You write a JavaScript function, deploy it, and it runs globally — no servers, no containers.
How do I call Claude API from Cloudflare Workers?
Use the standard fetch() API inside your Worker with the Anthropic API key stored as a Wrangler secret. Set headers: x-api-key to your key and anthropic-version to '2023-06-01'. Workers support async/await natively. Set your key securely with: wrangler secret put ANTHROPIC_API_KEY
What are the Cloudflare Workers free tier limits?
Free tier: 100,000 HTTP requests/day, 10ms CPU time per invocation, 128MB memory, KV reads (100K/day), KV writes (1K/day), R2 storage (10GB). For most solopreneur AI apps, these limits aren't hit. The $5/month paid tier extends CPU time to 30 seconds, which matters for longer AI pipelines.
How do Cloudflare Workers compare to AWS Lambda for AI?
For AI API proxying and edge logic: Workers win on cold start (5ms vs 500ms), cost at low volume ($0 vs ~$0.20/million requests), and simplicity (no IAM, VPCs, or CloudFormation). Lambda wins on max execution time (15 min vs 30 sec) and ecosystem maturity. For solopreneur AI apps making external API calls, Workers' limits rarely matter.

Related Reading

Zero-Cost AI Stack: Architecture Patterns for Solopreneurs Three production patterns — Event-Driven Agent, Async Pipeline, Edge Cache Layer — built on the Workers foundation from this tutorial. Building a $7/Month AI Business: Complete Infrastructure Guide The full cost breakdown of running 9 Workers in production — $6.83/month total. How I Replaced $300/Month SaaS with Free Tier Tools Workers replaced Zapier, Pages replaced Vercel — the complete SaaS migration case study.