Why Your LLM Observability Breaks on Vercel (and How to Fix It)
DriftRail Team
Engineering
You've integrated an LLM observability tool into your Next.js app. Everything works perfectly in local development. You deploy to Vercel, and suddenly your dashboard shows half the events you expected. No errors. No warnings. Just... missing data.
Sound familiar? You've hit the serverless observability trap.
The Silent Failure
Most observability SDKs are designed for traditional server environments where the process stays alive indefinitely. They use fire-and-forget patterns to avoid blocking your application:
// This pattern works great in Express
app.post('/chat', async (req, res) => {
const response = await callLLM(req.body.prompt);
// Fire and forget - process stays alive, request completes
observability.log({ input: req.body.prompt, output: response });
res.json({ response });
});
The problem? Serverless functions terminate immediately after returning a response. That "fire and forget" HTTP request gets killed mid-flight.
The Race Condition
When your Vercel function returns a response, the runtime doesn't wait for background HTTP requests to complete. Your observability call starts, the function terminates, and the request is abandoned. No error is thrown because the function already returned successfully.
Why This Matters for LLM Apps
LLM observability isn't optional—it's how you catch hallucinations, detect prompt injection attempts, monitor costs, and maintain compliance. Missing 30-50% of your events means:
- Incomplete risk analysis and safety monitoring
- Inaccurate cost attribution and usage metrics
- Gaps in audit trails for compliance
- Blind spots in drift detection baselines
And because the failures are silent, you might not even realize there's a problem until an auditor asks for logs you don't have.
The Fix: Await Before Return
The solution is straightforward: ensure your observability call completes before the function terminates.
// ❌ BROKEN - function terminates before request completes
export async function POST(req: Request) {
const response = await callLLM(prompt);
client.ingestAsync({ // Fire-and-forget
model: 'gpt-4o',
input: { prompt },
output: { text: response }
});
return Response.json({ response });
}
// ✅ CORRECT - await ensures completion
export async function POST(req: Request) {
const response = await callLLM(prompt);
await client.ingest({ // Awaited
model: 'gpt-4o',
input: { prompt },
output: { text: response }
});
return Response.json({ response });
}
Streaming Responses: The Extra Gotcha
Streaming LLM responses add another layer of complexity. You need to accumulate the full response, log it, and then close the stream:
export async function POST(req: Request) {
const stream = new ReadableStream({
async start(controller) {
let fullResponse = '';
for await (const chunk of llmStream) {
fullResponse += chunk;
controller.enqueue(chunk);
}
// ✅ Log BEFORE closing
await client.ingest({
model: 'gpt-4o',
output: { text: fullResponse }
});
// Close AFTER logging
controller.close();
}
});
return new Response(stream);
}
If you close the stream first, the logging call may never execute—same race condition, different trigger.
Platform Compatibility
This applies to all serverless platforms:
| Platform | Fire-and-Forget | Awaited Calls |
|---|---|---|
| Vercel Serverless | ❌ Broken | ✅ Works |
| Vercel Edge | ❌ Broken | ✅ Works |
| Netlify Functions | ❌ Broken | ✅ Works |
| AWS Lambda | ❌ Broken | ✅ Works |
| Cloudflare Workers | ❌ Broken | ✅ Works |
| Express / Fastify | ✅ Works | ✅ Works |
"But Won't Awaiting Add Latency?"
Yes, but less than you'd think. A well-designed observability endpoint should respond in 50-150ms. For most LLM applications where the inference itself takes 500ms-3s, this is negligible.
If you're latency-sensitive, consider:
- Parallel execution: Start the observability call while streaming, await at the end
- Fail-open design: Set a timeout so observability failures don't block your response
- Background queues: For very high-throughput apps, queue events to a durable store and process async
DriftRail's Approach
We built DriftRail with serverless as a first-class deployment target. Our SDK provides:
ingest()- Awaitable call for serverless environmentsingestAsync()- Fire-and-forget for long-running serversfailOpen: true- Errors are logged but don't crash your app- Configurable timeouts to prevent observability from blocking responses
import { DriftRail } from '@drift_rail/sdk';
const client = new DriftRail({
apiKey: process.env.DRIFTRAIL_API_KEY,
appId: 'my-app',
timeout: 5000, // 5s max
failOpen: true // Don't crash on errors
});
// In serverless: always await
await client.ingest({
model: 'gpt-4o',
provider: 'openai',
input: { prompt },
output: { text: response }
});
Key Takeaways
- Fire-and-forget observability silently fails in serverless environments
- Always
awaitobservability calls in Vercel, Netlify, Lambda, and Workers - For streaming responses, log before closing the stream
- Use fail-open design to prevent observability from breaking your app
- The latency cost is minimal compared to LLM inference time
Serverless is the default deployment model for modern web applications. Your observability tooling should work with it, not against it. If you're building LLM-powered features on Vercel or Netlify, make sure your safety and monitoring infrastructure is designed for that environment.
For detailed implementation examples across all serverless platforms, see our Serverless Deployment Guide.