This is the implementation playbook. It is written for product teams who want to ship quickly without creating security or reliability debt. You will see the proxy-first pattern, queue-first flows, and the minimum needed to support streaming.
Step 1: Create an API key and treat it like a password
Your first job is not calling a model. Your first job is setting up the key correctly. Keys are the number one avoidable failure point in AI integrations because teams rush and put secrets in places they do not belong.
Store the key in an environment variable and load it only in server-side code. If you are using a hosting provider, use its secret store rather than hardcoding values in a repo.
Image placeholder: Screenshot of Fal.ai dashboard showing API key creation screen
Step 2: Use the proxy-first pattern (non-negotiable)
If your app has a browser or mobile client, do not let it call Fal.ai directly with your API key. Instead, add a proxy route inside your backend. Fal.ai provides a ready-made proxy integration for Next.js, and there is a clear proxy “formula” if you want to implement it yourself.
Next.js proxy example
This pattern keeps the key on the server. The client uses the SDK, but the SDK sends requests via your proxy. That gives you a secure integration without turning your backend into a complex gateway.
// install
npm install @fal-ai/client @fal-ai/server-proxy
// .env.local (server-only)
FAL_KEY="key_id:key_secret"
// App Router: src/app/api/fal/proxy/route.ts
import { route } from "@fal-ai/server-proxy/nextjs";
export const { GET, POST } = route;
// client config (any file running in the browser)
import { fal } from "@fal-ai/client";
fal.config({
proxyUrl: "/api/fal/proxy",
});
Step 3: Choose an endpoint and validate the contract
Treat each endpoint like a contract. You need to know what inputs it expects, what it returns, and what failure modes look like. Do not ship a feature until you can answer these questions clearly.
- What is the minimum input needed for a good output?
- What does a typical job take in seconds, and what is the 95th percentile?
- What errors do you get when prompts are invalid or assets cannot be fetched?
- What does the output look like, and how long will URLs remain valid?
Image placeholder: Screenshot of an endpoint API schema or input/output example
Step 4: Use “subscribe” for simple flows, and “queue.submit” for product flows
In early prototyping, it is normal to use a blocking call and wait for the result. In production, you usually want control. That means using the queue primitives directly.
Prototype flow (blocking)
import { fal } from "@fal-ai/client";
const result = await fal.subscribe("fal-ai/fast-sdxl", {
input: {
prompt: "product photo, clean studio background, high detail"
},
logs: true,
onQueueUpdate(update) {
if (update.status === "IN_PROGRESS") {
for (const log of update.logs ?? []) console.log(log.message);
}
}
});
console.log(result.data);
Production flow (async queue)
The pattern below is what I use in SaaS apps. You submit the job, store the request ID, and let the rest of your system handle completion. This avoids timeouts and lets you implement retries and audit trails.
// server-side: submit and store request_id against a user/job record
import { fal } from "@fal-ai/client";
const { request_id } = await fal.queue.submit("fal-ai/flux/dev", {
input: {
prompt: "hero image for a landing page, bold lighting, realistic"
},
webhookUrl: "https://yourdomain.com/api/fal/webhook"
});
Step 5: Handle completion with idempotency
Webhooks are not “fire once and done”. Networks fail. Your server restarts. Providers retry. Your webhook handler must be idempotent so it can safely process the same completion event multiple times.
A practical approach is to store a “processed” flag keyed by request ID, and ignore duplicates. Also validate the payload and match it to a known request ID.
Image placeholder: Diagram of webhook flow and idempotency check
Step 6: Add a polling backstop
Even if you use webhooks, you should be able to reconcile job state by polling status endpoints. This is your safety net. If a webhook was missed or blocked, you can still recover the result.
// check status by request ID (useful for reconciliation jobs)
GET https://queue.fal.run/<endpoint>/requests/<request_id>/status
Step 7: Add streaming only when it changes the user experience
Streaming is not a checkbox. It is a UX choice. Use it when users are waiting and the experience feels broken without progress. Otherwise, keep it simple and rely on async completion plus notifications.
If you do use real-time WebSockets, follow the security guidance and use short-lived tokens via a server-side proxy.
Do and don’t table (this saves teams months)
| Do | Don’t |
|---|---|
| Use a server-side proxy for all client calls. Store request IDs and build a clear job state machine. Design idempotent webhook handlers. Add cost caps before marketing the feature. Log prompts and parameters for debugging and improvement. | Put API keys in the browser or a mobile app. Block user requests waiting for video generation. Assume webhooks will fire exactly once. Skip monitoring until the first big customer complains. Let users generate unlimited outputs without controls. |
Mini FAQ: implementation
What is the fastest path to a secure MVP?
Next.js proxy, one model endpoint, async queue.submit, and a simple webhook handler with idempotency.
What do I need to log for support and debugging?
Request IDs, endpoint name, user ID, key input parameters, timing, failures, and output URLs.
How do I stop abuse?
Rate limits, per-user quotas, and requiring authentication before job submission. Also add cost-based limits per account.
once it works, make it predictable
Getting it working is step one. Making it predictable is what turns it into revenue. Go to Pricing and cost control and put guardrails in before you scale usage.