Replicate
"Replicate: run open-source models, image generation (Flux/SDXL), predictions API, webhooks, streaming, Node SDK"
Replicate runs open-source models in the cloud without infrastructure management. Use it for **image generation, audio, video, and specialized open-source LLMs** that are not available through major API providers. Think of it as a model marketplace — browse, run, and pay per second of compute. Build around the **predictions API** for async workflows and **webhooks** for production pipelines. Use streaming for LLM outputs. The Node SDK provides typed, promise-based access with built-in polling. ## Key Points - **Use `replicate.run()` for simple tasks** — it handles polling internally. Use `predictions.create()` only when you need webhooks or manual control. - **Pin model versions** in production by using the full `owner/model:version` format for reproducibility. - **Use webhooks in production** instead of polling — it is more efficient and scales better. - **Set `webhook_events_filter`** to `["completed"]` to avoid receiving noisy intermediate events. - **Use Flux Schnell for drafts, Flux Dev for finals** — Schnell is 10x faster but slightly lower quality. - **Download and store generated images** — Replicate URLs are temporary and expire after about an hour. - **Handle cold starts** — first prediction on a model may take 10-30 seconds to boot. Subsequent runs are faster. - **Check model documentation** on replicate.com for input schemas — each model has unique parameters. - **Relying on Replicate output URLs for permanent storage** — URLs expire. Always download and store in your own storage (S3, GCS, etc.). - **Polling in tight loops** without backoff. Use 1-2 second intervals minimum, or better yet, use webhooks. - **Not handling `failed` status** — models can fail due to invalid inputs, GPU OOM, or timeouts. Always check prediction status. - **Sending huge base64 strings inline** when the model accepts URLs — pass a URL to avoid request size limits and improve performance. ## Quick Example ```bash REPLICATE_API_TOKEN=r8_... ```
skilldb get ai-llm-services-skills/ReplicateFull skill: 284 linesReplicate Skill
Core Philosophy
Replicate runs open-source models in the cloud without infrastructure management. Use it for image generation, audio, video, and specialized open-source LLMs that are not available through major API providers. Think of it as a model marketplace — browse, run, and pay per second of compute. Build around the predictions API for async workflows and webhooks for production pipelines. Use streaming for LLM outputs. The Node SDK provides typed, promise-based access with built-in polling.
Setup
Install the SDK and configure:
import Replicate from "replicate";
const replicate = new Replicate({
auth: process.env.REPLICATE_API_TOKEN!,
});
// Run a model (simplest form — waits for completion)
const output = await replicate.run("meta/meta-llama-3-70b-instruct", {
input: {
prompt: "Explain the CAP theorem in simple terms.",
max_tokens: 512,
temperature: 0.7,
},
});
console.log(output);
Environment variables:
REPLICATE_API_TOKEN=r8_...
Key Techniques
Image Generation with Flux
// Flux Schnell — fast, high-quality image generation
const output = await replicate.run("black-forest-labs/flux-schnell", {
input: {
prompt: "A cozy coffee shop interior, warm lighting, watercolor style",
num_outputs: 1,
aspect_ratio: "16:9",
output_format: "webp",
output_quality: 90,
},
});
// output is an array of URLs
const imageUrl = (output as string[])[0];
console.log("Image URL:", imageUrl);
// Flux Dev — higher quality, slower
const devOutput = await replicate.run("black-forest-labs/flux-dev", {
input: {
prompt: "Photorealistic portrait of a calico cat wearing a tiny top hat",
guidance: 3.5,
num_inference_steps: 28,
output_format: "png",
},
});
SDXL Image Generation
const sdxlOutput = await replicate.run(
"stability-ai/sdxl:7762fd07cf82c948538e41f63f77d685e02b063e37e496e96eefd46c929f9bdc",
{
input: {
prompt: "A futuristic cityscape at sunset, cyberpunk aesthetic",
negative_prompt: "blurry, low quality, distorted",
width: 1024,
height: 1024,
num_inference_steps: 30,
guidance_scale: 7.5,
scheduler: "K_EULER",
},
}
);
Predictions API (Async Workflow)
// Create a prediction without waiting
const prediction = await replicate.predictions.create({
model: "black-forest-labs/flux-schnell",
input: {
prompt: "Mountain landscape at golden hour",
},
});
console.log("Prediction ID:", prediction.id);
console.log("Status:", prediction.status); // "starting"
// Poll for completion
let current = prediction;
while (current.status !== "succeeded" && current.status !== "failed") {
await new Promise((r) => setTimeout(r, 2000));
current = await replicate.predictions.get(prediction.id);
console.log("Status:", current.status);
}
if (current.status === "succeeded") {
console.log("Output:", current.output);
} else {
console.error("Failed:", current.error);
}
Webhooks for Production
// Create prediction with webhook — no polling needed
const prediction = await replicate.predictions.create({
model: "black-forest-labs/flux-dev",
input: {
prompt: "A serene mountain lake at dawn",
},
webhook: "https://your-api.com/webhooks/replicate",
webhook_events_filter: ["completed"],
});
// In your webhook handler (e.g., Express):
import express from "express";
const app = express();
app.use(express.json());
app.post("/webhooks/replicate", async (req, res) => {
const prediction = req.body;
// Validate webhook (check Replicate-Webhook-Signature header)
const signature = req.headers["replicate-webhook-signature"];
if (prediction.status === "succeeded") {
const imageUrls = prediction.output;
await saveGeneratedImages(prediction.id, imageUrls);
} else if (prediction.status === "failed") {
await handleFailure(prediction.id, prediction.error);
}
res.sendStatus(200);
});
Streaming LLM Output
// Stream text from an LLM
const stream = replicate.stream("meta/meta-llama-3-70b-instruct", {
input: {
prompt: "Write a detailed guide to making sourdough bread.",
max_tokens: 1024,
temperature: 0.7,
},
});
for await (const event of stream) {
process.stdout.write(event.data);
}
Image-to-Image and Editing
import { readFileSync } from "fs";
// Image upscaling
const upscaleOutput = await replicate.run(
"nightmareai/real-esrgan:f121d640bd286e1fdc67f9799164c1d5be36ff74576ee11c803ae5b665dd46aa",
{
input: {
image: "https://example.com/low-res-photo.jpg",
scale: 4,
face_enhance: true,
},
}
);
// Background removal
const removeBgOutput = await replicate.run(
"cjwbw/rembg:fb8af171cfa1616ddcf1242c093f9c46bcada5ad4cf6f2fbe8b81b330ec5c003",
{
input: {
image: "https://example.com/product-photo.jpg",
},
}
);
Running Specific Model Versions
// Pin to an exact version for reproducibility
const output = await replicate.run(
"stability-ai/sdxl:7762fd07cf82c948538e41f63f77d685e02b063e37e496e96eefd46c929f9bdc",
{
input: {
prompt: "A watercolor painting of a lighthouse",
},
}
);
// List available model versions
const model = await replicate.models.get("black-forest-labs", "flux-schnell");
console.log("Latest version:", model.latest_version?.id);
// List all versions
const versions = await replicate.models.versions.list("black-forest-labs", "flux-schnell");
for (const v of versions.results) {
console.log(v.id, v.created_at);
}
File Inputs
import { readFileSync } from "fs";
// Pass a local file as input
const imageBuffer = readFileSync("./input.png");
const output = await replicate.run("some-model/version", {
input: {
image: imageBuffer, // SDK handles upload automatically
prompt: "Describe this image",
},
});
// Or use a data URI
const base64 = imageBuffer.toString("base64");
const dataUri = `data:image/png;base64,${base64}`;
const output2 = await replicate.run("some-model/version", {
input: {
image: dataUri,
},
});
Listing and Canceling Predictions
// List recent predictions
const predictions = await replicate.predictions.list();
for (const p of predictions.results) {
console.log(p.id, p.status, p.model, p.created_at);
}
// Cancel a running prediction
await replicate.predictions.cancel(prediction.id);
Best Practices
- Use
replicate.run()for simple tasks — it handles polling internally. Usepredictions.create()only when you need webhooks or manual control. - Pin model versions in production by using the full
owner/model:versionformat for reproducibility. - Use webhooks in production instead of polling — it is more efficient and scales better.
- Set
webhook_events_filterto["completed"]to avoid receiving noisy intermediate events. - Use Flux Schnell for drafts, Flux Dev for finals — Schnell is 10x faster but slightly lower quality.
- Download and store generated images — Replicate URLs are temporary and expire after about an hour.
- Handle cold starts — first prediction on a model may take 10-30 seconds to boot. Subsequent runs are faster.
- Check model documentation on replicate.com for input schemas — each model has unique parameters.
Anti-Patterns
- Relying on Replicate output URLs for permanent storage — URLs expire. Always download and store in your own storage (S3, GCS, etc.).
- Polling in tight loops without backoff. Use 1-2 second intervals minimum, or better yet, use webhooks.
- Not handling
failedstatus — models can fail due to invalid inputs, GPU OOM, or timeouts. Always check prediction status. - Sending huge base64 strings inline when the model accepts URLs — pass a URL to avoid request size limits and improve performance.
- Ignoring model cold start times in UX — show progress indicators. First runs can be slow; subsequent runs use warm hardware.
- Running expensive models without cost estimates — check model pricing on the Replicate website. GPU-heavy models (video, large image) can be costly at scale.
- Not validating webhook signatures — in production, always verify the
Replicate-Webhook-Signatureheader.
Install this skill directly: skilldb add ai-llm-services-skills
Related Skills
Anthropic Claude API
"Anthropic Claude API: messages API, tool use, streaming, vision, system prompts, extended thinking, batches, Node SDK"
Fireworks AI
"Fireworks AI: fast inference, function calling, grammar mode, JSON output, OpenAI-compatible API, fine-tuning"
Google Gemini API
"Google Gemini API: generateContent, multimodal (images/video/audio), function calling, streaming, embeddings, context caching"
Groq
"Groq: ultra-fast inference, OpenAI-compatible API, Llama/Mixtral models, tool use, JSON mode, streaming"
OpenAI API
"OpenAI API: chat completions, function calling/tools, streaming, embeddings, vision, JSON mode, assistants, Node SDK"
Together AI
"Together AI: inference API, open-source LLMs (Llama/Mistral), chat completions, embeddings, fine-tuning, JSON mode"