Replicate Image Generation
Replicate for image generation: Flux, SDXL, img2img, inpainting, upscaling, predictions API, webhooks, streaming
Replicate provides a unified API for running open-source AI models in the cloud. For image generation, it hosts models like Flux, SDXL, and Stable Diffusion without requiring GPU infrastructure. The platform uses a predictions-based API where you submit a request, receive a prediction ID, and poll or use webhooks for results. Models are versioned and community-maintained, so you always reference a specific model version hash. The billing model is per-second of GPU time, making it cost-effective for variable workloads. Prefer the official Node SDK over raw HTTP calls for type safety and cleaner error handling.
## Key Points
- **Pin model versions**: Always use explicit version hashes in production to avoid breaking changes when models update.
- **Use webhooks for long-running jobs**: Polling wastes compute and introduces latency. Set up webhook endpoints for batch workloads.
- **Cache predictions**: Store prediction IDs and results. If a user requests the same image twice, return the cached version.
- **Handle cold starts**: First prediction on a model may take longer. Use `replicate.models.get()` to check model status before generating.
- **Set timeouts**: Use `replicate.run()` with a reasonable timeout to avoid hanging on failed predictions.
- **Validate inputs**: Check prompt length and image dimensions before submitting. Replicate charges for failed predictions that consume GPU time.
- **Use `go_fast` for Flux Schnell**: This flag enables quantized inference for significantly faster generation with minimal quality loss.
- **Download and store outputs**: Replicate output URLs are temporary. Download and persist images to your own storage immediately.
- **Polling in tight loops**: Never `while(true)` poll the predictions API without delays. The SDK handles polling automatically with `replicate.run()`.
- **Ignoring model versions**: Using `owner/model` without a version hash means your code can break when the model updates. Always pin versions for production.
- **Sending massive images as base64**: For img2img, upload large images to a URL-accessible location first. Base64-encoded images in the request body hit size limits and slow down requests.
- **Not handling rate limits**: Replicate returns 429 status codes. Implement exponential backoff rather than retrying immediately.
## Quick Example
```typescript
import Replicate from "replicate";
const replicate = new Replicate({
auth: process.env.REPLICATE_API_TOKEN,
});
```skilldb get image-generation-services-skills/Replicate Image GenerationFull skill: 293 linesReplicate Image Generation
Core Philosophy
Replicate provides a unified API for running open-source AI models in the cloud. For image generation, it hosts models like Flux, SDXL, and Stable Diffusion without requiring GPU infrastructure. The platform uses a predictions-based API where you submit a request, receive a prediction ID, and poll or use webhooks for results. Models are versioned and community-maintained, so you always reference a specific model version hash. The billing model is per-second of GPU time, making it cost-effective for variable workloads. Prefer the official Node SDK over raw HTTP calls for type safety and cleaner error handling.
Setup
Install the Replicate Node SDK and configure authentication:
import Replicate from "replicate";
const replicate = new Replicate({
auth: process.env.REPLICATE_API_TOKEN,
});
Set REPLICATE_API_TOKEN from your Replicate dashboard. The SDK automatically handles polling, retries, and streaming where supported.
For projects with multiple model calls, create a shared client instance:
// lib/replicate.ts
import Replicate from "replicate";
let client: Replicate | null = null;
export function getReplicateClient(): Replicate {
if (!client) {
if (!process.env.REPLICATE_API_TOKEN) {
throw new Error("REPLICATE_API_TOKEN is required");
}
client = new Replicate({ auth: process.env.REPLICATE_API_TOKEN });
}
return client;
}
Key Techniques
Text-to-Image with Flux
Flux is a high-quality image generation model available on Replicate in several variants:
async function generateWithFlux(prompt: string, options?: {
aspectRatio?: string;
numOutputs?: number;
outputFormat?: "webp" | "png" | "jpg";
guidanceScale?: number;
}): Promise<string[]> {
const replicate = getReplicateClient();
const output = await replicate.run("black-forest-labs/flux-schnell", {
input: {
prompt,
num_outputs: options?.numOutputs ?? 1,
aspect_ratio: options?.aspectRatio ?? "1:1",
output_format: options?.outputFormat ?? "webp",
go_fast: true,
},
});
return output as string[];
}
// For higher quality with Flux Pro
async function generateWithFluxPro(prompt: string): Promise<string[]> {
const replicate = getReplicateClient();
const output = await replicate.run("black-forest-labs/flux-1.1-pro", {
input: {
prompt,
width: 1024,
height: 1024,
prompt_upsampling: true,
safety_tolerance: 2,
output_format: "webp",
},
});
return output as string[];
}
SDXL Generation
async function generateWithSDXL(prompt: string, negativePrompt?: string): Promise<string[]> {
const replicate = getReplicateClient();
const output = await replicate.run(
"stability-ai/sdxl:7762fd07cf82c948538e41f63f77d685e02b063e37e496e96eefd46c929f9bdc",
{
input: {
prompt,
negative_prompt: negativePrompt ?? "low quality, blurry, distorted",
width: 1024,
height: 1024,
num_outputs: 1,
scheduler: "K_EULER",
num_inference_steps: 30,
guidance_scale: 7.5,
refine: "expert_ensemble_refiner",
high_noise_frac: 0.8,
},
}
);
return output as string[];
}
Image-to-Image
import fs from "fs";
async function img2img(
imagePathOrUrl: string,
prompt: string,
strength: number = 0.75
): Promise<string[]> {
const replicate = getReplicateClient();
let imageInput: string | File;
if (imagePathOrUrl.startsWith("http")) {
imageInput = imagePathOrUrl;
} else {
const buffer = fs.readFileSync(imagePathOrUrl);
const base64 = buffer.toString("base64");
const mimeType = imagePathOrUrl.endsWith(".png") ? "image/png" : "image/jpeg";
imageInput = `data:${mimeType};base64,${base64}`;
}
const output = await replicate.run(
"stability-ai/sdxl:7762fd07cf82c948538e41f63f77d685e02b063e37e496e96eefd46c929f9bdc",
{
input: {
image: imageInput,
prompt,
prompt_strength: strength,
num_inference_steps: 30,
},
}
);
return output as string[];
}
Inpainting
async function inpaint(
imageUrl: string,
maskUrl: string,
prompt: string
): Promise<string[]> {
const replicate = getReplicateClient();
const output = await replicate.run(
"stability-ai/stable-diffusion-inpainting:95b7223104132402a9ae91cc677285bc5eb997f76ab45f93e1cbd4b4e08d6e29",
{
input: {
image: imageUrl,
mask: maskUrl,
prompt,
num_outputs: 1,
guidance_scale: 7.5,
num_inference_steps: 25,
},
}
);
return output as string[];
}
Upscaling
async function upscaleImage(imageUrl: string, scale: number = 4): Promise<string> {
const replicate = getReplicateClient();
const output = await replicate.run(
"nightmareai/real-esrgan:f121d640bd286e1fdc67f9799164c1d5be36ff74576ee11c803ae5b665dd46aa",
{
input: {
image: imageUrl,
scale,
face_enhance: false,
},
}
);
return output as string;
}
Webhooks for Async Processing
async function generateWithWebhook(prompt: string, webhookUrl: string): Promise<string> {
const replicate = getReplicateClient();
const prediction = await replicate.predictions.create({
model: "black-forest-labs/flux-schnell",
input: { prompt, num_outputs: 1 },
webhook: webhookUrl,
webhook_events_filter: ["completed"],
});
return prediction.id;
}
// Express webhook handler
import express from "express";
const app = express();
app.post("/webhooks/replicate", express.json(), (req, res) => {
const prediction = req.body;
if (prediction.status === "succeeded") {
const imageUrls: string[] = prediction.output;
// Process completed images
console.log("Generated images:", imageUrls);
} else if (prediction.status === "failed") {
console.error("Generation failed:", prediction.error);
}
res.sendStatus(200);
});
Streaming Output
async function generateWithStreaming(prompt: string): Promise<void> {
const replicate = getReplicateClient();
const prediction = await replicate.predictions.create({
model: "black-forest-labs/flux-schnell",
input: { prompt },
stream: true,
});
if (prediction.urls?.stream) {
const response = await fetch(prediction.urls.stream, {
headers: { Accept: "text/event-stream" },
});
const reader = response.body!.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
// Process SSE events
console.log("Stream chunk:", chunk);
}
}
}
Best Practices
- Pin model versions: Always use explicit version hashes in production to avoid breaking changes when models update.
- Use webhooks for long-running jobs: Polling wastes compute and introduces latency. Set up webhook endpoints for batch workloads.
- Cache predictions: Store prediction IDs and results. If a user requests the same image twice, return the cached version.
- Handle cold starts: First prediction on a model may take longer. Use
replicate.models.get()to check model status before generating. - Set timeouts: Use
replicate.run()with a reasonable timeout to avoid hanging on failed predictions. - Validate inputs: Check prompt length and image dimensions before submitting. Replicate charges for failed predictions that consume GPU time.
- Use
go_fastfor Flux Schnell: This flag enables quantized inference for significantly faster generation with minimal quality loss. - Download and store outputs: Replicate output URLs are temporary. Download and persist images to your own storage immediately.
Anti-Patterns
- Polling in tight loops: Never
while(true)poll the predictions API without delays. The SDK handles polling automatically withreplicate.run(). - Ignoring model versions: Using
owner/modelwithout a version hash means your code can break when the model updates. Always pin versions for production. - Sending massive images as base64: For img2img, upload large images to a URL-accessible location first. Base64-encoded images in the request body hit size limits and slow down requests.
- Not handling rate limits: Replicate returns 429 status codes. Implement exponential backoff rather than retrying immediately.
- Hardcoding model IDs: Store model references in configuration so you can swap models (e.g., from SDXL to Flux) without code changes.
- Skipping error states: Predictions can end in
failedorcanceledstatus. Always checkprediction.statusand handleprediction.error. - Running synchronous in serverless: In Lambda or edge functions, use webhooks instead of
replicate.run()which blocks until completion.
Install this skill directly: skilldb add image-generation-services-skills
Related Skills
Adobe Firefly API
"Adobe Firefly: text-to-image generation, generative fill, generative expand, style reference, content credentials, REST API"
Cloudinary Image Generation & Manipulation
"Cloudinary: image and video upload, transformation, AI-based generation, background removal, CDN delivery, URL-based API"
DALL-E Image Generation
"DALL-E API (OpenAI): image generation, editing, variations, quality/style params, size options, Node SDK"
fal.ai Image Generation
"fal.ai: fast inference, Flux, realtime image gen, queue API, webhooks, JavaScript SDK, serverless GPU"
Imgix Image Processing
"Imgix: real-time image processing and CDN, URL-based transformations, resizing, cropping, watermarking, face detection, format optimization"
Leonardo AI Image Generation
"Leonardo AI: image generation API, fine-tuned models, canvas editing, texture generation, REST API"