Skip to main content
Technology & EngineeringImage Generation Services293 lines

Replicate Image Generation

Replicate for image generation: Flux, SDXL, img2img, inpainting, upscaling, predictions API, webhooks, streaming

Quick Summary28 lines
Replicate provides a unified API for running open-source AI models in the cloud. For image generation, it hosts models like Flux, SDXL, and Stable Diffusion without requiring GPU infrastructure. The platform uses a predictions-based API where you submit a request, receive a prediction ID, and poll or use webhooks for results. Models are versioned and community-maintained, so you always reference a specific model version hash. The billing model is per-second of GPU time, making it cost-effective for variable workloads. Prefer the official Node SDK over raw HTTP calls for type safety and cleaner error handling.

## Key Points

- **Pin model versions**: Always use explicit version hashes in production to avoid breaking changes when models update.
- **Use webhooks for long-running jobs**: Polling wastes compute and introduces latency. Set up webhook endpoints for batch workloads.
- **Cache predictions**: Store prediction IDs and results. If a user requests the same image twice, return the cached version.
- **Handle cold starts**: First prediction on a model may take longer. Use `replicate.models.get()` to check model status before generating.
- **Set timeouts**: Use `replicate.run()` with a reasonable timeout to avoid hanging on failed predictions.
- **Validate inputs**: Check prompt length and image dimensions before submitting. Replicate charges for failed predictions that consume GPU time.
- **Use `go_fast` for Flux Schnell**: This flag enables quantized inference for significantly faster generation with minimal quality loss.
- **Download and store outputs**: Replicate output URLs are temporary. Download and persist images to your own storage immediately.
- **Polling in tight loops**: Never `while(true)` poll the predictions API without delays. The SDK handles polling automatically with `replicate.run()`.
- **Ignoring model versions**: Using `owner/model` without a version hash means your code can break when the model updates. Always pin versions for production.
- **Sending massive images as base64**: For img2img, upload large images to a URL-accessible location first. Base64-encoded images in the request body hit size limits and slow down requests.
- **Not handling rate limits**: Replicate returns 429 status codes. Implement exponential backoff rather than retrying immediately.

## Quick Example

```typescript
import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});
```
skilldb get image-generation-services-skills/Replicate Image GenerationFull skill: 293 lines
Paste into your CLAUDE.md or agent config

Replicate Image Generation

Core Philosophy

Replicate provides a unified API for running open-source AI models in the cloud. For image generation, it hosts models like Flux, SDXL, and Stable Diffusion without requiring GPU infrastructure. The platform uses a predictions-based API where you submit a request, receive a prediction ID, and poll or use webhooks for results. Models are versioned and community-maintained, so you always reference a specific model version hash. The billing model is per-second of GPU time, making it cost-effective for variable workloads. Prefer the official Node SDK over raw HTTP calls for type safety and cleaner error handling.

Setup

Install the Replicate Node SDK and configure authentication:

import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Set REPLICATE_API_TOKEN from your Replicate dashboard. The SDK automatically handles polling, retries, and streaming where supported.

For projects with multiple model calls, create a shared client instance:

// lib/replicate.ts
import Replicate from "replicate";

let client: Replicate | null = null;

export function getReplicateClient(): Replicate {
  if (!client) {
    if (!process.env.REPLICATE_API_TOKEN) {
      throw new Error("REPLICATE_API_TOKEN is required");
    }
    client = new Replicate({ auth: process.env.REPLICATE_API_TOKEN });
  }
  return client;
}

Key Techniques

Text-to-Image with Flux

Flux is a high-quality image generation model available on Replicate in several variants:

async function generateWithFlux(prompt: string, options?: {
  aspectRatio?: string;
  numOutputs?: number;
  outputFormat?: "webp" | "png" | "jpg";
  guidanceScale?: number;
}): Promise<string[]> {
  const replicate = getReplicateClient();

  const output = await replicate.run("black-forest-labs/flux-schnell", {
    input: {
      prompt,
      num_outputs: options?.numOutputs ?? 1,
      aspect_ratio: options?.aspectRatio ?? "1:1",
      output_format: options?.outputFormat ?? "webp",
      go_fast: true,
    },
  });

  return output as string[];
}

// For higher quality with Flux Pro
async function generateWithFluxPro(prompt: string): Promise<string[]> {
  const replicate = getReplicateClient();

  const output = await replicate.run("black-forest-labs/flux-1.1-pro", {
    input: {
      prompt,
      width: 1024,
      height: 1024,
      prompt_upsampling: true,
      safety_tolerance: 2,
      output_format: "webp",
    },
  });

  return output as string[];
}

SDXL Generation

async function generateWithSDXL(prompt: string, negativePrompt?: string): Promise<string[]> {
  const replicate = getReplicateClient();

  const output = await replicate.run(
    "stability-ai/sdxl:7762fd07cf82c948538e41f63f77d685e02b063e37e496e96eefd46c929f9bdc",
    {
      input: {
        prompt,
        negative_prompt: negativePrompt ?? "low quality, blurry, distorted",
        width: 1024,
        height: 1024,
        num_outputs: 1,
        scheduler: "K_EULER",
        num_inference_steps: 30,
        guidance_scale: 7.5,
        refine: "expert_ensemble_refiner",
        high_noise_frac: 0.8,
      },
    }
  );

  return output as string[];
}

Image-to-Image

import fs from "fs";

async function img2img(
  imagePathOrUrl: string,
  prompt: string,
  strength: number = 0.75
): Promise<string[]> {
  const replicate = getReplicateClient();

  let imageInput: string | File;
  if (imagePathOrUrl.startsWith("http")) {
    imageInput = imagePathOrUrl;
  } else {
    const buffer = fs.readFileSync(imagePathOrUrl);
    const base64 = buffer.toString("base64");
    const mimeType = imagePathOrUrl.endsWith(".png") ? "image/png" : "image/jpeg";
    imageInput = `data:${mimeType};base64,${base64}`;
  }

  const output = await replicate.run(
    "stability-ai/sdxl:7762fd07cf82c948538e41f63f77d685e02b063e37e496e96eefd46c929f9bdc",
    {
      input: {
        image: imageInput,
        prompt,
        prompt_strength: strength,
        num_inference_steps: 30,
      },
    }
  );

  return output as string[];
}

Inpainting

async function inpaint(
  imageUrl: string,
  maskUrl: string,
  prompt: string
): Promise<string[]> {
  const replicate = getReplicateClient();

  const output = await replicate.run(
    "stability-ai/stable-diffusion-inpainting:95b7223104132402a9ae91cc677285bc5eb997f76ab45f93e1cbd4b4e08d6e29",
    {
      input: {
        image: imageUrl,
        mask: maskUrl,
        prompt,
        num_outputs: 1,
        guidance_scale: 7.5,
        num_inference_steps: 25,
      },
    }
  );

  return output as string[];
}

Upscaling

async function upscaleImage(imageUrl: string, scale: number = 4): Promise<string> {
  const replicate = getReplicateClient();

  const output = await replicate.run(
    "nightmareai/real-esrgan:f121d640bd286e1fdc67f9799164c1d5be36ff74576ee11c803ae5b665dd46aa",
    {
      input: {
        image: imageUrl,
        scale,
        face_enhance: false,
      },
    }
  );

  return output as string;
}

Webhooks for Async Processing

async function generateWithWebhook(prompt: string, webhookUrl: string): Promise<string> {
  const replicate = getReplicateClient();

  const prediction = await replicate.predictions.create({
    model: "black-forest-labs/flux-schnell",
    input: { prompt, num_outputs: 1 },
    webhook: webhookUrl,
    webhook_events_filter: ["completed"],
  });

  return prediction.id;
}

// Express webhook handler
import express from "express";

const app = express();
app.post("/webhooks/replicate", express.json(), (req, res) => {
  const prediction = req.body;

  if (prediction.status === "succeeded") {
    const imageUrls: string[] = prediction.output;
    // Process completed images
    console.log("Generated images:", imageUrls);
  } else if (prediction.status === "failed") {
    console.error("Generation failed:", prediction.error);
  }

  res.sendStatus(200);
});

Streaming Output

async function generateWithStreaming(prompt: string): Promise<void> {
  const replicate = getReplicateClient();

  const prediction = await replicate.predictions.create({
    model: "black-forest-labs/flux-schnell",
    input: { prompt },
    stream: true,
  });

  if (prediction.urls?.stream) {
    const response = await fetch(prediction.urls.stream, {
      headers: { Accept: "text/event-stream" },
    });

    const reader = response.body!.getReader();
    const decoder = new TextDecoder();

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;
      const chunk = decoder.decode(value);
      // Process SSE events
      console.log("Stream chunk:", chunk);
    }
  }
}

Best Practices

  • Pin model versions: Always use explicit version hashes in production to avoid breaking changes when models update.
  • Use webhooks for long-running jobs: Polling wastes compute and introduces latency. Set up webhook endpoints for batch workloads.
  • Cache predictions: Store prediction IDs and results. If a user requests the same image twice, return the cached version.
  • Handle cold starts: First prediction on a model may take longer. Use replicate.models.get() to check model status before generating.
  • Set timeouts: Use replicate.run() with a reasonable timeout to avoid hanging on failed predictions.
  • Validate inputs: Check prompt length and image dimensions before submitting. Replicate charges for failed predictions that consume GPU time.
  • Use go_fast for Flux Schnell: This flag enables quantized inference for significantly faster generation with minimal quality loss.
  • Download and store outputs: Replicate output URLs are temporary. Download and persist images to your own storage immediately.

Anti-Patterns

  • Polling in tight loops: Never while(true) poll the predictions API without delays. The SDK handles polling automatically with replicate.run().
  • Ignoring model versions: Using owner/model without a version hash means your code can break when the model updates. Always pin versions for production.
  • Sending massive images as base64: For img2img, upload large images to a URL-accessible location first. Base64-encoded images in the request body hit size limits and slow down requests.
  • Not handling rate limits: Replicate returns 429 status codes. Implement exponential backoff rather than retrying immediately.
  • Hardcoding model IDs: Store model references in configuration so you can swap models (e.g., from SDXL to Flux) without code changes.
  • Skipping error states: Predictions can end in failed or canceled status. Always check prediction.status and handle prediction.error.
  • Running synchronous in serverless: In Lambda or edge functions, use webhooks instead of replicate.run() which blocks until completion.

Install this skill directly: skilldb add image-generation-services-skills

Get CLI access →