Technology & EngineeringAi Llm Services284 lines

Replicate

"Replicate: run open-source models, image generation (Flux/SDXL), predictions API, webhooks, streaming, Node SDK"

Quick Summary24 lines

Replicate runs open-source models in the cloud without infrastructure management. Use it for **image generation, audio, video, and specialized open-source LLMs** that are not available through major API providers. Think of it as a model marketplace — browse, run, and pay per second of compute. Build around the **predictions API** for async workflows and **webhooks** for production pipelines. Use streaming for LLM outputs. The Node SDK provides typed, promise-based access with built-in polling.

## Key Points

- **Use `replicate.run()` for simple tasks** — it handles polling internally. Use `predictions.create()` only when you need webhooks or manual control.
- **Pin model versions** in production by using the full `owner/model:version` format for reproducibility.
- **Use webhooks in production** instead of polling — it is more efficient and scales better.
- **Set `webhook_events_filter`** to `["completed"]` to avoid receiving noisy intermediate events.
- **Use Flux Schnell for drafts, Flux Dev for finals** — Schnell is 10x faster but slightly lower quality.
- **Download and store generated images** — Replicate URLs are temporary and expire after about an hour.
- **Handle cold starts** — first prediction on a model may take 10-30 seconds to boot. Subsequent runs are faster.
- **Check model documentation** on replicate.com for input schemas — each model has unique parameters.
- **Relying on Replicate output URLs for permanent storage** — URLs expire. Always download and store in your own storage (S3, GCS, etc.).
- **Polling in tight loops** without backoff. Use 1-2 second intervals minimum, or better yet, use webhooks.
- **Not handling `failed` status** — models can fail due to invalid inputs, GPU OOM, or timeouts. Always check prediction status.
- **Sending huge base64 strings inline** when the model accepts URLs — pass a URL to avoid request size limits and improve performance.

## Quick Example

```bash
REPLICATE_API_TOKEN=r8_...
```

skilldb get ai-llm-services-skills/ReplicateFull skill: 284 lines

Paste into your CLAUDE.md or agent config

Replicate Skill

Core Philosophy

Replicate runs open-source models in the cloud without infrastructure management. Use it for image generation, audio, video, and specialized open-source LLMs that are not available through major API providers. Think of it as a model marketplace — browse, run, and pay per second of compute. Build around the predictions API for async workflows and webhooks for production pipelines. Use streaming for LLM outputs. The Node SDK provides typed, promise-based access with built-in polling.

Setup

Install the SDK and configure:

import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN!,
});

// Run a model (simplest form — waits for completion)
const output = await replicate.run("meta/meta-llama-3-70b-instruct", {
  input: {
    prompt: "Explain the CAP theorem in simple terms.",
    max_tokens: 512,
    temperature: 0.7,
  },
});

console.log(output);

Environment variables:

REPLICATE_API_TOKEN=r8_...

Key Techniques

Image Generation with Flux

// Flux Schnell — fast, high-quality image generation
const output = await replicate.run("black-forest-labs/flux-schnell", {
  input: {
    prompt: "A cozy coffee shop interior, warm lighting, watercolor style",
    num_outputs: 1,
    aspect_ratio: "16:9",
    output_format: "webp",
    output_quality: 90,
  },
});

// output is an array of URLs
const imageUrl = (output as string[])[0];
console.log("Image URL:", imageUrl);

// Flux Dev — higher quality, slower
const devOutput = await replicate.run("black-forest-labs/flux-dev", {
  input: {
    prompt: "Photorealistic portrait of a calico cat wearing a tiny top hat",
    guidance: 3.5,
    num_inference_steps: 28,
    output_format: "png",
  },
});

SDXL Image Generation

const sdxlOutput = await replicate.run(
  "stability-ai/sdxl:7762fd07cf82c948538e41f63f77d685e02b063e37e496e96eefd46c929f9bdc",
  {
    input: {
      prompt: "A futuristic cityscape at sunset, cyberpunk aesthetic",
      negative_prompt: "blurry, low quality, distorted",
      width: 1024,
      height: 1024,
      num_inference_steps: 30,
      guidance_scale: 7.5,
      scheduler: "K_EULER",
    },
  }
);

Predictions API (Async Workflow)

// Create a prediction without waiting
const prediction = await replicate.predictions.create({
  model: "black-forest-labs/flux-schnell",
  input: {
    prompt: "Mountain landscape at golden hour",
  },
});

console.log("Prediction ID:", prediction.id);
console.log("Status:", prediction.status); // "starting"

// Poll for completion
let current = prediction;
while (current.status !== "succeeded" && current.status !== "failed") {
  await new Promise((r) => setTimeout(r, 2000));
  current = await replicate.predictions.get(prediction.id);
  console.log("Status:", current.status);
}

if (current.status === "succeeded") {
  console.log("Output:", current.output);
} else {
  console.error("Failed:", current.error);
}

Webhooks for Production

// Create prediction with webhook — no polling needed
const prediction = await replicate.predictions.create({
  model: "black-forest-labs/flux-dev",
  input: {
    prompt: "A serene mountain lake at dawn",
  },
  webhook: "https://your-api.com/webhooks/replicate",
  webhook_events_filter: ["completed"],
});

// In your webhook handler (e.g., Express):
import express from "express";

const app = express();
app.use(express.json());

app.post("/webhooks/replicate", async (req, res) => {
  const prediction = req.body;

  // Validate webhook (check Replicate-Webhook-Signature header)
  const signature = req.headers["replicate-webhook-signature"];

  if (prediction.status === "succeeded") {
    const imageUrls = prediction.output;
    await saveGeneratedImages(prediction.id, imageUrls);
  } else if (prediction.status === "failed") {
    await handleFailure(prediction.id, prediction.error);
  }

  res.sendStatus(200);
});

Streaming LLM Output

// Stream text from an LLM
const stream = replicate.stream("meta/meta-llama-3-70b-instruct", {
  input: {
    prompt: "Write a detailed guide to making sourdough bread.",
    max_tokens: 1024,
    temperature: 0.7,
  },
});

for await (const event of stream) {
  process.stdout.write(event.data);
}

Image-to-Image and Editing

import { readFileSync } from "fs";

// Image upscaling
const upscaleOutput = await replicate.run(
  "nightmareai/real-esrgan:f121d640bd286e1fdc67f9799164c1d5be36ff74576ee11c803ae5b665dd46aa",
  {
    input: {
      image: "https://example.com/low-res-photo.jpg",
      scale: 4,
      face_enhance: true,
    },
  }
);

// Background removal
const removeBgOutput = await replicate.run(
  "cjwbw/rembg:fb8af171cfa1616ddcf1242c093f9c46bcada5ad4cf6f2fbe8b81b330ec5c003",
  {
    input: {
      image: "https://example.com/product-photo.jpg",
    },
  }
);

Running Specific Model Versions

// Pin to an exact version for reproducibility
const output = await replicate.run(
  "stability-ai/sdxl:7762fd07cf82c948538e41f63f77d685e02b063e37e496e96eefd46c929f9bdc",
  {
    input: {
      prompt: "A watercolor painting of a lighthouse",
    },
  }
);

// List available model versions
const model = await replicate.models.get("black-forest-labs", "flux-schnell");
console.log("Latest version:", model.latest_version?.id);

// List all versions
const versions = await replicate.models.versions.list("black-forest-labs", "flux-schnell");
for (const v of versions.results) {
  console.log(v.id, v.created_at);
}

File Inputs

import { readFileSync } from "fs";

// Pass a local file as input
const imageBuffer = readFileSync("./input.png");

const output = await replicate.run("some-model/version", {
  input: {
    image: imageBuffer, // SDK handles upload automatically
    prompt: "Describe this image",
  },
});

// Or use a data URI
const base64 = imageBuffer.toString("base64");
const dataUri = `data:image/png;base64,${base64}`;

const output2 = await replicate.run("some-model/version", {
  input: {
    image: dataUri,
  },
});

Listing and Canceling Predictions

// List recent predictions
const predictions = await replicate.predictions.list();
for (const p of predictions.results) {
  console.log(p.id, p.status, p.model, p.created_at);
}

// Cancel a running prediction
await replicate.predictions.cancel(prediction.id);

Best Practices

Use replicate.run() for simple tasks — it handles polling internally. Use predictions.create() only when you need webhooks or manual control.
Pin model versions in production by using the full owner/model:version format for reproducibility.
Use webhooks in production instead of polling — it is more efficient and scales better.
Set webhook_events_filter to ["completed"] to avoid receiving noisy intermediate events.
Use Flux Schnell for drafts, Flux Dev for finals — Schnell is 10x faster but slightly lower quality.
Download and store generated images — Replicate URLs are temporary and expire after about an hour.
Handle cold starts — first prediction on a model may take 10-30 seconds to boot. Subsequent runs are faster.
Check model documentation on replicate.com for input schemas — each model has unique parameters.

Anti-Patterns

Relying on Replicate output URLs for permanent storage — URLs expire. Always download and store in your own storage (S3, GCS, etc.).
Polling in tight loops without backoff. Use 1-2 second intervals minimum, or better yet, use webhooks.
Not handling failed status — models can fail due to invalid inputs, GPU OOM, or timeouts. Always check prediction status.
Sending huge base64 strings inline when the model accepts URLs — pass a URL to avoid request size limits and improve performance.
Ignoring model cold start times in UX — show progress indicators. First runs can be slow; subsequent runs use warm hardware.
Running expensive models without cost estimates — check model pricing on the Replicate website. GPU-heavy models (video, large image) can be costly at scale.
Not validating webhook signatures — in production, always verify the Replicate-Webhook-Signature header.

Install this skill directly: skilldb add ai-llm-services-skills

Get CLI access →

Replicate

Replicate Skill

Core Philosophy

Setup

Key Techniques

Image Generation with Flux

SDXL Image Generation

Predictions API (Async Workflow)

Webhooks for Production

Streaming LLM Output

Image-to-Image and Editing

Running Specific Model Versions

File Inputs

Listing and Canceling Predictions

Best Practices

Anti-Patterns

Related Skills

Anthropic Claude API

Fireworks AI

Google Gemini API

Groq

OpenAI API

Together AI