Technology & EngineeringAi Llm Services279 lines

Google Gemini API

"Google Gemini API: generateContent, multimodal (images/video/audio), function calling, streaming, embeddings, context caching"

Quick Summary24 lines

Gemini is Google's multimodal-first model family. Its core strength is **native multimodal understanding** — images, video, audio, and text in a single request. Build around `generateContent` as the primary method. Use the Google AI SDK for direct API access or the Vertex AI SDK for enterprise deployments. Leverage long context windows (up to 2M tokens on Gemini 1.5 Pro) for document-heavy workflows. Use context caching to reduce costs on repeated large-context calls.

## Key Points

- **Use the File API for large media** — upload videos and audio files rather than inlining base64 for anything over a few MB.
- **Use context caching** when you will send the same large context (>32K tokens) in multiple requests within an hour.
- **Set `responseSchema`** alongside `responseMimeType: "application/json"` for type-safe structured outputs.
- **Use `gemini-2.0-flash`** as the default for cost and speed; switch to Pro only for tasks requiring deeper reasoning.
- **Check `response.promptFeedback`** for safety blocks before accessing text content.
- **Use `safetySettings`** to adjust content filtering thresholds per category for your use case.
- **Batch embedding requests** — send multiple texts in one call to reduce latency and overhead.
- **Ignoring file processing state** — uploaded files may be in `PROCESSING` state. Always poll until `ACTIVE` before using them.
- **Sending large video as base64 inline** — use the File API. Inline data has strict size limits and is inefficient.
- **Not setting `maxOutputTokens`** — Gemini defaults can vary by model. Always specify to control costs.
- **Using context caching for single requests** — caching has a creation cost; it only saves money when the cache is reused.
- **Ignoring `finishReason`** — check for `"SAFETY"`, `"MAX_TOKENS"`, `"RECITATION"` to handle edge cases.

## Quick Example

```bash
GEMINI_API_KEY=AIza...
```

skilldb get ai-llm-services-skills/Google Gemini APIFull skill: 279 lines

Paste into your CLAUDE.md or agent config

Google Gemini API Skill

Core Philosophy

Gemini is Google's multimodal-first model family. Its core strength is native multimodal understanding — images, video, audio, and text in a single request. Build around generateContent as the primary method. Use the Google AI SDK for direct API access or the Vertex AI SDK for enterprise deployments. Leverage long context windows (up to 2M tokens on Gemini 1.5 Pro) for document-heavy workflows. Use context caching to reduce costs on repeated large-context calls.

Setup

Install the SDK and configure:

import { GoogleGenerativeAI } from "@google/generative-ai";

const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY!);

const model = genAI.getGenerativeModel({
  model: "gemini-2.0-flash",
});

// Basic text generation
const result = await model.generateContent("Explain quantum entanglement simply.");
console.log(result.response.text());

Environment variables:

GEMINI_API_KEY=AIza...

Vertex AI Setup (Alternative)

import { VertexAI } from "@google-cloud/vertexai";

const vertexAI = new VertexAI({
  project: process.env.GCP_PROJECT_ID!,
  location: "us-central1",
});

const model = vertexAI.getGenerativeModel({
  model: "gemini-2.0-flash",
});

Key Techniques

Multi-Turn Chat

const chat = model.startChat({
  history: [
    { role: "user", parts: [{ text: "I'm building a REST API in Node." }] },
    {
      role: "model",
      parts: [{ text: "I can help with that. What framework are you using?" }],
    },
  ],
  generationConfig: {
    temperature: 0.7,
    maxOutputTokens: 1024,
  },
});

const result = await chat.sendMessage("I'm using Fastify. How do I add auth?");
console.log(result.response.text());

Streaming

const streamResult = await model.generateContentStream(
  "Write a short story about a robot learning to paint."
);

for await (const chunk of streamResult.stream) {
  const text = chunk.text();
  process.stdout.write(text);
}

// Get the full aggregated response
const aggregated = await streamResult.response;
console.log("\n\nUsage:", aggregated.usageMetadata);

Image Understanding

import { readFileSync } from "fs";

const imageData = readFileSync("./diagram.png");

const result = await model.generateContent([
  {
    inlineData: {
      mimeType: "image/png",
      data: imageData.toString("base64"),
    },
  },
  { text: "Describe this architecture diagram. List each component and its connections." },
]);

console.log(result.response.text());

Video and Audio Analysis

import { GoogleAIFileManager } from "@google/generative-ai/server";

const fileManager = new GoogleAIFileManager(process.env.GEMINI_API_KEY!);

// Upload a video file
const uploadResult = await fileManager.uploadFile("./meeting.mp4", {
  mimeType: "video/mp4",
  displayName: "Team meeting recording",
});

// Wait for processing
let file = await fileManager.getFile(uploadResult.file.name);
while (file.state === "PROCESSING") {
  await new Promise((r) => setTimeout(r, 5000));
  file = await fileManager.getFile(uploadResult.file.name);
}

const result = await model.generateContent([
  {
    fileData: {
      mimeType: file.mimeType,
      fileUri: file.uri,
    },
  },
  { text: "Summarize the key decisions made in this meeting." },
]);

console.log(result.response.text());

Function Calling

const modelWithTools = genAI.getGenerativeModel({
  model: "gemini-2.0-flash",
  tools: [
    {
      functionDeclarations: [
        {
          name: "searchProducts",
          description: "Search product catalog by query",
          parameters: {
            type: "object",
            properties: {
              query: { type: "string", description: "Search query" },
              category: { type: "string", enum: ["electronics", "clothing", "books"] },
              maxPrice: { type: "number", description: "Maximum price filter" },
            },
            required: ["query"],
          },
        },
      ],
    },
  ],
});

const chatWithTools = modelWithTools.startChat();
const response = await chatWithTools.sendMessage("Find me wireless headphones under $100");

const functionCall = response.response.functionCalls()?.[0];
if (functionCall) {
  const products = await searchProducts(functionCall.args);

  const followUp = await chatWithTools.sendMessage([
    {
      functionResponse: {
        name: functionCall.name,
        response: { results: products },
      },
    },
  ]);
  console.log(followUp.response.text());
}

Embeddings

const embeddingModel = genAI.getGenerativeModel({
  model: "text-embedding-004",
});

const result = await embeddingModel.embedContent("What is machine learning?");
const vector = result.embedding.values; // number[]

// Batch embeddings
const batchResult = await embeddingModel.batchEmbedContents({
  requests: [
    { content: { role: "user", parts: [{ text: "First document" }] } },
    { content: { role: "user", parts: [{ text: "Second document" }] } },
    { content: { role: "user", parts: [{ text: "Third document" }] } },
  ],
});

Context Caching

import { GoogleAICacheManager } from "@google/generative-ai/server";

const cacheManager = new GoogleAICacheManager(process.env.GEMINI_API_KEY!);

const cache = await cacheManager.create({
  model: "models/gemini-2.0-flash",
  contents: [
    {
      role: "user",
      parts: [{ text: largeDocumentText }], // e.g., an entire codebase or book
    },
  ],
  systemInstruction: {
    role: "system",
    parts: [{ text: "You are an expert analyst of this document." }],
  },
  ttlSeconds: 3600,
});

// Use the cache for multiple queries — much cheaper than resending context
const cachedModel = genAI.getGenerativeModelFromCachedContent(cache);
const answer1 = await cachedModel.generateContent("What are the main themes?");
const answer2 = await cachedModel.generateContent("List all mentioned people.");

JSON Output Mode

const jsonModel = genAI.getGenerativeModel({
  model: "gemini-2.0-flash",
  generationConfig: {
    responseMimeType: "application/json",
    responseSchema: {
      type: "object",
      properties: {
        title: { type: "string" },
        topics: { type: "array", items: { type: "string" } },
        sentiment: { type: "string", enum: ["positive", "negative", "neutral"] },
      },
      required: ["title", "topics", "sentiment"],
    },
  },
});

const result = await jsonModel.generateContent("Analyze this article: ...");
const parsed = JSON.parse(result.response.text());

Best Practices

Use the File API for large media — upload videos and audio files rather than inlining base64 for anything over a few MB.
Use context caching when you will send the same large context (>32K tokens) in multiple requests within an hour.
Set responseSchema alongside responseMimeType: "application/json" for type-safe structured outputs.
Use gemini-2.0-flash as the default for cost and speed; switch to Pro only for tasks requiring deeper reasoning.
Check response.promptFeedback for safety blocks before accessing text content.
Use safetySettings to adjust content filtering thresholds per category for your use case.
Batch embedding requests — send multiple texts in one call to reduce latency and overhead.

Anti-Patterns

Ignoring file processing state — uploaded files may be in PROCESSING state. Always poll until ACTIVE before using them.
Sending large video as base64 inline — use the File API. Inline data has strict size limits and is inefficient.
Not setting maxOutputTokens — Gemini defaults can vary by model. Always specify to control costs.
Using context caching for single requests — caching has a creation cost; it only saves money when the cache is reused.
Ignoring finishReason — check for "SAFETY", "MAX_TOKENS", "RECITATION" to handle edge cases.
Creating new chat sessions for each message — reuse startChat sessions for multi-turn conversations.
Not handling function call responses — if you declare tools, you must process function calls and return results.

Install this skill directly: skilldb add ai-llm-services-skills

Get CLI access →

Google Gemini API

Google Gemini API Skill

Core Philosophy

Setup

Vertex AI Setup (Alternative)

Key Techniques

Multi-Turn Chat

Streaming

Image Understanding

Video and Audio Analysis

Function Calling

Embeddings

Context Caching

JSON Output Mode

Best Practices

Anti-Patterns

Related Skills

Anthropic Claude API

Fireworks AI

Groq

OpenAI API

Replicate

Together AI