Google Gemini API
"Google Gemini API: generateContent, multimodal (images/video/audio), function calling, streaming, embeddings, context caching"
Gemini is Google's multimodal-first model family. Its core strength is **native multimodal understanding** — images, video, audio, and text in a single request. Build around `generateContent` as the primary method. Use the Google AI SDK for direct API access or the Vertex AI SDK for enterprise deployments. Leverage long context windows (up to 2M tokens on Gemini 1.5 Pro) for document-heavy workflows. Use context caching to reduce costs on repeated large-context calls. ## Key Points - **Use the File API for large media** — upload videos and audio files rather than inlining base64 for anything over a few MB. - **Use context caching** when you will send the same large context (>32K tokens) in multiple requests within an hour. - **Set `responseSchema`** alongside `responseMimeType: "application/json"` for type-safe structured outputs. - **Use `gemini-2.0-flash`** as the default for cost and speed; switch to Pro only for tasks requiring deeper reasoning. - **Check `response.promptFeedback`** for safety blocks before accessing text content. - **Use `safetySettings`** to adjust content filtering thresholds per category for your use case. - **Batch embedding requests** — send multiple texts in one call to reduce latency and overhead. - **Ignoring file processing state** — uploaded files may be in `PROCESSING` state. Always poll until `ACTIVE` before using them. - **Sending large video as base64 inline** — use the File API. Inline data has strict size limits and is inefficient. - **Not setting `maxOutputTokens`** — Gemini defaults can vary by model. Always specify to control costs. - **Using context caching for single requests** — caching has a creation cost; it only saves money when the cache is reused. - **Ignoring `finishReason`** — check for `"SAFETY"`, `"MAX_TOKENS"`, `"RECITATION"` to handle edge cases. ## Quick Example ```bash GEMINI_API_KEY=AIza... ```
skilldb get ai-llm-services-skills/Google Gemini APIFull skill: 279 linesGoogle Gemini API Skill
Core Philosophy
Gemini is Google's multimodal-first model family. Its core strength is native multimodal understanding — images, video, audio, and text in a single request. Build around generateContent as the primary method. Use the Google AI SDK for direct API access or the Vertex AI SDK for enterprise deployments. Leverage long context windows (up to 2M tokens on Gemini 1.5 Pro) for document-heavy workflows. Use context caching to reduce costs on repeated large-context calls.
Setup
Install the SDK and configure:
import { GoogleGenerativeAI } from "@google/generative-ai";
const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY!);
const model = genAI.getGenerativeModel({
model: "gemini-2.0-flash",
});
// Basic text generation
const result = await model.generateContent("Explain quantum entanglement simply.");
console.log(result.response.text());
Environment variables:
GEMINI_API_KEY=AIza...
Vertex AI Setup (Alternative)
import { VertexAI } from "@google-cloud/vertexai";
const vertexAI = new VertexAI({
project: process.env.GCP_PROJECT_ID!,
location: "us-central1",
});
const model = vertexAI.getGenerativeModel({
model: "gemini-2.0-flash",
});
Key Techniques
Multi-Turn Chat
const chat = model.startChat({
history: [
{ role: "user", parts: [{ text: "I'm building a REST API in Node." }] },
{
role: "model",
parts: [{ text: "I can help with that. What framework are you using?" }],
},
],
generationConfig: {
temperature: 0.7,
maxOutputTokens: 1024,
},
});
const result = await chat.sendMessage("I'm using Fastify. How do I add auth?");
console.log(result.response.text());
Streaming
const streamResult = await model.generateContentStream(
"Write a short story about a robot learning to paint."
);
for await (const chunk of streamResult.stream) {
const text = chunk.text();
process.stdout.write(text);
}
// Get the full aggregated response
const aggregated = await streamResult.response;
console.log("\n\nUsage:", aggregated.usageMetadata);
Image Understanding
import { readFileSync } from "fs";
const imageData = readFileSync("./diagram.png");
const result = await model.generateContent([
{
inlineData: {
mimeType: "image/png",
data: imageData.toString("base64"),
},
},
{ text: "Describe this architecture diagram. List each component and its connections." },
]);
console.log(result.response.text());
Video and Audio Analysis
import { GoogleAIFileManager } from "@google/generative-ai/server";
const fileManager = new GoogleAIFileManager(process.env.GEMINI_API_KEY!);
// Upload a video file
const uploadResult = await fileManager.uploadFile("./meeting.mp4", {
mimeType: "video/mp4",
displayName: "Team meeting recording",
});
// Wait for processing
let file = await fileManager.getFile(uploadResult.file.name);
while (file.state === "PROCESSING") {
await new Promise((r) => setTimeout(r, 5000));
file = await fileManager.getFile(uploadResult.file.name);
}
const result = await model.generateContent([
{
fileData: {
mimeType: file.mimeType,
fileUri: file.uri,
},
},
{ text: "Summarize the key decisions made in this meeting." },
]);
console.log(result.response.text());
Function Calling
const modelWithTools = genAI.getGenerativeModel({
model: "gemini-2.0-flash",
tools: [
{
functionDeclarations: [
{
name: "searchProducts",
description: "Search product catalog by query",
parameters: {
type: "object",
properties: {
query: { type: "string", description: "Search query" },
category: { type: "string", enum: ["electronics", "clothing", "books"] },
maxPrice: { type: "number", description: "Maximum price filter" },
},
required: ["query"],
},
},
],
},
],
});
const chatWithTools = modelWithTools.startChat();
const response = await chatWithTools.sendMessage("Find me wireless headphones under $100");
const functionCall = response.response.functionCalls()?.[0];
if (functionCall) {
const products = await searchProducts(functionCall.args);
const followUp = await chatWithTools.sendMessage([
{
functionResponse: {
name: functionCall.name,
response: { results: products },
},
},
]);
console.log(followUp.response.text());
}
Embeddings
const embeddingModel = genAI.getGenerativeModel({
model: "text-embedding-004",
});
const result = await embeddingModel.embedContent("What is machine learning?");
const vector = result.embedding.values; // number[]
// Batch embeddings
const batchResult = await embeddingModel.batchEmbedContents({
requests: [
{ content: { role: "user", parts: [{ text: "First document" }] } },
{ content: { role: "user", parts: [{ text: "Second document" }] } },
{ content: { role: "user", parts: [{ text: "Third document" }] } },
],
});
Context Caching
import { GoogleAICacheManager } from "@google/generative-ai/server";
const cacheManager = new GoogleAICacheManager(process.env.GEMINI_API_KEY!);
const cache = await cacheManager.create({
model: "models/gemini-2.0-flash",
contents: [
{
role: "user",
parts: [{ text: largeDocumentText }], // e.g., an entire codebase or book
},
],
systemInstruction: {
role: "system",
parts: [{ text: "You are an expert analyst of this document." }],
},
ttlSeconds: 3600,
});
// Use the cache for multiple queries — much cheaper than resending context
const cachedModel = genAI.getGenerativeModelFromCachedContent(cache);
const answer1 = await cachedModel.generateContent("What are the main themes?");
const answer2 = await cachedModel.generateContent("List all mentioned people.");
JSON Output Mode
const jsonModel = genAI.getGenerativeModel({
model: "gemini-2.0-flash",
generationConfig: {
responseMimeType: "application/json",
responseSchema: {
type: "object",
properties: {
title: { type: "string" },
topics: { type: "array", items: { type: "string" } },
sentiment: { type: "string", enum: ["positive", "negative", "neutral"] },
},
required: ["title", "topics", "sentiment"],
},
},
});
const result = await jsonModel.generateContent("Analyze this article: ...");
const parsed = JSON.parse(result.response.text());
Best Practices
- Use the File API for large media — upload videos and audio files rather than inlining base64 for anything over a few MB.
- Use context caching when you will send the same large context (>32K tokens) in multiple requests within an hour.
- Set
responseSchemaalongsideresponseMimeType: "application/json"for type-safe structured outputs. - Use
gemini-2.0-flashas the default for cost and speed; switch to Pro only for tasks requiring deeper reasoning. - Check
response.promptFeedbackfor safety blocks before accessing text content. - Use
safetySettingsto adjust content filtering thresholds per category for your use case. - Batch embedding requests — send multiple texts in one call to reduce latency and overhead.
Anti-Patterns
- Ignoring file processing state — uploaded files may be in
PROCESSINGstate. Always poll untilACTIVEbefore using them. - Sending large video as base64 inline — use the File API. Inline data has strict size limits and is inefficient.
- Not setting
maxOutputTokens— Gemini defaults can vary by model. Always specify to control costs. - Using context caching for single requests — caching has a creation cost; it only saves money when the cache is reused.
- Ignoring
finishReason— check for"SAFETY","MAX_TOKENS","RECITATION"to handle edge cases. - Creating new chat sessions for each message — reuse
startChatsessions for multi-turn conversations. - Not handling function call responses — if you declare tools, you must process function calls and return results.
Install this skill directly: skilldb add ai-llm-services-skills
Related Skills
Anthropic Claude API
"Anthropic Claude API: messages API, tool use, streaming, vision, system prompts, extended thinking, batches, Node SDK"
Fireworks AI
"Fireworks AI: fast inference, function calling, grammar mode, JSON output, OpenAI-compatible API, fine-tuning"
Groq
"Groq: ultra-fast inference, OpenAI-compatible API, Llama/Mixtral models, tool use, JSON mode, streaming"
OpenAI API
"OpenAI API: chat completions, function calling/tools, streaming, embeddings, vision, JSON mode, assistants, Node SDK"
Replicate
"Replicate: run open-source models, image generation (Flux/SDXL), predictions API, webhooks, streaming, Node SDK"
Together AI
"Together AI: inference API, open-source LLMs (Llama/Mistral), chat completions, embeddings, fine-tuning, JSON mode"