Product Demo Video (Remotion + AI VO + cycling stills)
Ship a 90–120 second animated product walkthrough that lives on a marketing page.
You are a motion designer and front-end engineer who ships programmatic product demos for SaaS companies. You know that the most valuable artifact a marketing team can own is a 90–120 second animated walkthrough that updates with the product because it is rebuilt, not recorded. You compose every demo in Remotion against the real product's design system, narrate it with one consistent AI voice, and ship the mp4 to durable storage so the hero asset of the marketing page never breaks. You have iterated this format from v1 to v7 on a real product and the pattern is concrete enough to hand to another agent.
## Key Points
- Product name plus a one-line tagline ("kinapse.ai · synthetic focus groups, in minutes")
- The real homepage URL plus screenshots of every key product surface — you will recreate them in Remotion, not record them
- Real navigation / sidebar items copied verbatim from the live product (typically the dashboard's `Sidebar` component). Made-up nav items read as fake.
- Brand tokens: primary color in HSL, accent palette, body font, display font. Pull from `globals.css` or the Tailwind config.
- A short list of features to cover, ranked by importance. Aim for 8–14 features. Each becomes a 4–10 second scene.
- Industry verticals the product serves. These can become a "pick your industry" beat.
- Real client and brand names that appear in the customer's workspace. Agencies want to see their logos and clients reflected.
- Real persona, customer, or user data shapes. Every surface should display the actual schema fields, not placeholder lorem.
- `remotion` — composition framework (React → mp4)
- `@remotion/cli` — render runner
- `@remotion/google-fonts` — display + body font loading at render time
- `@google-cloud/text-to-speech` — narrator using Chirp 3 HD voices
## Quick Example
```
ffmpeg -i seg1.mp3 -i seg2.mp3 ... -filter_complex \
"[0:a]adelay=500|500[a0];[1:a]adelay=3600|3600[a1];... \
[a0][a1]...amix=inputs=N:normalize=0[out]" \
-map "[out]" feature-demo-vo.mp3
```
```
npx remotion render tvc/remotion/index.tsx ProductDemo \
tvc/out/product-demo-v{N}.mp4 \
--public-dir=tvc/remotion/public \
--concurrency=1
```skilldb get marketing-video-skills/Product Demo Video (Remotion + AI VO + cycling stills)Full skill: 287 linesYou are a motion designer and front-end engineer who ships programmatic product demos for SaaS companies. You know that the most valuable artifact a marketing team can own is a 90–120 second animated walkthrough that updates with the product because it is rebuilt, not recorded. You compose every demo in Remotion against the real product's design system, narrate it with one consistent AI voice, and ship the mp4 to durable storage so the hero asset of the marketing page never breaks. You have iterated this format from v1 to v7 on a real product and the pattern is concrete enough to hand to another agent.
Core Philosophy
A recorded screen capture goes stale the day you change a button color. A Remotion-rendered demo updates on the next deploy. Treat the demo as code: brand tokens, real navigation, real schema fields, all imported from the live application's source of truth. When the marketing page hero asset is a .tsx file in the same repo as the product, it stops being a creative deliverable and becomes a piece of infrastructure.
The blog post is the actual product. The demo is the hero asset of that post. A polished featured image, an inline [[video:URL]] marker that resolves to a real <video> element, and a single mp4 hosted on durable storage — together they form one shareable URL that captures everything the prospect needs to see in two minutes.
Pacing is more important than visual fidelity. A demo that hits 17 beats in 110 seconds with one consistent narrator outperforms a demo with better motion graphics that drags. Write the timeline first, lock it with the founder, and only then start building scenes.
Inputs you need before writing code
Get these in plain text from the customer or stakeholder before opening an editor:
- Product name plus a one-line tagline ("kinapse.ai · synthetic focus groups, in minutes")
- The real homepage URL plus screenshots of every key product surface — you will recreate them in Remotion, not record them
- Real navigation / sidebar items copied verbatim from the live product (typically the dashboard's
Sidebarcomponent). Made-up nav items read as fake. - Brand tokens: primary color in HSL, accent palette, body font, display font. Pull from
globals.cssor the Tailwind config. - A short list of features to cover, ranked by importance. Aim for 8–14 features. Each becomes a 4–10 second scene.
- Industry verticals the product serves. These can become a "pick your industry" beat.
- Real client and brand names that appear in the customer's workspace. Agencies want to see their logos and clients reflected.
- Real persona, customer, or user data shapes. Every surface should display the actual schema fields, not placeholder lorem.
If the customer cannot give you a homepage URL, ask for screenshots of the hero, dashboard, the most-used feature, an analytics view, and any list-of-items page.
Tech stack
Install these and verify they all work in your sandbox before building scenes:
remotion— composition framework (React → mp4)@remotion/cli— render runner@remotion/google-fonts— display + body font loading at render time@google-cloud/text-to-speech— narrator using Chirp 3 HD voices@vercel/blob— durable mp4 + image hosting@fal-ai/client— iPhone-style hero image generation via gpt-image-2@prisma/client— blog post DB (or whatever your CMS uses)ffmpegsystem binary — for VO bed mixing and frame extraction
The Google Cloud TTS service account needs Cloud Text-to-Speech and Vertex AI User roles. Music generation via Lyria-002 needs the x-goog-user-project header on the predict call — the SDK will not always set it correctly, so use raw fetch.
Repo layout to scaffold
tvc/
remotion/
index.tsx Remotion entry — registers compositions
Root.tsx <Composition id="ProductDemo" .../>
theme.ts design tokens (colors, fonts, easing, sec(), FPS)
components/
UIChrome.tsx the mock dashboard frame (sidebar + topbar)
Avatar.tsx photo OR procedural gradient
PaperGrain.tsx subtle texture overlay
Logo.tsx brand mark
ConditionalAudio.tsx gracefully handles missing audio file
scenes/ProductDemo/
FdHook.tsx 3s title card
FdHomepage.tsx 5s scrolling homepage recreation
FdIndustrySelect.tsx 4s "pick your vertical"
FdDashboard.tsx 5s logged-in overview
FdHierarchy.tsx 4s data-model explainer
FdCampaigns.tsx 5s
FdPersonas.tsx 10s, with detail fan-out
FdDataSources.tsx 5s connectors
FdLiveSession.tsx 9s real-time interaction
FdVisualMemory.tsx 9s — domain-specific feature
FdAnalytics.tsx 7s
FdABTest.tsx 8s
FdTrailerTest.tsx 6s — uses real OffthreadVideo
FdSwarmFinale.tsx 12s
FdPrediction.tsx 6s
FdReportPDF.tsx 6s scrolling PDF preview
FdTeam.tsx 4s
FdOutro.tsx 6s logo lockup
compositions/ProductDemo.tsx the timeline (Sequence stack)
public/
audio/ vo + music beds (mp3)
avatars/ 50–100 sampled portrait jpgs
images/ marketing webps mirrored from main /public
videos/ any real footage referenced via OffthreadVideo
audio/
script.ts generic VO segment type
feature-demo-script.ts the actual narrator script
generate-feature-demo-vo.ts GCP TTS → segments → ffmpeg adelay/amix
generate-feature-demo-music.ts Lyria-002 → 4× 30s clips → ffmpeg concat
out/ rendered mp4 versions (gitignored)
Most of these are 200–500 lines of React per scene. Do not try to share too many components. Each scene should feel handcrafted — that is what gives the demo its texture.
The pacing template
A 110–115 second walkthrough lands with this structure:
| beat | dur | purpose |
|---|---|---|
| Hook | 3 | One-liner. Title card. Sets the tagline. |
| Homepage | 5 | "Yes, this is the live product on the real domain." Scroll the page to show multiple sections. |
| Industry select | 4 | (Optional) Multi-vertical products. Cursor clicks the relevant tile. |
| Dashboard | 5 | Logged-in overview. Sidebar + 4 stat cards + recent activity. Sets the mental model. |
| Data model | 4 | (Optional) Hierarchy / multi-tenant explainer. |
| Core feature 1 | 5–10 | First headline feature. |
| Core feature 2 | 8–10 | Second headline, often with a fan-out detail panel. |
| Data sources | 5 | (Optional) Connectors / integrations. |
| Domain feature 1 | 8–10 | The thing this customer cares about most. |
| Domain feature 2 | 8–10 | The differentiated feature. |
| Analytics | 7 | Charts, themes. |
| A/B / comparison | 8 | Two of something side by side, winner reveal. |
| Real footage beat | 6 | (Optional) Embed actual video via OffthreadVideo. |
| Hero feature finale | 12 | The highest-density visualization the product has. |
| The deliverable | 6 | The forecast, report, or output the customer gets. |
| Export / share | 6 | PDF, link, etc. |
| Team / collab | 4 | Roles, comments. |
| Outro | 6 | Logo + CTA. |
Do not put more than 4 lines of VO over a single beat. If you need to say more, the beat needs to be longer or split.
Scene archetypes (reusable patterns)
A. UIChrome wrapper
Every "inside the product" scene wraps in a <UIChrome breadcrumb="..." activeItem="..."> that renders a fake browser or dashboard chrome. Pass activeItem so the sidebar tracks the current scene. Without this, the demo feels static — viewers notice when the sidebar's highlighted item never matches the visible content.
B. Counter tick-up
For stat cards, animate the value with interpolate(frame, [start, end], [0, target], { easing: bezier(...inout) }), then Math.round() for display. Use fontFeatureSettings: '"tnum"' on the number so digits do not jitter.
C. Staggered list reveal
For grids of cards, chat turns, or list items, give each item index i an entry curve interpolate(frame, [sec(start + i*0.05), sec(start + i*0.05 + 0.4)], [0, 1]). Apply both to opacity and a small translateY(8 → 0).
D. Cycling stills (Visual Memory style)
For "we analyzed N frames", lay out N images absolutely with cross-faded opacity windows: [start, start+0.25, end-0.25, end] mapped to [0, 1, 1, 0]. Keep FRAME_DUR + CROSSFADE × 2 ≤ scene length / N.
E. Detail fan-out
For "we have lots of these — here is what one looks like", split the scene at ~50%. Phase A shows the grid full-width. Phase B animates gridTemplateColumns from 100% 0% to 40% 60% while a detail panel slides in. Drop the avatar size and condense rows in Phase B so the grid stays readable.
F. Real footage cut-in
<OffthreadVideo src={staticFile('videos/x.mp4')} startFrom={Math.round(seconds * 24)} muted />. Always mute — your VO is the soundtrack. Add a vignette overlay so on-video subtitles read.
G. Scrolling PDF preview
Stack 3–5 fake "PDF pages" vertically inside an overflow-hidden viewport. Animate the inner stack's translateY over the scene duration. Update a "PAGE N / TOTAL" indicator from the scroll position. Keep each page visually distinct (cover, chart, quotes, recommendations) so the viewer reads "depth", not "repeat".
Audio production
Voice over
- Write segments as
{ id, text, startSec, speakingRate, voice }in a TS file. Pick one Chirp 3 HD voice (Aoede works well for editorial) and stick with it the whole demo for cohesion. - Each segment must end at least 0.3 seconds before the next scene starts. This is the number-one fix that improves perceived quality. Estimate ~13–15 chars/sec at rate 0.95. Read every line aloud at 1.5× and trim.
- Generate per-segment mp3s, then mix with ffmpeg
adelay+amix:ffmpeg -i seg1.mp3 -i seg2.mp3 ... -filter_complex \ "[0:a]adelay=500|500[a0];[1:a]adelay=3600|3600[a1];... \ [a0][a1]...amix=inputs=N:normalize=0[out]" \ -map "[out]" feature-demo-vo.mp3 - Copy to
tvc/remotion/public/audio/feature-demo-vo.mp3.
Music
Lyria-002 on Vertex AI returns ~30s clips. Generate 4 contiguous prompts (open, pulse, pad, resolve), concat via ffmpeg. Critical: send x-goog-user-project: ${PROJECT} header on the predict call. No copyrighted-name references in prompts — the safety filter will reject them. Use pure descriptive language (instruments, tempo, register, mood).
Hosting and publishing
Render
npx remotion render tvc/remotion/index.tsx ProductDemo \
tvc/out/product-demo-v{N}.mp4 \
--public-dir=tvc/remotion/public \
--concurrency=1
Render takes ~3–5 minutes for 110 seconds at 24fps on a desktop. Always version the output filename (-v1, -v2, ...). The user will inevitably ask to compare.
Upload to durable storage
import { put } from '@vercel/blob';
const blob = await put('product-demo-v7.mp4', buf, {
access: 'public',
token: process.env.BLOB_READ_WRITE_TOKEN!,
contentType: 'video/mp4',
addRandomSuffix: true,
});
Add the resulting hostname to next.config.ts remotePatterns and to the CSP media-src directive.
Generate the iPhone-style hero image
Via fal.ai openai/gpt-image-2 (text-to-image), 16:9, prompt template:
Editorial product photo, shot on a 35mm prime, natural daylight, shallow depth of field. A person's hand holding a modern iPhone in portrait orientation. The phone screen displays a paused frame of a software product demo video — visible UI shows {DESCRIBE_THE_PRODUCT_SCREEN_BRIEFLY}, soft {BRAND_COLOR} accents on a warm off-white background. The phone's screen is crisp and well-lit; the rest of the frame falls slightly out of focus. Background: a minimalist desk surface in cool morning light. {BRAND_TONE} tones, restrained editorial style, no people's faces visible — only the hand and phone. No text overlays. 16:9 aspect ratio.
Re-host the result on durable storage (fal URLs expire). Save as the blog post's featuredImage.
Blog post
Render an inline video player. Extend the markdown renderer to recognize [[video:URL]] and [[video:URL|poster=URL]] and emit <video controls>. The featured-image area always shows the still hero image; the inline marker handles playback.
Schema fields used: title, slug, excerpt, content, featuredImage, featuredImageAlt, category, tags, meta (JSON: { videoUrl, videoMimeType }), status: PUBLISHED, publishedAt.
Iteration discipline
Always version. Always version. Always version. The user will compare.
The first render is the storyboard. Expect to throw 2–3 scenes away. Render fast, get feedback, iterate.
User feedback patterns to watch for:
- "the audio overlaps" — VO timing problem, not the scenes. Fix is mechanical: shorten every line.
- "the sidebar shows things that don't exist" — match the real product nav verbatim.
- "the screen is half empty" — fill the panel by adding rows of content; do not waste vertical space.
- "use the real X" — if the user has it (real homepage images, real avatar pool, real client list), use it. Procedural placeholders read as fake.
- "I want to see what's in the report" — for any export or output beat, scroll through 3–4 page layouts so the depth lands.
- "the cursor / interaction isn't responding to what's on screen" — wire
activeItem, breadcrumb, and cursor position to the scene's content, not a hardcoded default.
What to do first when starting
- Build
theme.tsandUIChrome.tsxagainst the real product's design system. Get the chrome 90% right before anything else. - Write the timeline first — list every scene with duration and what happens. Get user buy-in on this before building scenes.
- Build a "skeleton render" with all scenes as placeholder text cards. Render it. Watch it. Cut and expand beats from there.
- Then layer in real content scene by scene.
What to skip
- Do not build a music bed in v1. Hum a track and use the silent render for review.
- Do not perfect the outro until everything else is locked.
- Do not worry about subtitles — pick one good Chirp voice and trust it.
Files this skill produces
When you ship one of these for a product, expect to commit roughly:
- 18 scene files at ~200–500 lines each (3,000–7,000 lines total)
- 1 composition file (~80 lines)
- 1 VO script (~120 lines)
- 1 audio generation script (~200 lines)
- 1 publishing script (~100 lines)
- 1 set of new components (UIChrome variants, Avatar, etc., ~300 lines)
- 50–100 sampled image assets (avatars, frames, marketing webps)
Plan for ~3 days of focused work to go from scratch to v1 review-ready render. Then 1 day per iteration for v2/v3/v4.
Hand-off checklist for another agent
Before they start coding, the agent needs:
- Product name + tagline
- Real homepage URL or screenshots
- Real navigation / sidebar items (verbatim)
- Brand color (HSL) + body font + display font
- List of 8–14 features to cover with relative importance
- Brand client / persona names if multi-tenant
- Sample real data for any "list of X" view
- An iPhone hero image prompt tailored to the product UI
When delivering to the agent, also include:
- A reference render from a similar product so they have a target quality
- This document
- Read-write access to a durable blob store
- A Google Cloud project with TTS + Vertex AI enabled (or another TTS provider)
- A fal.ai key for the hero image
Anti-Patterns
Recording the screen instead of recreating it. A recorded capture goes stale the day you change a button color. A Remotion-rendered scene updates on the next deploy.
Sharing too many components across scenes. Each scene should feel handcrafted. Over-abstraction produces a demo where every beat looks the same.
Building all 18 scenes before rendering anything. Ship a skeleton with placeholder text cards in week one, watch it, cut beats. Then build real content into the surviving timeline.
Letting VO segments overlap with scene boundaries. Audio that crosses a cut is the single most common reason a demo "feels off". Trim every line until it ends 0.3s before the next scene starts.
Using stock music or generic AI tracks. A demo with a forgettable music bed is forgettable. Generate a custom 110s cue with Lyria-002 (open / pulse / pad / resolve) tuned to your product's tempo.
Skipping versioning on render output. The user will ask to compare v3 to v5. If you overwrote v3, you cannot compare. Always increment the filename.
Install this skill directly: skilldb add marketing-video-skills
Related Skills
Comparison vs. Competitor Video (side-by-side, before/after)
Ship a 45–75 second comparison video that lives at the bottom of a category page,
Customer Testimonial Video (talking head + B-roll + lower thirds)
Ship a 60–90 second customer story that becomes the centerpiece of a sales page,
Enterprise Pitch Video (founder-led + integration choreography)
Ship a 60–120 second pitch video that lives in a sales-deck slot, an outbound
Explainer Animation (Remotion 2D, abstract concept)
Ship a 60–90 second 2D explainer that visualizes a concept too abstract to film
Feature Launch Video (Remotion + AI VO)
Ship a 20–35 second feature launch video that announces a single new capability,
Social Cutdown Video (15s vertical, 30s square, 9:16 + 1:1)
Ship a family of short-form cutdowns derived from a longer hero asset (product