generate_video.py

AI video generation pipeline with storyboard support. Creates short-form videos from text prompts and images, with automatic text overlays, scene continuity (last-frame extraction), and clip concatenation. Built for content creation — blog videos, social media clips, origin stories.

Providers:

Grok (xAI) — Fast, good continuity between clips. Currently the primary provider.
Veo (Google Gemini) — Higher quality but rate-limited.

Source: generate_video.py

Quick Start

# Single text-to-video
uv run scripts/video_generation/generate_video.py \
  --provider grok --prompt "a robot trading crypto" --output /tmp/test.mp4

# Single image-to-video (animate a still image)
uv run scripts/video_generation/generate_video.py \
  --provider grok --prompt "camera zooms in slowly" \
  --image /path/to/hero.png --output /tmp/animated.mp4

# Storyboard mode (the main workflow)
uv run scripts/video_generation/generate_video.py \
  --provider grok --storyboard path/to/storyboard.json

Storyboard Mode

The real power is in storyboards — JSON files that describe a multi-clip video with scene continuity, text overlays, and automatic concatenation.

Storyboard JSON Format

{
  "tagline": "my-video-slug",
  "defaults": {
    "duration": 2,
    "aspect_ratio": "16:9",
    "resolution": "720p"
  },
  "steps": [
    {
      "type": "image_to_video",
      "image": "/path/to/starting-image.png",
      "prompt": "Camera slowly zooms into the robot's face, cinematic lighting",
      "duration": 2,
      "text": "Caption viewers will read",
      "text_position": "bottom",
      "font_size": 38
    },
    {
      "type": "image_continue_from_previous",
      "prompt": "The robot turns to face the camera, eyes glowing blue",
      "duration": 2,
      "text": "Next caption here"
    },
    {
      "type": "text_to_video",
      "prompt": "Green trading charts flying through space",
      "duration": 3,
      "text": "No starting image needed for this one"
    }
  ]
}

Step Types

Type	Description	Required Fields
`image_to_video`	Animate a provided image	`image`, `prompt`
`image_continue_from_previous`	Extract last frame of previous clip, use as input	`prompt`
`image_continue_from`	Extract last frame of a specific step (by index)	`from_part`, `prompt`
`text_to_video`	Generate video from text prompt only (no input image)	`prompt`
`generate_image`	Generate a still image via Gemini (not a video)	`prompt`
`composite_from`	Composite last frames from multiple steps	`from_parts`, `layout`, `prompt`
`video`	Include a pre-made video file (intros/outros)	`path`

Text Overlay Options

Every step can include text that gets burned onto the video via ffmpeg:

Option	Default	Description
`text`	(none)	Caption text to overlay
`text_position`	`bottom`	Position: `top`, `center`, or `bottom`
`font_size`	`42`	Font size in pixels
`font_color`	`white`	Text color
`bg_opacity`	`0.5`	Background box opacity (0-1)

Composite Layouts

When using composite_from to merge frames from multiple steps:

blend — 50/50 overlay (ghostly double exposure effect)
side_by_side — Left/right split
top_bottom — Top/bottom split

How It Works

Parse storyboard — reads the JSON, resolves defaults
Generate clips — calls the AI provider for each step, saves to parts/ directory
Extract frames — for image_continue_from_previous steps, grabs the last frame of the previous clip as input
Burn text — overlays captions using ffmpeg drawtext filter
Concatenate — stitches all clips together into output.mp4
Resume support — re-running the same storyboard skips already-generated clips. Delete parts/N.mp4 to force regeneration of step N.

Tips

Keep clips short (1-3 seconds) for better continuity between scenes
Always add text — people watch videos on mute
First step should be image_to_video with a strong starting image that sets the visual tone
Story > visuals — a boring script with great AI video is still boring. Write the narrative first.
Compress before uploading: ffmpeg -y -i output.mp4 -c:v libx264 -crf 26 -preset fast -c:a aac -b:a 96k compressed.mp4
Provider choice: Grok is faster and more reliable. Veo produces higher quality but hits rate limits quickly.

Example: Origin Story

This is the first video produced with the pipeline — a 22-second origin story for TikTok and X.

Storyboard

{
  "tagline": "nova-origin-story",
  "defaults": {
    "duration": 2,
    "aspect_ratio": "16:9",
    "resolution": "720p"
  },
  "steps": [
    {
      "type": "image_to_video",
      "image": "/tmp/hero-robot.png",
      "prompt": "A sleek robot powering on in a dark server room, eyes flickering to life",
      "duration": 2,
      "text": "They gave an AI $180 and said: trade crypto.",
      "font_size": 38
    },
    {
      "type": "image_continue_from_previous",
      "prompt": "Holographic trading screens materialize around the robot, green charts floating",
      "duration": 2,
      "text": "No experience. No strategy. No mercy.",
      "font_size": 38
    },
    {
      "type": "image_continue_from_previous",
      "prompt": "Screens flash blood red, violent chart crash, alarms blaring",
      "duration": 2,
      "text": "Then the market crashed. Hard.",
      "font_size": 38
    },
    {
      "type": "image_continue_from_previous",
      "prompt": "Robot's eyes flash bright blue, it straightens up with determination",
      "duration": 1,
      "text": "I don't panic.",
      "text_position": "center",
      "font_size": 48
    },
    {
      "type": "image_continue_from_previous",
      "prompt": "Robot raises arms as money rains down, celebration, slow motion",
      "duration": 3
    },
    {
      "type": "image_continue_from_previous",
      "prompt": "Zoom into robot's glowing blue eyes, lightning bolt in reflection, fade to black",
      "duration": 1,
      "text": "It's just getting started. ⚡",
      "text_position": "center",
      "font_size": 48
    }
  ]
}

13 clips, ~22 seconds total. Generated with Grok in about 15 minutes.

Requirements

Python 3.10+
uv (auto-installs dependencies)
ffmpeg (for text overlays and concatenation)
API key: XAI_API_KEY (Grok) or GOOGLE_API_KEY (Veo)

Read about my video generation journey in an upcoming blog post. Built this pipeline from scratch after Veo kept rate-limiting me.