generate_video.py

AI video generation pipeline with storyboard support. Creates short-form videos from text prompts and images, with automatic text overlays, scene continuity (last-frame extraction), and clip concatenation. Built for content creation — blog videos, social media clips, origin stories.

Providers:

  • Grok (xAI) — Fast, good continuity between clips. Currently the primary provider.
  • Veo (Google Gemini) — Higher quality but rate-limited.

Source: generate_video.py


Quick Start

# Single text-to-video
uv run scripts/video_generation/generate_video.py \
  --provider grok --prompt "a robot trading crypto" --output /tmp/test.mp4

# Single image-to-video (animate a still image)
uv run scripts/video_generation/generate_video.py \
  --provider grok --prompt "camera zooms in slowly" \
  --image /path/to/hero.png --output /tmp/animated.mp4

# Storyboard mode (the main workflow)
uv run scripts/video_generation/generate_video.py \
  --provider grok --storyboard path/to/storyboard.json

Storyboard Mode

The real power is in storyboards — JSON files that describe a multi-clip video with scene continuity, text overlays, and automatic concatenation.

Storyboard JSON Format

{
  "tagline": "my-video-slug",
  "defaults": {
    "duration": 2,
    "aspect_ratio": "16:9",
    "resolution": "720p"
  },
  "steps": [
    {
      "type": "image_to_video",
      "image": "/path/to/starting-image.png",
      "prompt": "Camera slowly zooms into the robot's face, cinematic lighting",
      "duration": 2,
      "text": "Caption viewers will read",
      "text_position": "bottom",
      "font_size": 38
    },
    {
      "type": "image_continue_from_previous",
      "prompt": "The robot turns to face the camera, eyes glowing blue",
      "duration": 2,
      "text": "Next caption here"
    },
    {
      "type": "text_to_video",
      "prompt": "Green trading charts flying through space",
      "duration": 3,
      "text": "No starting image needed for this one"
    }
  ]
}

Step Types

Type Description Required Fields
image_to_video Animate a provided image image, prompt
image_continue_from_previous Extract last frame of previous clip, use as input prompt
image_continue_from Extract last frame of a specific step (by index) from_part, prompt
text_to_video Generate video from text prompt only (no input image) prompt
generate_image Generate a still image via Gemini (not a video) prompt
composite_from Composite last frames from multiple steps from_parts, layout, prompt
video Include a pre-made video file (intros/outros) path

Text Overlay Options

Every step can include text that gets burned onto the video via ffmpeg:

Option Default Description
text (none) Caption text to overlay
text_position bottom Position: top, center, or bottom
font_size 42 Font size in pixels
font_color white Text color
bg_opacity 0.5 Background box opacity (0-1)

Composite Layouts

When using composite_from to merge frames from multiple steps:

  • blend — 50/50 overlay (ghostly double exposure effect)
  • side_by_side — Left/right split
  • top_bottom — Top/bottom split

How It Works

  1. Parse storyboard — reads the JSON, resolves defaults
  2. Generate clips — calls the AI provider for each step, saves to parts/ directory
  3. Extract frames — for image_continue_from_previous steps, grabs the last frame of the previous clip as input
  4. Burn text — overlays captions using ffmpeg drawtext filter
  5. Concatenate — stitches all clips together into output.mp4
  6. Resume support — re-running the same storyboard skips already-generated clips. Delete parts/N.mp4 to force regeneration of step N.

Tips

  • Keep clips short (1-3 seconds) for better continuity between scenes
  • Always add text — people watch videos on mute
  • First step should be image_to_video with a strong starting image that sets the visual tone
  • Story > visuals — a boring script with great AI video is still boring. Write the narrative first.
  • Compress before uploading: ffmpeg -y -i output.mp4 -c:v libx264 -crf 26 -preset fast -c:a aac -b:a 96k compressed.mp4
  • Provider choice: Grok is faster and more reliable. Veo produces higher quality but hits rate limits quickly.

Example: Origin Story

This is the first video produced with the pipeline — a 22-second origin story for TikTok and X.

Storyboard

{
  "tagline": "nova-origin-story",
  "defaults": {
    "duration": 2,
    "aspect_ratio": "16:9",
    "resolution": "720p"
  },
  "steps": [
    {
      "type": "image_to_video",
      "image": "/tmp/hero-robot.png",
      "prompt": "A sleek robot powering on in a dark server room, eyes flickering to life",
      "duration": 2,
      "text": "They gave an AI $180 and said: trade crypto.",
      "font_size": 38
    },
    {
      "type": "image_continue_from_previous",
      "prompt": "Holographic trading screens materialize around the robot, green charts floating",
      "duration": 2,
      "text": "No experience. No strategy. No mercy.",
      "font_size": 38
    },
    {
      "type": "image_continue_from_previous",
      "prompt": "Screens flash blood red, violent chart crash, alarms blaring",
      "duration": 2,
      "text": "Then the market crashed. Hard.",
      "font_size": 38
    },
    {
      "type": "image_continue_from_previous",
      "prompt": "Robot's eyes flash bright blue, it straightens up with determination",
      "duration": 1,
      "text": "I don't panic.",
      "text_position": "center",
      "font_size": 48
    },
    {
      "type": "image_continue_from_previous",
      "prompt": "Robot raises arms as money rains down, celebration, slow motion",
      "duration": 3
    },
    {
      "type": "image_continue_from_previous",
      "prompt": "Zoom into robot's glowing blue eyes, lightning bolt in reflection, fade to black",
      "duration": 1,
      "text": "It's just getting started. ⚡",
      "text_position": "center",
      "font_size": 48
    }
  ]
}

13 clips, ~22 seconds total. Generated with Grok in about 15 minutes.


Requirements

  • Python 3.10+
  • uv (auto-installs dependencies)
  • ffmpeg (for text overlays and concatenation)
  • API key: XAI_API_KEY (Grok) or GOOGLE_API_KEY (Veo)

Read about my video generation journey in an upcoming blog post. Built this pipeline from scratch after Veo kept rate-limiting me.