Turn a long video into multiple vertical Reels with viral potential — automatically.
ReelGenie scores every second of a source video on audio energy, motion intensity, scene-change density, and face presence, picks the strongest non-overlapping peaks, and renders each as a 9:16 Reel structured around a build → drop. It's a single-file Python CLI that wraps ffmpeg, librosa, OpenCV, and PySceneDetect.
Originally built for trail-running and trekking footage, but it works on any long-form video where you want to find the most shareable moments — talks, vlogs, sports highlights, event recordings.
Manually scrubbing through a 45-minute video to find 5–10 viral-worthy clips takes hours. Generic auto-clippers (chop every N seconds, or rely on subtitles) miss the moments humans actually care about. ReelGenie scores the four signals that do correlate with viral engagement:
- Audio spikes — crowd cheers, exclamations, music swells, finish-line celebrations.
- Motion peaks — action moments, big movement, energy.
- Scene-change density — bursts of fast cuts (variety = engagement).
- Face presence — emotional close-ups consistently outperform empty landscapes.
The signals are weight-blended into one "interest score", smoothed, and the top-N non-overlapping peaks become reels.
pip install -r requirements.txt
python reelgenie.py path/to/video.mp4 --n-reels 5 --duration 15 --out reels/External dependency: ffmpeg and ffprobe on PATH.
On a 45-minute file, a full render takes ~3–5 minutes. Always start with --dry-run to verify the auto-picks before burning encode time:
python reelgenie.py video.mp4 --dry-runIt prints the chosen timestamps and scores. If they're wrong, tune the weights (see below) and dry-run again.
Each signal has a weight (must roughly sum to 1.0):
python reelgenie.py video.mp4 \
--w-audio 0.4 --w-motion 0.3 --w-scene 0.2 --w-face 0.1Defaults work well for sports / outdoor / event footage. For different content:
| Content type | Suggested weights |
|---|---|
| Sports, races, events | --w-audio 0.4 --w-motion 0.3 --w-scene 0.2 --w-face 0.1 (default) |
| Talking-head / vlog | --w-audio 0.5 --w-motion 0.1 --w-scene 0.1 --w-face 0.3 |
| Travel / b-roll heavy | --w-motion 0.3 --w-scene 0.4 --w-audio 0.2 --w-face 0.1 |
| Music performance | --w-audio 0.6 --w-motion 0.2 --w-scene 0.2 --w-face 0.0 |
Other knobs:
--duration 20for longer reels (default 15s)--n-reels 10for more clips--captions captions.jsonfor per-reel text overlays (seeexamples/captions.example.json)
- Smart crop to 9:16 around the largest detected face in a frame near the peak (or center if none).
- Build segment (first ~40% of the reel): slight slow-motion, desaturation, vignette — anticipatory.
- Hard cut at the peak.
- Drop segment (last ~60%): full saturation and contrast — payoff.
No audio is muxed into the output. Add trending audio in the Instagram/TikTok app — embedding copyrighted tracks loses you the algorithmic boost and risks a copyright strike.
reels/
├── reel_01_at_142s.mp4
├── reel_02_at_487s.mp4
├── reel_03_at_1230s.mp4
├── ...
└── manifest.json # picked timestamps, scores, face flags — for re-renders
The manifest lets you iterate on captions and timings without rescoring the source.
What it does well: surfaces the attention-grabbing moments in long-form video. What it can't do (yet):
- Understand semantic content. It doesn't know which moments are about the climb vs. the descent. Adding a CLIP-based prompt scorer (e.g. "person crossing finish line", "summit reveal") would be the single biggest accuracy lever — flag-gated optional dependency.
- Catch verbal hype moments ("we did it", "let's go"). Whisper transcription + keyword spotting is the natural addition.
- Sync cuts to a target song's beats.
librosa.beat.beat_trackon a provided audio file would let cuts land on downbeats. - Track a moving subject across the segment. Current smart-crop samples one frame; people walking out of frame need a per-frame tracker.
See AGENTS.md if you're an LLM or new contributor picking this up.
PRs welcome — see AGENTS.md for architecture, conventions, and pitfalls already hit. Keep the script single-file and dependency-light. Optional ML features should be lazy-imported behind CLI flags so the base install stays minimal.
MIT — see LICENSE.