Skip to content

thecodearrow/ReelGenie

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ReelGenie

Turn a long video into multiple vertical Reels with viral potential — automatically.

ReelGenie scores every second of a source video on audio energy, motion intensity, scene-change density, and face presence, picks the strongest non-overlapping peaks, and renders each as a 9:16 Reel structured around a build → drop. It's a single-file Python CLI that wraps ffmpeg, librosa, OpenCV, and PySceneDetect.

Originally built for trail-running and trekking footage, but it works on any long-form video where you want to find the most shareable moments — talks, vlogs, sports highlights, event recordings.

Why this exists

Manually scrubbing through a 45-minute video to find 5–10 viral-worthy clips takes hours. Generic auto-clippers (chop every N seconds, or rely on subtitles) miss the moments humans actually care about. ReelGenie scores the four signals that do correlate with viral engagement:

  • Audio spikes — crowd cheers, exclamations, music swells, finish-line celebrations.
  • Motion peaks — action moments, big movement, energy.
  • Scene-change density — bursts of fast cuts (variety = engagement).
  • Face presence — emotional close-ups consistently outperform empty landscapes.

The signals are weight-blended into one "interest score", smoothed, and the top-N non-overlapping peaks become reels.

Quick start

pip install -r requirements.txt
python reelgenie.py path/to/video.mp4 --n-reels 5 --duration 15 --out reels/

External dependency: ffmpeg and ffprobe on PATH.

Dry run first

On a 45-minute file, a full render takes ~3–5 minutes. Always start with --dry-run to verify the auto-picks before burning encode time:

python reelgenie.py video.mp4 --dry-run

It prints the chosen timestamps and scores. If they're wrong, tune the weights (see below) and dry-run again.

Tuning

Each signal has a weight (must roughly sum to 1.0):

python reelgenie.py video.mp4 \
  --w-audio 0.4 --w-motion 0.3 --w-scene 0.2 --w-face 0.1

Defaults work well for sports / outdoor / event footage. For different content:

Content type Suggested weights
Sports, races, events --w-audio 0.4 --w-motion 0.3 --w-scene 0.2 --w-face 0.1 (default)
Talking-head / vlog --w-audio 0.5 --w-motion 0.1 --w-scene 0.1 --w-face 0.3
Travel / b-roll heavy --w-motion 0.3 --w-scene 0.4 --w-audio 0.2 --w-face 0.1
Music performance --w-audio 0.6 --w-motion 0.2 --w-scene 0.2 --w-face 0.0

Other knobs:

  • --duration 20 for longer reels (default 15s)
  • --n-reels 10 for more clips
  • --captions captions.json for per-reel text overlays (see examples/captions.example.json)

How it composes each reel

  1. Smart crop to 9:16 around the largest detected face in a frame near the peak (or center if none).
  2. Build segment (first ~40% of the reel): slight slow-motion, desaturation, vignette — anticipatory.
  3. Hard cut at the peak.
  4. Drop segment (last ~60%): full saturation and contrast — payoff.

No audio is muxed into the output. Add trending audio in the Instagram/TikTok app — embedding copyrighted tracks loses you the algorithmic boost and risks a copyright strike.

Output

reels/
├── reel_01_at_142s.mp4
├── reel_02_at_487s.mp4
├── reel_03_at_1230s.mp4
├── ...
└── manifest.json    # picked timestamps, scores, face flags — for re-renders

The manifest lets you iterate on captions and timings without rescoring the source.

Limitations and roadmap

What it does well: surfaces the attention-grabbing moments in long-form video. What it can't do (yet):

  • Understand semantic content. It doesn't know which moments are about the climb vs. the descent. Adding a CLIP-based prompt scorer (e.g. "person crossing finish line", "summit reveal") would be the single biggest accuracy lever — flag-gated optional dependency.
  • Catch verbal hype moments ("we did it", "let's go"). Whisper transcription + keyword spotting is the natural addition.
  • Sync cuts to a target song's beats. librosa.beat.beat_track on a provided audio file would let cuts land on downbeats.
  • Track a moving subject across the segment. Current smart-crop samples one frame; people walking out of frame need a per-frame tracker.

See AGENTS.md if you're an LLM or new contributor picking this up.

Contributing

PRs welcome — see AGENTS.md for architecture, conventions, and pitfalls already hit. Keep the script single-file and dependency-light. Optional ML features should be lazy-imported behind CLI flags so the base install stays minimal.

License

MIT — see LICENSE.

About

Building a tool that turns long-form YouTube videos into short-form reels automatically. I had hours of unused running and travel footage because editing clips manually took too much time. This tool finds the best moments, creates vertical reels, adds captions/transitions, and generates ready-to-post content with minimal editing effort.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages