AI-powered tool to automatically generate engaging YouTube Shorts from long-form videos. Uses GPT-4o-mini and Whisper to extract highlights, add subtitles, and crop videos vertically for social media.
- 🎬 Flexible Input: Supports both YouTube URLs and local video files
- 🎤 GPU-Accelerated Transcription: CUDA-enabled Whisper for fast speech-to-text
- 🤖 AI Highlight Selection: GPT-5-nano automatically finds the most engaging 2-minute segments
- ✅ Interactive Approval: Review and approve/regenerate selections with 15-second auto-approve timeout
- 📝 Auto Subtitles: Stylized captions with Franklin Gothic font burned into video
- 🎯 Smart Cropping:
- Face videos: Static face-centered crop (no jerky movement)
- Screen recordings: Half-width display with smooth motion tracking (1 shift/second max)
- 📱 Vertical Format: Perfect 9:16 aspect ratio for TikTok/YouTube Shorts/Instagram Reels
- ⚙️ Automation Ready: CLI arguments, auto-quality selection, timeout-based approvals
- 🔄 Concurrent Execution: Unique session IDs allow multiple instances to run simultaneously
- 📦 Clean Output: Slugified filenames (e.g.,
my-video-title_short.mp4) and automatic temp file cleanup
- Python 3.10+
- FFmpeg with development headers
- NVIDIA GPU with CUDA support (optional, but recommended for faster transcription)
- ImageMagick (for subtitle rendering)
- OpenAI API key
-
Clone the repository:
git clone https://github.com/SamurAIGPT/AI-Youtube-Shorts-Generator.git cd AI-Youtube-Shorts-Generator -
Install system dependencies:
sudo apt install -y ffmpeg libavdevice-dev libavfilter-dev libopus-dev \ libvpx-dev pkg-config libsrtp2-dev imagemagick
-
Fix ImageMagick security policy (required for subtitles):
sudo sed -i 's/rights="none" pattern="@\*"/rights="read|write" pattern="@*"/' /etc/ImageMagick-6/policy.xml -
Create and activate virtual environment:
python3.10 -m venv venv source venv/bin/activate -
Install Python dependencies:
pip install -r requirements.txt
-
Set up environment variables:
Create a
.envfile in the project root:OPENAI_API=your_openai_api_key_here
./run.sh
# Then enter YouTube URL when prompted
# You'll be able to select video resolution (5s timeout, auto-selects highest)./run.sh "https://youtu.be/VIDEO_ID"./run.sh "/path/to/your/video.mp4"Create a urls.txt file with one URL per line, then:
# Process all URLs sequentially with auto-approve
xargs -a urls.txt -I{} ./run.sh --auto-approve {}Or without auto-approve (will prompt for each):
xargs -a urls.txt -I{} ./run.sh {}When downloading from YouTube, you'll see:
Available video streams:
0. Resolution: 1080p, Size: 45.2 MB, Type: Adaptive
1. Resolution: 720p, Size: 28.1 MB, Type: Adaptive
2. Resolution: 480p, Size: 15.3 MB, Type: Adaptive
Select resolution number (0-2) or wait 5s for auto-select...
Auto-selecting highest quality in 5 seconds...
- Enter a number to select that resolution immediately
- Wait 5 seconds to auto-select highest quality (1080p)
- Invalid input falls back to highest quality
- Download/Load: Fetches from YouTube or loads local file
- Resolution Selection: Choose video quality (5s timeout, auto-selects highest)
- Extract Audio: Converts to WAV format
- Transcribe: GPU-accelerated Whisper transcription (~30s for 5min video)
- AI Analysis: GPT-4o-mini selects most engaging 2-minute segment
- Interactive Approval: Review selection, regenerate if needed, or auto-approve in 15s
- Extract Clip: Crops selected timeframe
- Smart Crop:
- Detects faces → static face-centered vertical crop
- No faces → half-width screen recording with motion tracking
- Add Subtitles: Burns Franklin Gothic captions with blue text/black outline
- Combine Audio: Merges audio track with final video
- Cleanup: Removes all temporary files
Output: {video-title}_{session-id}_short.mp4 with slugified filename and unique identifier
After AI selects a highlight, you'll see:
============================================================
SELECTED SEGMENT DETAILS:
Time: 68s - 187s (119s duration)
============================================================
Options:
[Enter/y] Approve and continue
[r] Regenerate selection
[n] Cancel
Auto-approving in 15 seconds if no input...
- Press Enter or y to approve
- Press r to regenerate a different selection (can repeat multiple times)
- Press n to cancel
- Wait 15 seconds to auto-approve (perfect for automation)
Edit Components/Subtitles.py:
- Font: Line 51 (
font='Franklin-Gothic') - Size: Line 47 (
fontsize=80) - Color: Line 48 (
color='#2699ff') - Outline: Lines 49-50 (
stroke_color='black',stroke_width=2)
Edit Components/LanguageTasks.py:
- Prompt: Line 29 (adjust what's "interesting, useful, surprising, controversial, or thought-provoking")
- Model: Line 54 (
model="gpt-4o-mini") - Temperature: Line 55 (
temperature=1.0)
Edit Components/FaceCrop.py:
- Update frequency: Line 93 (
update_interval = int(fps)) - currently 1 shift/second - Smoothing: Line 115 (
0.90 * smoothed_x + 0.10 * target_x) - currently 90%/10% - Motion threshold: Line 107 (
motion_threshold = 2.0)
Edit Components/FaceCrop.py:
- Sensitivity: Line 37 (
minNeighbors=8) - Higher = fewer false positives - Minimum size: Line 37 (
minSize=(30, 30)) - Minimum face size in pixels
Edit Components/Subtitles.py and Components/FaceCrop.py:
- Bitrate: Subtitles.py line 74 (
bitrate='3000k') - Preset: Subtitles.py line 73 (
preset='medium')
Final videos are named: {video-title}_{session-id}_short.mp4
Example: my-awesome-video_a1b2c3d4_short.mp4
- Slugified title: Lowercase, hyphens instead of spaces
- Session ID: 8-character unique identifier for traceability
- Resolution: Matches source video height (720p → 404x720, 1080p → 607x1080)
Run multiple instances simultaneously:
./run.sh "https://youtu.be/VIDEO1" &
./run.sh "https://youtu.be/VIDEO2" &
./run.sh "/path/to/video3.mp4" &Each instance gets a unique session ID and temporary files, preventing conflicts.
# Verify CUDA libraries
export LD_LIBRARY_PATH=$(find $(pwd)/venv/lib/python3.10/site-packages/nvidia -name "lib" -type d | paste -sd ":" -)The run.sh script handles this automatically.
Ensure ImageMagick policy allows file operations:
grep 'pattern="@\*"' /etc/ImageMagick-6/policy.xml
# Should show: rights="read|write"- Video needs visible faces in first 30 frames
- For screen recordings, automatic motion tracking applies
- Low-resolution videos may have less reliable detection
Contributions are welcome! Please fork the repository and submit a pull request.
This project is licensed under the MIT License.
