Belarebia, end to end.
Everything you need to translate any video into any language: the web UI, the CLI, the HTTP API, the pipeline, and the cost model. Open source under MIT — the link to the source is at the top right of every page.
Overview
Belarebia is a translation pipeline that takes a video — either a YouTube URL or a local file — and outputs three artefacts:
The engine is one Python package (belarebia) that exposes both a terminal CLI and a local FastAPI server. The web app you're reading talks to that server. You can run them on the same machine (the default), or on different machines (set NEXT_PUBLIC_API_BASE).
Quick start
yt-dlp, ffmpeg (built with libass + videotoolbox on macOS), Python ≥ 3.10, and a free Gemini API key.# 1. install system tools brew install yt-dlp ffmpeg # macOS # (apt install yt-dlp ffmpeg on Debian/Ubuntu) # 2. clone + install the engine git clone https://github.com/miloudbelarebia/belarebia cd belarebia/api python -m venv .venv && source .venv/bin/activate pip install -e . # 3. set your key + run export GEMINI_API_KEY=AIza… belarebia "https://youtube.com/watch?v=…"
Output goes to ~/Desktop/belarebia_<slug>/. Pick the target language with --language "French" (or any other — see the full list in Supported languages).
Web UI
The Next.js dashboard at /tools/youtube drives the same engine over HTTP. Boot both servers at once:
./scripts/dev.sh # → API on http://127.0.0.1:8765 # → Web on http://localhost:3000
The page has three steps: paste your Gemini key, give a YouTube URL or drop a local file, choose a language. The progress is streamed live via Server-Sent Events. When the job finishes you get three download buttons.
Bring your own key, securely
The browser sends your key in the X-Gemini-Key header on every job. The local FastAPI server holds it in memory only — never on disk, never in a log. When the job finishes the variable is dropped. If you tick "Remember on this device", the key lives in your browser's localStorage only.
CLI reference
belarebia URL [options] # YouTube URL belarebia local.mp4 [options] # not yet — upload via the web UI # Common runs belarebia URL --language "French" belarebia URL --language "Modern Standard Arabic" --mode bilingual belarebia URL --language "Mandarin Chinese (Simplified)" --quality 1440 belarebia URL --context "Sony AI Project Ace announcement" belarebia URL --list-qualities # Run the web server belarebia-web # bind 127.0.0.1:8765
All flags
| Flag | What it does |
|---|---|
--language | Free-form target language (e.g. "French", "Mandarin Chinese (Simplified)"). |
--mode | target (default) or bilingual (English on top + target underneath). |
--quality | Max video height. 144 / 240 / 360 / 480 / 720 / 1080 / 1440 / 2160. Default 1080. |
--list-qualities | Inspect what's available for a URL without downloading. |
--out-dir | Where to write the result. Default ~/Desktop. |
--context | Short context hint passed to the LLM. Helps proper-noun spelling. |
--font | Override the auto-picked subtitle font. |
--env-file | .env path with GEMINI_API_KEY. Lowest priority (env wins). |
--keep-intermediates | Keep audio.m4a, .ass, transcription chunks for debugging. |
HTTP API
The FastAPI server exposes everything the web UI uses. Bind defaults to loopback so you can keep your key local. X-Gemini-Key header is required on POST endpoints.
| Method · Path | Purpose |
|---|---|
| GET / | Standalone single-file HTML UI (works without the Next.js site). |
| GET /api/languages | Curated list of 85 target languages, grouped. |
| POST /api/probe | { url } → { title, channel, duration, qualities[] } via yt-dlp. |
| POST /jobs | Start a job from a YouTube URL. |
| POST /jobs/upload | Start a job from an uploaded local file (multipart). |
| GET /jobs/{id}/events | Server-Sent Events stream of phase updates. |
| GET /jobs/{id}/download/{kind} | kind is mp4, srt, or txt. |
Example request
curl -X POST http://127.0.0.1:8765/jobs \
-H 'Content-Type: application/json' \
-H 'X-Gemini-Key: AIza…' \
-d '{
"url": "https://www.youtube.com/watch?v=…",
"language": "French",
"mode": "target",
"quality": 1080,
"context": "Sony AI announcement"
}'
# {"job_id": "abc123def456"}
# subscribe to progress
curl -N http://127.0.0.1:8765/jobs/abc123def456/events
# when done, three downloads:
# /jobs/abc123def456/download/mp4
# /jobs/abc123def456/download/srt
# /jobs/abc123def456/download/txtHow it works
One pipeline, five phases:
- Download —
yt-dlpgrabs the best avc1 + m4a streams at your chosen height, or we accept a multipart upload. - Audio extract —
ffmpeg -acodec copystrips the audio without re-encoding (~5 s per hour of video). - Speech-to-text — Gemini 2.5 Flash with a JSON-schema-constrained response that emits
{ start, end, text }segments. Past 25 minutes the engine chunks the audio into 8-min slices, transcribes each, then offsets timestamps and merges. - Translate — Gemini 2.5 Pro translates segments in batches of 50, ID-mapped so order/length never drifts. Prompt asks for natural spoken target language, not formal translation.
- Burn + encode — libass renders the .ass file (font auto-picked: Geeza Pro for Arabic shaping, PingFang for CJK, Helvetica otherwise),
h264_videotoolboxencodes HD on Apple Silicon at ~7-10× realtime.
Supported languages
The dropdown ships with 85 languages across 10 regional groups. The--language flag is free-form, so anything Gemini can write in works — including dialects, transliterations, and constructed scripts. The list is in api/belarebia/languages.py.
Coverage
- European: French, Spanish, Portuguese, Italian, German, Dutch, Russian, Polish, Greek, Turkish, plus 10 more
- Asian: Mandarin (Simplified + Traditional), Cantonese, Japanese, Korean, Hindi, Bengali, Tamil, Thai, Vietnamese, Indonesian
- Middle Eastern: Modern Standard Arabic, Persian (Farsi), Hebrew, Urdu, Kurdish
- African: Swahili, Amharic, Hausa, Yoruba, Igbo, Zulu, Wolof, Afrikaans
- Regional dialects + minority scripts: Maghrebi Arabic dialects, Berber (Tifinagh + Latin), Quechua, Haitian Creole, etc.
Pricing model
Self-host = free forever, you pay only Gemini for what you call. Hosted = pass-through cost with a transparent margin so we can keep the servers running.
Per-hour cost breakdown
Gemini 2.5 Flash speech-to-text $0.19 +20% buffer = $0.23
Gemini 2.5 Pro translation $0.90 +20% buffer = $1.08
AWS Fargate compute (ffmpeg) $0.30 +20% buffer = $0.36
S3 + CloudFront egress $0.05 +20% buffer = $0.06
────────────
Cost (with buffer) $1.73 / hour
+ 20% Belarebia margin +$0.35
────────────
Hosted price $2.08 / hourEach line carries a 20% buffer to absorb token spikes and infra variability; the 20% margin is what Belarebia keeps after Gemini, AWS, and Stripe fees. Total padding is 1.20 × 1.20 = 1.44× the raw provider cost. If providers drop their prices, ours drop too — see the live math on the pricing page.
Self-hosting
The engine works on macOS (best — uses h264_videotoolbox) and Linux (CPU encode via libx264, slower). Below is the basic install; for long-running deployments wrap it in systemd or Docker.
macOS
brew install yt-dlp ffmpeg git clone https://github.com/miloudbelarebia/belarebia cd belarebia/api python -m venv .venv && source .venv/bin/activate pip install -e . export GEMINI_API_KEY=AIza… belarebia URL
Linux (Debian/Ubuntu)
sudo apt install yt-dlp ffmpeg python3-venv git clone https://github.com/miloudbelarebia/belarebia cd belarebia/api python3 -m venv .venv && source .venv/bin/activate pip install -e . # In api/belarebia/pipeline.py, swap h264_videotoolbox → libx264 # (or set CODEC env var once we ship the abstraction) export GEMINI_API_KEY=AIza… belarebia URL
Cross-origin (run the dashboard against a remote API)
belarebia-web binds 127.0.0.1 by default. To run the Next.js site against a remote engine, override the bind and add the site origin to the CORS allowlist in api/belarebia/server.py.
Troubleshooting
"Connection to API lost" in the dashboard
The Next.js page expects the FastAPI server athttp://127.0.0.1:8765. Run belarebia-web in a separate terminal, or use ./scripts/dev.sh to boot both.
Subtitles render as boxes (tofu)
Your system is missing the auto-picked font. Pass --font "Helvetica Neue" or whatever you have. For Arabic, install "Geeza Pro" or "Noto Naskh Arabic".
Timestamps drift on long videos
Past 25 min the engine should chunk automatically. If you tweakedCHUNK_SECONDS_THRESHOLD, lower it or run --keep-intermediatesto inspect _chunks/.
Got a permission error from yt-dlp
YouTube changes formats often. Update yt-dlp:pip install -U yt-dlp.
Roadmap & contributing
The repo is MIT and PRs are welcome. The next things on the list:
- Generic
--input local.mp4for the CLI (web UI already does upload). - Dub-back: TTS the translated track so you also get an audio-translated MP4.
- Hugging Face Whisper as a fallback STT for offline self-hosting.
- Docker image for one-line Linux deployment.
- Speaker diarization for multi-voice videos.
Issues, PRs, or just sharing what you built with it →github.com/miloudbelarebia/belarebia. Direct contact:belarebia@2pidata.fr.
Belarebia is a project by 2pidata.fr.