Documentation · v0.2

Belarebia, end to end.

Everything you need to translate any video into any language: the web UI, the CLI, the HTTP API, the pipeline, and the cost model. Open source under MIT — the link to the source is at the top right of every page.

Overview

Belarebia is a translation pipeline that takes a video — either a YouTube URL or a local file — and outputs three artefacts:

HD MP4
Subtitles burned in. Ready to upload, react, dub.
.srt
Standard subtitle file. Re-burn elsewhere if you want.
.txt
Plain transcript in the target language for search/notes.

The engine is one Python package (belarebia) that exposes both a terminal CLI and a local FastAPI server. The web app you're reading talks to that server. You can run them on the same machine (the default), or on different machines (set NEXT_PUBLIC_API_BASE).

Quick start

You need yt-dlp, ffmpeg (built with libass + videotoolbox on macOS), Python ≥ 3.10, and a free Gemini API key.
# 1. install system tools
brew install yt-dlp ffmpeg                # macOS
# (apt install yt-dlp ffmpeg on Debian/Ubuntu)

# 2. clone + install the engine
git clone https://github.com/miloudbelarebia/belarebia
cd belarebia/api
python -m venv .venv && source .venv/bin/activate
pip install -e .

# 3. set your key + run
export GEMINI_API_KEY=AIza…
belarebia "https://youtube.com/watch?v=…"

Output goes to ~/Desktop/belarebia_<slug>/. Pick the target language with --language "French" (or any other — see the full list in Supported languages).

Web UI

The Next.js dashboard at /tools/youtube drives the same engine over HTTP. Boot both servers at once:

./scripts/dev.sh
# → API   on http://127.0.0.1:8765
# → Web   on http://localhost:3000

The page has three steps: paste your Gemini key, give a YouTube URL or drop a local file, choose a language. The progress is streamed live via Server-Sent Events. When the job finishes you get three download buttons.

Bring your own key, securely

The browser sends your key in the X-Gemini-Key header on every job. The local FastAPI server holds it in memory only — never on disk, never in a log. When the job finishes the variable is dropped. If you tick "Remember on this device", the key lives in your browser's localStorage only.

CLI reference

belarebia URL [options]                       # YouTube URL
belarebia local.mp4 [options]                 # not yet — upload via the web UI

# Common runs
belarebia URL --language "French"
belarebia URL --language "Modern Standard Arabic" --mode bilingual
belarebia URL --language "Mandarin Chinese (Simplified)" --quality 1440
belarebia URL --context "Sony AI Project Ace announcement"
belarebia URL --list-qualities

# Run the web server
belarebia-web                                  # bind 127.0.0.1:8765

All flags

FlagWhat it does
--languageFree-form target language (e.g. "French", "Mandarin Chinese (Simplified)").
--modetarget (default) or bilingual (English on top + target underneath).
--qualityMax video height. 144 / 240 / 360 / 480 / 720 / 1080 / 1440 / 2160. Default 1080.
--list-qualitiesInspect what's available for a URL without downloading.
--out-dirWhere to write the result. Default ~/Desktop.
--contextShort context hint passed to the LLM. Helps proper-noun spelling.
--fontOverride the auto-picked subtitle font.
--env-file.env path with GEMINI_API_KEY. Lowest priority (env wins).
--keep-intermediatesKeep audio.m4a, .ass, transcription chunks for debugging.

HTTP API

The FastAPI server exposes everything the web UI uses. Bind defaults to loopback so you can keep your key local. X-Gemini-Key header is required on POST endpoints.

Method · PathPurpose
GET /Standalone single-file HTML UI (works without the Next.js site).
GET /api/languagesCurated list of 85 target languages, grouped.
POST /api/probe{ url }{ title, channel, duration, qualities[] } via yt-dlp.
POST /jobsStart a job from a YouTube URL.
POST /jobs/uploadStart a job from an uploaded local file (multipart).
GET /jobs/{id}/eventsServer-Sent Events stream of phase updates.
GET /jobs/{id}/download/{kind}kind is mp4, srt, or txt.

Example request

curl -X POST http://127.0.0.1:8765/jobs \
  -H 'Content-Type: application/json' \
  -H 'X-Gemini-Key: AIza…' \
  -d '{
    "url": "https://www.youtube.com/watch?v=…",
    "language": "French",
    "mode": "target",
    "quality": 1080,
    "context": "Sony AI announcement"
  }'

# {"job_id": "abc123def456"}

# subscribe to progress
curl -N http://127.0.0.1:8765/jobs/abc123def456/events

# when done, three downloads:
# /jobs/abc123def456/download/mp4
# /jobs/abc123def456/download/srt
# /jobs/abc123def456/download/txt

How it works

One pipeline, five phases:

  1. Downloadyt-dlp grabs the best avc1 + m4a streams at your chosen height, or we accept a multipart upload.
  2. Audio extractffmpeg -acodec copy strips the audio without re-encoding (~5 s per hour of video).
  3. Speech-to-text — Gemini 2.5 Flash with a JSON-schema-constrained response that emits { start, end, text } segments. Past 25 minutes the engine chunks the audio into 8-min slices, transcribes each, then offsets timestamps and merges.
  4. Translate — Gemini 2.5 Pro translates segments in batches of 50, ID-mapped so order/length never drifts. Prompt asks for natural spoken target language, not formal translation.
  5. Burn + encode — libass renders the .ass file (font auto-picked: Geeza Pro for Arabic shaping, PingFang for CJK, Helvetica otherwise), h264_videotoolbox encodes HD on Apple Silicon at ~7-10× realtime.

Supported languages

The dropdown ships with 85 languages across 10 regional groups. The--language flag is free-form, so anything Gemini can write in works — including dialects, transliterations, and constructed scripts. The list is in api/belarebia/languages.py.

Coverage

  • European: French, Spanish, Portuguese, Italian, German, Dutch, Russian, Polish, Greek, Turkish, plus 10 more
  • Asian: Mandarin (Simplified + Traditional), Cantonese, Japanese, Korean, Hindi, Bengali, Tamil, Thai, Vietnamese, Indonesian
  • Middle Eastern: Modern Standard Arabic, Persian (Farsi), Hebrew, Urdu, Kurdish
  • African: Swahili, Amharic, Hausa, Yoruba, Igbo, Zulu, Wolof, Afrikaans
  • Regional dialects + minority scripts: Maghrebi Arabic dialects, Berber (Tifinagh + Latin), Quechua, Haitian Creole, etc.

Pricing model

Self-host = free forever, you pay only Gemini for what you call. Hosted = pass-through cost with a transparent margin so we can keep the servers running.

Per-hour cost breakdown

Gemini 2.5 Flash speech-to-text   $0.19   +20% buffer = $0.23
Gemini 2.5 Pro   translation      $0.90   +20% buffer = $1.08
AWS Fargate compute (ffmpeg)      $0.30   +20% buffer = $0.36
S3 + CloudFront egress            $0.05   +20% buffer = $0.06
                                          ────────────
Cost (with buffer)                        $1.73 / hour
+ 20% Belarebia margin                    +$0.35
                                          ────────────
Hosted price                              $2.08 / hour

Each line carries a 20% buffer to absorb token spikes and infra variability; the 20% margin is what Belarebia keeps after Gemini, AWS, and Stripe fees. Total padding is 1.20 × 1.20 = 1.44× the raw provider cost. If providers drop their prices, ours drop too — see the live math on the pricing page.

Self-hosting

The engine works on macOS (best — uses h264_videotoolbox) and Linux (CPU encode via libx264, slower). Below is the basic install; for long-running deployments wrap it in systemd or Docker.

macOS

brew install yt-dlp ffmpeg
git clone https://github.com/miloudbelarebia/belarebia
cd belarebia/api
python -m venv .venv && source .venv/bin/activate
pip install -e .
export GEMINI_API_KEY=AIza…
belarebia URL

Linux (Debian/Ubuntu)

sudo apt install yt-dlp ffmpeg python3-venv
git clone https://github.com/miloudbelarebia/belarebia
cd belarebia/api
python3 -m venv .venv && source .venv/bin/activate
pip install -e .
# In api/belarebia/pipeline.py, swap h264_videotoolbox → libx264
# (or set CODEC env var once we ship the abstraction)
export GEMINI_API_KEY=AIza…
belarebia URL

Cross-origin (run the dashboard against a remote API)

belarebia-web binds 127.0.0.1 by default. To run the Next.js site against a remote engine, override the bind and add the site origin to the CORS allowlist in api/belarebia/server.py.

Troubleshooting

"Connection to API lost" in the dashboard

The Next.js page expects the FastAPI server athttp://127.0.0.1:8765. Run belarebia-web in a separate terminal, or use ./scripts/dev.sh to boot both.

Subtitles render as boxes (tofu)

Your system is missing the auto-picked font. Pass --font "Helvetica Neue" or whatever you have. For Arabic, install "Geeza Pro" or "Noto Naskh Arabic".

Timestamps drift on long videos

Past 25 min the engine should chunk automatically. If you tweakedCHUNK_SECONDS_THRESHOLD, lower it or run --keep-intermediatesto inspect _chunks/.

Got a permission error from yt-dlp

YouTube changes formats often. Update yt-dlp:pip install -U yt-dlp.

Roadmap & contributing

The repo is MIT and PRs are welcome. The next things on the list:

  • Generic --input local.mp4 for the CLI (web UI already does upload).
  • Dub-back: TTS the translated track so you also get an audio-translated MP4.
  • Hugging Face Whisper as a fallback STT for offline self-hosting.
  • Docker image for one-line Linux deployment.
  • Speaker diarization for multi-voice videos.

Issues, PRs, or just sharing what you built with it →github.com/miloudbelarebia/belarebia. Direct contact:belarebia@2pidata.fr.

Belarebia is a project by 2pidata.fr.