FFmpeg in the Browser: Launching Render Lab with DojoClip (Technical Deep Dive)

Welcome to Render Lab, our series on the tech that keeps DojoClip fast, private, and creator-friendly. This post goes beyond the high-level pitch and into the how: FFmpeg compiled to WebAssembly (WASM), where it shines, where it struggles, and how we pair it with MediaRecorder and WebCodecs for a hybrid, low-latency workflow.
TL;DR
- FFmpeg WASM equals precision plus feature coverage (filters, trims, remux) with local privacy.
- WebCodecs delivers the lowest-latency, hardware-assisted encode and decode, but needs a muxer.
- MediaRecorder provides the simplest real-time capture, perfect for previews but limited in control.
- We use a hybrid pipeline: WASM for exact transforms, WebCodecs and MediaRecorder for speedy previews and exports.
Concepts (2-Minute Primer)
- Container vs. codec: MP4, MKV, and WebM are containers; H.264, H.265, VP9, AV1, AAC, and Opus are codecs. Editing often needs decode then filter then encode then mux.
- Transcode vs. remux: Transcoding re-encodes (quality and size tradeoff). Remuxing changes the container without touching compressed streams (fast and lossless).
- CFR vs. VFR: Constant vs. variable frame rate. Web capture is often VFR; editing can prefer CFR for frame-accurate seeking.
- Keyframes (IDR): Cuts and edits snap to keyframes unless you transcode or use smart-render strategies.
- CRF and bitrate: Quality knobs. Lower CRF increases quality and size. Bitrate caps throughput for streaming targets.
Architecture at a Glance
[File Input]
└─▶ Browser FS (OPFS / RAM)
├─▶ FFmpeg WASM worker (precision filters, remux, waveform)
├─▶ WebCodecs (fast decode/encode; preview/export)
└─▶ MediaRecorder (real-time canvas/tab capture)
▼
[Muxer] → MP4/WebM → Download / OPFS / Upload
Why WASM?
It brings most of FFmpeg's CLI to the browser. This unlocks frame-accurate trimming, complex filter graphs, and audio channel operations while keeping processing local for privacy-sensitive media.
Why not only WASM?
Long encodes stress memory and CPU, the first load is several megabytes, and threads or SIMD require cross-origin isolation headers.
Why WebCodecs and MediaRecorder too?
WebCodecs taps platform encoders and decoders for speed. MediaRecorder is trivial for live capture and quick proxies.
Deploying FFmpeg WASM Correctly
1. Enable Threads and SIMD (Real-World Gains)
Threads plus SIMD usually bring 1.5 to 3 times speedups depending on workload. They require cross-origin isolation headers.
// next.config.js (or middleware/headers)
const securityHeaders = [
{ key: 'Cross-Origin-Opener-Policy', value: 'same-origin' },
{ key: 'Cross-Origin-Embedder-Policy', value: 'require-corp' },
{ key: 'Cross-Origin-Resource-Policy', value: 'same-site' },
];
module.exports = {
async headers() {
return [{ source: '/:path*', headers: securityHeaders }];
},
};
2. Load the Core in a Worker
Avoid blocking the UI thread and keep FFmpeg memory isolated.
// ffmpeg.worker.ts
import { createFFmpeg, fetchFile } from '@ffmpeg/ffmpeg';
const ffmpeg = createFFmpeg({ log: true, corePath: '/wasm/ffmpeg-core.js' });
self.onmessage = async (event) => {
const { name, file, args } = event.data;
if (!ffmpeg.isLoaded()) {
await ffmpeg.load();
}
ffmpeg.FS('writeFile', name, await fetchFile(file));
await ffmpeg.run(...args);
const out = ffmpeg.FS('readFile', 'out.bin');
self.postMessage({ ok: true, data: out.buffer }, [out.buffer]);
};
// UI side
const worker = new Worker(new URL('./ffmpeg.worker.ts', import.meta.url));
worker.postMessage({
name: 'in.mp4',
file,
args: ['-i', 'in.mp4', '-vn', '-acodec', 'copy', 'out.bin'],
});
worker.onmessage = ({ data }) => {
const blob = new Blob([data.data], { type: 'audio/mp4' });
download(blob, 'audio.m4a');
};
3. Store Large Intermediates in OPFS
The Origin Private File System avoids huge RAM spikes.
const root = await navigator.storage.getDirectory();
const fileHandle = await root.getFileHandle('clip.m4a', { create: true });
const writer = await fileHandle.createWritable();
await writer.write(blob);
await writer.close();
Everyday Tasks (Copy-Paste FFmpeg Recipes)
Audio extraction (bit-exact if AAC already):
ffmpeg -i input.mp4 -vn -acodec copy audio.m4a
Downscale plus CRF transcode (H.264 proxy):
ffmpeg -i input_4k.mp4 -vf scale=-2:1080 -c:v libx264 -preset veryfast -crf 23 -c:a aac -b:a 160k out_1080p.mp4
Frame-accurate segment (re-encode small window):
ffmpeg -ss 00:00:12.300 -to 00:00:19.000 -i input.mp4 -c:v libx264 -crf 20 -pix_fmt yuv420p -c:a aac slice.mp4
Waveform PNG for a timeline:
ffmpeg -i audio.m4a -lavfi showwavespic=s=1200x200:colors=white waveform.png
In WASM you pass the same arguments to ffmpeg.run(...)
. For pure remux (no transcode) keep -c copy
to preserve quality and speed.
WebCodecs: The Fast Lane
WebCodecs gives you direct access to platform decoders and encoders. You need a muxer to wrap encoded chunks into MP4 or WebM.
const fps = 30;
const encoder = new VideoEncoder({
output: handleChunk,
error: console.error,
});
encoder.configure({
codec: 'avc1.42E01E',
width: canvas.width,
height: canvas.height,
bitrate: 3_000_000,
framerate: fps,
});
let t0 = performance.now();
let frameIndex = 0;
const track = canvas.captureStream(fps).getVideoTracks()[0];
const reader = new MediaStreamTrackProcessor({ track }).readable.getReader();
async function pump() {
const { value: frame, done } = await reader.read();
if (done) {
await encoder.flush();
muxer.finalize();
return;
}
const timestamp = Math.floor((performance.now() - t0) * 1000);
const videoFrame = new VideoFrame(frame, { timestamp });
encoder.encode(videoFrame, { keyFrame: frameIndex % (fps * 2) === 0 });
frameIndex += 1;
videoFrame.close();
frame.close();
pump();
}
function handleChunk(chunk) {
muxer.addVideoChunk(chunk);
}
Muxing note: WebCodecs outputs elementary streams. Use a browser muxer library to produce a downloadable file. For previews, Media Source Extensions can stream chunks to a video
element.
Compared to FFmpeg WASM, encode latency is often lower and CPU usage smaller, especially on devices with hardware acceleration.
MediaRecorder: Zero-Fuss Proxies
const stream = canvas.captureStream(30);
const recorder = new MediaRecorder(stream, { mimeType: 'video/webm;codecs=vp9' });
const chunks = [];
recorder.ondataavailable = (event) => {
if (event.data.size) {
chunks.push(event.data);
}
};
recorder.onstop = () => {
const blob = new Blob(chunks, { type: recorder.mimeType });
download(blob, 'preview.webm');
};
recorder.start();
// ... render frames ...
recorder.stop();
Pros: Dead simple and great for quick reviews or social-media proxies. Cons: Limited control over GOP, CRF, or advanced filtering.
Choosing the Right Pipeline
Task | Precision Needed | Latency Target | Recommended |
---|---|---|---|
Audio extraction, remux | High | Low | FFmpeg WASM (-c copy ) |
Waveform or thumbnails | High | Low | FFmpeg WASM (filters) |
Frame-accurate cuts | High | Medium | FFmpeg WASM (small re-encode window) |
Live preview or screen capture | Medium | Very low | MediaRecorder |
Final exports from canvas timeline | Medium | Low | WebCodecs plus muxer |
Long offline transcodes | Medium | High | Native or server fallback |
Memory Math (Plan Before You Render)
- Decoded video frame size is roughly width times height times 1.5 bytes (YUV420p). Example: 1920x1080 is about 3.1 MB per frame. A small 120-frame buffer is roughly 372 MB.
- WASM heap growth can spike during filters such as resamplers or scalers. Keep intermediates on disk (OPFS) and stream when possible.
- Audio: 48 kHz stereo 16-bit PCM is about 192 KB per second. Five minutes equals roughly 57 MB if kept uncompressed in memory.
Practical tips
- Prefer remux (
-c copy
) over full transcode whenever possible. - Split long jobs into segments, persist to OPFS, then concatenate with FFmpeg.
- For previews, use WebCodecs to avoid holding many raw frames in JavaScript memory.
Reproducible Benchmarks (Fill In Before Publishing)
Run these in a clean, production build. Record device and browser versions.
-
Audio extraction (remux)
- Input: MP4 (H.264 plus AAC), 1 to 2 minutes.
- Command:
-i in.mp4 -vn -acodec copy out.m4a
- Metrics: Wall time (milliseconds), peak JS heap (MB), WASM heap (MB), output size.
-
Proxy transcode (4K to 1080p)
- Command:
-vf scale=-2:1080 -c:v libx264 -preset veryfast -crf 23 -c:a aac -b:a 160k
- Compare: FFmpeg WASM vs. WebCodecs with similar bitrate and framerate. Record time, CPU percentage, dropped frames.
- Command:
-
Canvas timeline export
- Method A: WebCodecs encode plus MP4 muxer.
- Method B: MediaRecorder capture at 30 fps.
- Metrics: Encode FPS, total time, output bitrate or size, visual quality (SSIM or PSNR if you can compare to a ground truth).
Template table (replace N/A with measured values):
Device / Browser | Test | Pipeline | Time (s) | CPU avg (%) | Peak Mem (MB) | Notes |
---|---|---|---|---|---|---|
M2 Pro / Chrome 128 | Audio extract | FFmpeg WASM | Threads plus SIMD | |||
M2 Pro / Chrome 128 | 4K to 1080p | FFmpeg WASM | Threads plus SIMD | |||
M2 Pro / Chrome 128 | 4K to 1080p | WebCodecs | H.264 hardware | |||
Pixel 8 / Chrome | Canvas export | MediaRecorder | VP9 |
Measurement helpers
const perf = new PerformanceObserver((list) => {
for (const entry of list.getEntries()) {
console.log(entry.name, entry.duration);
}
});
perf.observe({ entryTypes: ['measure'] });
const memory = performance.memory; // Chrome only: usedJSHeapSize and totalJSHeapSize
Handling Common Pitfalls
- Slow-motion exports: Supply monotonically increasing timestamps to WebCodecs, or encode CFR at a fixed fps. Avoid relying on
requestAnimationFrame
timing alone. - Audio and video drift: Keep audio as the timing source. Clamp video PTS to the audio-derived timeline or resample audio to your project rate.
- Green frames or color shifts: Ensure pixel format (for example, RGBA to I420) and color space info are set consistently when muxing.
- Safari gaps: WebCodecs support is partial; fall back to MediaRecorder or WASM H.264.
- Threading disabled: If
SharedArrayBuffer
is not available (no COOP or COEP), load the single-threaded WASM core to avoid runtime errors.
Our Hybrid in DojoClip (Today)
- FFmpeg WASM for audio extraction, waveform and thumbnails, precise trims, remux, and subtitle prep.
- WebCodecs for canvas timeline previews and final exports where supported.
- MediaRecorder for instant proxies, quick-share videos, and tab capture demos.
- Storage: OPFS for intermediates and resumable exports; memory for small jobs.
We will publish a follow-up with real benchmark tables from multiple devices and browsers once we lock down the current encoder parameters.
Appendix A — Safe Defaults
- H.264 (web):
-preset veryfast -crf 23 -maxrate 4M -bufsize 8M -pix_fmt yuv420p
- Audio:
-c:a aac -b:a 160k
(music),-b:a 96k
(speech-heavy) - Seek then transcode (faster):
-ss
before-i input ...
when not frame-accurate; otherwise-i input -ss
after. - Subtitles: Keep text as
.srt
or.vtt
sidecar files where possible; burn in only for final deliverables.
Appendix B — Feature Map (Partial)
Capability | FFmpeg WASM | WebCodecs | MediaRecorder |
---|---|---|---|
Exact filter graphs | Yes | No | No |
Hardware acceleration | No (CPU) | Yes | Yes |
Precise CFR or VFR control | Yes | Yes | Limited |
Real-time capture | No | Yes (decode or encode) | Yes |
MP4 mux built-in | Yes | No (needs muxer) | No (WebM typical) |
Easiest setup | Moderate | Moderate | Yes |
Want to Try It?
Open Video Compressor, Audio Extractor, or Subtitle Studio in DojoClip to see this stack in action. If you are a developer, clone our benchmark harness (coming in the next post), run the tests, and share your numbers. We will aggregate them in a public table so creators can make informed choices.