DojoClip

FFmpeg in the Browser: Launching Render Lab with DojoClip (Technical Deep Dive)

By: Pansa LegrandDate: 2025-02-18Category: Engineering
Diagram of FFmpeg WebAssembly pipeline inside DojoClip

Welcome to Render Lab, our series on the tech that keeps DojoClip fast, private, and creator-friendly. This post goes beyond the high-level pitch and into the how: FFmpeg compiled to WebAssembly (WASM), where it shines, where it struggles, and how we pair it with MediaRecorder and WebCodecs for a hybrid, low-latency workflow.


TL;DR

  • FFmpeg WASM equals precision plus feature coverage (filters, trims, remux) with local privacy.
  • WebCodecs delivers the lowest-latency, hardware-assisted encode and decode, but needs a muxer.
  • MediaRecorder provides the simplest real-time capture, perfect for previews but limited in control.
  • We use a hybrid pipeline: WASM for exact transforms, WebCodecs and MediaRecorder for speedy previews and exports.

Concepts (2-Minute Primer)

  • Container vs. codec: MP4, MKV, and WebM are containers; H.264, H.265, VP9, AV1, AAC, and Opus are codecs. Editing often needs decode then filter then encode then mux.
  • Transcode vs. remux: Transcoding re-encodes (quality and size tradeoff). Remuxing changes the container without touching compressed streams (fast and lossless).
  • CFR vs. VFR: Constant vs. variable frame rate. Web capture is often VFR; editing can prefer CFR for frame-accurate seeking.
  • Keyframes (IDR): Cuts and edits snap to keyframes unless you transcode or use smart-render strategies.
  • CRF and bitrate: Quality knobs. Lower CRF increases quality and size. Bitrate caps throughput for streaming targets.

Architecture at a Glance

[File Input]
   └─▶ Browser FS (OPFS / RAM)
         ├─▶ FFmpeg WASM worker (precision filters, remux, waveform)
         ├─▶ WebCodecs (fast decode/encode; preview/export)
         └─▶ MediaRecorder (real-time canvas/tab capture)
                     ▼
               [Muxer] → MP4/WebM → Download / OPFS / Upload

Why WASM?

It brings most of FFmpeg's CLI to the browser. This unlocks frame-accurate trimming, complex filter graphs, and audio channel operations while keeping processing local for privacy-sensitive media.

Why not only WASM?

Long encodes stress memory and CPU, the first load is several megabytes, and threads or SIMD require cross-origin isolation headers.

Why WebCodecs and MediaRecorder too?

WebCodecs taps platform encoders and decoders for speed. MediaRecorder is trivial for live capture and quick proxies.


Deploying FFmpeg WASM Correctly

1. Enable Threads and SIMD (Real-World Gains)

Threads plus SIMD usually bring 1.5 to 3 times speedups depending on workload. They require cross-origin isolation headers.

// next.config.js (or middleware/headers)
const securityHeaders = [
  { key: 'Cross-Origin-Opener-Policy', value: 'same-origin' },
  { key: 'Cross-Origin-Embedder-Policy', value: 'require-corp' },
  { key: 'Cross-Origin-Resource-Policy', value: 'same-site' },
];

module.exports = {
  async headers() {
    return [{ source: '/:path*', headers: securityHeaders }];
  },
};

2. Load the Core in a Worker

Avoid blocking the UI thread and keep FFmpeg memory isolated.

// ffmpeg.worker.ts
import { createFFmpeg, fetchFile } from '@ffmpeg/ffmpeg';
const ffmpeg = createFFmpeg({ log: true, corePath: '/wasm/ffmpeg-core.js' });

self.onmessage = async (event) => {
  const { name, file, args } = event.data;
  if (!ffmpeg.isLoaded()) {
    await ffmpeg.load();
  }
  ffmpeg.FS('writeFile', name, await fetchFile(file));
  await ffmpeg.run(...args);
  const out = ffmpeg.FS('readFile', 'out.bin');
  self.postMessage({ ok: true, data: out.buffer }, [out.buffer]);
};
// UI side
const worker = new Worker(new URL('./ffmpeg.worker.ts', import.meta.url));
worker.postMessage({
  name: 'in.mp4',
  file,
  args: ['-i', 'in.mp4', '-vn', '-acodec', 'copy', 'out.bin'],
});

worker.onmessage = ({ data }) => {
  const blob = new Blob([data.data], { type: 'audio/mp4' });
  download(blob, 'audio.m4a');
};

3. Store Large Intermediates in OPFS

The Origin Private File System avoids huge RAM spikes.

const root = await navigator.storage.getDirectory();
const fileHandle = await root.getFileHandle('clip.m4a', { create: true });
const writer = await fileHandle.createWritable();
await writer.write(blob);
await writer.close();

Everyday Tasks (Copy-Paste FFmpeg Recipes)

Audio extraction (bit-exact if AAC already):

ffmpeg -i input.mp4 -vn -acodec copy audio.m4a

Downscale plus CRF transcode (H.264 proxy):

ffmpeg -i input_4k.mp4 -vf scale=-2:1080 -c:v libx264 -preset veryfast -crf 23 -c:a aac -b:a 160k out_1080p.mp4

Frame-accurate segment (re-encode small window):

ffmpeg -ss 00:00:12.300 -to 00:00:19.000 -i input.mp4 -c:v libx264 -crf 20 -pix_fmt yuv420p -c:a aac slice.mp4

Waveform PNG for a timeline:

ffmpeg -i audio.m4a -lavfi showwavespic=s=1200x200:colors=white waveform.png

In WASM you pass the same arguments to ffmpeg.run(...). For pure remux (no transcode) keep -c copy to preserve quality and speed.


WebCodecs: The Fast Lane

WebCodecs gives you direct access to platform decoders and encoders. You need a muxer to wrap encoded chunks into MP4 or WebM.

const fps = 30;
const encoder = new VideoEncoder({
  output: handleChunk,
  error: console.error,
});

encoder.configure({
  codec: 'avc1.42E01E',
  width: canvas.width,
  height: canvas.height,
  bitrate: 3_000_000,
  framerate: fps,
});

let t0 = performance.now();
let frameIndex = 0;
const track = canvas.captureStream(fps).getVideoTracks()[0];
const reader = new MediaStreamTrackProcessor({ track }).readable.getReader();

async function pump() {
  const { value: frame, done } = await reader.read();
  if (done) {
    await encoder.flush();
    muxer.finalize();
    return;
  }
  const timestamp = Math.floor((performance.now() - t0) * 1000);
  const videoFrame = new VideoFrame(frame, { timestamp });
  encoder.encode(videoFrame, { keyFrame: frameIndex % (fps * 2) === 0 });
  frameIndex += 1;
  videoFrame.close();
  frame.close();
  pump();
}

function handleChunk(chunk) {
  muxer.addVideoChunk(chunk);
}

Muxing note: WebCodecs outputs elementary streams. Use a browser muxer library to produce a downloadable file. For previews, Media Source Extensions can stream chunks to a video element.

Compared to FFmpeg WASM, encode latency is often lower and CPU usage smaller, especially on devices with hardware acceleration.


MediaRecorder: Zero-Fuss Proxies

const stream = canvas.captureStream(30);
const recorder = new MediaRecorder(stream, { mimeType: 'video/webm;codecs=vp9' });
const chunks = [];

recorder.ondataavailable = (event) => {
  if (event.data.size) {
    chunks.push(event.data);
  }
};

recorder.onstop = () => {
  const blob = new Blob(chunks, { type: recorder.mimeType });
  download(blob, 'preview.webm');
};

recorder.start();
// ... render frames ...
recorder.stop();

Pros: Dead simple and great for quick reviews or social-media proxies. Cons: Limited control over GOP, CRF, or advanced filtering.


Choosing the Right Pipeline

Task Precision Needed Latency Target Recommended
Audio extraction, remux High Low FFmpeg WASM (-c copy)
Waveform or thumbnails High Low FFmpeg WASM (filters)
Frame-accurate cuts High Medium FFmpeg WASM (small re-encode window)
Live preview or screen capture Medium Very low MediaRecorder
Final exports from canvas timeline Medium Low WebCodecs plus muxer
Long offline transcodes Medium High Native or server fallback

Memory Math (Plan Before You Render)

  • Decoded video frame size is roughly width times height times 1.5 bytes (YUV420p). Example: 1920x1080 is about 3.1 MB per frame. A small 120-frame buffer is roughly 372 MB.
  • WASM heap growth can spike during filters such as resamplers or scalers. Keep intermediates on disk (OPFS) and stream when possible.
  • Audio: 48 kHz stereo 16-bit PCM is about 192 KB per second. Five minutes equals roughly 57 MB if kept uncompressed in memory.

Practical tips

  • Prefer remux (-c copy) over full transcode whenever possible.
  • Split long jobs into segments, persist to OPFS, then concatenate with FFmpeg.
  • For previews, use WebCodecs to avoid holding many raw frames in JavaScript memory.

Reproducible Benchmarks (Fill In Before Publishing)

Run these in a clean, production build. Record device and browser versions.

  1. Audio extraction (remux)

    • Input: MP4 (H.264 plus AAC), 1 to 2 minutes.
    • Command: -i in.mp4 -vn -acodec copy out.m4a
    • Metrics: Wall time (milliseconds), peak JS heap (MB), WASM heap (MB), output size.
  2. Proxy transcode (4K to 1080p)

    • Command: -vf scale=-2:1080 -c:v libx264 -preset veryfast -crf 23 -c:a aac -b:a 160k
    • Compare: FFmpeg WASM vs. WebCodecs with similar bitrate and framerate. Record time, CPU percentage, dropped frames.
  3. Canvas timeline export

    • Method A: WebCodecs encode plus MP4 muxer.
    • Method B: MediaRecorder capture at 30 fps.
    • Metrics: Encode FPS, total time, output bitrate or size, visual quality (SSIM or PSNR if you can compare to a ground truth).

Template table (replace N/A with measured values):

Device / Browser Test Pipeline Time (s) CPU avg (%) Peak Mem (MB) Notes
M2 Pro / Chrome 128 Audio extract FFmpeg WASM Threads plus SIMD
M2 Pro / Chrome 128 4K to 1080p FFmpeg WASM Threads plus SIMD
M2 Pro / Chrome 128 4K to 1080p WebCodecs H.264 hardware
Pixel 8 / Chrome Canvas export MediaRecorder VP9

Measurement helpers

const perf = new PerformanceObserver((list) => {
  for (const entry of list.getEntries()) {
    console.log(entry.name, entry.duration);
  }
});

perf.observe({ entryTypes: ['measure'] });
const memory = performance.memory; // Chrome only: usedJSHeapSize and totalJSHeapSize

Handling Common Pitfalls

  • Slow-motion exports: Supply monotonically increasing timestamps to WebCodecs, or encode CFR at a fixed fps. Avoid relying on requestAnimationFrame timing alone.
  • Audio and video drift: Keep audio as the timing source. Clamp video PTS to the audio-derived timeline or resample audio to your project rate.
  • Green frames or color shifts: Ensure pixel format (for example, RGBA to I420) and color space info are set consistently when muxing.
  • Safari gaps: WebCodecs support is partial; fall back to MediaRecorder or WASM H.264.
  • Threading disabled: If SharedArrayBuffer is not available (no COOP or COEP), load the single-threaded WASM core to avoid runtime errors.

Our Hybrid in DojoClip (Today)

  • FFmpeg WASM for audio extraction, waveform and thumbnails, precise trims, remux, and subtitle prep.
  • WebCodecs for canvas timeline previews and final exports where supported.
  • MediaRecorder for instant proxies, quick-share videos, and tab capture demos.
  • Storage: OPFS for intermediates and resumable exports; memory for small jobs.

We will publish a follow-up with real benchmark tables from multiple devices and browsers once we lock down the current encoder parameters.


Appendix A — Safe Defaults

  • H.264 (web): -preset veryfast -crf 23 -maxrate 4M -bufsize 8M -pix_fmt yuv420p
  • Audio: -c:a aac -b:a 160k (music), -b:a 96k (speech-heavy)
  • Seek then transcode (faster): -ss before -i input ... when not frame-accurate; otherwise -i input -ss after.
  • Subtitles: Keep text as .srt or .vtt sidecar files where possible; burn in only for final deliverables.

Appendix B — Feature Map (Partial)

Capability FFmpeg WASM WebCodecs MediaRecorder
Exact filter graphs Yes No No
Hardware acceleration No (CPU) Yes Yes
Precise CFR or VFR control Yes Yes Limited
Real-time capture No Yes (decode or encode) Yes
MP4 mux built-in Yes No (needs muxer) No (WebM typical)
Easiest setup Moderate Moderate Yes

Want to Try It?

Open Video Compressor, Audio Extractor, or Subtitle Studio in DojoClip to see this stack in action. If you are a developer, clone our benchmark harness (coming in the next post), run the tests, and share your numbers. We will aggregate them in a public table so creators can make informed choices.