Capsule Transcriber: ML Transcription

Explore a full-stack transcription engine for video, comparing ML models for Voice Activity Detection, Whisper, Speaker Diarization, and more. Learn about speech recognition challenges.

Overview

I’ll share the details of how Capsule Transcriber works and demonstrate the full-stack analytics app that I’ve built to compare and validate various transcription models and APIs.
Capsule Transcriber consists of 5 ML models, and you will learn about Voice Activity Detection, Whisper, Speaker Diarization, Timestamps, Language Detection, etc.

Links

Tech stack