AI Transcription &
Meeting Intelligence
Local speech-to-text with speaker diarization. Transcribe meetings, interviews, and calls with automatic speaker labelling — all processed on your hardware, nothing leaves your machine.
How It Works
Drop in a recording — meeting, interview, call, video — and the AI extracts audio, transcribes it with OpenAI Whisper running locally, identifies who said what, and delivers a formatted, speaker-labelled transcript with optional AI-generated summary.
Audio Extraction
Feed in any audio or video format. The system extracts clean audio from MP4, WebM, M4A, or processes raw audio files directly.
Transcribe & Identify
OpenAI Whisper runs locally on GPU/Apple Silicon to transcribe speech. Speaker diarization identifies and labels each speaker.
Summarise & Deliver
Get a timestamped, speaker-labelled transcript plus an optional AI summary with key decisions, action items, and follow-ups.
Technical Details
Model & Processing Details
Speech Recognition: OpenAI Whisper large-v3 model running locally via MLX (Apple Silicon) or CUDA (NVIDIA GPU).
Speaker Diarization: PyAnnote audio pipeline for speaker segmentation and clustering. Automatically determines number of speakers.
Audio Processing: FFmpeg for format conversion, noise reduction, and audio track extraction from video files.
Summary Generation: Local LLM (Llama 3 / Qwen) processes transcripts for meeting summaries and action item extraction.
Hardware Requirements
Apple Silicon: M1 Pro or higher recommended. M1/M2 Max provides optimal performance with unified memory for large models.
NVIDIA GPU: RTX 3060 (12GB VRAM) minimum. RTX 3090/4090 for batch processing and larger models.
RAM: 16GB minimum, 32GB+ recommended for concurrent transcription and summarisation.
Who This Is For
Legal Professionals
Client consultations, depositions, mediation recordings. Legally privileged content stays on your machine.
Healthcare
Patient consultations, clinical notes, specialist referral recordings. Privacy-compliant local processing.
Corporate Teams
Board meetings, strategy sessions, client calls. Searchable records with action item tracking.
Researchers
Interview transcription, focus groups, field recordings. Speaker-labelled output for qualitative analysis.
Frequently Asked Questions
How fast is the AI transcription?
Can it identify different speakers in a meeting?
What audio and video formats are supported?
Does the audio data leave my computer?
Can it generate meeting summaries and action items?
Never Lose a Meeting Detail Again
AI transcription that runs on your hardware. Every word captured, every speaker identified, every action item tracked.