Speech Recognition Services
ClickMasters builds speech recognition systems for B2B companies across the USA, Europe, Canada, and Australia. Meeting transcription with speaker diarisation who said what, when. Call centre analytics transcribe, analyse sentiment, and extract action items from thousands of calls daily. Voice command interfaces for mobile and web applications. Real-time and batch transcription in 100+ languages. Built on OpenAI Whisper and Deepgram.
Years Experience
Projects Delivered
Client Satisfaction
Support Available

Whisper vs Deepgram for Transcription
OpenAI Whisper and Deepgram are both production-grade ASR systems but optimised for different use cases. Whisper is an open-source model that can be self-hosted (data stays on your infrastructure) or called via the OpenAI API. It has near-human accuracy on English (4.4% WER on standard benchmarks), supports 100+ languages, and is the best choice for batch transcription where latency is not a constraint. Deepgram is a managed API service optimised for real-time streaming transcription delivering partial transcripts with <300ms latency, making it the correct choice for live captioning, real-time agent assist, and voice interfaces where users see transcription as they speak. For batch transcription of meeting recordings or call logs: Whisper. For real-time streaming: Deepgram. ClickMasters uses both depending on the latency requirement.
Speaker Diarisation
Speaker diarisation is the process of determining "who spoke when" in a multi-speaker audio recording segmenting the transcript by speaker identity. Without diarisation, a meeting transcript is a single stream of text with no attribution: "The deadline is Friday. What about the API integration? We need to finish that first." With diarisation: "Speaker 1 (CEO): The deadline is Friday. Speaker 2 (CTO): What about the API integration? Speaker 1 (CEO): We need to finish that first." Diarisation is implemented with pyannote-audio (a speaker segmentation model) applied before transcription the audio is segmented by speaker, each segment is transcribed, and the transcript is reconstructed with speaker labels. For meeting intelligence, call analytics, and interview transcription, diarisation is essential without it, the transcript has limited business value.
On-Premises Speech Recognition for Sensitive Data
OpenAI Whisper is fully open-source and can be deployed on your own infrastructure either on-premises GPU servers or within your private AWS/GCP/Azure VPC. Audio never leaves your environment. Deployment options: Whisper served via a FastAPI endpoint on an AWS EC2 G5 instance (GPU-accelerated processes a 60-minute meeting in ~2 minutes), or faster-whisper (a CTranslate2-optimised Whisper implementation 4x faster than the original with the same accuracy) for high-throughput batch transcription. For real-time streaming in a private environment, NVIDIA Riva (enterprise-grade on-premises ASR) or a self-hosted Whisper with streaming chunking can replace Deepgram. ClickMasters deploys self-hosted ASR for healthcare, legal, and financial services clients where audio content cannot be sent to external APIs.
Speech Recognition Services We Deliver
ClickMasters operates as a full-stack speech recognition partner. Our team handles every layer of the software delivery lifecycle product strategy, UI/UX design, backend engineering, cloud infrastructure, QA, and ongoing support.
Meeting Transcription
Batch transcription of recorded meetings (Zoom, Teams, Google Meet exports). Whisper large-v3 for high accuracy. Speaker diarisation via pyannote-audio (identifies each speaker's segments). Structured output: timestamped transcript with speaker labels. Post-processing: punctuation restoration, custom vocabulary.
Real-Time Streaming ASR
Live transcription for video conferencing plugins, contact centre dashboards, or real-time captioning. Deepgram Nova-2 (primary for streaming <300ms latency) or AWS Transcribe Streaming. WebSocket-based streaming with partial/final transcripts.
Call Centre Analytics
Transcribe inbound/outbound call recordings at scale. Post-transcription analysis: sentiment per utterance, topic extraction (LLM-based), action item extraction, compliance phrase detection, silence analysis. Dashboard with agent performance metrics and escalation scoring.
Voice Command Interface
Embedded voice input for mobile (iOS + Android) and web applications. Architecture: device microphone capture → streaming ASR → intent classification → application action. Wake word detection (Porcupine lightweight on-device). Push-to-talk and always-listening modes.
Audio Processing Pipeline
Pre-processing for optimal ASR accuracy: noise reduction (RNNoise), voice activity detection (Silero VAD skip silent segments), audio normalisation, format conversion (→ 16kHz mono WAV), and diarisation (pyannote-audio speaker segmentation).
Why Companies Choose ClickMasters
Batch: Whisper (4.4% WER, self-hostable). Real-time: Deepgram (<300ms)
Basic: One-size-fits-all ASR choice
pyannote-audio "who spoke when" with speaker labels
Basic: Single-stream transcript only (no speaker attribution)
Self-hosted Whisper (faster-whisper 4x faster) for data privacy
Basic: Cloud API only (audio leaves environment)
Sentiment + topics + action items + compliance phrase detection
Basic: Transcription only (no analysis layer)
Porcupine on-device detection, no cloud round-trip
Basic: Push-to-talk only (mic always requires user button)
Our Process
Our Speech Recognition Process
A proven methodology that transforms your vision into reality
ASR Scoping
Use case analysis (batch vs real-time, latency requirements, languages, privacy constraints), model selection (Whisper vs Deepgram), diarisation plan, API design. Deliverable: ASR Architecture Plan.
Batch Transcription Pipeline
Whisper large-v3 or faster-whisper (4x faster) deployment. Audio pre-processing (RNNoise noise reduction, Silero VAD). Diarisation (pyannote-audio). S3 ingestion, JSON output, webhook delivery. Deliverable: Batch Transcription Pipeline.
Real-Time Streaming
Deepgram WebSocket or self-hosted streaming. Browser microphone capture (Web Audio API), partial transcript streaming, final transcript assembly. Integration with application UI. Deliverable: Real-Time ASR Integration.
Post-Processing Analytics
Call centre: sentiment analysis per utterance, topic extraction (LLM), action item extraction, compliance phrase detection, dashboard. Deliverable: Analytics Pipeline + Dashboard.
ASR Scoping
Use case analysis (batch vs real-time, latency requirements, languages, privacy constraints), model selection (Whisper vs Deepgram), diarisation plan, API design. Deliverable: ASR Architecture Plan.
Batch Transcription Pipeline
Whisper large-v3 or faster-whisper (4x faster) deployment. Audio pre-processing (RNNoise noise reduction, Silero VAD). Diarisation (pyannote-audio). S3 ingestion, JSON output, webhook delivery. Deliverable: Batch Transcription Pipeline.
Post-Processing Analytics
Call centre: sentiment analysis per utterance, topic extraction (LLM), action item extraction, compliance phrase detection, dashboard. Deliverable: Analytics Pipeline + Dashboard.
Real-Time Streaming
Deepgram WebSocket or self-hosted streaming. Browser microphone capture (Web Audio API), partial transcript streaming, final transcript assembly. Integration with application UI. Deliverable: Real-Time ASR Integration.
Technology Stack
Modern tools we use to build scalable, secure applications.
Languages & Frameworks
Data Processing
Infrastructure
Industry-Specific Expertise
Deep expertise across various sectors with tailored solutions
Meeting Transcription
Call Centre Analytics
Voice Command Interface
Medical Dictation
Pricing
Speech Recognition Development Pricing
Transparent pricing tailored to your business needs
ASR Scoping
Perfect for businesses that need asr scoping solutions
one-time project range
Package Includes
- Timeline: 1 week
- Best For: Use case analysis, model selection, diarisation plan, API design
- Budget Range: 2,000 - 5,000 AUD
- Dedicated Project Manager
- Quality Assurance Testing
- Documentation & Training
Batch Transcription Pipeline
Perfect for businesses that need batch transcription pipeline solutions
one-time project range
Package Includes
- Timeline: 3 - 5 weeks
- Best For: Whisper + diarisation, S3 ingestion, JSON output, webhook delivery
- Budget Range: 8,000 - 22,000 AUD
- Dedicated Project Manager
- Quality Assurance Testing
- Documentation & Training
Real-Time Streaming ASR
Perfect for businesses that need real-time streaming asr solutions
one-time project range
Package Includes
- Timeline: 4 - 7 weeks
- Best For: Deepgram WebSocket, partial transcripts, React/mobile UI
- Budget Range: 10,000 - 28,000 AUD
- Dedicated Project Manager
- Quality Assurance Testing
- Documentation & Training
Call Centre Analytics
Perfect for businesses that need call centre analytics solutions
one-time project range
Package Includes
- Timeline: 5 - 9 weeks
- Best For: Transcription + sentiment + topics + action items + compliance + dashboard
- Budget Range: 15,000 - 45,000 AUD
- Dedicated Project Manager
- Quality Assurance Testing
- Documentation & Training
Voice Command Interface
Perfect for businesses that need voice command interface solutions
one-time project range
Package Includes
- Timeline: 4 - 7 weeks
- Best For: Wake word + streaming ASR + intent classification + app integration
- Budget Range: 10,000 - 28,000 AUD
- Dedicated Project Manager
- Quality Assurance Testing
- Documentation & Training
Custom Vocabulary Fine-Tuning
Perfect for businesses that need custom vocabulary fine-tuning solutions
one-time project range
Package Includes
- Timeline: 2 - 4 weeks
- Best For: Domain vocabulary injection or Whisper fine-tune on specialised audio
- Budget Range: 6,000 - 15,000 AUD
- Dedicated Project Manager
- Quality Assurance Testing
- Documentation & Training
* All prices are estimates and may vary based on requirements.
CEO Vision
To build scalable, intelligent custom software development solutions that empower businesses to grow, automate, and transform in a digital-first world.

We are not building software. We are architecting the infrastructure of tomorrow systems that think, adapt, and grow alongside the businesses they power. Our mission is to make cutting-edge technology accessible to every ambitious team on the planet.
Amjad Khan
CEO
12+
Years
300+
Projects
98%
Retention
Speech Recognition Services client reviews
Success Stories
Frequently Asked Questions
Explore Related Capabilities
Discover how we can help transform your business through our comprehensive services, real-world case studies, or our full solutions portfolio.
