Streaming Audio Recognition with Flutter and Vosk
Nov 6, 2025



Summary
Summary
Summary
Summary
This tutorial explains how to add streaming, on-device speech recognition to Flutter mobile apps using Vosk. It covers model setup, PCM audio capture, chunked streaming to the recognizer, handling partial and final transcripts, and performance considerations for deployment on Android and iOS.
This tutorial explains how to add streaming, on-device speech recognition to Flutter mobile apps using Vosk. It covers model setup, PCM audio capture, chunked streaming to the recognizer, handling partial and final transcripts, and performance considerations for deployment on Android and iOS.
This tutorial explains how to add streaming, on-device speech recognition to Flutter mobile apps using Vosk. It covers model setup, PCM audio capture, chunked streaming to the recognizer, handling partial and final transcripts, and performance considerations for deployment on Android and iOS.
This tutorial explains how to add streaming, on-device speech recognition to Flutter mobile apps using Vosk. It covers model setup, PCM audio capture, chunked streaming to the recognizer, handling partial and final transcripts, and performance considerations for deployment on Android and iOS.
Key insights:
Key insights:
Key insights:
Key insights:
Setup Vosk Model And Plugin: Initialize a local Vosk model (extract from assets) and run the recognizer off the UI thread to avoid jank.
Recording And Streaming Audio: Capture 16 kHz mono PCM in small frames (100–200 ms) and stream asynchronously to minimize latency.
Processing Results And Partial Transcripts: Use partial results for live UI updates and append final results when recognizer returns a final JSON segment.
Performance And Deployment: Balance model size, buffer size, and threading to optimize CPU, memory, and battery for mobile devices.
Introduction
This article shows how to implement streaming, near-real-time speech recognition in Flutter apps using the Vosk offline speech recognition engine. Targeted at mobile development, the approach focuses on capturing audio, feeding PCM buffers into Vosk’s recognizer, and handling partial and final results with low latency. We cover setup options, audio capture considerations, chunking and threading, and lightweight strategies for production mobile apps.
Setup Vosk Model And Plugin
Vosk is an offline ASR engine that runs on-device. On mobile you can integrate it either via an existing Flutter plugin (for example vosk_flutter) or by creating a thin platform-channel wrapper around the Vosk native libraries (Android: AAR, iOS: frameworks). Steps:
Add the plugin or your platform-channel code to the Flutter project.
Bundle a compact Vosk model inside your app (for mobile choose a small model to limit install size). Place the model in app assets and extract to app storage on first run.
Initialize the Recognizer with the model and sample rate (usually 16000 Hz).
Example initialization (pseudo-Flutter plugin usage):
final modelPath = await Vosk.extractModel('assets/model');
final recognizer = await Vosk.createRecognizer(modelPath, sampleRate: 16000);
recognizer.setMaxAlternatives(1);Keep the model extraction and recognizer initialization off the UI thread. Use compute or an isolate to unpack large models.
Recording And Streaming Audio
For low-latency streaming, capture raw PCM with a small buffer size and forward each chunk to the recognizer. On Android/iOS choose an audio recorder package that can deliver 16-bit PCM, mono, 16 kHz (or resample). Examples: flutter_sound, audio_streamer, or custom platform code to avoid extra processing overhead.
Key points:
Use short frames (e.g., 100–200 ms) to keep perceived latency low.
Do not await recognition for each chunk on the UI thread; feed the recognizer asynchronously.
If using a plugin with a prebuilt callback, convert any Float32 samples to 16-bit PCM before sending to Vosk.
Minimal chunking example showing pushing buffers into the recognizer:
// audioChunk is Int16List PCM mono 16k
void handleAudioChunk(Int16List audioChunk) async {
// Send raw bytes to native recognizer
await recognizer.acceptWaveForm(audioChunk.buffer.asUint8List());
final resultJson = recognizer.getResult();
if (resultJson != null) handleFinalResult(resultJson);
}If acceptWaveForm returns false, you can call getPartialResult() to display interim text.
Processing Results And Partial Transcripts
Vosk provides partial transcripts while streaming, which are crucial for responsive UIs. Recommended pattern:
After each chunk, call getPartialResult() to update a live transcript display.
Call acceptWaveForm for each chunk and periodically (or when it returns true) call getResult() to receive a final recognized segment.
Accumulate final segments to a full transcription and clear partial text when a final segment arrives.
Handle JSON responses from Vosk carefully. Partial results are lightweight; final results include word timings and confidence that you can use to build word-aligned transcripts or highlight words in the UI.
Example handling flow:
On audio chunk: send to recognizer. If recognizer returns a final result, append to transcript and emit an event.
Else read partial result and display it as ephemeral text.
Also plan for out-of-vocabulary phrases: use grammar or phrase lists if you expect constrained domains to improve accuracy.
Performance And Deployment
Mobile constraints require balancing recognition accuracy, latency, and binary size:
Choose the smallest model that meets your accuracy needs; smaller models reduce memory and CPU usage.
Use CPU-affine threads or isolates for recognition work. The Vosk native libraries are optimized but the Flutter-to-native bridge should not be called on high-frequency UI threads.
Profile battery and CPU on target devices. Real-time ASR can be CPU heavy; consider batching slightly larger frames on low-end devices to reduce overhead.
For iOS, ensure you include the correct architecture slices when bundling native libraries.
Testing tips: exercise with realistic background noise and check how partial results behave under intermittent connectivity. Because Vosk operates offline, it’s robust to network loss but sensitive to microphone quality and sample-rate mismatches.
Vibe Studio

Vibe Studio, powered by Steve’s advanced AI agents, is a revolutionary no-code, conversational platform that empowers users to quickly and efficiently create full-stack Flutter applications integrated seamlessly with Firebase backend services. Ideal for solo founders, startups, and agile engineering teams, Vibe Studio allows users to visually manage and deploy Flutter apps, greatly accelerating the development process. The intuitive conversational interface simplifies complex development tasks, making app creation accessible even for non-coders.
Conclusion
Streaming audio recognition with Flutter and Vosk gives you offline, low-latency ASR suitable for privacy-sensitive and real-time mobile apps. The pattern is: initialize a model, capture 16 kHz mono PCM, stream small chunks into the recognizer asynchronously, and display partial results while aggregating finals. Focus on model size, buffer sizing, and threading to achieve responsive mobile experiences.
Introduction
This article shows how to implement streaming, near-real-time speech recognition in Flutter apps using the Vosk offline speech recognition engine. Targeted at mobile development, the approach focuses on capturing audio, feeding PCM buffers into Vosk’s recognizer, and handling partial and final results with low latency. We cover setup options, audio capture considerations, chunking and threading, and lightweight strategies for production mobile apps.
Setup Vosk Model And Plugin
Vosk is an offline ASR engine that runs on-device. On mobile you can integrate it either via an existing Flutter plugin (for example vosk_flutter) or by creating a thin platform-channel wrapper around the Vosk native libraries (Android: AAR, iOS: frameworks). Steps:
Add the plugin or your platform-channel code to the Flutter project.
Bundle a compact Vosk model inside your app (for mobile choose a small model to limit install size). Place the model in app assets and extract to app storage on first run.
Initialize the Recognizer with the model and sample rate (usually 16000 Hz).
Example initialization (pseudo-Flutter plugin usage):
final modelPath = await Vosk.extractModel('assets/model');
final recognizer = await Vosk.createRecognizer(modelPath, sampleRate: 16000);
recognizer.setMaxAlternatives(1);Keep the model extraction and recognizer initialization off the UI thread. Use compute or an isolate to unpack large models.
Recording And Streaming Audio
For low-latency streaming, capture raw PCM with a small buffer size and forward each chunk to the recognizer. On Android/iOS choose an audio recorder package that can deliver 16-bit PCM, mono, 16 kHz (or resample). Examples: flutter_sound, audio_streamer, or custom platform code to avoid extra processing overhead.
Key points:
Use short frames (e.g., 100–200 ms) to keep perceived latency low.
Do not await recognition for each chunk on the UI thread; feed the recognizer asynchronously.
If using a plugin with a prebuilt callback, convert any Float32 samples to 16-bit PCM before sending to Vosk.
Minimal chunking example showing pushing buffers into the recognizer:
// audioChunk is Int16List PCM mono 16k
void handleAudioChunk(Int16List audioChunk) async {
// Send raw bytes to native recognizer
await recognizer.acceptWaveForm(audioChunk.buffer.asUint8List());
final resultJson = recognizer.getResult();
if (resultJson != null) handleFinalResult(resultJson);
}If acceptWaveForm returns false, you can call getPartialResult() to display interim text.
Processing Results And Partial Transcripts
Vosk provides partial transcripts while streaming, which are crucial for responsive UIs. Recommended pattern:
After each chunk, call getPartialResult() to update a live transcript display.
Call acceptWaveForm for each chunk and periodically (or when it returns true) call getResult() to receive a final recognized segment.
Accumulate final segments to a full transcription and clear partial text when a final segment arrives.
Handle JSON responses from Vosk carefully. Partial results are lightweight; final results include word timings and confidence that you can use to build word-aligned transcripts or highlight words in the UI.
Example handling flow:
On audio chunk: send to recognizer. If recognizer returns a final result, append to transcript and emit an event.
Else read partial result and display it as ephemeral text.
Also plan for out-of-vocabulary phrases: use grammar or phrase lists if you expect constrained domains to improve accuracy.
Performance And Deployment
Mobile constraints require balancing recognition accuracy, latency, and binary size:
Choose the smallest model that meets your accuracy needs; smaller models reduce memory and CPU usage.
Use CPU-affine threads or isolates for recognition work. The Vosk native libraries are optimized but the Flutter-to-native bridge should not be called on high-frequency UI threads.
Profile battery and CPU on target devices. Real-time ASR can be CPU heavy; consider batching slightly larger frames on low-end devices to reduce overhead.
For iOS, ensure you include the correct architecture slices when bundling native libraries.
Testing tips: exercise with realistic background noise and check how partial results behave under intermittent connectivity. Because Vosk operates offline, it’s robust to network loss but sensitive to microphone quality and sample-rate mismatches.
Vibe Studio

Vibe Studio, powered by Steve’s advanced AI agents, is a revolutionary no-code, conversational platform that empowers users to quickly and efficiently create full-stack Flutter applications integrated seamlessly with Firebase backend services. Ideal for solo founders, startups, and agile engineering teams, Vibe Studio allows users to visually manage and deploy Flutter apps, greatly accelerating the development process. The intuitive conversational interface simplifies complex development tasks, making app creation accessible even for non-coders.
Conclusion
Streaming audio recognition with Flutter and Vosk gives you offline, low-latency ASR suitable for privacy-sensitive and real-time mobile apps. The pattern is: initialize a model, capture 16 kHz mono PCM, stream small chunks into the recognizer asynchronously, and display partial results while aggregating finals. Focus on model size, buffer sizing, and threading to achieve responsive mobile experiences.
Build Flutter Apps Faster with Vibe Studio
Build Flutter Apps Faster with Vibe Studio
Build Flutter Apps Faster with Vibe Studio
Build Flutter Apps Faster with Vibe Studio
Vibe Studio is your AI-powered Flutter development companion. Skip boilerplate, build in real-time, and deploy without hassle. Start creating apps at lightning speed with zero setup.
Vibe Studio is your AI-powered Flutter development companion. Skip boilerplate, build in real-time, and deploy without hassle. Start creating apps at lightning speed with zero setup.
Vibe Studio is your AI-powered Flutter development companion. Skip boilerplate, build in real-time, and deploy without hassle. Start creating apps at lightning speed with zero setup.
Vibe Studio is your AI-powered Flutter development companion. Skip boilerplate, build in real-time, and deploy without hassle. Start creating apps at lightning speed with zero setup.






















