Integrating Google Cloud Speech-to-Text API in Flutter
Nov 19, 2025



Summary
Summary
Summary
Summary
This tutorial explains how to integrate Google Cloud Speech-to-Text with Flutter for mobile development: enable the API, create a service account, use a server-side proxy to protect credentials, record audio in Flutter (WAV/Linear16), send audio to the server, and handle synchronous or streaming transcription with responsive UI updates and best practices for performance and security.
This tutorial explains how to integrate Google Cloud Speech-to-Text with Flutter for mobile development: enable the API, create a service account, use a server-side proxy to protect credentials, record audio in Flutter (WAV/Linear16), send audio to the server, and handle synchronous or streaming transcription with responsive UI updates and best practices for performance and security.
This tutorial explains how to integrate Google Cloud Speech-to-Text with Flutter for mobile development: enable the API, create a service account, use a server-side proxy to protect credentials, record audio in Flutter (WAV/Linear16), send audio to the server, and handle synchronous or streaming transcription with responsive UI updates and best practices for performance and security.
This tutorial explains how to integrate Google Cloud Speech-to-Text with Flutter for mobile development: enable the API, create a service account, use a server-side proxy to protect credentials, record audio in Flutter (WAV/Linear16), send audio to the server, and handle synchronous or streaming transcription with responsive UI updates and best practices for performance and security.
Key insights:
Key insights:
Key insights:
Key insights:
Setting Up Google Cloud: Keep service account keys off the device—use a server proxy to call the Speech-to-Text API securely.
Preparing Flutter Project: Use packages like record and http; record mono WAV or Linear16 at 16 kHz for best recognition accuracy.
Recording And Streaming Audio: For short files use speech:recognize with base64; for real-time use server-side gRPC streaming and websocket to Flutter.
Handling Transcriptions And UI: Show partial results during streaming, parse alternatives for confidence, and throttle UI updates to avoid jank.
Security And Performance: Resample to 16 kHz, compress audio, restrict service account permissions, and implement retries and quota handling.
Introduction
This tutorial shows how to integrate Google Cloud Speech-to-Text into a Flutter app for robust, low-latency speech recognition on mobile. We'll cover cloud setup, a secure architecture choice, recording audio in Flutter, sending audio for transcription, and updating the UI with results. The guidance focuses on practical mobile development decisions so you can prototype quickly and move to production safely.
Setting Up Google Cloud
1) Enable the Speech-to-Text API in the Google Cloud Console. 2) Create a service account and grant it the "Speech-to-Text Admin" or equivalent minimal role. 3) Create and download a JSON key for development, but do not ship this key in the mobile app.
Recommended architecture: run a small server (Node, Python, Go) that holds your service account key and acts as a proxy. The mobile app sends recorded audio to that server, and the server calls Google Cloud APIs. This avoids exposing credentials and lets you enforce request quotas, authentication, and audio preprocessing.
Enable billing for the project and set quotas and alerts. For streaming recognition in production, consider using gRPC on the server side for lower latency.
Preparing Flutter Project
Add dependencies in pubspec.yaml: a recorder and a simple HTTP client. For example, use record and http packages.
Install packages and request microphone permission in AndroidManifest and Info.plist. Choose an audio container and encoding that work with Speech-to-Text. Google accepts base64-encoded Linear16 PCM or WAV. For mobile, recording to WAV at 16 kHz or resampling to 16 kHz Linear16 is a pragmatic choice.
Example recording (using record package):
import 'package:record/record.dart';
final recorder = Record();
await recorder.start(path: 'audio.wav', encoder: AudioEncoder.wav);
// ...later
final path = await recorder.stop();This produces a WAV file you can upload. If you need raw Linear16, either record raw PCM or convert on the server.
Recording And Streaming Audio
Two common flows: synchronous file recognition for short utterances and streaming recognition for live dictation.
Synchronous (best for <1 minute): read the WAV file bytes, base64-encode, then request Google Cloud's speech:recognize endpoint (REST). For larger files use longrunningrecognize.
Streaming (best for real-time UI): implement streaming on the server side using gRPC to Google and a websocket between the Flutter app and the server. Streaming directly from mobile to Google via gRPC is possible but complex and not recommended because of credential handling.
Flutter client: send the recorded file as multipart/form-data or bytes to your server. Example HTTP upload snippet:
import 'package:http/http.dart' as http;
final uri = Uri.parse('https://your-server.example.com/transcribe');
final request = http.MultipartRequest('POST', uri);
request.files.add(await http.MultipartFile.fromPath('file', pathToWav));
final res = await request.send();
final body = await res.stream.bytesToString();On the server, convert or resample if needed and call the Speech-to-Text API, passing audio content encoded as base64 in the JSON request body for speech:recognize, or use a long-running / streaming method.
Handling Transcriptions And UI
Design the Flutter UI to display partial results during streaming and final results when available. For synchronous requests, show a spinner and then render the returned transcript. For streaming, send partial transcripts from the server over websockets or SSE and update the UI incrementally.
Parse the Google response's results[] and alternatives[] fields; pick the highest confidence alternative and show timestamps if returned. Handle edge cases: silence, network errors, and API quota errors. Implement retry logic and exponential backoff.
Performance tips for mobile development: compress audio moderately, avoid sending unnecessary channels (mono is fine), and resample to 16 kHz/16-bit if your app targets conversational voice. Throttle updates to the UI to avoid jank; use isolates for heavy audio processing.
Security and privacy: inform users and follow platform rules for microphone access. Store or forward audio only as necessary, and allow users to opt out of cloud transcription.
Vibe Studio

Vibe Studio, powered by Steve’s advanced AI agents, is a revolutionary no-code, conversational platform that empowers users to quickly and efficiently create full-stack Flutter applications integrated seamlessly with Firebase backend services. Ideal for solo founders, startups, and agile engineering teams, Vibe Studio allows users to visually manage and deploy Flutter apps, greatly accelerating the development process. The intuitive conversational interface simplifies complex development tasks, making app creation accessible even for non-coders.
Conclusion
Integrating Google Cloud Speech-to-Text into a Flutter mobile app is straightforward when you separate concerns: keep credentials on a server, record and preformat audio in the client, and let the server handle API communication (synchronous, long-running, or streaming). Use the record and http packages to capture and send audio, update the UI with partial/final transcripts, and optimize encoding and resampling for accuracy and performance. This approach speeds development while maintaining security and scalability for production mobile development.
Introduction
This tutorial shows how to integrate Google Cloud Speech-to-Text into a Flutter app for robust, low-latency speech recognition on mobile. We'll cover cloud setup, a secure architecture choice, recording audio in Flutter, sending audio for transcription, and updating the UI with results. The guidance focuses on practical mobile development decisions so you can prototype quickly and move to production safely.
Setting Up Google Cloud
1) Enable the Speech-to-Text API in the Google Cloud Console. 2) Create a service account and grant it the "Speech-to-Text Admin" or equivalent minimal role. 3) Create and download a JSON key for development, but do not ship this key in the mobile app.
Recommended architecture: run a small server (Node, Python, Go) that holds your service account key and acts as a proxy. The mobile app sends recorded audio to that server, and the server calls Google Cloud APIs. This avoids exposing credentials and lets you enforce request quotas, authentication, and audio preprocessing.
Enable billing for the project and set quotas and alerts. For streaming recognition in production, consider using gRPC on the server side for lower latency.
Preparing Flutter Project
Add dependencies in pubspec.yaml: a recorder and a simple HTTP client. For example, use record and http packages.
Install packages and request microphone permission in AndroidManifest and Info.plist. Choose an audio container and encoding that work with Speech-to-Text. Google accepts base64-encoded Linear16 PCM or WAV. For mobile, recording to WAV at 16 kHz or resampling to 16 kHz Linear16 is a pragmatic choice.
Example recording (using record package):
import 'package:record/record.dart';
final recorder = Record();
await recorder.start(path: 'audio.wav', encoder: AudioEncoder.wav);
// ...later
final path = await recorder.stop();This produces a WAV file you can upload. If you need raw Linear16, either record raw PCM or convert on the server.
Recording And Streaming Audio
Two common flows: synchronous file recognition for short utterances and streaming recognition for live dictation.
Synchronous (best for <1 minute): read the WAV file bytes, base64-encode, then request Google Cloud's speech:recognize endpoint (REST). For larger files use longrunningrecognize.
Streaming (best for real-time UI): implement streaming on the server side using gRPC to Google and a websocket between the Flutter app and the server. Streaming directly from mobile to Google via gRPC is possible but complex and not recommended because of credential handling.
Flutter client: send the recorded file as multipart/form-data or bytes to your server. Example HTTP upload snippet:
import 'package:http/http.dart' as http;
final uri = Uri.parse('https://your-server.example.com/transcribe');
final request = http.MultipartRequest('POST', uri);
request.files.add(await http.MultipartFile.fromPath('file', pathToWav));
final res = await request.send();
final body = await res.stream.bytesToString();On the server, convert or resample if needed and call the Speech-to-Text API, passing audio content encoded as base64 in the JSON request body for speech:recognize, or use a long-running / streaming method.
Handling Transcriptions And UI
Design the Flutter UI to display partial results during streaming and final results when available. For synchronous requests, show a spinner and then render the returned transcript. For streaming, send partial transcripts from the server over websockets or SSE and update the UI incrementally.
Parse the Google response's results[] and alternatives[] fields; pick the highest confidence alternative and show timestamps if returned. Handle edge cases: silence, network errors, and API quota errors. Implement retry logic and exponential backoff.
Performance tips for mobile development: compress audio moderately, avoid sending unnecessary channels (mono is fine), and resample to 16 kHz/16-bit if your app targets conversational voice. Throttle updates to the UI to avoid jank; use isolates for heavy audio processing.
Security and privacy: inform users and follow platform rules for microphone access. Store or forward audio only as necessary, and allow users to opt out of cloud transcription.
Vibe Studio

Vibe Studio, powered by Steve’s advanced AI agents, is a revolutionary no-code, conversational platform that empowers users to quickly and efficiently create full-stack Flutter applications integrated seamlessly with Firebase backend services. Ideal for solo founders, startups, and agile engineering teams, Vibe Studio allows users to visually manage and deploy Flutter apps, greatly accelerating the development process. The intuitive conversational interface simplifies complex development tasks, making app creation accessible even for non-coders.
Conclusion
Integrating Google Cloud Speech-to-Text into a Flutter mobile app is straightforward when you separate concerns: keep credentials on a server, record and preformat audio in the client, and let the server handle API communication (synchronous, long-running, or streaming). Use the record and http packages to capture and send audio, update the UI with partial/final transcripts, and optimize encoding and resampling for accuracy and performance. This approach speeds development while maintaining security and scalability for production mobile development.
Build Flutter Apps Faster with Vibe Studio
Build Flutter Apps Faster with Vibe Studio
Build Flutter Apps Faster with Vibe Studio
Build Flutter Apps Faster with Vibe Studio
Vibe Studio is your AI-powered Flutter development companion. Skip boilerplate, build in real-time, and deploy without hassle. Start creating apps at lightning speed with zero setup.
Vibe Studio is your AI-powered Flutter development companion. Skip boilerplate, build in real-time, and deploy without hassle. Start creating apps at lightning speed with zero setup.
Vibe Studio is your AI-powered Flutter development companion. Skip boilerplate, build in real-time, and deploy without hassle. Start creating apps at lightning speed with zero setup.
Vibe Studio is your AI-powered Flutter development companion. Skip boilerplate, build in real-time, and deploy without hassle. Start creating apps at lightning speed with zero setup.






















