Try Vibe Studio

Running TensorFlow Lite Models on‑Device in Flutter

Jun 23, 2025

Flutter

Engineering

Vibe Studio

Summary

This guide explores advanced mobile ML with Flutter TFLite, covering plugin setup, input preprocessing, model loading with GPU/NNAPI delegates, threaded inference, and robust post-processing. It equips developers to build high-performance, offline-capable Flutter apps with real-time AI features.

Key insights:

Efficient Setup: Use tflite_flutter with proper platform configs for delegate support.
Preprocessing Matters: Input normalization is crucial for model accuracy.
Delegate Optimization: GPU and NNAPI delegates boost inference speed on supported devices.
Threaded Execution: Isolate-specific interpreters avoid concurrency issues.
Dynamic Model Handling: Models can be hot-swapped during runtime for updates.
Scalable Deployment: Patterns support real-time AI features across Flutter apps.

Introduction

Running machine-learning inference directly on mobile devices offers low latency, offline capability, and improved privacy. Flutter TFLite brings TensorFlow Lite’s lightweight models to Flutter, enabling powerful on-device inference. In this advanced tutorial, you’ll learn how to configure the TFLite plugin for Flutter, preprocess inputs, load and optimize a model with delegates and multi-threading, and handle post-processing and edge cases—all without fluff or superficial steps.

Configuring the TFLite Plugin and Model Assets

First, add the TFLite plugin for Flutter to your pubspec.yaml. We’ll use tflite_flutter, a robust TFLite plugin for Flutter with support for Android NNAPI and GPU delegates.

dependencies:
  flutter:
    sdk: flutter
  tflite_flutter: ^0.10.0
  tflite_flutter_helper

Place your .tflite model (e.g., mobilenet_v1_1.0_224.tflite) in assets/models and update:

flutter:
  assets

On Android, ensure your minSdkVersion is at least 21 to leverage NNAPI. For iOS, include use_frameworks! in your Podfile if needed.

Preprocessing and Advanced Model Loading

Preprocessing is critical for accuracy. Most image models require a fixed-size, normalized float32 tensor. Use tflite_flutter_helper to convert ui.Image or Image.file into the required format.

import 'package:tflite_flutter/tflite_flutter.dart';

Future<Interpreter> loadInterpreter() async {
  final options = InterpreterOptions()
    ..threads = 4                    // Use 4 CPU threads
    ..useNnApiForAndroid = true;     // Enable Android NNAPI delegate
  return await Interpreter.fromAsset(
    'mobilenet_v1_1.0_224.tflite',
    options: options,
  )..allocateTensors();
}

Convert your incoming image to a normalized byte buffer:

import 'package:tflite_flutter_helper/tflite_flutter_helper.dart';

TensorImage preprocessImage(ui.Image srcImage) {
  final inputShape = interpreter.getInputTensor(0).shape; // e.g. [1,224,224,3]
  final processor = ImageProcessorBuilder()
    .add(ResizeOp(inputShape[1], inputShape[2], ResizeMethod.BILINEAR))
    .add(NormalizeOp(127.5, 127.5))
    .build();
  return processor.process(TensorImage.fromImage(srcImage));
}

Running Inference with Delegates and Threading

With the interpreter and preprocessed input ready, execute inference. If you need GPU acceleration on supported devices, swap in a GPU delegate:

var gpuDelegate = GpuDelegate();
var options = InterpreterOptions()
  ..addDelegate(gpuDelegate)
  ..threads = 2; // Fewer threads recommended with GPU
var interpreterGPU = await Interpreter.fromAsset(
  'mobilenet_v1_1.0_224.tflite', options: options);

Run inference:

// Assume interpreter (CPU or GPU) is initialized
var inputTensor = preprocessImage(image).buffer;
var output = List.generate(1, (_) => List.filled(1001, 0.0));

interpreter.run(inputTensor, output);

// Post-process: find the highest probability index
double maxProb = output[0].reduce((a, b) => a > b ? a : b);
int maxIndex = output[0].indexOf(maxProb);
print('Predicted label: $maxIndex with prob $maxProb');

Post-Processing and Edge Cases

Output tensors from quantized models may use uint8 or int8. Always read tensor dtype:

var outputTensor = interpreter.getOutputTensor(0);
if (outputTensor.type == TfLiteType.uint8) {
  // Dequantize manually: value * scale + zeroPoint
}

Handle scenarios like:

Memory constraints: Close the interpreter after use: interpreter.close().
Concurrency: Avoid sharing one interpreter across isolates; create one per isolate.
Model updates: Hot-swap models by disposing and reloading interpreters.

Vibe Studio

Vibe Studio, powered by Steve’s advanced AI agents, is a revolutionary no-code, conversational platform that empowers users to quickly and efficiently create full-stack Flutter applications integrated seamlessly with Firebase backend services. Ideal for solo founders, startups, and agile engineering teams, Vibe Studio allows users to visually manage and deploy Flutter apps, greatly accelerating the development process. The intuitive conversational interface simplifies complex development tasks, making app creation accessible even for non-coders.

Conclusion

This tutorial covered advanced techniques for on-device Flutter TFLite inference: setting up the TFLite plugin for Flutter, precise preprocessing, leveraging CPU/GPU delegates, threading, and robust post-processing. You now have a scalable pattern for integrating TensorFlow Lite in Flutter, whether you’re building real-time vision, speech recognition, or custom ML solutions.

Introduction

Configuring the TFLite Plugin and Model Assets

First, add the TFLite plugin for Flutter to your pubspec.yaml. We’ll use tflite_flutter, a robust TFLite plugin for Flutter with support for Android NNAPI and GPU delegates.

dependencies:
  flutter:
    sdk: flutter
  tflite_flutter: ^0.10.0
  tflite_flutter_helper

Place your .tflite model (e.g., mobilenet_v1_1.0_224.tflite) in assets/models and update:

flutter:
  assets

On Android, ensure your minSdkVersion is at least 21 to leverage NNAPI. For iOS, include use_frameworks! in your Podfile if needed.

Preprocessing and Advanced Model Loading

Preprocessing is critical for accuracy. Most image models require a fixed-size, normalized float32 tensor. Use tflite_flutter_helper to convert ui.Image or Image.file into the required format.

import 'package:tflite_flutter/tflite_flutter.dart';

Future<Interpreter> loadInterpreter() async {
  final options = InterpreterOptions()
    ..threads = 4                    // Use 4 CPU threads
    ..useNnApiForAndroid = true;     // Enable Android NNAPI delegate
  return await Interpreter.fromAsset(
    'mobilenet_v1_1.0_224.tflite',
    options: options,
  )..allocateTensors();
}

Convert your incoming image to a normalized byte buffer:

import 'package:tflite_flutter_helper/tflite_flutter_helper.dart';

TensorImage preprocessImage(ui.Image srcImage) {
  final inputShape = interpreter.getInputTensor(0).shape; // e.g. [1,224,224,3]
  final processor = ImageProcessorBuilder()
    .add(ResizeOp(inputShape[1], inputShape[2], ResizeMethod.BILINEAR))
    .add(NormalizeOp(127.5, 127.5))
    .build();
  return processor.process(TensorImage.fromImage(srcImage));
}

Running Inference with Delegates and Threading

With the interpreter and preprocessed input ready, execute inference. If you need GPU acceleration on supported devices, swap in a GPU delegate:

var gpuDelegate = GpuDelegate();
var options = InterpreterOptions()
  ..addDelegate(gpuDelegate)
  ..threads = 2; // Fewer threads recommended with GPU
var interpreterGPU = await Interpreter.fromAsset(
  'mobilenet_v1_1.0_224.tflite', options: options);

Run inference:

// Assume interpreter (CPU or GPU) is initialized
var inputTensor = preprocessImage(image).buffer;
var output = List.generate(1, (_) => List.filled(1001, 0.0));

interpreter.run(inputTensor, output);

// Post-process: find the highest probability index
double maxProb = output[0].reduce((a, b) => a > b ? a : b);
int maxIndex = output[0].indexOf(maxProb);
print('Predicted label: $maxIndex with prob $maxProb');

Post-Processing and Edge Cases

Output tensors from quantized models may use uint8 or int8. Always read tensor dtype:

var outputTensor = interpreter.getOutputTensor(0);
if (outputTensor.type == TfLiteType.uint8) {
  // Dequantize manually: value * scale + zeroPoint
}

Handle scenarios like:

Memory constraints: Close the interpreter after use: interpreter.close().
Concurrency: Avoid sharing one interpreter across isolates; create one per isolate.
Model updates: Hot-swap models by disposing and reloading interpreters.

Vibe Studio

Conclusion

Power your Flutter app with on-device AI

Leverage Vibe Studio to integrate ML models seamlessly—no need for boilerplate or manual setup.

Explore Vibe Studio

Other Insights

This insight shows how Shorebird enables seamless hot code push in Flutter without restarting the app.

Jun 23, 2025

Hot Restartless Code Push with Shorebird in Flutter

Flutter

Vibe Studio

Engineering

Jun 23, 2025

Hot Restartless Code Push with Shorebird in Flutter

Flutter

Vibe Studio

Engineering

This insight shows how Shorebird enables seamless hot code push in Flutter without restarting the app.

Jun 23, 2025

Hot Restartless Code Push with Shorebird in Flutter

Flutter

Vibe Studio

Engineering

Jun 23, 2025

Hot Restartless Code Push with Shorebird in Flutter

Flutter

Vibe Studio

Engineering

This insight shows how Shorebird enables seamless hot code push in Flutter without restarting the app.

Jun 23, 2025

Hot Restartless Code Push with Shorebird in Flutter

Flutter

Vibe Studio

Engineering

Jun 23, 2025

Hot Restartless Code Push with Shorebird in Flutter

Flutter

Vibe Studio

Engineering

This insight outlines a complete, repeatable workflow for deploying Flutter apps to TestFlight and Google Play.