Performance Regression Testing with Codemagic CI
Oct 20, 2025



Summary
Summary
Summary
Summary
This tutorial explains how to add performance regression testing to Codemagic CI for Flutter mobile development: set up a stable pipeline, write deterministic benchmarks, emit machine-readable results, compare against baselines, and automate alerts and baseline updates to fail builds on regressions.
This tutorial explains how to add performance regression testing to Codemagic CI for Flutter mobile development: set up a stable pipeline, write deterministic benchmarks, emit machine-readable results, compare against baselines, and automate alerts and baseline updates to fail builds on regressions.
This tutorial explains how to add performance regression testing to Codemagic CI for Flutter mobile development: set up a stable pipeline, write deterministic benchmarks, emit machine-readable results, compare against baselines, and automate alerts and baseline updates to fail builds on regressions.
This tutorial explains how to add performance regression testing to Codemagic CI for Flutter mobile development: set up a stable pipeline, write deterministic benchmarks, emit machine-readable results, compare against baselines, and automate alerts and baseline updates to fail builds on regressions.
Key insights:
Key insights:
Key insights:
Key insights:
Setup Codemagic Pipeline: Pin Flutter versions, use stable builder types, and isolate a dedicated job for performance tests to reduce environment drift.
Create Reliable Performance Tests: Use deterministic inputs, warm-up iterations, and median measurements to reduce noise in microbenchmarks.
Integrate Benchmarks Into CI: Emit JSON artifacts with median timings and fail builds by comparing measured values to stored baselines.
Automate Baselines And Alerts: Update baselines only from trusted builds, comment PRs with results, and notify teams on regressions via webhooks.
Measure Locally Before CI: Validate tests on a pinned local environment to ensure repeatability and reduce false positives in CI.
Introduction
Performance regressions silently damage user experience and retention in Flutter mobile development. Continuous integration that only runs unit and widget tests misses slowdowns introduced by new code, dependencies, or platform changes. This tutorial shows how to add deterministic performance regression testing to a Codemagic CI pipeline, establish baselines, and fail builds when regressions exceed acceptable thresholds.
Setup Codemagic Pipeline
Start with a Codemagic pipeline that builds your app and runs tests. Add a dedicated job for performance tests so they run on a stable machine type and do not interfere with unit-test speed requirements. Use the same SDK version across local runs and CI to avoid jitter from tooling changes.
Key settings:
Use a macOS builder for iOS performance measurements or Linux for Android emulator tests.
Cache the Flutter SDK and pub packages for repeatable builds.
Pin Flutter channel and version in
codemagic.yaml
to reduce environment drift.
Example snippet (codemagic.yaml excerpt):
workflows: performance_tests: name: Performance Tests instance_type: mac_mini environment: vars: FLUTTER_VERSION: "3.10.0" scripts: - name: Install script: flutter --version - name: Run Performance Tests script
Create Reliable Performance Tests
Performance tests must be deterministic and isolated. Avoid external network I/O, rely on synthetic inputs, and warm caches and JIT/AOT where appropriate. For microbenchmarks, use the Dart Stopwatch API or the package:benchmark_harness
for repeatable measurements. Run multiple iterations and compute median or trimmed mean to reduce noise.
Example microbenchmark using Stopwatch:
void measureOperation(Function op, int iterations) { final sw = Stopwatch()..start(); for (var i = 0; i < iterations; i++) op(); sw.stop(); print('Elapsed (ms): ${sw.elapsedMilliseconds / iterations}'); }
Wrap UI-related measurements with WidgetsBinding.instance.ensureVisualUpdate
and pump the necessary frames when using flutter_test
. For integration or end-to-end scenarios, use deterministic gestures and mock platform channels.
Integrate Benchmarks Into CI
Collect numeric results during the CI run and emit a machine-readable file (JSON) so follow-up steps can compare against baselines.
Steps:
Run each benchmark multiple times and write median timings to
perf_results.json
.Use environment variables to tag builds (branch, commit, PR number).
Store the results artifact on Codemagic or push to a small backing store (e.g., S3) for historical comparisons.
A simple test assertion can enforce budgeted timings. Keep thresholds conservative and consider percentage-based thresholds to adapt to hardware variability.
import 'dart:convert'; void assertWithin(String name, double measuredMs, double baselineMs, double allowedPct) { final allowed = baselineMs * (1 + allowedPct / 100); if (measuredMs > allowed) throw Exception('$name regression: $measuredMs ms > $allowed ms'); }
Call this during your CI test run and exit non-zero to fail the pipeline on regressions.
Automate Baselines And Alerts
Manually maintaining baselines is error-prone. Automate baseline updates and alerting:
For main branch builds: if the build is green and the measured values are within an acceptably small delta, optionally update the baseline artifact. Require PR review for baseline changes greater than a small percentage.
For PRs: compare measurements against the current baseline and fail the build if regressions exceed thresholds. Add comments to PRs with a summary table linking to artifacts and CI logs.
Integrate Codemagic notifications or webhooks to Slack, email, or ticketing systems to surface regressions.
Charting historical results helps identify trendlines. Export perf_results.json
to a simple dashboard (Grafana, BigQuery, or a static site) to visualize regressions over time.
Operational tips:
Run performance tests on dedicated builders with consistent hardware to reduce variance.
Use warm-up iterations and discard first-run timings.
Prefer median over mean and compute interquartile ranges to detect flakiness.
Conclusion
Adding performance regression testing to your Codemagic CI pipeline improves confidence in Flutter mobile development by catching slowdowns before they reach users. Implement deterministic microbenchmarks, emit machine-readable results, compare against baselines, and automate alerts. Over time, the dataset you collect will make it easier to spot regressions, attribute root causes, and maintain a fast app.
Introduction
Performance regressions silently damage user experience and retention in Flutter mobile development. Continuous integration that only runs unit and widget tests misses slowdowns introduced by new code, dependencies, or platform changes. This tutorial shows how to add deterministic performance regression testing to a Codemagic CI pipeline, establish baselines, and fail builds when regressions exceed acceptable thresholds.
Setup Codemagic Pipeline
Start with a Codemagic pipeline that builds your app and runs tests. Add a dedicated job for performance tests so they run on a stable machine type and do not interfere with unit-test speed requirements. Use the same SDK version across local runs and CI to avoid jitter from tooling changes.
Key settings:
Use a macOS builder for iOS performance measurements or Linux for Android emulator tests.
Cache the Flutter SDK and pub packages for repeatable builds.
Pin Flutter channel and version in
codemagic.yaml
to reduce environment drift.
Example snippet (codemagic.yaml excerpt):
workflows: performance_tests: name: Performance Tests instance_type: mac_mini environment: vars: FLUTTER_VERSION: "3.10.0" scripts: - name: Install script: flutter --version - name: Run Performance Tests script
Create Reliable Performance Tests
Performance tests must be deterministic and isolated. Avoid external network I/O, rely on synthetic inputs, and warm caches and JIT/AOT where appropriate. For microbenchmarks, use the Dart Stopwatch API or the package:benchmark_harness
for repeatable measurements. Run multiple iterations and compute median or trimmed mean to reduce noise.
Example microbenchmark using Stopwatch:
void measureOperation(Function op, int iterations) { final sw = Stopwatch()..start(); for (var i = 0; i < iterations; i++) op(); sw.stop(); print('Elapsed (ms): ${sw.elapsedMilliseconds / iterations}'); }
Wrap UI-related measurements with WidgetsBinding.instance.ensureVisualUpdate
and pump the necessary frames when using flutter_test
. For integration or end-to-end scenarios, use deterministic gestures and mock platform channels.
Integrate Benchmarks Into CI
Collect numeric results during the CI run and emit a machine-readable file (JSON) so follow-up steps can compare against baselines.
Steps:
Run each benchmark multiple times and write median timings to
perf_results.json
.Use environment variables to tag builds (branch, commit, PR number).
Store the results artifact on Codemagic or push to a small backing store (e.g., S3) for historical comparisons.
A simple test assertion can enforce budgeted timings. Keep thresholds conservative and consider percentage-based thresholds to adapt to hardware variability.
import 'dart:convert'; void assertWithin(String name, double measuredMs, double baselineMs, double allowedPct) { final allowed = baselineMs * (1 + allowedPct / 100); if (measuredMs > allowed) throw Exception('$name regression: $measuredMs ms > $allowed ms'); }
Call this during your CI test run and exit non-zero to fail the pipeline on regressions.
Automate Baselines And Alerts
Manually maintaining baselines is error-prone. Automate baseline updates and alerting:
For main branch builds: if the build is green and the measured values are within an acceptably small delta, optionally update the baseline artifact. Require PR review for baseline changes greater than a small percentage.
For PRs: compare measurements against the current baseline and fail the build if regressions exceed thresholds. Add comments to PRs with a summary table linking to artifacts and CI logs.
Integrate Codemagic notifications or webhooks to Slack, email, or ticketing systems to surface regressions.
Charting historical results helps identify trendlines. Export perf_results.json
to a simple dashboard (Grafana, BigQuery, or a static site) to visualize regressions over time.
Operational tips:
Run performance tests on dedicated builders with consistent hardware to reduce variance.
Use warm-up iterations and discard first-run timings.
Prefer median over mean and compute interquartile ranges to detect flakiness.
Conclusion
Adding performance regression testing to your Codemagic CI pipeline improves confidence in Flutter mobile development by catching slowdowns before they reach users. Implement deterministic microbenchmarks, emit machine-readable results, compare against baselines, and automate alerts. Over time, the dataset you collect will make it easier to spot regressions, attribute root causes, and maintain a fast app.
Build Flutter Apps Faster with Vibe Studio
Build Flutter Apps Faster with Vibe Studio
Build Flutter Apps Faster with Vibe Studio
Build Flutter Apps Faster with Vibe Studio
Vibe Studio is your AI-powered Flutter development companion. Skip boilerplate, build in real-time, and deploy without hassle. Start creating apps at lightning speed with zero setup.
Vibe Studio is your AI-powered Flutter development companion. Skip boilerplate, build in real-time, and deploy without hassle. Start creating apps at lightning speed with zero setup.
Vibe Studio is your AI-powered Flutter development companion. Skip boilerplate, build in real-time, and deploy without hassle. Start creating apps at lightning speed with zero setup.
Vibe Studio is your AI-powered Flutter development companion. Skip boilerplate, build in real-time, and deploy without hassle. Start creating apps at lightning speed with zero setup.











