Running YOLO26 Natively on Apple Silicon with MLX

March 16, 2026

webAI is open-sourcing YOLO26-MLX, a native MLX implementation of the YOLO26 object detection models that enables faster training and inference on Apple Silicon without a PyTorch runtime.

Running modern computer-vision models locally has historically required heavyweight machine-learning frameworks and specialized infrastructure. While YOLO models are widely used for real-time object detection, most implementations rely on PyTorch and environments optimized for Linux-based ML workflows.

Today we’re open-sourcing YOLO26-MLX, a native implementation of the YOLO26 object detection models built on Apple’s MLX framework and optimized for Apple Silicon.

MLX is Apple’s open-source machine-learning framework for Apple Silicon. With YOLO26-MLX, developers can train and run real-time object detection models directly on Mac hardware without relying on a full PyTorch runtime.

The repository includes a complete training and inference pipeline implemented entirely in MLX, along with tools for benchmarking, validation, and model conversion.

Speed: In internal benchmarks on Apple Silicon, the MLX implementation delivers noticeable performance gains compared to PyTorch running with the MPS backend — with some YOLO26 variants achieving more than 2× faster inference and significantly faster training.

Accuracy: In validation tests on the COCO val2017 dataset, the MLX implementation closely matches official YOLO26 results with most models within ~0.2% and a maximum deviation of 0.5% while delivering high-performance inference on Apple Silicon devices.

This implementation builds on infrastructure webAI has used internally for years to run computer-vision workloads natively on Apple Silicon. It has powered object detection inside our products since the YOLOv8 generation. By open-sourcing it, we hope to make it easier for developers to experiment with high-performance vision models directly on their Macs.

What is YOLO26?

YOLO (“You Only Look Once”) is one of the most widely used architectures for real-time object detection. The latest generation, YOLO26, was designed specifically for edge and low-power environments, simplifying deployment and improving performance across a variety of hardware targets.

Unlike earlier detection pipelines that relied on additional post-processing steps like non-maximum suppression (NMS), YOLO26 is designed as an end-to-end detection model that produces predictions directly. This simplifies deployment and reduces latency in real-world systems.

The model family supports multiple variants, ranging from lightweight models suitable for embedded devices to larger architectures designed for higher-accuracy workloads.

Why MLX and Apple Silicon?

For many developers, running serious machine-learning workloads locally has historically meant switching to Linux or using GPU infrastructure. MLX changes that equation.

MLX was designed from the ground up for Apple Silicon, allowing machine-learning workloads to efficiently utilize both CPU and GPU resources through Apple’s unified memory architecture. This makes it possible to run modern machine-learning workloads directly on Mac hardware without the infrastructure typically associated with large ML frameworks.

By implementing YOLO26 directly in MLX, YOLO26-MLX enables developers to run and train computer-vision models natively on Apple hardware without needing a PyTorch runtime or external GPU infrastructure. Unlike wrappers that adapt existing frameworks, YOLO26-MLX is a ground-up implementation of the model architecture written directly for MLX.

This opens the door to a range of workflows that are particularly useful for experimentation, edge AI development, and local model iteration.

Validation Results

To validate the implementation, we evaluated YOLO26-MLX on the COCO val2017 dataset (5,000 images) and benchmarked training and inference performance on Apple Silicon. The results show that the MLX implementation closely matches the accuracy of the official Ultralytics models while delivering strong performance on Apple Silicon devices.

Model	MLX mAP50-95	Official mAP50-95	Gap	MLX FPS
YOLO26n	40.2%	40.1%	+0.1%	124.9
YOLO26s	47.6%	47.8%	−0.2%	57.7
YOLO26m	52.3%	52.5%	−0.2%	25.3
YOLO26l	53.9%	54.4%	−0.5%	19.8
YOLO26x	56.7%	56.9%	−0.2%	10.7

These results demonstrate that the MLX implementation maintains accuracy parity with the official YOLO26 models while running efficiently on Apple Silicon hardware.

In addition to matching the accuracy of the official YOLO26 models, the MLX implementation delivers meaningful performance improvements on Apple Silicon. In internal benchmarks on an Apple M4 Pro, YOLO26-MLX achieved 1.1x–2.6x faster inference compared to PyTorch running with the MPS backend, and up to ~1.7x faster training depending on the model variant. These gains come from running the model natively in MLX and leveraging Metal GPU acceleration on Apple Silicon.

YOLO26-MLX speedup vs PyTorch MPS on Apple Silicon (M4 Pro)

To provide additional context, the charts below show inference throughput and training time across MLX, PyTorch MPS, and CPU backends on an Apple M4 Pro device.

Training time (seconds) for YOLO26 models on Apple Silicon (M4 Pro), comparing MLX, PyTorch (MPS), and CPU.

Inference throughput (FPS) for YOLO26 models on Apple Silicon (M4 Pro), comparing MLX, PyTorch (MPS), and CPU.

What’s Included in the Repository

The repo has everything you need to go from model to production. The YOLO26-MLX release includes tools for training, inference, validation, and benchmarking using MLX.

Feature	YOLO26-MLX
Runtime framework	MLX
Hardware target	Apple Silicon (M-series chips)
Training support	Yes
Inference support	Yes
PyTorch dependency	None at runtime
Dataset validation	COCO val2017
Weight conversion support	PyTorch → MLX

The repository also includes benchmarking tools, validation scripts, and utilities for converting existing YOLO weights into MLX format.

Try It

Getting started with YOLO26-MLX takes only a few steps:

Clone the repository
Install dependencies
Convert YOLO weights to MLX format
Run inference or training directly on your Mac

The project supports the full YOLO26 model family, including the n, s, m, l, and x variants, allowing developers to experiment with models ranging from lightweight edge deployments to larger high-accuracy configurations.More broadly, this release reflects how we think about AI at webAI: models should run where the work happens, on hardware you control. YOLO26-MLX is another step toward making high-performance computer vision easier to build and run locally.

You can explore the repository and get started here: