YOLO26x (ExecuTorch, XNNPACK, INT8)

This folder contains an ExecuTorch .pte export of ultralytics/yolo26x for CPU inference via the XNNPACK backend.

Contents

  • yolo26x_xnnpack_q8.pte: ExecuTorch program (56.80 MB)

Model Details

  • Task: Object detection
  • Architecture: YOLO26x (YOLO26 family)
  • Parameters: 98.9M
  • Input shape: (1, 3, 640, 640) - NCHW format, float32
  • Output: Detection results with bounding boxes, classes, and confidence scores

Export Details

  • Backend: XNNPACK (CPU)
  • Precision: INT8
  • Quantization scheme: Static INT8 symmetric quantization (per-tensor)

Input/Output Format

Input

  • Shape: (1, 3, 640, 640)
  • Format: NCHW (batch, channels, height, width)
  • Dtype: float32
  • Range: Typically [-1.0, 1.0] or [0.0, 1.0] after preprocessing

Output

The model returns a tuple containing:

  1. Detection tensor: Shape (1, 300, 6) - top 300 detections with [x1, y1, x2, y2, conf, cls]
  2. Detection outputs for multiple anchor points (8400 anchors)

Running the Model

Python (using ExecuTorch runtime)

import torch
from executorch.runtime import Runtime

# Load the .pte file
with open("yolo26x_xnnpack_q8.pte", "rb") as f:
    pte_buffer = f.read()

# Create runtime and load method
runtime = Runtime.get()
program = runtime.load_program(pte_buffer)
method = program.load_method("forward")

# Prepare input (NCHW float32)
input_tensor = torch.randn(1, 3, 640, 640)

# Run inference
outputs = method.execute([input_tensor])

Preprocessing for Real Images

import cv2
import numpy as np
import torch

def preprocess_image(image_path, input_size=(640, 640)):
    # Load image
    img = cv2.imread(image_path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

    # Resize with padding (letterbox)
    h, w = img.shape[:2]
    scale = min(input_size[0] / h, input_size[1] / w)
    new_h, new_w = int(h * scale), int(w * scale)

    resized = cv2.resize(img, (new_w, new_h))
    padded = np.full((input_size[0], input_size[1], 3), 114, dtype=np.uint8)
    pad_h = (input_size[0] - new_h) // 2
    pad_w = (input_size[1] - new_w) // 2
    padded[pad_h:pad_h+new_h, pad_w:pad_w+new_w] = resized

    # Convert to tensor and normalize
    tensor = torch.from_numpy(padded).permute(2, 0, 1).float() / 255.0
    tensor = tensor.unsqueeze(0)  # Add batch dimension

    return tensor

input_tensor = preprocess_image("image.jpg")

Postprocessing Detections

import numpy as np

def postprocess_detections(outputs, conf_threshold=0.25, iou_threshold=0.45):
    # Convert model outputs to bounding boxes
    # Args: outputs (Model output tuple), conf_threshold, iou_threshold
    # Returns: List of detections with [x1, y1, x2, y2, confidence, class_id]

    # The first output contains the main detections
    detections = outputs[0][0]  # (300, 6)

    # Filter by confidence
    mask = detections[:, 4] > conf_threshold
    filtered = detections[mask]

    # Apply NMS (simplified - use cv2.dnn.NMSBoxes for full implementation)
    return filtered.cpu().numpy()

Performance

  • FP32 model size: ~10 MB
  • INT8 model size: ~3 MB
  • Speed: XNNPACK provides optimized CPU inference with SIMD acceleration

Notes

  • This export uses static input shape (1, 3, 640, 640). For dynamic shapes, re-export with different dimensions.
  • The INT8 model uses static symmetric quantization. Some accuracy tradeoff may occur.
  • XNNPACK backend is optimized for ARM and x86 CPUs with NEON/SSE/AVX support.

Original Model

  • Source: Ultralytics YOLO26
  • License: AGPL-3.0
  • Paper: YOLO26 is part of the YOLO family of object detection models

Export Tooling

Generated with ExecuTorch export tooling.

Troubleshooting

Low confidence / incorrect outputs with non-contiguous inputs

If your outputs look wrong (for object-detection models this can show up as all confidences capped around ~0.20 / 20% and no detections), ensure the input tensor passed to ExecuTorch is contiguous.

Example:

import torch

# img_hwc: float32 HWC image (e.g. RGB) in [0, 1]
x = torch.from_numpy(img_hwc).permute(2, 0, 1).unsqueeze(0)  # NCHW (often non-contiguous)
x = x.contiguous()  # IMPORTANT

outputs = method.execute([x])

Detection symptom example (before fix):

Confidence range: [0.0004, 0.2012]
Detections: 0

After fix (.contiguous()):

Confidence range: [0.0001, 0.9589]
Detections: 12
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support