PAPER_TITLE

FIRST_AUTHOR_LAST, FIRST_AUTHOR_FIRST; SECOND_AUTHOR_LAST, SECOND_AUTHOR_FIRST

SAM2Matting
Generalized Image and Video Matting

Ruiqi Shen¹, Guangquan Jie¹, Chang Liu², Henghui Ding¹

¹Fudan University ²Shanghai University of Finance and Economics
ECCV 2026

arXiv Code

Models

Drag the slider to compare.

Input

Mask ↔ Matte

Mask Matte

Abstract

Despite impressive advances in image matting, video matting remains challenging due to the gap between high-level tracking, which requires frame-wise understanding, and low-level matting, which focuses on extremely fine-grained details. Existing methods attempt this with expensive and narrowly-scoped video matting datasets, which may limit out-of-domain generalization and compromise tracking robustness. We rethink with SAM2Matting, a tracker-to-matting framework that advances VOS trackers to (hence 2) high-fidelity alpha matting. Specifically, it decouples the task by enhancing such foundational tracker (e.g., SAM2, SAM3) with a region-proposal bridge and dedicated matting heads, enabling the uncompromised tracker to handle temporal consistency while the matting components dedicated to resolving fine-grained details. Notably, despite being trained only on images, SAM2Matting establishes new state-of-the-art video matting performance, supports diverse prompt types, maintains strong temporal consistency, and exhibits robust generalization across both human-centric and in-the-wild scenarios.

Architecture

SAM2Matting is a generalized matting framework that decouples high-level tracking from dedicated low-level matting components. Specifically, a VOS tracker provides a temporally-consistent target mask for each frame. Given the mask and multi-scale image features, an ROI Detector identifies instance-specific matting-critical regions with fine-grained details or semi-transparency. A Progressive Alpha Predictor then iteratively produces and refines the matte through a coarse-to-fine cascade, with intermediate mattes supervised at each scale to progressively capture finer details.

Quantitative Results

We provide three SAM2Matting variants based on different trackers of SAM2.1-T, SAM2.1-B+, and SAM3. The best, second-best, and third-best results are highlighted with red, orange, and yellow backgrounds, respectively. SAM2Matting achieves state-of-the-art performance on both image and video matting benchmarks, with its video matting performance evaluated in a zero-shot manner.

SAM2Matting video matting quantitative results

Qualitative Results

Qualitative comparison on fast motion — SAM2Matting stably tracks challenging targets and recovers intricate details in in-the-wild and rapid-motion scenarios where baselines fail.

Interactive Demo

SAM2Matting supports diverse prompt types and enables robust matting of any open-world target throughout video sequence.

BibTeX

@inproceedings{SAM2Matting,
  title={{SAM2Matting}: Generalized Image and Video Matting},
  author={Shen, Ruiqi and Jie, Guangquan and Liu, Chang and Ding, Henghui},
  booktitle={European Conference on Computer Vision (ECCV)},
  year={2026}
}

SAM2Matting Generalized Image and Video Matting