This AI Paper Introduces EdgeSAM: Advancing Machine Learning for High-Speed, Efficient Image Segmentation on Edge Devices

The Segment Anything Model (SAM) is an AI-powered model that segments images for object detection and recognition. It is an effective solution for various computer vision tasks. However, SAM is not optimized for edge devices, which can lead to retarded performance and high resource consumption. Researchers from S-Lab Nanyang Technological University and Shanghai Artificial Intelligence Laboratory developed EdgeSAM to address this issue. This optimized version of SAM is designed to ensure enhanced performance without sacrificing accuracy on resource-constrained edge devices.

The study focuses on designing efficient CNNs and transformers for visual representation learning, a direction explored in prior research. It recognizes the application of knowledge distillation in dense prediction tasks like semantic segmentation and object detection from previous studies. Related works include Mobile-SAM, implementing pixel-wise feature distillation, and Fast-SAM, training a YOLACT-based instance segmentation model. It highlights prior studies addressing efficient segmentation within specific domains and recent efforts exploring segmentation models suitable for on-device implementation on mobile platforms.

The research tackles the challenge of deploying the computationally demanding SAM on edge devices, like smartphones, for real-time interactive segmentation. Introducing EdgeSAM, an optimized SAM variant, achieves real-time operation on edge devices while maintaining accuracy. EdgeSAM utilizes a prompt-aware knowledge distillation approach aligning with SAM’s output masks and introduces tailored prompts for the mask decoder. With a purely CNN-based backbone suitable for on-device AI accelerators, EdgeSAM outperforms Mobile-SAM, achieving a significant speed increase over the original SAM for real-time edge deployment.

EdgeSAM is tailored for efficient execution on edge devices without significant performance compromise. EdgeSAM distills the original ViT-based SAM image encoder into a CNN-based architecture suitable for edge devices. To capture SAM’s knowledge fully, the research incorporates prompt encoder and mask decoder distillation with box and point prompts in the loop. A lightweight module is added to address dataset bias issues. Evaluation includes investigations into prompt-in-the-loop knowledge distillation and the impact of a lightweight Region Proposal Network with granularity priors through ablation studies.

EdgeSAM achieves a remarkable 40-fold speed increase compared to the original SAM, surpassing Mobile-SAM 14 times when deployed on edge devices. It outperforms Mobile-SAM consistently across diverse prompt combinations and datasets, showcasing its efficacy for real-world applications. EdgeSAM, optimized for edge deployment, is over 40 times faster on NVIDIA 2080 Ti and around 14 times faster on an iPhone 14 compared to SAM and MobileSAM, respectively. The introduced prompt-in-the-loop knowledge distillation and lightweight Region Proposal Network significantly enhance performance.

In conclusion, the key highlights from the research can be posed in a few points below:

EdgeSAM is an optimized variant of SAM.
It is designed to be deployed on edge devices like smartphones in real time.
Compared to the original SAM, EdgeSAM is 40 times faster.
It outperforms Mobile-SAM by 14 times on edge devices.
It significantly improves the mIoUs on COCO and LVIS datasets.
EdgeSAM integrates a dynamic prompt-in-the-loop strategy and a lightweight module to address dataset bias.
The study explores various training configurations, prompt types, and freezing approaches.
A lightweight Region Proposal Network is also introduced, leveraging granularity priors.

Check out the Paper and Project. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 34k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

Sana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.

🐝 [FREE AI WEBINAR] ‘Building Multimodal Apps with LlamaIndex – Chat with Text + Image Data’ Dec 18, 2023 10 am PST

Source link

This AI Paper Introduces EdgeSAM: Advancing Machine Learning for High-Speed, Efficient Image Segmentation on Edge Devices

You May Also Like

Meet CoLLaVO: KAIST’s AI Breakthrough in Vision Language Models Enhancing Object-Level Image Understanding

UC Berkeley and UCSF Researchers Propose Cross-Attention Masked Autoencoders (CrossMAE): A Leap in Efficient Visual Data Processing