Not logged in.

Quick Search - Contribution

Contribution Details

Type	Conference or Workshop Paper
Scope	Discipline-based scholarship
Published in Proceedings	Yes
Title	Revisiting Token Pruning for Object Detection and Instance Segmentation
Organization Unit	Department of Informatics (Burkhard Stiller)
Authors	Yifei Liu Mathias Gehrig Nico Messikommer Marco Cannici Davide Scaramuzza
Presentation Type	paper
Item Subtype	Original Work
Refereed	Yes
Status	Published in final form
Language	English
Page Range	2658 - 2668
Event Title	IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
Event Type	conference
Event Location	Waikoloa, Hawaii
Event Start Date	January 4 - 2024
Event End Date	January 8 - 2024
Abstract Text	Vision Transformers (ViTs) have shown impressive performance in computer vision, but their high computational cost, quadratic in the number of tokens, limits their adoption in computation-constrained applications. However, this large number of tokens may not be necessary, as not all tokens are equally important. In this paper, we investigate token pruning to accelerate inference for object detection and instance segmentation, extending prior works from image classification. Through extensive experiments, we offer four insights for dense tasks: (i) tokens should not be completely pruned and discarded, but rather preserved in the feature maps for later use. (ii) reactivating previously pruned tokens can further enhance model performance. (iii) a dynamic pruning rate based on images is better than a fixed pruning rate. (iv) a lightweight, 2-layer MLP can effectively prune tokens, achieving accuracy comparable with complex gating networks with a simpler design. We evaluate the impact of these design choices on COCO dataset and present a method integrating these insights that outperforms prior art token pruning models, significantly reducing performance drop from ~1.5 mAP to ~0.3 mAP for both boxes and masks. Compared to the dense counterpart that uses all tokens, our method achieves up to 34% faster inference speed for the whole network and 46% for the backbone.
Official URL	https://openaccess.thecvf.com/content/WACV2024/papers/Liu_Revisiting_Token_Pruning_for_Object_Detection_and_Instance_Segmentation_WACV_2024_paper.pdf
PDF File	Download from ZORA
Export	BibTeX EP3 XML (ZORA)