SHA: sparse adaptive head of attention

Haoyang Yuan

doi:10.1117/12.3053346

3 January 2025 SHA: sparse adaptive head of attention

Haoyang Yuan

Author Affiliations +

Proceedings Volume 13442, Fifth International Conference on Signal Processing and Computer Science (SPCS 2024); 134421M (2025) https://doi.org/10.1117/12.3053346
Event: Fifth International Conference on Signal Processing and Computer Science (SPCS 2024), 2024, Kaifeng, China

Abstract

In the trend of deepening and broadening large language models, the limitations posed by key-value (KV) caches on LLM inference have become increasingly prominent. This study elucidates how different input samples induce variations in the correlation between attention heads during the computation of transformer attention mechanisms. By quantifying this context-dependent correlation and dynamically assigning weights to each attention head accordingly, we propose SHA. SHA identifies and prunes heads that contribute minimally to overall performance, accelerating the inference process with minimal loss of performance. Experimental results demonstrate that on the Llama-7B, we successfully remove 30% of the attention heads, reducing KV cache memory requirements by 24.2%, and achieve a throughput improvement of up to 2.04 times.

(2025) Published by SPIE. Downloading of the abstract is permitted for personal use only.

Citation Download Citation

Haoyang Yuan "SHA: sparse adaptive head of attention", Proc. SPIE 13442, Fifth International Conference on Signal Processing and Computer Science (SPCS 2024), 134421M (3 January 2025); https://doi.org/10.1117/12.3053346

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

;

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE