TY - JOUR
T1 - Neural Prompt Search
AU - Zhang, Yuanhan
AU - Zhou, Kaiyang
AU - Liu, Ziwei
N1 - This study is supported in part by the Ministry of Education, Singapore, under its MOE AcRF Tier 2 (MOET2EP20221- 0012), NTU NAP, and the RIE2020 Industry Alignment Fund – Industry Collaboration Projects (IAF-ICP) Funding Initiative. Additional support includes cash and in-kind contributions from industry partners. Furthermore, this study is also supported by the Hong Kong Research Grants Council Early Career Scheme (No. 22200824).
Publisher Copyright:
IEEE
PY - 2024/7/30
Y1 - 2024/7/30
N2 - The size of vision models has grown exponentially over the last few years, especially after the emergence of Vision Transformer. This has motivated the development of parameter-efficient tuning methods, such as learning adapter layers or visual prompt tokens, which allow a tiny portion of model parameters to be trained whereas the vast majority obtained from pre-training are frozen. However, designing a proper tuning method is non-trivial: one might need to try out a lengthy list of design choices, not to mention that each downstream dataset often requires custom designs. In this paper, we view the existing parameter-efficient tuning methods as “prompt modules” and propose Neural prOmpt seArcH (NOAH), a novel approach that learns, for large vision models, the optimal design of prompt modules through a neural architecture search algorithm, specifically for each downstream dataset. By conducting extensive experiments on over 20 vision datasets, we demonstrate that NOAH (i) is superior to individual prompt modules, (ii) has good few-shot learning ability, and (iii) is domain-generalizable. The code and models are available at https://github.com/ZhangYuanhan-AI/NOAH.
AB - The size of vision models has grown exponentially over the last few years, especially after the emergence of Vision Transformer. This has motivated the development of parameter-efficient tuning methods, such as learning adapter layers or visual prompt tokens, which allow a tiny portion of model parameters to be trained whereas the vast majority obtained from pre-training are frozen. However, designing a proper tuning method is non-trivial: one might need to try out a lengthy list of design choices, not to mention that each downstream dataset often requires custom designs. In this paper, we view the existing parameter-efficient tuning methods as “prompt modules” and propose Neural prOmpt seArcH (NOAH), a novel approach that learns, for large vision models, the optimal design of prompt modules through a neural architecture search algorithm, specifically for each downstream dataset. By conducting extensive experiments on over 20 vision datasets, we demonstrate that NOAH (i) is superior to individual prompt modules, (ii) has good few-shot learning ability, and (iii) is domain-generalizable. The code and models are available at https://github.com/ZhangYuanhan-AI/NOAH.
KW - Parameter-efficient Tuning
KW - Transfer learning
UR - http://www.scopus.com/inward/record.url?scp=85200217081&partnerID=8YFLogxK
U2 - 10.1109/TPAMI.2024.3435939
DO - 10.1109/TPAMI.2024.3435939
M3 - Journal article
AN - SCOPUS:85200217081
SN - 0162-8828
SP - 1
EP - 14
JO - IEEE Transactions on Pattern Analysis and Machine Intelligence
JF - IEEE Transactions on Pattern Analysis and Machine Intelligence
ER -