TY - JOUR
T1 - A pre-trained large generative model for translating single-cell transcriptomes to proteomes
AU - Liu, Linjing
AU - Li, Wei
AU - Wang, Fang
AU - Li, Yiming
AU - Huang, Long Kai
AU - Wong, Ka Chun
AU - Yang, Fan
AU - Yao, Jianhua
N1 - We thank Z. Zheng and D. Tang for their valuable suggestions and discussion during the preparation of this manuscript. This research was substantially sponsored by the research projects (grant numbers 32170654 and 32000464 (K.-C.W.)) supported by the National Natural Science Foundation of China and was substantially supported by the Shenzhen Research Institute, City University of Hong Kong. The work described in this paper was substantially supported by the grant from the Research Grants Council of the Hong Kong Special Administrative Region (CityU 11203723 (K.-C.W.)). This project was substantially funded by the Strategic Interdisciplinary Research Grant of City University of Hong Kong (project number 2021SIRG036 (K.-C.W.)). The work described in this paper was partially supported by the grant from City University of Hong Kong (CityU 9667265 (K.-C.W.)) and Key-Area Research and Development Program of Guangdong Province (2021B0101420005 (K.-C.W.)). F.Y. was supported by the Young Elite Scientists Sponsorship Program by CAST (2023QNRC001).
Publisher Copyright:
© The Author(s), under exclusive licence to Springer Nature Limited 2025.
PY - 2025/11/5
Y1 - 2025/11/5
N2 - Measuring protein abundance at the single-cell level can facilitate a high-resolution understanding of biological mechanisms in cellular processes and disease progression. However, current single-cell proteomic technologies face challenges such as limited coverage, constrained throughput and sensitivity, batch effects, high costs and stringent experimental operations. Inspired by the translation procedure in both natural language processing and the genetic central dogma, we propose a pre-trained, large generative model named single-cell translator (scTranslator). scTranslator can generate multi-omics data by inferring the missing single-cell proteome based on the transcriptome. Through systematic benchmarking and validation on independent datasets, we have confirmed the accuracy, stability and flexibility of scTranslator across various profiling techniques (for example, CITE-seq, spatial CITE-seq, REAP-seq, NEAT-seq), cell types (for example, monocytes, macrophages, T cells, B cells), tissues (for example, blood, lung, brain) and a wide range of disease contexts, including infectious, metabolic and oncologic conditions. Furthermore, scTranslator shows its superiority in assisting various downstream analyses and applications, including gene/protein interaction inference, perturbation prediction, cell clustering, batch correction and cell origin recognition in pan-cancer data.
AB - Measuring protein abundance at the single-cell level can facilitate a high-resolution understanding of biological mechanisms in cellular processes and disease progression. However, current single-cell proteomic technologies face challenges such as limited coverage, constrained throughput and sensitivity, batch effects, high costs and stringent experimental operations. Inspired by the translation procedure in both natural language processing and the genetic central dogma, we propose a pre-trained, large generative model named single-cell translator (scTranslator). scTranslator can generate multi-omics data by inferring the missing single-cell proteome based on the transcriptome. Through systematic benchmarking and validation on independent datasets, we have confirmed the accuracy, stability and flexibility of scTranslator across various profiling techniques (for example, CITE-seq, spatial CITE-seq, REAP-seq, NEAT-seq), cell types (for example, monocytes, macrophages, T cells, B cells), tissues (for example, blood, lung, brain) and a wide range of disease contexts, including infectious, metabolic and oncologic conditions. Furthermore, scTranslator shows its superiority in assisting various downstream analyses and applications, including gene/protein interaction inference, perturbation prediction, cell clustering, batch correction and cell origin recognition in pan-cancer data.
UR - https://www.scopus.com/pages/publications/105021051337
U2 - 10.1038/s41551-025-01528-z
DO - 10.1038/s41551-025-01528-z
M3 - Journal article
C2 - 41193888
AN - SCOPUS:105021051337
SN - 2157-846X
JO - Nature Biomedical Engineering
JF - Nature Biomedical Engineering
ER -