TY - JOUR
T1 - Full-Length Transcript-Based Proteogenomics of Rice Improves Its Genome and Proteome Annotation
AU - Chen, Mo Xian
AU - Zhu, Fu Yuan
AU - Gao, Bei
AU - Ma, Kai Long
AU - Zhang, Youjun
AU - Fernie, Alisdair R.
AU - Chen, Xi
AU - Dai, Lei
AU - Ye, I. Neng Hui
AU - Zhang, Xue
AU - Tian, Yuan
AU - Zhang, Di
AU - Xiao, Shi
AU - Zhang, Jianhua
AU - Liua, Ying Gao
N1 - Funding Information:
1This work was supported by the Natural Science Foundation of Guangdong Province (grant no. 2018A030313030), the Funds of Shan-dong “Double Top” Program, the National Natural Science Foundation of China (grant nos. NSFC81401561 and 91535109), the Shenzhen Virtual University Park Support Scheme to the CUHK Shenzhen Research Institute (grant no. YFJGJS1.0), the Natural Science Foundation of Hunan Province (grant no. 2019JJ50263), the Hong Kong Research Grant Council (grant nos. AoE/M-05/12, AoE/M-403/ 16, GRF14160516, 14177617, and 12100318), and the European Union H2020 project PlantaSYST (SGA-CSA no. 739582 under FPA no. 664620). 2These authors contributed equally to the article. 3Senior authors. 4Author for Contact: [email protected] The author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (www.plantphysiol.org) is: Ying-Gao Liu ([email protected]).
PY - 2020/3
Y1 - 2020/3
N2 - Rice (Oryza sativa) molecular breeding has gained considerable attention in recent years, but inaccurate genome annotation hampers its progress and functional studies of the rice genome. In this study, we applied single-molecule long-read RNA sequencing (lrRNA seq)-based proteogenomics to reveal the complexity of the rice transcriptome and its coding abilities. Surprisingly, approximately 60% of loci identified by lrRNA seq are associated with natural antisense transcripts (NATs). The high-density genomic arrangement of NAT genes suggests their potential roles in the multifaceted control of gene expression. In addition, a large number of fusion and intergenic transcripts have been observed. Furthermore, 906,456 transcript isoforms were identified, and 72.9% of the genes can generate splicing isoforms. A total of 706,075 posttranscriptional events were subsequently categorized into 10 subtypes, demonstrating the interdependence of posttranscriptional mechanisms that contribute to transcriptome diversity. Parallel short-read RNA sequencing indicated that lrRNA seq has a superior capacity for the identification of longer transcripts. In addition, over 190,000 unique peptides belonging to 9,706 proteoforms/protein groups were identified, expanding the diversity of the rice proteome. Our findings indicate that the genome organization, transcriptome diversity, and coding potential of the rice transcriptome are far more complex than previously anticipated.
AB - Rice (Oryza sativa) molecular breeding has gained considerable attention in recent years, but inaccurate genome annotation hampers its progress and functional studies of the rice genome. In this study, we applied single-molecule long-read RNA sequencing (lrRNA seq)-based proteogenomics to reveal the complexity of the rice transcriptome and its coding abilities. Surprisingly, approximately 60% of loci identified by lrRNA seq are associated with natural antisense transcripts (NATs). The high-density genomic arrangement of NAT genes suggests their potential roles in the multifaceted control of gene expression. In addition, a large number of fusion and intergenic transcripts have been observed. Furthermore, 906,456 transcript isoforms were identified, and 72.9% of the genes can generate splicing isoforms. A total of 706,075 posttranscriptional events were subsequently categorized into 10 subtypes, demonstrating the interdependence of posttranscriptional mechanisms that contribute to transcriptome diversity. Parallel short-read RNA sequencing indicated that lrRNA seq has a superior capacity for the identification of longer transcripts. In addition, over 190,000 unique peptides belonging to 9,706 proteoforms/protein groups were identified, expanding the diversity of the rice proteome. Our findings indicate that the genome organization, transcriptome diversity, and coding potential of the rice transcriptome are far more complex than previously anticipated.
UR - http://www.scopus.com/inward/record.url?scp=85081146297&partnerID=8YFLogxK
U2 - 10.1104/PP.19.00430
DO - 10.1104/PP.19.00430
M3 - Journal article
C2 - 31857423
AN - SCOPUS:85081146297
SN - 0032-0889
VL - 182
SP - 1510
EP - 1526
JO - Plant Physiology
JF - Plant Physiology
IS - 3
ER -