Abstract
This paper presents RITA (Resource for Italian Tests Assessment), a new dataset of academic exam texts written in Italian by second-language learners for obtaining the CEFR certification of proficiency level. In addition to the tests, RITA provides a variety of speech elements, annotations, and statistics, including phraseological units and their syntactic dependencies. The dataset consists of two corpora: one containing the task assignment and the other containing the texts elaborated by the learners in response to the assignment. This work describes the data collection and annotation process, structure, and statistics computed to facilitate the analysis of the phraseological text. RITA is a valuable resource for researchers and educators interested in Italian phraseology, language assessment, and natural language processing.
Original language | English |
---|---|
Title of host publication | Proceedings - 2023 22nd IEEE/WIC International Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT 2023 |
Publisher | IEEE |
Pages | 425-430 |
Number of pages | 6 |
ISBN (Electronic) | 9798350309188 |
ISBN (Print) | 9798350309195 |
DOIs | |
Publication status | Published - 26 Oct 2023 |
Event | 22nd IEEE/WIC International Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT 2023 - Hybrid, Venice, Italy Duration: 26 Oct 2023 → 29 Oct 2023 https://ieeexplore.ieee.org/xpl/conhome/10350035/proceeding |
Publication series
Name | Proceedings - IEEE/WIC International Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT |
---|
Conference
Conference | 22nd IEEE/WIC International Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT 2023 |
---|---|
Country/Territory | Italy |
City | Venice |
Period | 26/10/23 → 29/10/23 |
Internet address |
User-Defined Keywords
- Italian L2
- L2
- Natural Language Processing
- NLP
- text complexity