RITA: A Phraseological Dataset of CEFR Assignments and Exams for Italian as a Second Language

Giulio Biondi, Valentina Franzoni*, Yuanxi Li*, Alfredo Milani, Valentino Santucci

*Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

1 Citation (Scopus)

Abstract

This paper presents RITA (Resource for Italian Tests Assessment), a new dataset of academic exam texts written in Italian by second-language learners for obtaining the CEFR certification of proficiency level. In addition to the tests, RITA provides a variety of speech elements, annotations, and statistics, including phraseological units and their syntactic dependencies. The dataset consists of two corpora: one containing the task assignment and the other containing the texts elaborated by the learners in response to the assignment. This work describes the data collection and annotation process, structure, and statistics computed to facilitate the analysis of the phraseological text. RITA is a valuable resource for researchers and educators interested in Italian phraseology, language assessment, and natural language processing.

Original languageEnglish
Title of host publicationProceedings - 2023 22nd IEEE/WIC International Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT 2023
PublisherIEEE
Pages425-430
Number of pages6
ISBN (Electronic)9798350309188
ISBN (Print)9798350309195
DOIs
Publication statusPublished - 26 Oct 2023
Event22nd IEEE/WIC International Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT 2023 - Hybrid, Venice, Italy
Duration: 26 Oct 202329 Oct 2023
https://ieeexplore.ieee.org/xpl/conhome/10350035/proceeding

Publication series

NameProceedings - IEEE/WIC International Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT

Conference

Conference22nd IEEE/WIC International Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT 2023
Country/TerritoryItaly
CityVenice
Period26/10/2329/10/23
Internet address

User-Defined Keywords

  • Italian L2
  • L2
  • Natural Language Processing
  • NLP
  • text complexity

Fingerprint

Dive into the research topics of 'RITA: A Phraseological Dataset of CEFR Assignments and Exams for Italian as a Second Language'. Together they form a unique fingerprint.

Cite this