Re-testing the universals: “Mining” interpreting data through re-engineered mega-size corpora

  • 潘珺, Jun (PI)
  • Defrancq, Bart (CoI)
  • Russo, Mariachiara (CoI)
  • Wong, Tak Ming (CoI)

Project: Research project

Project Details


For years, interpreting researchers have been obsessed with the possible “universals”, i.e. the general language features associated with interpreting. There have been debates over if or what “universals” may exist in such a complex activity of language transfer, and more recently the exploration of linguistic features shared in constrained language use. Empirical studies on this topic, although substantial in number, are usually hindered by the limited size and variety of data.

The proposed study, tapping into the recent developments in Corpus-based Interpreting Studies (CIS) and text mining, aims to readdress the issue of interpreting “universals” based on a collaborative effort to develop re-engineered mega-size corpora. The project consists of two major steps. The first part involves the integration of existing and comparable interpreting corpora through a process called “re-engineering”. The corpora selected for integration include the Chinese/English Political Interpreting Corpus (CEPIC), the European Parliament Interpreting Corpus (EPIC), and the European Parliament Interpreting Corpus Ghent (EPICG), of which the investigators of the proposed project have served as either main or co-developers. These corpora also cover a variety of European languages, in addition to Chinese and English. The re-engineered corpora will then share a unified framework to make data comparable across different sub-corpora and language sets. They will be uploaded to a searchable platform for follow-up analyses.

The second part of the project is comprised of analyses pivoting on the notion of interpreting “universals”. Apart from examining existing measures of such “universals” and features of constrained language, the project employs tools and methods available in text mining to explore the mega-size interpreting corpora data, the very thought of which was impossible back in the years when the concept of “universals” in interpreting was raised.

Corpus-driven in nature, the project aims to answer three research questions (RQs):
RQ1: What may be the linguistic and text features that make interpreted speeches deviate from its source speeches?
RQ2: What may be the linguistic and text features that make interpreter speech different from non-mediated speech in the same language?
RQ3: What may be the linguistic and text features that make interpreting different from its written counterpart, i.e. translation?

The three RQs provide comparisons on three dimensions. In particular, RQ1 focuses on possible shared linguistic strategies employed by interpreters. RQ2 extends the comparison to a level between mediated and non-mediated language, and RQ3 further extends it to the two modes of mediated language (spoken vs. written).
Effective start/end date1/01/2331/12/24

UN Sustainable Development Goals

In 2015, UN member states agreed to 17 global Sustainable Development Goals (SDGs) to end poverty, protect the planet and ensure prosperity for all. This project contributes towards the following SDG(s):

  • SDG 4 - Quality Education
  • SDG 9 - Industry, Innovation, and Infrastructure
  • SDG 10 - Reduced Inequalities
  • SDG 16 - Peace, Justice and Strong Institutions
  • SDG 17 - Partnerships for the Goals


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.