An investigation of PhD students’ use of corpora in facilitating their writing for publication purposes

Meilin Chen, John Flowerdew

Research output: Contribution to conferenceConference paperpeer-review


Due to the rapid development of globalization and the knowledge economy over the last several decades, universities worldwide have been striving for more research outputs with greater impacts in international journals in order to be more visible in the international academic community. Such pressure to publish internationally, mostly in English, falls on not only university academic staff but also research postgraduate students. As a result of such pressure, some EAP practitioners have experimented with the use of data-driven learning (Johns, 1991) to help graduate students in their research writing (e.g. Bianchi & Pazzaglia, 2007; Charles, 2007, 2014; Cortes, 2007, 2014; Diani, 2012; Flowerdew, 2009, 2015; Lee & Swales, 2006). The use of corpora in research writing can indeed be highly beneficial for research postgraduate students, given that they can access or easily create discipline-specific corpora that contain real-life published writing samples from their own research fields (Charles, 2015, 2018; Chen & Flowerdew, 2018). Except for notable studies by Charles (2014) and Yoon (2008), however, little research has yet been done on whether, why, or how research students may use corpora in their actual research writing in the longer term after being introduced to the approach.

Accordingly, this study investigates corpus use by PhD students for research writing during one year after attending a corpus-based research writing workshop. In the workshop, the students were taught to use the BNCweb online corpus and to build and search in their own personal corpora using AntFileConverter (Anthony, 2016a) and AntConc (Anthony, 2016b) respectively. The study is manly based on qualitative data from semi-structured interviews with 13 PhD students from five different universities who attended the workshop and analyses their corpus use histories. The interviews were either small-group or individual interviews, depending on the interviewees’ preferences, and were conducted in the language chosen by them (English, Cantonese, or Putonghua).

Although all students fully acknowledged the benefits of corpora for their writing, many had difficulties forming a long-term corpus habit (Charles, 2014), one major reason being that they did not write much or at all during long periods of their study, due to time being taken up with experiments or data collection for their research. Another finding was that, while many of the participants made use of wildcards and other complex search techniques rather than simple single-word searches, some found it difficult to remember all the search techniques taught in the workshop. Furthermore, the students’ preferences for corpus tools were greatly affected by the complexity of the interface or the amount of time required to carry out searches. For this reason, although students were taught to make their own corpus, this approach was less favoured, even though many participants acknowledged that having a personal corpus is highly beneficial. Of course, this finding, as with others, may have been due to the inadequacy of the workshop and not the corpus tools.

As regards the specific purposes the participants made use of corpus tools for in their research writing after the workshop, the interviews revealed that they used corpora for five major purposes: 1) to ascertain whether an expression or a sentence they produced was native like; 2) to explore alternative expressions by checking the co-texts of certain key words or phrases; 3) to discover patterns that were unknown to them, very often collocates or colligates of certain words/phrases; 4) to find useful example sentences or sentence templates for different rhetorical purposes; 5) to identify new technical terms specific to their research fields by using their personal corpora with AntConc.

Findings from this study suggest that corpora could become an important resource for PhD students to improve their research writing. Although the benefits of corpora are evident to students, they do need prolonged training to develop long-term habits. Given that many PhD students, especially those in Science, do not write in English regularly, in order to ensure a long-lasting effect of the teaching, short-term tutorials or workshops that are organised repeatedly throughout the year would be more beneficial than longer one-off single workshops.
Original languageEnglish
Publication statusPublished - Jul 2020
EventThe 14th Teaching and Language Corpora Conference - Online
Duration: 13 Jul 202016 Jul 2020 (Conference website) (Conference abstracts)


ConferenceThe 14th Teaching and Language Corpora Conference
Abbreviated titleTaLC 2020
Internet address


Dive into the research topics of 'An investigation of PhD students’ use of corpora in facilitating their writing for publication purposes'. Together they form a unique fingerprint.

Cite this