Skip to main navigation Skip to search Skip to main content

Data colonialism and indigenous languages in AI: a critical review of existing initiatives and their struggles with data sovereignty

Research output: Contribution to journalJournal articlepeer-review

Abstract

This article critically reviews recent initiatives to employ artificial intelligence (AI), particularly large language models (LLMs), for the revitalization of Indigenous languages. Structured by geographical contexts, the analysis includes Irish Gaelic (Europe), Māori (Aotearoa/New Zealand, Oceania), Guaraní (Paraguay/Bolivia, South America), and Inuktitut (Canada, North America). Applying a theoretical framework grounded in data colonialism and Indigenous data sovereignty, the article examines the key achievements in different regional endeavors, as well as investigate how government-led projects and Big Tech collaborations across these diverse contexts navigate (or fail to navigate) issues of data extraction, community consent, cultural representation, and ownership. Through this lens, the article identifies specific ethical pitfalls as well as commendable practices that either reproduce colonial dynamics or empower Indigenous communities. This critique emphasizes regional and contextual nuances, arguing that authentic community agency and rigorous adherence to Indigenous data sovereignty principles are vital to ensuring ethical AI practices and meaningful linguistic revitalization.
Original languageEnglish
Number of pages11
JournalAI and Society
DOIs
Publication statusE-pub ahead of print - 18 May 2026

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 9 - Industry, Innovation, and Infrastructure
    SDG 9 Industry, Innovation, and Infrastructure

User-Defined Keywords

  • Data colonialism
  • Ethical AI
  • Indigenous community agency
  • Indigenous data sovereignty
  • Indigenous language revitalization
  • Large language models (LLMs)

Fingerprint

Dive into the research topics of 'Data colonialism and indigenous languages in AI: a critical review of existing initiatives and their struggles with data sovereignty'. Together they form a unique fingerprint.

Cite this