CeDMA: Enhancing Memory Efficiency of Heterogeneous Accelerator Systems Through Central DMA Controlling

Ruoshi Li, Long Zheng*, Yu Huang*, Zhiyuan Shao, Amelie Chi Zhou, Xiaofei Liao, Hai Jin, Jingling Xue

*Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

Abstract

As heterogeneous computing systems continue to evolve, emerging workloads increasingly span multiple types of accelerators, resulting in frequent inter-accelerator data transfers. However, traditional CPU-managed memory systems often struggle to coordinate these transfers efficiently, leading to high latency, poor memory bandwidth utilization, and scalability bottlenecks. We propose CeDMA, a centralized and programmable Direct Memory Access (DMA) control architecture that enables high-performance, CPU-decoupled memory coordination across diverse accelerators. CeDMA combines a unified hardware-software co-design: a modular DMA engine with integrated address translation and dual-level arbitration logic on the hardware side, and a lightweight instruction-driven memory management model with adaptive scheduling on the software side. CeDMA enables fine-grained control over memory transfers, minimizes off-chip bandwidth consumption, and exploits memory-level parallelism through dynamic resource partitioning. Cycle-accurate simulation results across a diverse workload suite—including GEMM, Conv2D, and graph traversal kernels—demonstrate up to 75% reduction in external memory access, 60% improvement in performance, and 45% reduction in access latency. Furthermore, CeDMA maintains high throughput and predictable latency at scale, supporting up to 32 concurrent accelerators. These results position CeDMA as a scalable, general-purpose memory management substrate for future heterogeneous SoC architectures.

Original languageEnglish
Title of host publicationAdvanced Parallel Processing Technologies
Subtitle of host publication16th International Symposium, APPT 2025, Athens, Greece, July 13-16, 2025, Proceedings
EditorsChao Li, Xuehai Qian, Dimitris Gizopoulos, Boris Grot
Place of PublicationSingapore
PublisherSpringer
Pages129-144
Number of pages16
ISBN (Electronic)9789819510214
ISBN (Print)9789819510207
DOIs
Publication statusPublished - 3 Nov 2025
Event16th International Symposium on Advanced Parallel Processing Technologies - Athenaeum Intercontinental hotel, Athens, Greece
Duration: 13 Jul 202516 Jul 2025
https://link.springer.com/book/10.1007/978-981-95-1021-4 (Conference proceeding)
https://www.appt-conference.com/ (Conference website)
https://www.appt-conference.com/program (Conference program)

Publication series

NameLecture Notes in Computer Science
Volume16062
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349
NameAPPT: International Symposium on Advanced Parallel Processing Technologies

Conference

Conference16th International Symposium on Advanced Parallel Processing Technologies
Abbreviated titleAPPT 2025
Country/TerritoryGreece
CityAthens
Period13/07/2516/07/25
Internet address

Fingerprint

Dive into the research topics of 'CeDMA: Enhancing Memory Efficiency of Heterogeneous Accelerator Systems Through Central DMA Controlling'. Together they form a unique fingerprint.

Cite this