Ouroboros: Virtualized queues for dynamic memory management on GPUs

Martin Winter, Daniel Mlakar, Mathias Parger, Markus Steinberger

Publikation: Beitrag in Buch/Bericht/KonferenzbandBeitrag in einem KonferenzbandBegutachtung


Dynamic memory allocation on a single instruction, multiple threads architecture, like the Graphics Processing Unit (GPU), is challenging and implementation guidelines caution against it. Data structures must rise to the challenge of thousands of concurrently active threads trying to allocate memory. Efficient queueing structures have been used in the past to allow for simple allocation and reuse of memory directly on the GPU but do not scale well to different allocation sizes, as each requires its own queue.

In this work, we propose Ouroboros, a virtualized queueing structure, managing dynamically allocatable data chunks, whilst being built on top of these same chunks. Data chunks are interpreted on-the-fly either as building blocks for the virtualized queues or as paged user data. Re-usable user memory is managed in one of two ways, either as individual pages or as chunks containing pages. The queueing structures grow and shrink dynamically, only currently needed queue chunks are held in memory and freed up queue chunks can be reused within the system. Thus, we retain the performance benefits of an efficient, static queue design while keeping the memory requirements low. Performance evaluation on an NVIDIA TITAN V with the native device memory allocator in CUDA 10.1 shows speed-ups between 11X and 412X, with an average of 118X. For real-world testing, we integrate our allocator into faimGraph, a dynamic graph framework with proprietary memory management. Throughout all memory-intensive operations, such as graph initialization and edge updates, our allocator shows similar to improved performance. Additionally, we show improved algorithmic performance on PageRank and Static Triangle Counting.

Overall, our memory allocator can be efficiently initialized, allows for high-throughput allocation and offers, with its per-thread allocation model, a drop-in replacement for comparable dynamic memory allocators.
TitelProceedings of the 34th ACM International Conference on Supercomputing, ICS 2020
Herausgeber (Verlag)Association of Computing Machinery
ISBN (elektronisch)9781450379830
ISBN (Print)9781450379830
PublikationsstatusVeröffentlicht - 29 Juni 2020
Veranstaltung34th ACM International Conference on Supercomputing - Virtuell, Spanien
Dauer: 29 Juni 20202 Juli 2020


Konferenz34th ACM International Conference on Supercomputing
KurztitelICS '20

ASJC Scopus subject areas

  • Informatik (insg.)


Untersuchen Sie die Forschungsthemen von „Ouroboros: Virtualized queues for dynamic memory management on GPUs“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren