Distributed Computing and Systems Research Group
Distributed Computing and Systems
ABOUT
RESEARCH
EDUCATION
PUBLICATIONS
CONTACT
PEOPLE

GPU Quicksort Library

Daniel Cederman, Philippas Tsigas

DownloadPresentationPapers | Back to Research

This sorting library gives programmers an easy way to increase sorting performance by harnessing the highly parallel computational power available in todays graphics cards.

Tests shows that it can outperform highly optimized CPU-based Quicksort with a factor of 10 on cards commonly available in 2007. Sorting 16 million floating point numbers or integers take less than half a second!

Benchmarks

Benchmarks This graph shows the performance when sorting an array of uniformly distributed floating point numbers on a 8800GTX graphics card. The comparison is made between the GPU Quicksort Library, Radix sort, Radix/Merge sort, GPUSort (bitonic sort) and the Introsort (a combination of Quicksort and Heapsort) algorithm in the standard C++ library (on a Opteron 265 1.8GHz). More results are available in the technical report [1].

Download

The source code of the GPU Quicksort Library is released under a BSD license, see source for details. It is bundled with a testbench for evaluation. It contains a Makefile for Linux and Visual Studio project files for Windows.

Version 1.1 - Released 2008-11-10
Version 1.0 - Released 2007-12-15

Name: [Required]
Email: [Optional]
Comments: [Optional]

To be able to use the GPU Quicksort Library you need to have a CUDA capable graphics card from NVIDIA, display drivers with CUDA support (169.09 and above on Windows, 169.01 and above on Linux) and the CUDA toolkit. If you want to compile the library yourself, you also need to have the CUDA SDK. Everything can be downloaded from the official CUDA website.

Presentation

This presentation was given at the Annual European Symposium on Algorithms in 2008.

Papers

[1] Daniel Cederman and Philippas Tsigas, A Practical Quicksort Algorithm for Graphics Processors,Technical Report 2008-01, Chalmers University of Technology, January 2008. [pdf]

[2] Daniel Cederman and Philippas Tsigas, A Practical Quicksort Algorithm for Graphics Processors. In the Proceedings of the 16th Annual European Symposium on Algorithms (ESA 2008), Lecture Notes in Computer Science Vol.: 5193, pages 246 - 258, Springer-Verlag 2008. [pdf]

[3] Daniel Cederman and Philippas Tsigas, On sorting and load balancing on GPUs. In the ACM SIGARCH Computer Architecture News, Vol. 36, No. 5, pages: 11-18, ACM press 2009. [doi]

[4] Daniel Cederman and Philippas Tsigas, GPU-Quicksort: A Practical Quicksort Algorithm for Graphics Processors. In the ACM Journal of Experimental Algorithmics (JEA), Vol. 14, No. 4, ACM press 2009. [doi]

Danielon


Home © 2003-2012 Distributed Computing and Systems Research Group
Chalmers university of technology, Computing Science Department
Rännvägen 6B, S-412 96, Gothenburg, Sweden (map)
Phone: +46 (0)31-772 1000 (central), +46 (0)31-16 56 55