The CUDA Driver API library for low-level CUDA programming.
The CUDA Runtime API library for high-level CUDA programming, on
top of the CUDA Driver API.
The cuBLAS library is an implementation of BLAS (Basic Linear
Algebra Subprograms) on top of the NVIDIA CUDA runtime. It allows the user
to access the computational resources of NVIDIA Graphics Processing Unit
(GPU), but does not auto-parallelize across multiple GPUs.
To use the cuBLAS library, the application must allocate the
required matrices and vectors in the GPU memory space, fill them with data,
call the sequence of desired cuBLAS functions, and then upload the results
from the GPU memory space back to the host. The cuBLAS library also provides
helper functions for writing and retrieving data from the GPU.
The cuSPARSE library contains a set of basic linear algebra
subroutines used for handling sparse matrices. It is implemented on top of
the NVIDIA CUDA runtime (which is part of the CUDA Toolkit) and is designed
to be called from C and C++. The library routines can be classified into
four categories:
* Level 1: operations between a vector in sparse format and a
vector in dense format
* Level 2: operations between a matrix in sparse format and a
vector in dense format
* Level 3: operations between a matrix in sparse format and a set
of vectors in dense format (which can also usually be viewed as a dense tall
matrix)
* Conversion: operations that allow conversion between different
matrix formats
The cuSOLVER library contains LAPACK-like functions in dense and
sparse linear algebra, including linear solver, least-square solver and
eigenvalue solver.
The NVIDIA CUDA Fast Fourier Transform (FFT) product consists of
two separate libraries: cuFFT and cuFFTW. The cuFFT library is designed to
provide high performance on NVIDIA GPUs. The cuFFTW library is provided as
porting tool to enable users of FFTW to start using NVIDIA GPUs with a
minimum amount of effort.
The FFT is a divide-and-conquer algorithm for efficiently
computing discrete Fourier transforms of complex or real-valued data sets.
It is one of the most important and widely used numerical algorithms in
computational physics and general signal processing. The cuFFT library
provides a simple interface for computing FFTs on an NVIDIA GPU, which
allows users to quickly leverage the floating-point power and parallelism of
the GPU in a highly optimized and tested FFT library.
The cuRAND library provides facilities that focus on the simple
and efficient generation of high-quality pseudorandom and quasirandom
numbers. A pseudorandom sequence of numbers satisfies most of the
statistical properties of a truly random sequence but is generated by a
deterministic algorithm. A quasirandom sequence of n-dimensional points is
generated by a deterministic algorithm designed to fill an n-dimensional
space evenly.
NVIDIA NPP is a library of functions for performing CUDA
accelerated processing. The initial set of functionality in the library
focuses on imaging and video processing and is widely applicable for
developers in these areas. NPP will evolve over time to encompass more of
the compute heavy tasks in a variety of problem domains. The NPP library is
written to maximize flexibility, while maintaining high performance.
NPP can be used in one of two ways:
* A stand-alone library for adding GPU acceleration to an
application with minimal effort. Using this route allows developers to add
GPU acceleration to their applications in a matter of hours.
* A cooperative library for interoperating with a
developer’s GPU code efficiently.
Either route allows developers to harness the massive compute
resources of NVIDIA GPUs, while simultaneously reducing development
times.
The NVVM library is used by NVCC to compile CUDA binary code to
run on NVIDIA GPUs.
The libdevice library is a collection of NVVM bitcode functions
that implement common functions for NVIDIA GPU devices, including math
primitives and bit-manipulation functions. These functions are optimized for
particular GPU architectures, and are intended to be linked with an NVVM IR
module during compilation to PTX.
The CUDA internal libraries for profiling. Used by nvprof and the
Visual Profiler.
The NVIDIA Tools Extension Library.