CLUDA layer¶

CLUDA is the foundation of reikna. It provides the unified access to basic features of CUDA and OpenCL, such as memory operations, compilation and so on. It can also be used by itself, if you want to write GPU API-independent programs and happen to only need a small subset of GPU API. The terminology is borrowed from OpenCL, since it is a more general API.

API module¶

Modules for all APIs have the same generalized interface. It is referred here (and references from other parts of this documentation) as reikna.cluda.api.

Temporary Arrays¶

Each Thread contains a special allocator for arrays with data that does not have to be persistent all the time. In many cases you only want some array to keep its contents between several kernel calls. This can be achieved by manually allocating and deallocating such arrays every time, but it slows the program down, and you have to synchronize the queue because allocation commands are not serialized. Therefore it is advantageous to use temp_array() method to get such arrays. It takes a list of dependencies as an optional parameter which gives the allocator a hint about which arrays should not use the same physical allocation.

Function modules¶

Kernel toolbox¶

The stuff available for the kernel passed for compilation consists of two parts.

First, there are several objects available at the template rendering stage, namely numpy, reikna.cluda.dtypes (as dtypes), and reikna.helpers (as helpers).

Second, there is a set of macros attached to any kernel depending on the API it is being compiled for:

CUDA¶: If defined, specifies that the kernel is being compiled for CUDA API.

COMPILE_FAST_MATH¶: If defined, specifies that the compilation for this kernel was requested with fast_math == True.

LOCAL_BARRIER¶: Synchronizes threads inside a block.

WITHIN_KERNEL¶: Modifier for a device-only function declaration.

KERNEL¶: Modifier for the kernel function declaration.

GLOBAL_MEM¶: Modifier for the global memory pointer argument.

LOCAL_MEM¶: Modifier for the statically allocated local memory variable.

LOCAL_MEM_DYNAMIC¶: Modifier for the dynamically allocated local memory variable.

LOCAL_MEM_ARG¶: Modifier for the local memory argument in the device-only functions.

INLINE¶: Modifier for inline functions.

SIZE_T¶: The type of local/global IDs and sizes. Equal to unsigned int for CUDA, and size_t for OpenCL (which can be 32- or 64-bit unsigned integer, depending on the device).

SIZE_T get_local_id(int dim)¶

SIZE_T get_group_id(int dim)¶

SIZE_T get_global_id(int dim)¶

SIZE_T get_local_size(int dim)¶

SIZE_T get_num_groups(int dim)¶

SIZE_T get_global_size(int dim)¶: Local, group and global identifiers and sizes. In case of CUDA mimic the behavior of corresponding OpenCL functions.

VSIZE_T¶: The type of local/global IDs in the virtual grid. It is separate from SIZE_T because the former is intended to be equivalent to what the backend is using, while VSIZE_T is a separate type and can be made larger than SIZE_T in the future if necessary.

ALIGN(int)¶

Used to specify an explicit alignment (in bytes) for fields in structures, as

typedef struct {
    char ALIGN(4) a;
    int b;
} MY_STRUCT;

VIRTUAL_SKIP_THREADS¶: This macro should start any kernel compiled with compile_static(). It skips all the empty threads resulting from fitting call parameters into backend limitations.

VSIZE_T virtual_local_id(int dim)¶

VSIZE_T virtual_group_id(int dim)¶

VSIZE_T virtual_global_id(int dim)¶

VSIZE_T virtual_local_size(int dim)¶

VSIZE_T virtual_num_groups(int dim)¶

VSIZE_T virtual_global_size(int dim)¶

VSIZE_T virtual_global_flat_id()¶

VSIZE_T virtual_global_flat_size()¶: Only available in StaticKernel objects obtained from compile_static(). Since its dimensions can differ from actual call dimensions, these functions have to be used.

Datatype tools¶

This module contains various convenience functions which operate with numpy.dtype objects.

CLUDA layer¶

API module¶

Temporary Arrays¶

Function modules¶

Kernel toolbox¶

Datatype tools¶

Table Of Contents

Related Topics

This Page

Navigation

CLUDA layer¶

API module¶

Temporary Arrays¶

Function modules¶

Kernel toolbox¶

Datatype tools¶

Table Of Contents

Related Topics

This Page

Quick search