# Computations¶

## Algorithms¶

General purpose algorithms.

### Pure parallel computations¶

class reikna.algorithms.PureParallel(parameters, code, guiding_array=None, render_kwds=None)

Bases: Computation

A general class for pure parallel computations (i.e. with no interaction between threads).

Parameters: parameters – a list of Parameter objects. code – a source code for the computation. Can be a Snippet object which will be passed Indices object for the guiding_array as the first positional argument, and KernelParameter objects corresponding to parameters as the rest of positional arguments. If it is a string, such Snippet will be created out of it, with the parameter names idxs for the first one and the names of parameters for the remaining ones. guiding_array – an tuple with the array shape, or the name of one of parameters. By default, the first parameter is chosen. render_kwds – a dictionary with render keywords for the code.
compiled_signature(*args)
Parameters: args – corresponds to the given parameters.
classmethod from_trf(trf, guiding_array=None)

Creates a PureParallel instance from a Transformation object. guiding_array can be a string with a name of an array parameter from trf, or the corresponding TransformationParameter object.

### Transposition (permutation)¶

class reikna.algorithms.Transpose(arr_t, output_arr_t=None, axes=None, block_width_override=None)

Bases: Computation

Changes the order of axes in a multidimensional array. Works analogous to numpy.transpose.

Parameters: arr_t – an array-like defining the initial array. output_arr_t – an array-like defining the output array. If None, its shape will be derived based on the shape of arr_t, its dtype will be equal to that of arr_t, and any non-default offset or strides of arr_t will be ignored. axes – tuple with the new axes order. If None, then axes will be reversed.
compiled_signature(output:o, input:i)
Parameters: output – an array with all the attributes of arr_t, with the shape permuted according to axes. input – an array with all the attributes of arr_t.

### Reduction¶

class reikna.algorithms.Reduce(arr_t, predicate, axes=None, output_arr_t=None)

Bases: Computation

Reduces the array over given axis using given binary operation.

Parameters: arr_t – an array-like defining the initial array. predicate – a Predicate object. axes – a list of non-repeating axes to reduce over. If None, the whole array will be reduced (in which case the shape of the output array is (1,)). output_arr_t – an output array metadata (the shape must still correspond to the result of reducing the original array over given axes, but offset and strides can be set to the desired ones).
compiled_signature(output:o, input:i)
Parameters: input – an array with the attributes of arr_t. output – an array with the attributes of arr_t, with its shape missing axes from axes.

### Scan¶

class reikna.algorithms.Scan(arr_t, predicate, axes=None, exclusive=False, max_work_group_size=None, seq_size=None)

Bases: Computation

Scans the array over given axis using given binary operation. Namely, from an array [a, b, c, d, ...] and an operation ., produces [a, a.b, a.b.c, a.b.c.d, ...] if exclusive is False and [0, a, a.b, a.b.c, ...] if exclusive is True (here 0 is the operation’s identity element).

Parameters: arr_t – an array-like defining the initial array. predicate – a Predicate object. axes – a list of non-repeating axes to scan over. (Note that the result will depend on the order of the axes). If None, the whole array will be scanned over. exclusive – whether to perform an exclusive scan (see above). max_work_group_size – the maximum workgroup size to be used for the scan kernel. seq_size – the number of elements to be scanned sequentially. If not given, Reikna will attempt to choose the one resulting in the best performance, but sometimes a manual choice may be better.
compiled_signature(output:o, input:i)
Parameters: input – an array with the attributes of arr_t. output – an array with the attributes of arr_t.

### Predicates¶

class reikna.algorithms.Predicate(operation, empty)

A predicate used in some of Reikna algorithms (e.g. Reduce or Scan).

Parameters: operation – a Snippet object with two parameters which will take the names of two arguments to join. empty – a numpy scalar with the empty value of the argument (the one which, being joined by another argument, does not change it).
reikna.algorithms.predicate_sum(dtype)

Returns a Predicate object which sums its arguments.

## Linear algebra¶

Linear algebra algorithms.

### Matrix multiplication (dot product)¶

class reikna.linalg.MatrixMul(a_arr, b_arr, out_arr=None, block_width_override=None, transposed_a=False, transposed_b=False)

Bases: Computation

Multiplies two matrices using last two dimensions and batching over remaining dimensions. For batching to work, the products of remaining dimensions should be equal (then the multiplication will be performed piecewise), or one of them should equal 1 (then the multiplication will be batched over the remaining dimensions of the other matrix).

Parameters: a_arr – an array-like defining the first argument. b_arr – an array-like defining the second argument. out_arr – an array-like definign the output; if not given, both shape and dtype will be derived from a_arr and b_arr. block_width_override – if provided, it will used as a block size of the multiplication kernel. transposed_a – if True, the first matrix will be transposed before the multiplication. transposed_b – if True, the second matrix will be transposed before the multiplication.
compiled_signature(output:o, matrix_a:i, matrix_b:i)
Parameters: output – the output of matrix multiplication. matrix_a – the first argument. matrix_b – the second argument.

### Matrix norms¶

class reikna.linalg.EntrywiseNorm(arr_t, order=2, axes=None)

Bases: Computation

Calculates the entrywise matrix norm (same as numpy.linalg.norm) of an arbitrary order $$r$$:

$||A||_r = \left( \sum_{i,j,\ldots} |A_{i,j,\ldots}|^r \right)^{1 / r}$
Parameters: arr_t – an array-like defining the initial array. order – the order $$r$$ (any real number). axes – a list of non-repeating axes to sum over. If None, the norm of the whole array will be calculated.
compiled_signature(output:o, input:i)
Parameters: input – an array with the attributes of arr_t. output – an array with the attributes of arr_t, with its shape missing axes from axes.

## Fast Fourier transform and related utilities¶

### Fast Fourier Transform¶

class reikna.fft.FFT(arr_t, axes=None)

Bases: Computation

Performs the Fast Fourier Transform. The interface is similar to numpy.fft.fftn. The inverse transform is normalized so that IFFT(FFT(X)) = X.

Parameters: arr_t – an array-like defining the problem array. axes – a tuple with axes over which to perform the transform. If not given, the transform is performed over all the axes.

Note

Current algorithm works most effectively with array dimensions being power of 2 This mostly applies to the axes over which the transform is performed, beacuse otherwise the computation falls back to the Bluestein’s algorithm, which effectively halves the performance.

compiled_signature(output:o, input:i, inverse:s)

output and input may be the same array.

Parameters: output – an array with the attributes of arr_t. input – an array with the attributes of arr_t. inverse – a scalar value castable to integer. If 0, output contains the forward FFT of input, if 1, the inverse one.

### FFT frequency shift¶

class reikna.fft.FFTShift(arr_t, axes=None)

Bases: Computation

Shift the zero-frequency component to the center of the spectrum. The interface is similar to numpy.fft.fftshift, and the output is the same for the same array shape and axes.

Parameters: arr_t – an array-like defining the problem array. axes – a tuple with axes over which to perform the shift. If not given, the shift is performed over all the axes.
compiled_signature(output:o, input:i)

output and input may be the same array.

Parameters: output – an array with the attributes of arr_t. input – an array with the attributes of arr_t.

## Discrete harmonic transform¶

reikna.dht.get_spatial_grid(modes, order, add_points=0)

Returns the spatial grid required to calculate the order power of a function defined in the harmonic mode space of the size modes. If add_points is 0, the grid has the minimum size required for exact transformation back to the mode space.

reikna.dht.harmonic(mode)

Returns an eigenfunction of order $$n = \mathrm{mode}$$ for the harmonic oscillator:

$\phi_{n} = \frac{1}{\sqrt[4]{\pi} \sqrt{2^n n!}} H_n(x) \exp(-x^2/2),$

where $$H_n$$ is the $$n$$-th order “physicists’” Hermite polynomial. The normalization is chosen so that $$\int \phi_n^2(x) dx = 1$$.

class reikna.dht.DHT(mode_arr, add_points=None, inverse=False, order=1, axes=None)

Bases: Computation

Discrete transform to and from harmonic oscillator modes. With inverse=True transforms a function defined by its expansion $$C_m,\,m=0 \ldots M-1$$ in the mode space with mode functions from harmonic(), to the coordinate space ($$F(x)$$ on the grid $$x$$ from get_spatial_grid()). With inverse=False guarantees to recover first $$M$$ modes of $$F^k(x)$$, where $$k$$ is the order parameter.

For multiple dimensions the operation is the same, and the mode functions are products of 1D mode functions, i.e. $$\phi_{l,m,n}^{3D}(x,y,z) = \phi_l(x) \phi_m(y) \phi_n(z)$$.

For the detailed description of the algorithm, see Dion & Cances, PRE 67(4) 046706 (2003)

Parameters: mode_arr – an array-like object defining the shape of mode space. If inverse=False, its shape is used to define the mode space size. inverse – False for forward (coordinate space -> mode space) transform, True for inverse (mode space -> coordinate space) transform. axes – a tuple with axes over which to perform the transform. If not given, the transform is performed over all the axes. order – if F is a function in mode space, the number of spatial points is chosen so that the transformation DHT[(DHT^{-1}[F])^order] could be performed. add_points – a list of the same length as mode_arr shape, specifying the number of points in x-space to use in addition to minimally required (0 by default).
compiled_signature_forward(modes:o, coords:i)
compiled_signature_inverse(coords:o, modes:i)

Depending on inverse value, either of these two will be created.

Parameters: modes – an array with the attributes of mode_arr. coords – an array with the shape depending on mode_arr, axes, order and add_points, and the dtype of mode_arr.

## Counter-based random number generators¶

This module is based on the paper by Salmon et al., P. Int. C. High. Perform. 16 (2011). and the source code of Random123 library.

A counter-based random-number generator (CBRNG) is a parametrized function $$f_k(c)$$, where $$k$$ is the key, $$c$$ is the counter, and the function $$f_k$$ defines a bijection in the set of integer numbers. Being applied to successive counters, the function produces a sequence of pseudo-random numbers. The key is an analogue of the seed of stateful RNGs; if the CBRNG is used to generate random num bers in parallel threads, the key is a combination of a seed and a unique thread number.

There are two types of generators available, threefry (uses large number of simple functions), and philox (uses smaller number of more complicated functions). The latter one is generally faster on GPUs; see the paper above for detailed comparisons. These generators can be further specialized to use words=2 or words=4 bitness=32-bit or bitness=64-bit counters. Obviously, the period of the generator equals to the cardinality of the set of possible counters. For example, if the counter consits of 4 64-bit numbers, then the period of the generator is $$2^{256}$$. As for the key size, in case of threefry the key has the same size as the counter, and for philox the key is half its size.

The CBRNG class sets one of the words of the key (except for philox-2x64, where 32 bit of the only word in the key are used), the rest are the same for all threads and are derived from the provided seed. This limits the maximum number of number-generating threads (size). philox-2x32 has a 32-bit key and therefore cannot be used in CBRNG (although it can be used separately with the help of the kernel API).

The CBRNG class itself is stateless, same as other computations in Reikna, so you have to manage the generator state yourself. The state is created by the create_counters() method and contains a size counters. This state is then passed to, and updated by a CBRNG object.

class reikna.cbrng.CBRNG(randoms_arr, generators_dim, sampler, seed=None)

Bases: Computation

Counter-based pseudo-random number generator class.

Parameters: randoms_arr – an array intended for storing generated random numbers. generators_dim – the number of dimensions (counting from the end) which will use independent generators. For example, if randoms_arr has the shape (100, 200, 300) and generators_dim is 2, then in every sub-array (j, :, :), j = 0 .. 99, every element will use an independent generator. sampler – a Sampler object. seed – None for random seed, or an integer.
classmethod sampler_name(randoms_arr, generators_dim, sampler_kwds=None, seed=None)

A convenience constructor for the sampler sampler_name from samplers. The contents of the dictionary sampler_kwds will be passed to the sampler constructor function (with bijection being created automatically, and dtype taken from randoms_arr).

compiled_signature(counters:io, randoms:o)
Parameters: counters – the RNG “state”. All attributes are equal to the ones of the result of create_counters(). randoms – generated random numbers. All attributes are equal to the ones of randoms_arr from the constructor.
create_counters()

Create a counter array for use in CBRNG.

### Kernel API¶

class reikna.cbrng.bijections.Bijection(module, word_dtype, key_dtype, counter_dtype)

Contains a CBRNG bijection module and accompanying metadata. Supports __process_modules__ protocol.

word_dtype

The data type of the integer word used by the generator.

key_words

The number of words used by the key.

counter_words

The number of words used by the counter.

key_dtype

The numpy.dtype object representing a bijection key. Contains a single array field v with key_words of word_dtype elements.

counter_dtype

The numpy.dtype object representing a bijection counter. Contains a single array field v with key_words of word_dtype elements.

raw_functions

A dictionary dtype:function_name of available functions function_name in module that produce a random full-range integer dtype from a State, advancing it. Available functions: get_raw_uint32(), get_raw_uint64().

module

The module containing the CBRNG function. It provides the C functions below.

COUNTER_WORDS

Contains the value of counter_words.

KEY_WORDS

Contains the value of key_words.

Word

Contains the type corresponding to word_dtype.

Key

Describes the bijection key. Alias for the structure generated from key_dtype.

Word v[KEY_WORDS]
Counter

Describes the bijection counter, or its output. Alias for the structure generated from counter_dtype.

Word v[COUNTER_WORDS]
Counter make_counter_from_int(int x)

Creates a counter object from an integer.

Counter bijection(Key key, Counter counter)

The main bijection function.

State

A structure containing the CBRNG state which is used by samplers.

State make_state(Key key, Counter counter)

Creates a new state object.

Counter get_next_unused_counter(State state)

Extracts a counter which has not been used in random sampling.

uint32

A type of unsigned 32-bit word, corresponds to numpy.uint32.

uint64

A type of unsigned 64-bit word, corresponds to numpy.uint64.

uint32 get_raw_uint32(State *state)

Returns uniformly distributed unsigned 32-bit word and updates the state.

uint64 get_raw_uint64(State *state)

Returns uniformly distributed unsigned 64-bit word and updates the state.

reikna.cbrng.bijections.philox(bitness, counter_words, rounds=10)

A CBRNG based on a low number of slow rounds (multiplications).

Parameters: bitness – 32 or 64, corresponds to the size of generated random integers. counter_words – 2 or 4, number of integers generated in one go. rounds – 1 to 12, the more rounds, the better randomness is achieved. Default values are big enough to qualify as PRNG. a Bijection object.
reikna.cbrng.bijections.threefry(bitness, counter_words, rounds=20)

A CBRNG based on a big number of fast rounds (bit rotations).

Parameters: bitness – 32 or 64, corresponds to the size of generated random integers. counter_words – 2 or 4, number of integers generated in one go. rounds – 1 to 72, the more rounds, the better randomness is achieved. Default values are big enough to qualify as PRNG. a Bijection object.
class reikna.cbrng.samplers.Sampler(bijection, module, dtype, randoms_per_call=1, deterministic=False)

Contains a random distribution sampler module and accompanying metadata. Supports __process_modules__ protocol.

deterministic

If True, every sampled random number consumes the same amount of counters.

randoms_per_call

How many random numbers one call to sample creates.

dtype

The data type of one random value produced by the sampler.

module

The module containing the distribution sampling function. It provides the C functions below.

RANDOMS_PER_CALL

Contains the value of randoms_per_call.

Value

Contains the type corresponding to dtype.

Result

Describes the sampling result.

value v[RANDOMS_PER_CALL]
Result sample(State *state)

Performs the sampling, updating the state.

reikna.cbrng.samplers.gamma(bijection, dtype, shape=1, scale=1)

Generates random numbers from the gamma distribution

$P(x) = x^{k-1} \frac{e^{-x/\theta}}{\theta^k \Gamma(k)},$

where $$k$$ is shape, and $$\theta$$ is scale. Supported dtypes: float(32/64). Returns a Sampler object.

reikna.cbrng.samplers.normal_bm(bijection, dtype, mean=0, std=1)

Generates normally distributed random numbers with the mean mean and the standard deviation std using Box-Muller transform. Supported dtypes: float(32/64), complex(64/128). Produces two random numbers per call for real types and one number for complex types. Returns a Sampler object.

Note

In case of a complex dtype, std refers to the standard deviation of the complex numbers (same as numpy.std() returns), not real and imaginary components (which will be normally distributed with the standard deviation std / sqrt(2)). Consequently, while mean is of type dtype, std must be real.

reikna.cbrng.samplers.uniform_float(bijection, dtype, low=0, high=1)

Generates uniformly distributed floating-points numbers in the interval [low, high). Supported dtypes: float(32/64). A fixed number of counters is used in each thread. Returns a Sampler object.

reikna.cbrng.samplers.uniform_integer(bijection, dtype, low, high=None)

Generates uniformly distributed integer numbers in the interval [low, high). If high is None, the interval is [0, low). Supported dtypes: any numpy integers. If the size of the interval is a power of 2, a fixed number of counters is used in each thread. Returns a Sampler object.

reikna.cbrng.samplers.vonmises(bijection, dtype, mu=0, kappa=1)

Generates random numbers from the von Mises distribution

$P(x) = \frac{\exp(\kappa \cos(x - \mu))}{2 \pi I_0(\kappa)},$

where $$\mu$$ is the mode, $$\kappa$$ is the dispersion, and $$I_0$$ is the modified Bessel function of the first kind. Supported dtypes: float(32/64). Returns a Sampler object.

class reikna.cbrng.tools.KeyGenerator(module, base_key)

Contains a key generator module and accompanying metadata. Supports __process_modules__ protocol.

module

A module with the key generator function:

Key key_from_int(int idx)

Generates and returns a key, suitable for the bijection which was given to the constructor.

classmethod create(bijection, seed=None, reserve_id_space=True)

Creates a generator.

Parameters: bijection – a Bijection object. seed – an integer, or numpy array of 32-bit unsigned integers. reserve_id_space – if True, the last 32 bit of the key will be reserved for the thread identifier. As a result, the total size of the key should be 64 bit or more. If False, the thread identifier will be just added to the key, which will still result in different keys for different threads, with the danger that different seeds produce the same sequences.
reference(idx)

Reference function that returns the key given the thread identifier. Uses the same algorithm as the module.