TRIQS/nda 2.0.0
Multi-dimensional array library for C++
Loading...
Searching...
No Matches
BLAS/cuBLAS interface

Detailed Description

Low-level interface to parts of the BLAS/cuBLAS library.

Functions

template<BlasArray< 1 > X, BlasArrayFor< X, 1 > Y>
auto nda::blas::dot (X const &x, Y const &y)
 Interface to the BLAS/cuBLAS dot and dotu routines.
template<BlasArray< 1 > X, BlasArrayFor< X, 1 > Y>
auto nda::blas::dotc (X const &x, Y const &y)
 Interface to the BLAS/cuBLAS dotc routine.
template<BlasArrayOrConj< 2 > A, BlasArrayOrConjFor< A, 2 > B, BlasArrayFor< A, 2 > C>
void nda::blas::gemm (get_value_t< A > alpha, A const &a, B const &b, get_value_t< A > beta, C &&c)
 Interface to the BLAS/cuBLAS gemm routine.
template<bool is_vbatch = false, BlasArrayOrConj< 2 > A, BlasArrayOrConjFor< A, 2 > B, BlasArrayFor< A, 2 > C>
void nda::blas::gemm_batch (get_value_t< A > alpha, std::vector< A > const &va, std::vector< B > const &vb, get_value_t< A > beta, std::vector< C > &vc)
 Interface to batched versions of the BLAS/cuBLAS gemm routine.
template<BlasArrayOrConj< 3 > A, BlasArrayOrConjFor< A, 3 > B, BlasArrayFor< A, 3 > C>
requires ((has_C_layout<A> or has_F_layout<A>) and (has_C_layout<B> or has_F_layout<B>) and (has_C_layout<C> or has_F_layout<C>))
void nda::blas::gemm_batch_strided (get_value_t< A > alpha, A const &a, B const &b, get_value_t< A > beta, C &&c)
 Interface to batched-strided versions of the BLAS/cuBLAS gemm routine.
template<BlasArrayOrConj< 2 > A, BlasArrayOrConjFor< A, 2 > B, BlasArrayFor< A, 2 > C>
void nda::blas::gemm_vbatch (get_value_t< A > alpha, std::vector< A > const &va, std::vector< B > const &vb, get_value_t< A > beta, std::vector< C > &vc)
 Interface to batched versions of the BLAS/cuBLAS gemm routine for variable sized matrices.
template<BlasArrayOrConj< 2 > A, BlasArrayFor< A, 1 > X, BlasArrayFor< A, 1 > Y>
void nda::blas::gemv (get_value_t< A > alpha, A const &a, X const &x, get_value_t< A > beta, Y &&y)
 Interface to the BLAS/cuBLAS gemv routine.
template<BlasArray< 1 > X, BlasArrayFor< X, 1 > Y, BlasArrayFor< X, 2 > A>
void nda::blas::ger (get_value_t< X > alpha, X const &x, Y const &y, A &&a)
 Interface to the BLAS/cuBLAS ger and geru routine.
template<BlasArray< 1 > X, BlasArrayFor< X, 1 > Y, BlasArrayFor< X, 2 > A>
requires (has_F_layout<A>)
void nda::blas::gerc (get_value_t< X > alpha, X const &x, Y const &y, A &&a)
 Interface to the BLAS/cuBLAS gerc routine.
template<BlasArray< 1 > X>
void nda::blas::scal (get_value_t< X > alpha, X &&x)
 Interface to the BLAS/cuBLAS scal routine.

Function Documentation

◆ dot()

template<BlasArray< 1 > X, BlasArrayFor< X, 1 > Y>
auto nda::blas::dot ( X const & x,
Y const & y )

#include <nda/blas/dot.hpp>

Interface to the BLAS/cuBLAS dot and dotu routines.

This function forms the dot product of two vectors. It calculates \( \mathbf{x}^T \mathbf{y} \).

The first argument is never conjugated. For complex vectors, it calls dotu. Use nda::blas::dotc to conjugate the first argument.

If the input vectors satisfy nda::mem::have_device_compatible_addr_space, the cuBLAS implementation is used.

Template Parameters
Xnda::blas_lapack::BlasArray<1> type.
Ynda::blas_lapack::BlasArrayFor<X, 1> type.
Parameters
xInput vector \( \mathbf{x} \).
yInput vector \( \mathbf{y} \).
Returns
Result of \( \mathbf{x}^T \mathbf{y} \).

Definition at line 44 of file dot.hpp.

◆ dotc()

template<BlasArray< 1 > X, BlasArrayFor< X, 1 > Y>
auto nda::blas::dotc ( X const & x,
Y const & y )

#include <nda/blas/dot.hpp>

Interface to the BLAS/cuBLAS dotc routine.

This function forms the dot product of two vectors. It calculates \( \mathbf{x}^H \mathbf{y} \).

For real vectors, it calls nda::blas::dot and returns a real result.

If the input vectors satisfy nda::mem::have_device_compatible_addr_space, the cuBLAS implementation is used.

Template Parameters
Xnda::blas_lapack::BlasArray<1> type.
Ynda::blas_lapack::BlasArrayFor<X, 1> type.
Parameters
xInput vector \( \mathbf{x} \).
yInput vector \( \mathbf{y} \).
Returns
Result of \( \mathbf{x}^H \mathbf{y} \).

Definition at line 72 of file dot.hpp.

◆ gemm()

template<BlasArrayOrConj< 2 > A, BlasArrayOrConjFor< A, 2 > B, BlasArrayFor< A, 2 > C>
void nda::blas::gemm ( get_value_t< A > alpha,
A const & a,
B const & b,
get_value_t< A > beta,
C && c )

#include <nda/blas/gemm.hpp>

Interface to the BLAS/cuBLAS gemm routine.

This function performs the matrix-matrix operation

\[ \mathbf{C} \leftarrow \alpha \mathbf{A} \mathbf{B} + \beta \mathbf{C} \;, \]

where \( \alpha \) and \( \beta \) are scalars, and \( \mathbf{A} \), \( \mathbf{B} \) and \( \mathbf{C} \) are matrices of size \( m \times k \), \( k \times n \) and \( m \times n \), respectively.

If the input arrays satisfy nda::mem::have_device_compatible_addr_space, the cuBLAS implementation is used.

Note
\( \mathbf{A} \) and \( \mathbf{B} \) are allowed to be lazy conjugate expressions (see nda::blas_lapack::is_conj_array_expr). In this case, they are required to have the opposite memory layout of \(\mathbf{C} \) (see nda::C_layout vs nda::F_layout).
Template Parameters
Anda::blas_lapack::BlasArrayOrConj<2> type.
Bnda::blas_lapack::BlasArrayOrConjFor<A, 2> type.
Cnda::blas_lapack::BlasArrayFor<A, 2> type.
Parameters
alphaInput scalar \( \alpha \).
aInput matrix \( \mathbf{A} \) of size \( m \times k \).
bInput matrix \( \mathbf{B} \) of size \( k \times n \).
betaInput scalar \( \beta \).
cInput/Output matrix \( \mathbf{C} \) of size \( m \times n \).

Definition at line 57 of file gemm.hpp.

◆ gemm_batch()

template<bool is_vbatch = false, BlasArrayOrConj< 2 > A, BlasArrayOrConjFor< A, 2 > B, BlasArrayFor< A, 2 > C>
void nda::blas::gemm_batch ( get_value_t< A > alpha,
std::vector< A > const & va,
std::vector< B > const & vb,
get_value_t< A > beta,
std::vector< C > & vc )

#include <nda/blas/gemm_batch.hpp>

Interface to batched versions of the BLAS/cuBLAS gemm routine.

This function performs the matrix-matrix operations

\[ \mathbf{C}_i \leftarrow \alpha \mathbf{A}_i \mathbf{B}_i + \beta \mathbf{C}_i \;, \]

for batches of matrices indexed by \( i \in \{ 0, \ldots, N_b - 1 \} \). Here, \( N_b \) is the batch size and \( \alpha \) and \( \beta \) are scalars. See also nda::blas::gemm.

A batch of matrices is just a std::vector of nda::blas_lapack::BlasArray or nda::blas_lapack::BlasArrayOrConj objects. If is_vbatch is true, the matrices are allowed to have different sizes. Otherwise, they are required to have the same size.

Depending on the types of input matrices, the template parameter is_vbatch and the availability of MAGMA and MKL libraries, the function does the following:

  • If the input matrices satisfy nda::mem::have_device_compatible_addr_space and
    • is_vbatch is false, it calls cuBLAS's cublasXgemmBatched.
    • is_vbatch is true and
      • the matrices are real, it calls cuBLAS's cublasGemmGroupedBatchedEx
      • the matrices are complex, it (tries) to call magmablas_Xgemm_vbatched. If nda has not been configured with MAGMA support, an exception is thrown.
  • If the input matrices do not satisfy nda::mem::have_device_compatible_addr_space and
    • nda is linked to MKL, it calls MKL's Xgemm_batch for both is_vbatch true and false.
    • nda is not linked to MKL, it simply loops over all matrices in the batch and calls nda::blas::gemm.
Note
\( \mathbf{A}_i \) and \( \mathbf{B}_i \) are allowed to be lazy conjugate expressions (see nda::blas_lapack::is_conj_array_expr). In this case, they are required to have the opposite memory layout of \(\mathbf{C}_i \) (see nda::C_layout vs nda::F_layout).
Template Parameters
is_vbatchAllow variable sized matrices.
Anda::blas_lapack::BlasArrayOrConj<2> type.
Bnda::blas_lapack::BlasArrayOrConjFor<A, 2> type.
Cnda::blas_lapack::BlasArrayFor<A, 2> type.
Parameters
alphaInput scalar \( \alpha \).
vastd::vector of size \( N_b \) containing the input matrices \( \mathbf{A}_i \).
vbstd::vector of size \( N_b \) containing the input matrices \( \mathbf{B}_i \).
betaInput scalar \( \beta \).
vcstd::vector of size \( N_b \) containing the input/output matrices \( \mathbf{C}_i \).

Definition at line 101 of file gemm_batch.hpp.

◆ gemm_batch_strided()

template<BlasArrayOrConj< 3 > A, BlasArrayOrConjFor< A, 3 > B, BlasArrayFor< A, 3 > C>
requires ((has_C_layout<A> or has_F_layout<A>) and (has_C_layout<B> or has_F_layout<B>) and (has_C_layout<C> or has_F_layout<C>))
void nda::blas::gemm_batch_strided ( get_value_t< A > alpha,
A const & a,
B const & b,
get_value_t< A > beta,
C && c )

#include <nda/blas/gemm_batch.hpp>

Interface to batched-strided versions of the BLAS/cuBLAS gemm routine.

This function performs the matrix-matrix operations

\[ \mathbf{C}_i \leftarrow \alpha \mathbf{A}_i \mathbf{B}_i + \beta \mathbf{C}_i \;, \]

for batches of matrices indexed by \( i \in \{ 0, \ldots, N_b - 1 \} \). Here, \( N_b \) is the batch size and \( \alpha \) and \( \beta \) are scalars. See also nda::blas::gemm.

A batch of matrices is just a 3-dimensional array in either nda::C_layout or nda::F_layout. For a Fortran/C layout array, the last/first dimension indexes the individual matrices such that A(:,:,i)/A(i,:,:) corresponds to the \( i \)-th matrix \( \mathbf{A}_i \) in the batch.

Depending on the types of input arrays and the availability of the MKL library, the function does the following:

Note
The 3-dimensional arrays \( \mathbf{A} \) and \( \mathbf{B} \) are allowed to be lazy conjugate expressions (see nda::blas_lapack::is_conj_array_expr). In this case, they are required to have the opposite memory layout of \( \mathbf{C} \) (see nda::C_layout vs nda::F_layout).
Template Parameters
Anda::blas_lapack::BlasArrayOrConj<3> type.
Bnda::blas_lapack::BlasArrayOrConjFor<A, 3> type.
Cnda::blas_lapack::BlasArrayFor<A, 3> type.
Parameters
alphaInput scalar \( \alpha \).
a3-dimensional input array \( \mathbf{A} \) containing the matrices \( \mathbf{A}_i \).
b3-dimensional input array \( \mathbf{B} \) containing the matrices \( \mathbf{B}_i \).
betaInput scalar \( \beta \).
c3-dimensional input/output array \( \mathbf{C} \) containing the matrices \( \mathbf{C}_i \).

Definition at line 234 of file gemm_batch.hpp.

◆ gemm_vbatch()

template<BlasArrayOrConj< 2 > A, BlasArrayOrConjFor< A, 2 > B, BlasArrayFor< A, 2 > C>
void nda::blas::gemm_vbatch ( get_value_t< A > alpha,
std::vector< A > const & va,
std::vector< B > const & vb,
get_value_t< A > beta,
std::vector< C > & vc )

#include <nda/blas/gemm_batch.hpp>

Interface to batched versions of the BLAS/cuBLAS gemm routine for variable sized matrices.

It simply calls nda::blas::gemm_batch with is_vbatch set to true.

Template Parameters
Anda::blas_lapack::BlasArrayOrConj<2> type.
Bnda::blas_lapack::BlasArrayOrConjFor<A, 2> type.
Cnda::blas_lapack::BlasArrayFor<A, 2> type.
Parameters
alphaInput scalar \( \alpha \).
vastd::vector of size \( N_b \) containing the input matrices \( \mathbf{A}_i \).
vbstd::vector of size \( N_b \) containing the input matrices \( \mathbf{B}_i \).
betaInput scalar \( \beta \).
vcstd::vector of size \( N_b \) containing the input/output matrices \( \mathbf{C}_i \).

Definition at line 194 of file gemm_batch.hpp.

◆ gemv()

template<BlasArrayOrConj< 2 > A, BlasArrayFor< A, 1 > X, BlasArrayFor< A, 1 > Y>
void nda::blas::gemv ( get_value_t< A > alpha,
A const & a,
X const & x,
get_value_t< A > beta,
Y && y )

#include <nda/blas/gemv.hpp>

Interface to the BLAS/cuBLAS gemv routine.

This function performs the matrix-vector operation

\[ \mathbf{y} \leftarrow \alpha \mathbf{A} \mathbf{x} + \beta \mathbf{y} \; , \]

where \( \alpha \) and \( \beta \) are scalars, \( \mathbf{A} \) is an \( m \times n \) matrix and \(\mathbf{x} \) and \( \mathbf{y} \) are vectors of sizes \( n \) and \( m \), respectively.

If the input arrays satisfy nda::mem::have_device_compatible_addr_space, the cuBLAS implementation is used.

Note
\( \mathbf{A} \) is allowed to be a lazy conjugate expression (see nda::blas_lapack::is_conj_array_expr), in which case it is required to be in nda::C_layout.
Template Parameters
Anda::blas_lapack::BlasArrayOrConj<2> type.
Xnda::blas_lapack::BlasArrayFor<A, 1> type.
Ynda::blas_lapack::BlasArrayFor<A, 1> type.
Parameters
alphaInput scalar \( \alpha \).
aInput matrix \( \mathbf{A} \) of size \( m \times n \).
xInput vector \( \mathbf{x} \) of size \( n \).
betaInput scalar \( \beta \).
yInput/Output vector \( \mathbf{y} \) of size \( m \).

Definition at line 55 of file gemv.hpp.

◆ ger()

template<BlasArray< 1 > X, BlasArrayFor< X, 1 > Y, BlasArrayFor< X, 2 > A>
void nda::blas::ger ( get_value_t< X > alpha,
X const & x,
Y const & y,
A && a )

#include <nda/blas/ger.hpp>

Interface to the BLAS/cuBLAS ger and geru routine.

This function performs the rank 1 operation

\[ \mathbf{A} \leftarrow \alpha \mathbf{x} \mathbf{y}^T + \mathbf{A} \; , \]

where \( \alpha \) is a scalar, \( \mathbf{x} \) is an \( m \) element vector, \( \mathbf{y} \) is an \( n \) element vector and \( \mathbf{A} \) is an \( m \times n \) matrix.

The vector \( \mathbf{y} \) is never conjugated. For complex vectors, it calls geru. Use nda::blas::gerc to conjugate \( \mathbf{y} \).

If the input arrays satisfy nda::mem::have_device_compatible_addr_space, the cuBLAS implementation is used.

Template Parameters
Xnda::blas_lapack::BlasArray<1> type.
Ynda::blas_lapack::BlasArrayFor<X, 1> type.
Anda::blas_lapack::BlasArrayFor<X, 2> type.
Parameters
alphaInput scalar \( \alpha \).
xInput vector \( \mathbf{x} \) of size \( m \).
yInput vector \( \mathbf{y} \) of size \( n \).
aInput/Output matrix \( \mathbf{A} \) of size \( m \times n \) to which the outer product is added.

Definition at line 54 of file ger.hpp.

◆ gerc()

template<BlasArray< 1 > X, BlasArrayFor< X, 1 > Y, BlasArrayFor< X, 2 > A>
requires (has_F_layout<A>)
void nda::blas::gerc ( get_value_t< X > alpha,
X const & x,
Y const & y,
A && a )

#include <nda/blas/ger.hpp>

Interface to the BLAS/cuBLAS gerc routine.

This function performs the rank 1 operation

\[ \mathbf{A} \leftarrow \alpha \mathbf{x} \mathbf{y}^H + \mathbf{A} \; , \]

where \( \alpha \) is a scalar, \( \mathbf{x} \) is an \( m \) element vector, \( \mathbf{y} \) is an \( n \) element vector and \( \mathbf{A} \) is an \( m \times n \) matrix.

For real vectors/matrices, it calls nda::blas::ger.

If the input arrays satisfy nda::mem::have_device_compatible_addr_space, the cuBLAS implementation is used.

Template Parameters
Xnda::blas_lapack::BlasArray<1> type.
Ynda::blas_lapack::BlasArrayFor<X, 1> type.
Anda::blas_lapack::BlasArrayFor<X, 2> type with nda::F_layout.
Parameters
alphaInput scalar \( \alpha \).
xInput vector \( \mathbf{x} \) of size \( m \).
yInput vector \( \mathbf{y} \) of size \( n \).
aInput/Output matrix \( \mathbf{A} \) of size \( m \times n \) to which the outer product is added.

Definition at line 101 of file ger.hpp.

◆ scal()

template<BlasArray< 1 > X>
void nda::blas::scal ( get_value_t< X > alpha,
X && x )

#include <nda/blas/scal.hpp>

Interface to the BLAS/cuBLAS scal routine.

Scales a vector by a constant. This function calculates \( \mathbf{x} \leftarrow \alpha \mathbf{x} \), where \( \alpha \) is a scalar constant and \( \mathbf{x} \) is a vector.

If the input vector satisfies nda::mem::have_device_compatible_addr_space, the cuBLAS implementation is used.

Template Parameters
Xnda::blas_lapack::BlasArray<1> type.
Parameters
alphaInput scalar \( \alpha \).
xInput/Output vector \( \mathbf{x} \) to be scaled.

Definition at line 36 of file scal.hpp.