|
TRIQS/nda 2.0.0
Multi-dimensional array library for C++
|
Low-level interface to parts of the BLAS/cuBLAS library.
Functions | |
| template<BlasArray< 1 > X, BlasArrayFor< X, 1 > Y> | |
| auto | nda::blas::dot (X const &x, Y const &y) |
| Interface to the BLAS/cuBLAS dot and dotu routines. | |
| template<BlasArray< 1 > X, BlasArrayFor< X, 1 > Y> | |
| auto | nda::blas::dotc (X const &x, Y const &y) |
| Interface to the BLAS/cuBLAS dotc routine. | |
| template<BlasArrayOrConj< 2 > A, BlasArrayOrConjFor< A, 2 > B, BlasArrayFor< A, 2 > C> | |
| void | nda::blas::gemm (get_value_t< A > alpha, A const &a, B const &b, get_value_t< A > beta, C &&c) |
| Interface to the BLAS/cuBLAS gemm routine. | |
| template<bool is_vbatch = false, BlasArrayOrConj< 2 > A, BlasArrayOrConjFor< A, 2 > B, BlasArrayFor< A, 2 > C> | |
| void | nda::blas::gemm_batch (get_value_t< A > alpha, std::vector< A > const &va, std::vector< B > const &vb, get_value_t< A > beta, std::vector< C > &vc) |
| Interface to batched versions of the BLAS/cuBLAS gemm routine. | |
| template<BlasArrayOrConj< 3 > A, BlasArrayOrConjFor< A, 3 > B, BlasArrayFor< A, 3 > C> requires ((has_C_layout<A> or has_F_layout<A>) and (has_C_layout<B> or has_F_layout<B>) and (has_C_layout<C> or has_F_layout<C>)) | |
| void | nda::blas::gemm_batch_strided (get_value_t< A > alpha, A const &a, B const &b, get_value_t< A > beta, C &&c) |
| Interface to batched-strided versions of the BLAS/cuBLAS gemm routine. | |
| template<BlasArrayOrConj< 2 > A, BlasArrayOrConjFor< A, 2 > B, BlasArrayFor< A, 2 > C> | |
| void | nda::blas::gemm_vbatch (get_value_t< A > alpha, std::vector< A > const &va, std::vector< B > const &vb, get_value_t< A > beta, std::vector< C > &vc) |
| Interface to batched versions of the BLAS/cuBLAS gemm routine for variable sized matrices. | |
| template<BlasArrayOrConj< 2 > A, BlasArrayFor< A, 1 > X, BlasArrayFor< A, 1 > Y> | |
| void | nda::blas::gemv (get_value_t< A > alpha, A const &a, X const &x, get_value_t< A > beta, Y &&y) |
| Interface to the BLAS/cuBLAS gemv routine. | |
| template<BlasArray< 1 > X, BlasArrayFor< X, 1 > Y, BlasArrayFor< X, 2 > A> | |
| void | nda::blas::ger (get_value_t< X > alpha, X const &x, Y const &y, A &&a) |
| Interface to the BLAS/cuBLAS ger and geru routine. | |
| template<BlasArray< 1 > X, BlasArrayFor< X, 1 > Y, BlasArrayFor< X, 2 > A> requires (has_F_layout<A>) | |
| void | nda::blas::gerc (get_value_t< X > alpha, X const &x, Y const &y, A &&a) |
| Interface to the BLAS/cuBLAS gerc routine. | |
| template<BlasArray< 1 > X> | |
| void | nda::blas::scal (get_value_t< X > alpha, X &&x) |
| Interface to the BLAS/cuBLAS scal routine. | |
| auto nda::blas::dot | ( | X const & | x, |
| Y const & | y ) |
#include <nda/blas/dot.hpp>
Interface to the BLAS/cuBLAS dot and dotu routines.
This function forms the dot product of two vectors. It calculates \( \mathbf{x}^T \mathbf{y} \).
The first argument is never conjugated. For complex vectors, it calls dotu. Use nda::blas::dotc to conjugate the first argument.
If the input vectors satisfy nda::mem::have_device_compatible_addr_space, the cuBLAS implementation is used.
| X | nda::blas_lapack::BlasArray<1> type. |
| Y | nda::blas_lapack::BlasArrayFor<X, 1> type. |
| x | Input vector \( \mathbf{x} \). |
| y | Input vector \( \mathbf{y} \). |
| auto nda::blas::dotc | ( | X const & | x, |
| Y const & | y ) |
#include <nda/blas/dot.hpp>
Interface to the BLAS/cuBLAS dotc routine.
This function forms the dot product of two vectors. It calculates \( \mathbf{x}^H \mathbf{y} \).
For real vectors, it calls nda::blas::dot and returns a real result.
If the input vectors satisfy nda::mem::have_device_compatible_addr_space, the cuBLAS implementation is used.
| X | nda::blas_lapack::BlasArray<1> type. |
| Y | nda::blas_lapack::BlasArrayFor<X, 1> type. |
| x | Input vector \( \mathbf{x} \). |
| y | Input vector \( \mathbf{y} \). |
| void nda::blas::gemm | ( | get_value_t< A > | alpha, |
| A const & | a, | ||
| B const & | b, | ||
| get_value_t< A > | beta, | ||
| C && | c ) |
#include <nda/blas/gemm.hpp>
Interface to the BLAS/cuBLAS gemm routine.
This function performs the matrix-matrix operation
\[ \mathbf{C} \leftarrow \alpha \mathbf{A} \mathbf{B} + \beta \mathbf{C} \;, \]
where \( \alpha \) and \( \beta \) are scalars, and \( \mathbf{A} \), \( \mathbf{B} \) and \( \mathbf{C} \) are matrices of size \( m \times k \), \( k \times n \) and \( m \times n \), respectively.
If the input arrays satisfy nda::mem::have_device_compatible_addr_space, the cuBLAS implementation is used.
| A | nda::blas_lapack::BlasArrayOrConj<2> type. |
| B | nda::blas_lapack::BlasArrayOrConjFor<A, 2> type. |
| C | nda::blas_lapack::BlasArrayFor<A, 2> type. |
| alpha | Input scalar \( \alpha \). |
| a | Input matrix \( \mathbf{A} \) of size \( m \times k \). |
| b | Input matrix \( \mathbf{B} \) of size \( k \times n \). |
| beta | Input scalar \( \beta \). |
| c | Input/Output matrix \( \mathbf{C} \) of size \( m \times n \). |
| void nda::blas::gemm_batch | ( | get_value_t< A > | alpha, |
| std::vector< A > const & | va, | ||
| std::vector< B > const & | vb, | ||
| get_value_t< A > | beta, | ||
| std::vector< C > & | vc ) |
#include <nda/blas/gemm_batch.hpp>
Interface to batched versions of the BLAS/cuBLAS gemm routine.
This function performs the matrix-matrix operations
\[ \mathbf{C}_i \leftarrow \alpha \mathbf{A}_i \mathbf{B}_i + \beta \mathbf{C}_i \;, \]
for batches of matrices indexed by \( i \in \{ 0, \ldots, N_b - 1 \} \). Here, \( N_b \) is the batch size and \( \alpha \) and \( \beta \) are scalars. See also nda::blas::gemm.
A batch of matrices is just a std::vector of nda::blas_lapack::BlasArray or nda::blas_lapack::BlasArrayOrConj objects. If is_vbatch is true, the matrices are allowed to have different sizes. Otherwise, they are required to have the same size.
Depending on the types of input matrices, the template parameter is_vbatch and the availability of MAGMA and MKL libraries, the function does the following:
| is_vbatch | Allow variable sized matrices. |
| A | nda::blas_lapack::BlasArrayOrConj<2> type. |
| B | nda::blas_lapack::BlasArrayOrConjFor<A, 2> type. |
| C | nda::blas_lapack::BlasArrayFor<A, 2> type. |
| alpha | Input scalar \( \alpha \). |
| va | std::vector of size \( N_b \) containing the input matrices \( \mathbf{A}_i \). |
| vb | std::vector of size \( N_b \) containing the input matrices \( \mathbf{B}_i \). |
| beta | Input scalar \( \beta \). |
| vc | std::vector of size \( N_b \) containing the input/output matrices \( \mathbf{C}_i \). |
Definition at line 101 of file gemm_batch.hpp.
| void nda::blas::gemm_batch_strided | ( | get_value_t< A > | alpha, |
| A const & | a, | ||
| B const & | b, | ||
| get_value_t< A > | beta, | ||
| C && | c ) |
#include <nda/blas/gemm_batch.hpp>
Interface to batched-strided versions of the BLAS/cuBLAS gemm routine.
This function performs the matrix-matrix operations
\[ \mathbf{C}_i \leftarrow \alpha \mathbf{A}_i \mathbf{B}_i + \beta \mathbf{C}_i \;, \]
for batches of matrices indexed by \( i \in \{ 0, \ldots, N_b - 1 \} \). Here, \( N_b \) is the batch size and \( \alpha \) and \( \beta \) are scalars. See also nda::blas::gemm.
A batch of matrices is just a 3-dimensional array in either nda::C_layout or nda::F_layout. For a Fortran/C layout array, the last/first dimension indexes the individual matrices such that A(:,:,i)/A(i,:,:) corresponds to the \( i \)-th matrix \( \mathbf{A}_i \) in the batch.
Depending on the types of input arrays and the availability of the MKL library, the function does the following:
| A | nda::blas_lapack::BlasArrayOrConj<3> type. |
| B | nda::blas_lapack::BlasArrayOrConjFor<A, 3> type. |
| C | nda::blas_lapack::BlasArrayFor<A, 3> type. |
| alpha | Input scalar \( \alpha \). |
| a | 3-dimensional input array \( \mathbf{A} \) containing the matrices \( \mathbf{A}_i \). |
| b | 3-dimensional input array \( \mathbf{B} \) containing the matrices \( \mathbf{B}_i \). |
| beta | Input scalar \( \beta \). |
| c | 3-dimensional input/output array \( \mathbf{C} \) containing the matrices \( \mathbf{C}_i \). |
Definition at line 234 of file gemm_batch.hpp.
| void nda::blas::gemm_vbatch | ( | get_value_t< A > | alpha, |
| std::vector< A > const & | va, | ||
| std::vector< B > const & | vb, | ||
| get_value_t< A > | beta, | ||
| std::vector< C > & | vc ) |
#include <nda/blas/gemm_batch.hpp>
Interface to batched versions of the BLAS/cuBLAS gemm routine for variable sized matrices.
It simply calls nda::blas::gemm_batch with is_vbatch set to true.
| A | nda::blas_lapack::BlasArrayOrConj<2> type. |
| B | nda::blas_lapack::BlasArrayOrConjFor<A, 2> type. |
| C | nda::blas_lapack::BlasArrayFor<A, 2> type. |
| alpha | Input scalar \( \alpha \). |
| va | std::vector of size \( N_b \) containing the input matrices \( \mathbf{A}_i \). |
| vb | std::vector of size \( N_b \) containing the input matrices \( \mathbf{B}_i \). |
| beta | Input scalar \( \beta \). |
| vc | std::vector of size \( N_b \) containing the input/output matrices \( \mathbf{C}_i \). |
Definition at line 194 of file gemm_batch.hpp.
| void nda::blas::gemv | ( | get_value_t< A > | alpha, |
| A const & | a, | ||
| X const & | x, | ||
| get_value_t< A > | beta, | ||
| Y && | y ) |
#include <nda/blas/gemv.hpp>
Interface to the BLAS/cuBLAS gemv routine.
This function performs the matrix-vector operation
\[ \mathbf{y} \leftarrow \alpha \mathbf{A} \mathbf{x} + \beta \mathbf{y} \; , \]
where \( \alpha \) and \( \beta \) are scalars, \( \mathbf{A} \) is an \( m \times n \) matrix and \(\mathbf{x} \) and \( \mathbf{y} \) are vectors of sizes \( n \) and \( m \), respectively.
If the input arrays satisfy nda::mem::have_device_compatible_addr_space, the cuBLAS implementation is used.
| A | nda::blas_lapack::BlasArrayOrConj<2> type. |
| X | nda::blas_lapack::BlasArrayFor<A, 1> type. |
| Y | nda::blas_lapack::BlasArrayFor<A, 1> type. |
| alpha | Input scalar \( \alpha \). |
| a | Input matrix \( \mathbf{A} \) of size \( m \times n \). |
| x | Input vector \( \mathbf{x} \) of size \( n \). |
| beta | Input scalar \( \beta \). |
| y | Input/Output vector \( \mathbf{y} \) of size \( m \). |
| void nda::blas::ger | ( | get_value_t< X > | alpha, |
| X const & | x, | ||
| Y const & | y, | ||
| A && | a ) |
#include <nda/blas/ger.hpp>
Interface to the BLAS/cuBLAS ger and geru routine.
This function performs the rank 1 operation
\[ \mathbf{A} \leftarrow \alpha \mathbf{x} \mathbf{y}^T + \mathbf{A} \; , \]
where \( \alpha \) is a scalar, \( \mathbf{x} \) is an \( m \) element vector, \( \mathbf{y} \) is an \( n \) element vector and \( \mathbf{A} \) is an \( m \times n \) matrix.
The vector \( \mathbf{y} \) is never conjugated. For complex vectors, it calls geru. Use nda::blas::gerc to conjugate \( \mathbf{y} \).
If the input arrays satisfy nda::mem::have_device_compatible_addr_space, the cuBLAS implementation is used.
| X | nda::blas_lapack::BlasArray<1> type. |
| Y | nda::blas_lapack::BlasArrayFor<X, 1> type. |
| A | nda::blas_lapack::BlasArrayFor<X, 2> type. |
| alpha | Input scalar \( \alpha \). |
| x | Input vector \( \mathbf{x} \) of size \( m \). |
| y | Input vector \( \mathbf{y} \) of size \( n \). |
| a | Input/Output matrix \( \mathbf{A} \) of size \( m \times n \) to which the outer product is added. |
| void nda::blas::gerc | ( | get_value_t< X > | alpha, |
| X const & | x, | ||
| Y const & | y, | ||
| A && | a ) |
#include <nda/blas/ger.hpp>
Interface to the BLAS/cuBLAS gerc routine.
This function performs the rank 1 operation
\[ \mathbf{A} \leftarrow \alpha \mathbf{x} \mathbf{y}^H + \mathbf{A} \; , \]
where \( \alpha \) is a scalar, \( \mathbf{x} \) is an \( m \) element vector, \( \mathbf{y} \) is an \( n \) element vector and \( \mathbf{A} \) is an \( m \times n \) matrix.
For real vectors/matrices, it calls nda::blas::ger.
If the input arrays satisfy nda::mem::have_device_compatible_addr_space, the cuBLAS implementation is used.
| X | nda::blas_lapack::BlasArray<1> type. |
| Y | nda::blas_lapack::BlasArrayFor<X, 1> type. |
| A | nda::blas_lapack::BlasArrayFor<X, 2> type with nda::F_layout. |
| alpha | Input scalar \( \alpha \). |
| x | Input vector \( \mathbf{x} \) of size \( m \). |
| y | Input vector \( \mathbf{y} \) of size \( n \). |
| a | Input/Output matrix \( \mathbf{A} \) of size \( m \times n \) to which the outer product is added. |
| void nda::blas::scal | ( | get_value_t< X > | alpha, |
| X && | x ) |
#include <nda/blas/scal.hpp>
Interface to the BLAS/cuBLAS scal routine.
Scales a vector by a constant. This function calculates \( \mathbf{x} \leftarrow \alpha \mathbf{x} \), where \( \alpha \) is a scalar constant and \( \mathbf{x} \) is a vector.
If the input vector satisfies nda::mem::have_device_compatible_addr_space, the cuBLAS implementation is used.
| X | nda::blas_lapack::BlasArray<1> type. |
| alpha | Input scalar \( \alpha \). |
| x | Input/Output vector \( \mathbf{x} \) to be scaled. |