In this example, we show how to use the Tensor operations in nda to perform tensor addition, contraction, full dot product, reduction to a scalar, in-place scaling and element-wise binary ops.
All the following code snippets are part of the same main function:
#include <complex>
#include <iostream>
int main() {
}
unary_op
Unary element-wise operations for tensor operations.
binary_op
Binary operations for tensor operations.
Includes all relevant headers for the core nda library.
What is a tensor in nda?
In nda, a tensor is simply an nda::array of arbitrary rank. The nda::tensor namespace adds a small set of operations that work uniformly across ranks and use the Einstein summation convention, where dimensions are named by single-letter index strings and repeated indices imply summation.
In this example, all arrays live on the host (the default address space). The tensor operations dispatch to the optimized TBLIS backend whenever it is available, and otherwise fall back to plain nda expressions for the cases where this is possible (identical ranks and identical index orderings).
Prerequisite: building with TBLIS support
Tensor contraction with arbitrary einsum patterns cannot be expressed by the basic nda expression machinery alone. To enable the full feature set demonstrated here, configure nda with -DTblisSupport=ON:
cmake ../nda.src -DCMAKE_BUILD_TYPE=Release -DTblisSupport=ON
Constructing tensors and index strings
We start by creating a rank-3 tensor of shape \( 2 \times 3 \times 4 \) and filling it with consecutive integers:
for (int n = 0; auto &x : A) x = n++;
std::cout << "A = " << A << std::endl;
std::cout << "A.shape() = " << A.shape() << std::endl;
basic_array< ValueType, Rank, Layout, 'A', ContainerPolicy > array
Alias template of an nda::basic_array with an 'A' algebra.
Output:
A = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23]
A.shape() = (2 3 4)
Note that nda streams the underlying storage of rank-3 (and higher) arrays as a single flat list.
The tensor operations identify dimensions by single-letter labels. For convenience, nda::tensor::default_index returns a default string of length equal to the rank, starting with 'a':
std::string_view default_index()
Generate a default index string ("abc...") of a given length.
Output:
Tensor addition
nda::tensor::add performs an in-place tensor addition with optional scaling:
\[ B_{\text{idx}_B} \leftarrow \alpha \, A_{\text{idx}_A} + \beta \, B_{\text{idx}_B} \;.
\]
B = 1.0;
double a1 = 2.0;
double b1 = 3.0;
std::cout << "B = 2*A + 3 = " << B << std::endl;
void add(get_value_t< A > alpha, A const &a, std::string_view idx_a, get_value_t< A > beta, B &&b, std::string_view idx_b)
Tensor addition with cuTENSOR/TBLIS/nda dispatch.
Output:
B = 2*A + 3 = [3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49]
Every entry of B equals \( 2 A_{ijk} + 3 \).
Tensor contraction
nda::tensor::contract is the central tensor operation: a general einsum-style contraction
\[ C_{\text{idx}_C} \leftarrow \alpha \, A_{\text{idx}_A} \cdot B_{\text{idx}_B} + \beta \, C_{\text{idx}_C} \;,
\]
where indices that appear in both \( A \) and \( B \) but not in \( C \) are summed over.
A rank-3 contraction \( C_{il} = \sum_{jk} A_{ijk} B_{jkl} \):
for (int n = 0; auto &x : A2) x = 0.1 * n++;
for (int n = 0; auto &x : B2) x = 0.1 * n++;
std::cout << "C_il = A_ijk * B_jkl = " << C2 << std::endl;
static basic_array zeros(std::array< Int, Rank > const &shape)
void contract(get_value_t< A > alpha, A const &a, std::string_view idx_a, B const &b, std::string_view idx_b, get_value_t< A > beta, C &&c, std::string_view idx_c)
Tensor contraction with cuTENSOR/TBLIS dispatch.
Output:
C_il = A_ijk * B_jkl =
[[25.3,25.96,26.62,27.28,27.94]
[64.9,67,69.1,71.2,73.3]]
The matrix product \( \mathbf{A}_m \mathbf{B}_m \) is just the special rank-2 case \( C_{ik} = \sum_j A_{ij} B_{jk} \):
std::cout << "A_ij * B_jk = " << Cm << std::endl;
Output:
A_ij * B_jk =
[[58,64]
[139,154]]
This is exactly the result returned by Am * Bm in the matrix algebra of nda::matrix; nda exposes the same operation in einsum form, generalized to arbitrary rank.
Full tensor dot product
nda::tensor::dot computes the scalar
\[ z \leftarrow \sum_{\text{idx}_A = \text{idx}_B} A_{\text{idx}_A} \cdot B_{\text{idx}_B} \;,
\]
i.e. it contracts every dimension of \( A \) against the matching dimension of \( B \).
std::cout << "dot(A2, A2) [indexed] = " << d_indexed << std::endl;
std::cout << "dot(A2, A2) [default] = " << d_default << std::endl;
get_value_t< A > dot(A const &a, std::string_view idx_a, B const &b, std::string_view idx_b)
Full tensor dot product with cuTENSOR/TBLIS/nda dispatch.
Output:
dot(A2, A2) [indexed] = 43.24
dot(A2, A2) [default] = 43.24
The two calls return the same value: the indexed form spells out the pairing explicitly, while the convenience overload uses nda::tensor::default_index for both operands.
Full tensor reduction
nda::tensor::reduce reduces every element of a tensor to a single scalar via one of the operations defined by nda::tensor::binary_op:
get_value_t< A > reduce(A const &a, binary_op op_reduce=binary_op::SUM)
Full tensor reduction with cuTENSOR/TBLIS/nda dispatch.
Output:
reduce(A, SUM) = 276
reduce(A, MAX) = 23
reduce(A, MIN) = 0
reduce(A, NORM_2) = 65.7571
The default reduction is binary_op::SUM. See nda::tensor::binary_op for the full list of supported reductions.
In-place tensor scaling
nda::tensor::scale rescales a tensor in place and optionally applies an element-wise unary operation (see nda::tensor::unary_op):
\[ A \leftarrow \alpha \, \text{op}(A) \;.
\]
for (int n = 0; auto &x : S) x = static_cast<double>(n++ + 1);
std::cout << "2 * S = " << S << std::endl;
std::cout << "sqrt(2 * S) = " << S << std::endl;
void scale(get_value_t< A > alpha, A &&a, unary_op op=unary_op::IDENTITY)
In-place tensor scaling with cuTENSOR/TBLIS/nda dispatch and optional element-wise unary operation.
Output:
2 * S = [2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48]
sqrt(2 * S) = [1.41421,2,2.44949,2.82843,3.16228,3.4641,3.74166,4,4.24264,4.47214,4.69042,4.89898,5.09902,5.2915,5.47723,5.65685,5.83095,6,6.16441,6.32456,6.48074,6.63325,6.78233,6.9282]
After the two calls, every entry of S equals \( \sqrt{2 \, n} \) for \( n = 1, 2, \dots, 24 \).
For a complex tensor, unary_op::CONJ flips the sign of the imaginary parts:
using cplx = std::complex<double>;
for (int n = 0; auto &x : Z) {
x = cplx(n, n + 1);
++n;
}
std::cout << "conj(Z) = " << Z << std::endl;
Output:
conj(Z) = [(0,-1),(1,-2),(2,-3),(3,-4),(4,-5),(5,-6),(6,-7),(7,-8)]
Element-wise binary operations
nda::tensor::elementwise applies an arbitrary nda::tensor::binary_op element-wise:
\[ B_{\text{idx}_B} \leftarrow \text{op}(\alpha \, A_{\text{idx}_A}, \, \beta \, B_{\text{idx}_B}) \;.
\]
With binary_op::SUM it reduces to nda::tensor::add:
for (int n = 0; auto &x : E1) x = static_cast<double>(n++);
E2 = 1.0;
std::cout << "E2 <- E1 + E2 = " << E2 << std::endl;
void elementwise(get_value_t< A > alpha, A const &a, std::string_view idx_a, get_value_t< A > beta, B &&b, std::string_view idx_b, binary_op op=binary_op::SUM)
In-place elementwise binary tensor operation with cuTENSOR/nda dispatch.
Output:
E2 <- E1 + E2 = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24]
With binary_op::PROD we get an element-wise (Hadamard) product:
F1 = 2.0;
F2 = 3.0;
std::cout << "F2 <- F1 * F2 = " << F2 << std::endl;
Output:
F2 <- F1 * F2 = [6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6]
Every entry of F2 is \( 1 \cdot 2 \cdot 1 \cdot 3 = 6 \).
Full source
The complete, compilable source for this example:
#include <complex>
#include <iostream>
int main() {
for (int n = 0; auto &x : A) x = n++;
std::cout << "A = " << A << std::endl;
std::cout << "A.shape() = " << A.shape() << std::endl;
B = 1.0;
double a1 = 2.0;
double b1 = 3.0;
std::cout << "B = 2*A + 3 = " << B << std::endl;
for (int n = 0; auto &x : A2) x = 0.1 * n++;
for (int n = 0; auto &x : B2) x = 0.1 * n++;
std::cout << "C_il = A_ijk * B_jkl = " << C2 << std::endl;
std::cout << "A_ij * B_jk = " << Cm << std::endl;
std::cout << "dot(A2, A2) [indexed] = " << d_indexed << std::endl;
std::cout << "dot(A2, A2) [default] = " << d_default << std::endl;
for (int n = 0; auto &x : S) x = static_cast<double>(n++ + 1);
std::cout << "2 * S = " << S << std::endl;
std::cout << "sqrt(2 * S) = " << S << std::endl;
using cplx = std::complex<double>;
for (int n = 0; auto &x : Z) {
x = cplx(n, n + 1);
++n;
}
std::cout << "conj(Z) = " << Z << std::endl;
for (int n = 0; auto &x : E1) x = static_cast<double>(n++);
E2 = 1.0;
std::cout << "E2 <- E1 + E2 = " << E2 << std::endl;
F1 = 2.0;
F2 = 3.0;
std::cout << "F2 <- F1 * F2 = " << F2 << std::endl;
}