TRIQS/nda 2.0.0
Multi-dimensional array library for C++
Loading...
Searching...
No Matches
Example 9: Tensor support

In this example, we show how to use the Tensor operations in nda to perform tensor addition, contraction, full dot product, reduction to a scalar, in-place scaling and element-wise binary ops.

All the following code snippets are part of the same main function:

#include <nda/nda.hpp>
#include <complex>
#include <iostream>
int main() {
// code snippets go here ...
}
unary_op
Unary element-wise operations for tensor operations.
Definition tools.hpp:103
binary_op
Binary operations for tensor operations.
Definition tools.hpp:67
Includes all relevant headers for the core nda library.

What is a tensor in nda?

In nda, a tensor is simply an nda::array of arbitrary rank. The nda::tensor namespace adds a small set of operations that work uniformly across ranks and use the Einstein summation convention, where dimensions are named by single-letter index strings and repeated indices imply summation.

In this example, all arrays live on the host (the default address space). The tensor operations dispatch to the optimized TBLIS backend whenever it is available, and otherwise fall back to plain nda expressions for the cases where this is possible (identical ranks and identical index orderings).

Prerequisite: building with TBLIS support

Tensor contraction with arbitrary einsum patterns cannot be expressed by the basic nda expression machinery alone. To enable the full feature set demonstrated here, configure nda with -DTblisSupport=ON:

cmake ../nda.src -DCMAKE_BUILD_TYPE=Release -DTblisSupport=ON

Constructing tensors and index strings

We start by creating a rank-3 tensor of shape \( 2 \times 3 \times 4 \) and filling it with consecutive integers:

// construct a rank-3 tensor of shape 2x3x4 and fill it with consecutive integers
auto A = nda::array<double, 3>(2, 3, 4);
for (int n = 0; auto &x : A) x = n++;
std::cout << "A = " << A << std::endl;
std::cout << "A.shape() = " << A.shape() << std::endl;
basic_array< ValueType, Rank, Layout, 'A', ContainerPolicy > array
Alias template of an nda::basic_array with an 'A' algebra.

Output:

A = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23]
A.shape() = (2 3 4)

Note that nda streams the underlying storage of rank-3 (and higher) arrays as a single flat list.

The tensor operations identify dimensions by single-letter labels. For convenience, nda::tensor::default_index returns a default string of length equal to the rank, starting with 'a':

std::cout << "default_index<3>() = " << nda::tensor::default_index<3>() << std::endl;
std::string_view default_index()
Generate a default index string ("abc...") of a given length.
Definition tools.hpp:265

Output:

default_index<3>() = abc

Tensor addition

nda::tensor::add performs an in-place tensor addition with optional scaling:

\[ B_{\text{idx}_B} \leftarrow \alpha \, A_{\text{idx}_A} + \beta \, B_{\text{idx}_B} \;. \]

// in-place tensor addition: B_ijk <- alpha * A_ijk + beta * B_ijk
auto B = nda::array<double, 3>(2, 3, 4);
B = 1.0;
double a1 = 2.0;
double b1 = 3.0;
nda::tensor::add(a1, A, "ijk", b1, B, "ijk");
std::cout << "B = 2*A + 3 = " << B << std::endl;
void add(get_value_t< A > alpha, A const &a, std::string_view idx_a, get_value_t< A > beta, B &&b, std::string_view idx_b)
Tensor addition with cuTENSOR/TBLIS/nda dispatch.
Definition add.hpp:63

Output:

B = 2*A + 3 = [3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49]

Every entry of B equals \( 2 A_{ijk} + 3 \).

Tensor contraction

nda::tensor::contract is the central tensor operation: a general einsum-style contraction

\[ C_{\text{idx}_C} \leftarrow \alpha \, A_{\text{idx}_A} \cdot B_{\text{idx}_B} + \beta \, C_{\text{idx}_C} \;, \]

where indices that appear in both \( A \) and \( B \) but not in \( C \) are summed over.

A rank-3 contraction \( C_{il} = \sum_{jk} A_{ijk} B_{jkl} \):

// tensor contraction: C_il = sum_jk A_ijk * B_jkl
auto A2 = nda::array<double, 3>(2, 3, 4);
for (int n = 0; auto &x : A2) x = 0.1 * n++;
auto B2 = nda::array<double, 3>(3, 4, 5);
for (int n = 0; auto &x : B2) x = 0.1 * n++;
auto C2 = nda::array<double, 2>::zeros({2, 5});
nda::tensor::contract(1.0, A2, "ijk", B2, "jkl", 0.0, C2, "il");
std::cout << "C_il = A_ijk * B_jkl = " << C2 << std::endl;
static basic_array zeros(std::array< Int, Rank > const &shape)
void contract(get_value_t< A > alpha, A const &a, std::string_view idx_a, B const &b, std::string_view idx_b, get_value_t< A > beta, C &&c, std::string_view idx_c)
Tensor contraction with cuTENSOR/TBLIS dispatch.
Definition contract.hpp:66

Output:

C_il = A_ijk * B_jkl =
[[25.3,25.96,26.62,27.28,27.94]
[64.9,67,69.1,71.2,73.3]]

The matrix product \( \mathbf{A}_m \mathbf{B}_m \) is just the special rank-2 case \( C_{ik} = \sum_j A_{ij} B_{jk} \):

// matrix-matrix multiplication expressed as a rank-2 contraction: C_ik = A_ij * B_jk
auto Am = nda::array<double, 2>{{1, 2, 3}, {4, 5, 6}};
auto Bm = nda::array<double, 2>{{7, 8}, {9, 10}, {11, 12}};
auto Cm = nda::array<double, 2>::zeros({2, 2});
nda::tensor::contract(1.0, Am, "ij", Bm, "jk", 0.0, Cm, "ik");
std::cout << "A_ij * B_jk = " << Cm << std::endl;

Output:

A_ij * B_jk =
[[58,64]
[139,154]]

This is exactly the result returned by Am * Bm in the matrix algebra of nda::matrix; nda exposes the same operation in einsum form, generalized to arbitrary rank.

Full tensor dot product

nda::tensor::dot computes the scalar

\[ z \leftarrow \sum_{\text{idx}_A = \text{idx}_B} A_{\text{idx}_A} \cdot B_{\text{idx}_B} \;, \]

i.e. it contracts every dimension of \( A \) against the matching dimension of \( B \).

// tensor dot product (sum over all matching indices, returns a scalar)
auto d_indexed = nda::tensor::dot(A2, "ijk", A2, "ijk");
auto d_default = nda::tensor::dot(A2, A2);
std::cout << "dot(A2, A2) [indexed] = " << d_indexed << std::endl;
std::cout << "dot(A2, A2) [default] = " << d_default << std::endl;
get_value_t< A > dot(A const &a, std::string_view idx_a, B const &b, std::string_view idx_b)
Full tensor dot product with cuTENSOR/TBLIS/nda dispatch.
Definition dot.hpp:68

Output:

dot(A2, A2) [indexed] = 43.24
dot(A2, A2) [default] = 43.24

The two calls return the same value: the indexed form spells out the pairing explicitly, while the convenience overload uses nda::tensor::default_index for both operands.

Full tensor reduction

nda::tensor::reduce reduces every element of a tensor to a single scalar via one of the operations defined by nda::tensor::binary_op:

// full tensor reductions to a scalar with different binary operations
std::cout << "reduce(A, SUM) = " << nda::tensor::reduce(A) << std::endl;
std::cout << "reduce(A, MAX) = " << nda::tensor::reduce(A, binary_op::MAX) << std::endl;
std::cout << "reduce(A, MIN) = " << nda::tensor::reduce(A, binary_op::MIN) << std::endl;
std::cout << "reduce(A, NORM_2) = " << nda::tensor::reduce(A, binary_op::NORM_2) << std::endl;
get_value_t< A > reduce(A const &a, binary_op op_reduce=binary_op::SUM)
Full tensor reduction with cuTENSOR/TBLIS/nda dispatch.
Definition reduce.hpp:67

Output:

reduce(A, SUM) = 276
reduce(A, MAX) = 23
reduce(A, MIN) = 0
reduce(A, NORM_2) = 65.7571

The default reduction is binary_op::SUM. See nda::tensor::binary_op for the full list of supported reductions.

In-place tensor scaling

nda::tensor::scale rescales a tensor in place and optionally applies an element-wise unary operation (see nda::tensor::unary_op):

\[ A \leftarrow \alpha \, \text{op}(A) \;. \]

// in-place tensor scaling with an optional element-wise unary operation
auto S = nda::array<double, 3>(2, 3, 4);
for (int n = 0; auto &x : S) x = static_cast<double>(n++ + 1);
std::cout << "2 * S = " << S << std::endl;
nda::tensor::scale(1.0, S, unary_op::SQRT);
std::cout << "sqrt(2 * S) = " << S << std::endl;
void scale(get_value_t< A > alpha, A &&a, unary_op op=unary_op::IDENTITY)
In-place tensor scaling with cuTENSOR/TBLIS/nda dispatch and optional element-wise unary operation.
Definition scale.hpp:55

Output:

2 * S = [2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48]
sqrt(2 * S) = [1.41421,2,2.44949,2.82843,3.16228,3.4641,3.74166,4,4.24264,4.47214,4.69042,4.89898,5.09902,5.2915,5.47723,5.65685,5.83095,6,6.16441,6.32456,6.48074,6.63325,6.78233,6.9282]

After the two calls, every entry of S equals \( \sqrt{2 \, n} \) for \( n = 1, 2, \dots, 24 \).

For a complex tensor, unary_op::CONJ flips the sign of the imaginary parts:

using cplx = std::complex<double>;
auto Z = nda::array<cplx, 3>(2, 2, 2);
for (int n = 0; auto &x : Z) {
x = cplx(n, n + 1);
++n;
}
nda::tensor::scale(cplx{1.0, 0.0}, Z, unary_op::CONJ);
std::cout << "conj(Z) = " << Z << std::endl;

Output:

conj(Z) = [(0,-1),(1,-2),(2,-3),(3,-4),(4,-5),(5,-6),(6,-7),(7,-8)]

Element-wise binary operations

nda::tensor::elementwise applies an arbitrary nda::tensor::binary_op element-wise:

\[ B_{\text{idx}_B} \leftarrow \text{op}(\alpha \, A_{\text{idx}_A}, \, \beta \, B_{\text{idx}_B}) \;. \]

With binary_op::SUM it reduces to nda::tensor::add:

// element-wise binary operation: B_ijk <- op(alpha * A_ijk, beta * B_ijk)
auto E1 = nda::array<double, 3>(2, 3, 4);
for (int n = 0; auto &x : E1) x = static_cast<double>(n++);
auto E2 = nda::array<double, 3>(2, 3, 4);
E2 = 1.0;
nda::tensor::elementwise(1.0, E1, "ijk", 1.0, E2, "ijk", binary_op::SUM);
std::cout << "E2 <- E1 + E2 = " << E2 << std::endl;
void elementwise(get_value_t< A > alpha, A const &a, std::string_view idx_a, get_value_t< A > beta, B &&b, std::string_view idx_b, binary_op op=binary_op::SUM)
In-place elementwise binary tensor operation with cuTENSOR/nda dispatch.

Output:

E2 <- E1 + E2 = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24]

With binary_op::PROD we get an element-wise (Hadamard) product:

auto F1 = nda::array<double, 3>(2, 3, 4);
F1 = 2.0;
auto F2 = nda::array<double, 3>(2, 3, 4);
F2 = 3.0;
nda::tensor::elementwise(1.0, F1, "ijk", 1.0, F2, "ijk", binary_op::PROD);
std::cout << "F2 <- F1 * F2 = " << F2 << std::endl;

Output:

F2 <- F1 * F2 = [6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6]

Every entry of F2 is \( 1 \cdot 2 \cdot 1 \cdot 3 = 6 \).

Full source

The complete, compilable source for this example:

#include <nda/nda.hpp>
#include <complex>
#include <iostream>
int main() {
// construct a rank-3 tensor of shape 2x3x4 and fill it with consecutive integers
auto A = nda::array<double, 3>(2, 3, 4);
for (int n = 0; auto &x : A) x = n++;
std::cout << "A = " << A << std::endl;
std::cout << "A.shape() = " << A.shape() << std::endl;
// default index strings: one letter per rank, starting from 'a'
std::cout << "default_index<3>() = " << nda::tensor::default_index<3>() << std::endl;
// in-place tensor addition: B_ijk <- alpha * A_ijk + beta * B_ijk
auto B = nda::array<double, 3>(2, 3, 4);
B = 1.0;
double a1 = 2.0;
double b1 = 3.0;
nda::tensor::add(a1, A, "ijk", b1, B, "ijk");
std::cout << "B = 2*A + 3 = " << B << std::endl;
// tensor contraction: C_il = sum_jk A_ijk * B_jkl
auto A2 = nda::array<double, 3>(2, 3, 4);
for (int n = 0; auto &x : A2) x = 0.1 * n++;
auto B2 = nda::array<double, 3>(3, 4, 5);
for (int n = 0; auto &x : B2) x = 0.1 * n++;
auto C2 = nda::array<double, 2>::zeros({2, 5});
nda::tensor::contract(1.0, A2, "ijk", B2, "jkl", 0.0, C2, "il");
std::cout << "C_il = A_ijk * B_jkl = " << C2 << std::endl;
// matrix-matrix multiplication expressed as a rank-2 contraction: C_ik = A_ij * B_jk
auto Am = nda::array<double, 2>{{1, 2, 3}, {4, 5, 6}};
auto Bm = nda::array<double, 2>{{7, 8}, {9, 10}, {11, 12}};
auto Cm = nda::array<double, 2>::zeros({2, 2});
nda::tensor::contract(1.0, Am, "ij", Bm, "jk", 0.0, Cm, "ik");
std::cout << "A_ij * B_jk = " << Cm << std::endl;
// tensor dot product (sum over all matching indices, returns a scalar)
auto d_indexed = nda::tensor::dot(A2, "ijk", A2, "ijk");
auto d_default = nda::tensor::dot(A2, A2);
std::cout << "dot(A2, A2) [indexed] = " << d_indexed << std::endl;
std::cout << "dot(A2, A2) [default] = " << d_default << std::endl;
// full tensor reductions to a scalar with different binary operations
std::cout << "reduce(A, SUM) = " << nda::tensor::reduce(A) << std::endl;
std::cout << "reduce(A, MAX) = " << nda::tensor::reduce(A, binary_op::MAX) << std::endl;
std::cout << "reduce(A, MIN) = " << nda::tensor::reduce(A, binary_op::MIN) << std::endl;
std::cout << "reduce(A, NORM_2) = " << nda::tensor::reduce(A, binary_op::NORM_2) << std::endl;
// in-place tensor scaling with an optional element-wise unary operation
auto S = nda::array<double, 3>(2, 3, 4);
for (int n = 0; auto &x : S) x = static_cast<double>(n++ + 1);
std::cout << "2 * S = " << S << std::endl;
nda::tensor::scale(1.0, S, unary_op::SQRT);
std::cout << "sqrt(2 * S) = " << S << std::endl;
// CONJ acts on a complex tensor
using cplx = std::complex<double>;
auto Z = nda::array<cplx, 3>(2, 2, 2);
for (int n = 0; auto &x : Z) {
x = cplx(n, n + 1);
++n;
}
nda::tensor::scale(cplx{1.0, 0.0}, Z, unary_op::CONJ);
std::cout << "conj(Z) = " << Z << std::endl;
// element-wise binary operation: B_ijk <- op(alpha * A_ijk, beta * B_ijk)
auto E1 = nda::array<double, 3>(2, 3, 4);
for (int n = 0; auto &x : E1) x = static_cast<double>(n++);
auto E2 = nda::array<double, 3>(2, 3, 4);
E2 = 1.0;
nda::tensor::elementwise(1.0, E1, "ijk", 1.0, E2, "ijk", binary_op::SUM);
std::cout << "E2 <- E1 + E2 = " << E2 << std::endl;
auto F1 = nda::array<double, 3>(2, 3, 4);
F1 = 2.0;
auto F2 = nda::array<double, 3>(2, 3, 4);
F2 = 3.0;
nda::tensor::elementwise(1.0, F1, "ijk", 1.0, F2, "ijk", binary_op::PROD);
std::cout << "F2 <- F1 * F2 = " << F2 << std::endl;
}