Detailed Description

User-facing tensor operations dispatched to cuTENSOR, TBLIS, or the nda fallback.

Every function in this group lives in the nda::tensor namespace and follows the same dispatch rules outlined in Tensor support. The available operations are:

nda::tensor::contract — general contraction \( C \leftarrow \alpha\, A \cdot B + \beta\, C \) with Einstein notation. The workhorse: covers gemm, gemv, outer products, batched gemm and arbitrary tensor contractions. Requires cuTENSOR (device) or TBLIS (host); no nda fallback.
nda::tensor::dot — full dot product reducing two tensors to a scalar \( z = \sum A \cdot B \). The nda fallback computes nda::sum(nda::hadamard(a, b)) for same-rank, identical-index inputs.
nda::tensor::add — \( B \leftarrow \alpha\, A + \beta\, B \) (and out-of-place \( C \leftarrow \alpha\, A + \beta\, B \)). cuTENSOR/TBLIS additionally support axis permutation and different-rank broadcast/reduction.
nda::tensor::assign — \( B \leftarrow A \). cuTENSOR/TBLIS additionally support axis permutation and different-rank broadcast/reduction. Use nda::tensor::add to fuse a scalar factor or conjugation with the copy.
nda::tensor::scale — in-place \( A \leftarrow \alpha \cdot \mathrm{op}(A) \) with an optional nda::tensor::unary_op.
nda::tensor::set — fill a tensor with a scalar value.
nda::tensor::reduce — full reduction of a tensor to a scalar. Supported nda::tensor::binary_op values: SUM, PROD, MAX, MIN, SUM_ABS, MAX_ABS, MIN_ABS, NORM_2 (backend coverage varies; MAX/MIN are undefined for complex value types).
nda::tensor::elementwise — in-place elementwise binary operation. Same nda::tensor::binary_op set as nda::tensor::reduce; backend coverage varies.
nda::tensor::elementwise_trinary — in-place elementwise ternary operation \( C \leftarrow \mathrm{op}_{ABC}(\mathrm{op}_{AB}(\alpha\, A,\, \beta\, B),\, \gamma\, C) \) with two nda::tensor::binary_op parameters.

Backend coverage and the existence of an nda fallback differ between operations; the per-function page is authoritative on which dispatch paths and which nda::tensor::binary_op and nda::tensor::unary_op are supported.

Functions
template<BlasArrayOrConj A, BlasArrayFor< A > B>
void	nda::tensor::add (A const &a, B &&b)
	Convenience overload of nda::tensor::add with nda::tensor::default_index strings, \( \alpha = 1 \) and \( \beta = 0 \).
template<BlasArrayOrConj A, BlasArrayFor< A > B>
void	nda::tensor::add (A const &a, std::string_view idx_a, B &&b, std::string_view idx_b)
	Convenience overload of nda::tensor::add with \( \alpha = 1 \) and \( \beta = 0 \).
template<BlasArrayOrConj A, BlasArrayOrConjFor< A > B, BlasArrayFor< A > C>
void	nda::tensor::add (A const &a, std::string_view idx_a, B const &b, std::string_view idx_b, C &&c, std::string_view idx_c)
	Convenience overload of out-of-place nda::tensor::add with \( \alpha = 1 \) and \( \beta = 1 \).
template<BlasArrayOrConj A, BlasArrayFor< A > B>
void	nda::tensor::add (get_value_t< A > alpha, A const &a, get_value_t< A > beta, B &&b)
	Convenience overload of nda::tensor::add with nda::tensor::default_index strings.
template<BlasArrayOrConj A, BlasArrayFor< A > B>
void	nda::tensor::add (get_value_t< A > alpha, A const &a, std::string_view idx_a, get_value_t< A > beta, B &&b, std::string_view idx_b)
	Tensor addition with cuTENSOR/TBLIS/nda dispatch.
template<BlasArrayOrConj A, BlasArrayOrConjFor< A > B, BlasArrayFor< A > C>
void	nda::tensor::add (get_value_t< A > alpha, A const &a, std::string_view idx_a, get_value_t< A > beta, B const &b, std::string_view idx_b, C &&c, std::string_view idx_c)
	Out-of-place tensor addition with cuTENSOR/TBLIS/nda dispatch.
template<MemoryArray A, MemoryArray B> requires (have_same_value_type_v<A, B>)
void	nda::tensor::assign (A const &a, B &&b)
	Convenience overload of nda::tensor::assign with nda::tensor::default_index strings.
template<MemoryArray A, MemoryArray B> requires (have_same_value_type_v<A, B>)
void	nda::tensor::assign (A const &a, std::string_view idx_a, B &&b, std::string_view idx_b)
	Tensor assignment with cuTENSOR/TBLIS/nda dispatch.
template<BlasArrayOrConj A, BlasArrayOrConjFor< A > B, BlasArrayFor< A > C>
void	nda::tensor::contract (A const &a, std::string_view idx_a, B const &b, std::string_view idx_b, C &&c, std::string_view idx_c)
	Convenience overload of nda::tensor::contract with \( \alpha = 1 \) and \( \beta = 0 \).
template<BlasArrayOrConj A, BlasArrayOrConjFor< A > B, BlasArrayFor< A > C>
void	nda::tensor::contract (get_value_t< A > alpha, A const &a, std::string_view idx_a, B const &b, std::string_view idx_b, get_value_t< A > beta, C &&c, std::string_view idx_c)
	Tensor contraction with cuTENSOR/TBLIS dispatch.
template<BlasArrayOrConj A, BlasArrayOrConjFor< A, get_rank< A > > B>
get_value_t< A >	nda::tensor::dot (A const &a, B const &b)
	Convenience overload of nda::tensor::dot with nda::tensor::default_index strings.
template<BlasArrayOrConj A, BlasArrayOrConjFor< A > B>
get_value_t< A >	nda::tensor::dot (A const &a, std::string_view idx_a, B const &b, std::string_view idx_b)
	Full tensor dot product with cuTENSOR/TBLIS/nda dispatch.
template<BlasArrayOrConj A, BlasArrayFor< A > B>
void	nda::tensor::elementwise (A const &a, B &&b, binary_op op=binary_op::SUM)
	Convenience overload of nda::tensor::elementwise with nda::tensor::default_index strings, \( \alpha = 1 \) and \( \beta = 0 \).
template<BlasArrayOrConj A, BlasArrayFor< A > B>
void	nda::tensor::elementwise (A const &a, std::string_view idx_a, B &&b, std::string_view idx_b, binary_op op=binary_op::SUM)
	Convenience overload of nda::tensor::elementwise with \( \alpha = 1 \) and \( \beta = 0 \).
template<BlasArrayOrConj A, BlasArrayFor< A > B>
void	nda::tensor::elementwise (get_value_t< A > alpha, A const &a, get_value_t< A > beta, B &&b, binary_op op=binary_op::SUM)
	Convenience overload of nda::tensor::elementwise with nda::tensor::default_index strings.
template<BlasArrayOrConj A, BlasArrayFor< A > B>
void	nda::tensor::elementwise (get_value_t< A > alpha, A const &a, std::string_view idx_a, get_value_t< A > beta, B &&b, std::string_view idx_b, binary_op op=binary_op::SUM)
	In-place elementwise binary tensor operation with cuTENSOR/nda dispatch.
template<BlasArrayOrConj A, BlasArrayOrConjFor< A > B, BlasArrayFor< A > C>
void	nda::tensor::elementwise_trinary (A const &a, std::string_view idx_a, B const &b, std::string_view idx_b, C &&c, std::string_view idx_c, binary_op op_AB=binary_op::SUM, binary_op op_ABC=binary_op::SUM)
	Convenience overload of nda::tensor::elementwise_trinary with \( \alpha = \beta = 1 \) and \( \gamma = 0 \).
template<BlasArrayOrConj A, BlasArrayOrConjFor< A > B, BlasArrayFor< A > C>
void	nda::tensor::elementwise_trinary (get_value_t< A > alpha, A const &a, std::string_view idx_a, get_value_t< A > beta, B const &b, std::string_view idx_b, get_value_t< A > gamma, C &&c, std::string_view idx_c, binary_op op_AB=binary_op::SUM, binary_op op_ABC=binary_op::SUM)
	In-place elementwise trinary tensor operation with cuTENSOR/nda dispatch.
template<BlasArrayOrConj A>
get_value_t< A >	nda::tensor::reduce (A const &a, binary_op op_reduce=binary_op::SUM)
	Full tensor reduction with cuTENSOR/TBLIS/nda dispatch.
template<BlasArray A>
void	nda::tensor::scale (get_value_t< A > alpha, A &&a, unary_op op=unary_op::IDENTITY)
	In-place tensor scaling with cuTENSOR/TBLIS/nda dispatch and optional element-wise unary operation.
template<BlasArray A>
void	nda::tensor::set (get_value_t< A > alpha, A &&a)
	In-place tensor constant fill with cuTENSOR/TBLIS/nda dispatch.

Function Documentation

◆ add() [1/2]

template<BlasArrayOrConj A, BlasArrayFor< A > B>

void nda::tensor::add	(	get_value_t< A >	alpha,
		A const &	a,
		std::string_view	idx_a,
		get_value_t< A >	beta,
		B &&	b,
		std::string_view	idx_b )

#include <nda/tensor/add.hpp>

Tensor addition with cuTENSOR/TBLIS/nda dispatch.

This function performs a general tensor addition of the form

\[ B_{\text{idx}_B} \leftarrow \alpha \, A_{\text{idx}_A} + \beta \, B_{\text{idx}_B} \;, \]

where \( \alpha \) and \( \beta \) are scalars, and \( A \) and \( B \) are tensors of arbitrary (possibly different) rank. The index strings specify the einsum-style mapping between dimensions of \( A \) and \( B \); indices present in one tensor but absent from the other drive broadcast/reduction on the backend.

Dispatch order and details:

If the input arrays satisfy nda::mem::have_device_compatible_addr_space, cuTENSOR's elementwise binary operation is used.
If TBLIS is available, tblis_tensor_add is used.
Otherwise, fallback to nda expression assignment, i.e. b = alpha * a + beta * b.

The nda host fallback requires identical ranks and identical index strings. For the other backend paths, all inputs are forwarded as-is.

Note: \( A \) is allowed to be a lazy conjugate expression (see nda::blas_lapack::is_conj_array_expr).

Template Parameters

A	nda::blas_lapack::BlasArrayOrConj type.
B	nda::blas_lapack::BlasArrayFor type.

Parameters

alpha	Input scalar \( \alpha \).
a	Input tensor \( A \).
idx_a	Index string \( \text{idx}_A \) for tensor \( A \).
beta	Input scalar \( \beta \).
b	Input/Output tensor \( B \).
idx_b	Index string \( \text{idx}_B \) for tensor \( B \).

Definition at line 63 of file add.hpp.

◆ add() [2/2]

template<BlasArrayOrConj A, BlasArrayOrConjFor< A > B, BlasArrayFor< A > C>

void nda::tensor::add	(	get_value_t< A >	alpha,
		A const &	a,
		std::string_view	idx_a,
		get_value_t< A >	beta,
		B const &	b,
		std::string_view	idx_b,
		C &&	c,
		std::string_view	idx_c )

#include <nda/tensor/add.hpp>

Out-of-place tensor addition with cuTENSOR/TBLIS/nda dispatch.

This function performs a general out-of-place tensor addition of the form

\[ C_{\text{idx}_C} \leftarrow \alpha \, A_{\text{idx}_A} + \beta \, B_{\text{idx}_B} \;, \]

where \( \alpha \) and \( \beta \) are scalars, and \( A \), \( B \) and \( C \) are tensors of arbitrary (possibly different) rank. The result is written into the separate output tensor \( C \) (any prior contents of \( C \) are overwritten). The index strings specify the einsum-style mapping between operand dimensions; indices present in some operands but absent from others drive broadcast/reduction on the backend.

Dispatch order and details:

If the input arrays satisfy nda::mem::have_device_compatible_addr_space, cuTENSOR's elementwise binary operation is used.
If TBLIS is available, tblis_tensor_add is used twice.
Otherwise, fallback to nda expression assignment, i.e. c = alpha * a + beta * b.

The nda host fallback requires identical ranks and identical index strings. For the other backend paths, all inputs are forwarded as-is.

Note: \( A \) and \( B \) are allowed to be lazy conjugate expressions (see nda::blas_lapack::is_conj_array_expr).

Template Parameters

A	nda::blas_lapack::BlasArrayOrConj type.
B	nda::blas_lapack::BlasArrayOrConjFor type.
C	nda::blas_lapack::BlasArrayFor type.

Parameters

alpha	Input scalar \( \alpha \).
a	Input tensor \( A \).
idx_a	Index string \( \text{idx}_A \) for tensor \( A \).
beta	Input scalar \( \beta \).
b	Input tensor \( B \).
idx_b	Index string \( \text{idx}_B \) for tensor \( B \).
c	Output tensor \( C \).
idx_c	Index string \( \text{idx}_C \) for tensor \( C \).

Definition at line 119 of file add.hpp.

◆ assign()

template<MemoryArray A, MemoryArray B>
requires (have_same_value_type_v<A, B>)

void nda::tensor::assign	(	A const &	a,
		std::string_view	idx_a,
		B &&	b,
		std::string_view	idx_b )

#include <nda/tensor/assign.hpp>

Tensor assignment with cuTENSOR/TBLIS/nda dispatch.

This function performs the assignment

\[ B_{\text{idx}_B} \leftarrow A_{\text{idx}_A} \;, \]

where \( A \) and \( B \) are tensors of arbitrary (possibly different) rank. The index strings specify the einsum-style mapping between dimensions of \( A \) and \( B \), allowing for permutations of axes; when ranks differ, indices present in one tensor but absent from the other drive broadcast/reduction on the backend.

Dispatch order and details:

If both arrays are in a device-compatible address space and cuTENSOR is available, cuTENSOR's permute operation is used.
If both arrays are in a host-compatible address space and TBLIS is available, tblis_tensor_add is used with \( \alpha = 1 \), \( \beta = 0 \).
If both arrays are in a host-compatible address space without TBLIS, fallback to nda expression assignment, i.e. b = a.
Otherwise (cross-memory: device-compatible without cuTENSOR or with unsupported value types, or mixed host/device), fallback to a recursive cross-memory copy (no permutation).

The nda host fallback and the cross-memory copy require identical ranks and identical index strings. The cross-memory copy descends the leading axis until reaching a slice that nda can handle directly (rank-1, contiguous, or strided-1d). For the cuTENSOR and TBLIS dispatch paths, all inputs are forwarded as-is to the backend.

Note: This function can be used to assign across different address spaces, i.e. Host \( \to \) Device and Device \( \to \) Host. Use nda::tensor::add if you need to fuse a scalar factor or a conjugation with the copy.

Template Parameters

A	nda::MemoryArray type.
B	nda::MemoryArray type with the same value type as \( A \) (rank may differ from A).

Parameters

a	Input tensor \( A \).
idx_a	Index string \( \text{idx}_A \) for tensor \( A \).
b	Output tensor \( B \).
idx_b	Index string \( \text{idx}_B \) for tensor \( B \).

Definition at line 102 of file assign.hpp.

◆ contract()

template<BlasArrayOrConj A, BlasArrayOrConjFor< A > B, BlasArrayFor< A > C>

void nda::tensor::contract	(	get_value_t< A >	alpha,
		A const &	a,
		std::string_view	idx_a,
		B const &	b,
		std::string_view	idx_b,
		get_value_t< A >	beta,
		C &&	c,
		std::string_view	idx_c )

#include <nda/tensor/contract.hpp>

Tensor contraction with cuTENSOR/TBLIS dispatch.

This function performs a general tensor contraction of the form

\[ C_{\text{idx}_C} \leftarrow \alpha \, A_{\text{idx}_A} \cdot B_{\text{idx}_B} + \beta \, C_{\text{idx}_C} \;, \]

where \( \alpha \) and \( \beta \) are scalars, and \( A \), \( B \) and \( C \) are tensors of arbitrary rank. The contraction pattern is specified via index strings (Einstein notation), where repeated indices between \( A \) and \( B \) are summed over.

Dispatch order and details:

If the input arrays satisfy nda::mem::have_device_compatible_addr_space, cuTENSOR's contraction operation is used.
Otherwise, TBLIS support is required and tblis_tensor_mult is used.

Index strings are forwarded as-is to the backend library (cuTENSOR or TBLIS) without any checks. It's the user's responsibility to ensure that they are valid and consistent with the shapes of the input tensors.

Note: \( A \) and \( B \) are allowed to be lazy conjugate expressions (see nda::blas_lapack::is_conj_array_expr).

Template Parameters

A	nda::blas_lapack::BlasArrayOrConj type.
B	nda::blas_lapack::BlasArrayOrConjFor type.
C	nda::blas_lapack::BlasArrayFor type.

Parameters

alpha	Input scalar \( \alpha \).
a	Input tensor \( A \).
idx_a	Index string \( \text{idx}_A \) for tensor \( A \).
b	Input tensor \( B \).
idx_b	Index string \( \text{idx}_B \) for tensor \( B \).
beta	Input scalar \( \beta \).
c	Input/Output tensor \( C \).
idx_c	Index string \( \text{idx}_C \) for tensor \( C \).

Definition at line 66 of file contract.hpp.

◆ dot()

template<BlasArrayOrConj A, BlasArrayOrConjFor< A > B>

get_value_t< A > nda::tensor::dot	(	A const &	a,
		std::string_view	idx_a,
		B const &	b,
		std::string_view	idx_b )

#include <nda/tensor/dot.hpp>

Full tensor dot product with cuTENSOR/TBLIS/nda dispatch.

This function computes the full tensor dot product

\[ z \leftarrow \sum_{\text{idx}_A = \text{idx}_B} A_{\text{idx}_A} \cdot B_{\text{idx}_B} \;, \]

where \( z \) is a scalar of the same value type as \( A \) and \( B \). The tensors may have arbitrary (possibly different) rank. The index strings define the einsum-style pairing of dimensions.

Dispatch order and details:

If the input arrays satisfy nda::mem::have_device_compatible_addr_space, cuTENSOR's contraction operation is used.
Otherwise, if TBLIS is available, tblis_tensor_dot is used.
Otherwise, fallback to nda::sum(nda::hadamard(a, b)). The nda fallback requires identical ranks and identical index strings; it does not support TBLIS's einsum semantics (repeated indices, broadcast).

For the cuTENSOR and TBLIS dispatch paths, index strings are forwarded as-is to the backend library without any checks. It's the user's responsibility to ensure that they are valid and consistent with the shapes of the input tensors.

Note: \( A \) and \( B \) are allowed to be lazy conjugate expressions (see nda::blas_lapack::is_conj_array_expr).

Template Parameters

A	nda::blas_lapack::BlasArrayOrConj type.
B	nda::blas_lapack::BlasArrayOrConjFor type.

Parameters

a	Input tensor \( A \).
idx_a	Index string \( \text{idx}_A \) for tensor \( A \).
b	Input tensor \( B \).
idx_b	Index string \( \text{idx}_B \) for tensor \( B \).

Returns: The scalar dot product.

Definition at line 68 of file dot.hpp.

◆ elementwise()

template<BlasArrayOrConj A, BlasArrayFor< A > B>

void nda::tensor::elementwise	(	get_value_t< A >	alpha,
		A const &	a,
		std::string_view	idx_a,
		get_value_t< A >	beta,
		B &&	b,
		std::string_view	idx_b,
		binary_op	op = binary_op::SUM )

#include <nda/tensor/elementwise.hpp>

In-place elementwise binary tensor operation with cuTENSOR/nda dispatch.

This function performs an in-place elementwise binary operation of the form

\[ B_{\text{idx}_B} \leftarrow \text{op}\bigl(\alpha \, A_{\text{idx}_A}, \, \beta \, B_{\text{idx}_B}\bigr) \;, \]

where \( \alpha \) and \( \beta \) are scalars, \( A \) and \( B \) are tensors, and \( \text{op} \) is a binary operation (see nda::tensor::binary_op). The index strings specify how the dimensions of \( A \) map to those of \( B \) (Einstein notation); when ranks differ on the cuTENSOR path, indices present in one tensor but absent from the other drive broadcast/reduction on the backend.

Dispatch order and details:

If the input arrays satisfy nda::mem::have_device_compatible_addr_space, cuTENSOR's elementwise binary operation is used.
Otherwise, fallback to nda expression assignment via nda::map.

The supported binary operations depend on the library backend. The nda host fallback requires identical ranks and identical index strings and supports all nda::tensor::binary_op values.

Note: \( A \) is allowed to be a lazy conjugate expression (see nda::blas_lapack::is_conj_array_expr).

Template Parameters

A	nda::blas_lapack::BlasArrayOrConj type.
B	nda::blas_lapack::BlasArrayFor type.

Parameters

alpha	Input scalar \( \alpha \).
a	Input tensor \( A \).
idx_a	Index string \( \text{idx}_A \) for tensor \( A \).
beta	Input scalar \( \beta \).
b	Input/Output tensor \( B \).
idx_b	Index string \( \text{idx}_B \) for tensor \( B \).
op	Binary operation (default: binary_op::SUM).

Definition at line 64 of file elementwise.hpp.

◆ elementwise_trinary()

template<BlasArrayOrConj A, BlasArrayOrConjFor< A > B, BlasArrayFor< A > C>

void nda::tensor::elementwise_trinary	(	get_value_t< A >	alpha,
		A const &	a,
		std::string_view	idx_a,
		get_value_t< A >	beta,
		B const &	b,
		std::string_view	idx_b,
		get_value_t< A >	gamma,
		C &&	c,
		std::string_view	idx_c,
		binary_op	op_AB = binary_op::SUM,
		binary_op	op_ABC = binary_op::SUM )

#include <nda/tensor/elementwise_trinary.hpp>

In-place elementwise trinary tensor operation with cuTENSOR/nda dispatch.

This function performs an in-place elementwise trinary operation of the form

\[ C_{\text{idx}_C} \leftarrow \text{op}_{ABC} \bigl( \text{op}_{AB} \bigl( \alpha \, A_{\text{idx}_A}, \, \beta \, B_{\text{idx}_B} \bigr), \, \gamma \, C_{\text{idx}_C} \bigr) \;, \]

where \( \alpha \), \( \beta \) and \( \gamma \) are scalars, \( A \), \( B \) and \( C \) are tensors of arbitrary (possibly different) ranks, and \( \text{op}_{ABC} \) and \( \text{op}_{AB} \) are binary operations (see nda::tensor::binary_op). The index strings determine the einsum-style mapping between operands; indices absent from one operand drive broadcast/reduction on the backend.

Dispatch order and details:

If the input arrays satisfy nda::mem::have_device_compatible_addr_space, cuTENSOR's elementwise trinary operation is used.
Otherwise, fallback to nda expression assignment via nda::map.

The supported binary operations depend on the library backend. The nda host fallback requires identical ranks and identical index strings and supports all nda::tensor::binary_op values for both op_AB and op_ABC.

Note: \( A \) and \( B \) are allowed to be lazy conjugate expressions (see nda::blas_lapack::is_conj_array_expr).

Template Parameters

A	nda::blas_lapack::BlasArrayOrConj type.
B	nda::blas_lapack::BlasArrayOrConjFor type.
C	nda::blas_lapack::BlasArrayFor type.

Parameters

alpha	Input scalar \( \alpha \).
a	Input tensor \( A \).
idx_a	Index string \( \text{idx}_A \) for tensor \( A \).
beta	Input scalar \( \beta \).
b	Input tensor \( B \).
idx_b	Index string \( \text{idx}_B \) for tensor \( B \).
gamma	Input scalar \( \gamma \).
c	Input/Output tensor \( C \).
idx_c	Index string \( \text{idx}_C \) for tensor \( C \).
op_AB	Binary operation between \( A \) and \( B \) (default: binary_op::SUM).
op_ABC	Binary operation between the result of \( \text{op}_{AB} \) and \( C \) (default: binary_op::SUM).

Definition at line 72 of file elementwise_trinary.hpp.

◆ reduce()

template<BlasArrayOrConj A>

get_value_t< A > nda::tensor::reduce	(	A const &	a,
		binary_op	op_reduce = binary_op::SUM )

#include <nda/tensor/reduce.hpp>

Full tensor reduction with cuTENSOR/TBLIS/nda dispatch.

This function applies a binary reduction operation (sum, max, min, ...) over all elements of a tensor and returns the resulting scalar

\[ z \leftarrow \text{op}_{\text{red}}(A) \;, \]

where \( A \) is a tensor of arbitrary rank and \( \text{op}_{\text{red}} \) is a binary reduction operation (see nda::tensor::binary_op).

Dispatch order and details:

If the input array satisfies nda::mem::have_device_compatible_addr_space, cuTENSOR's reduction operation is used.
If TBLIS is available and op_reduce != binary_op::PROD, tblis_tensor_reduce is used.
Otherwise, fallback to nda algorithms, e.g. nda::sum, nda::max_element, etc.

The supported binary operations depend on the library backend. The nda fallback supports all nda::tensor::binary_op values.

Note: \( A \) is allowed to be a lazy conjugate expression (see nda::blas_lapack::is_conj_array_expr).

Template Parameters

A	nda::blas_lapack::BlasArrayOrConj type.

Parameters

a	Input tensor \( A \).
op_reduce	Binary reduction operation (default: binary_op::SUM).

Returns: The reduced scalar \( z \).

Definition at line 67 of file reduce.hpp.

◆ scale()

template<BlasArray A>

void nda::tensor::scale	(	get_value_t< A >	alpha,
		A &&	a,
		unary_op	op = unary_op::IDENTITY )

#include <nda/tensor/scale.hpp>

In-place tensor scaling with cuTENSOR/TBLIS/nda dispatch and optional element-wise unary operation.

This function performs an in-place scaling of the form

\[ A \leftarrow \alpha \, \text{op}(A) \;, \]

where \( \alpha \) is a scalar, \( A \) is a tensor of arbitrary rank, and op is an element-wise unary operation (see nda::tensor::unary_op).

Dispatch order and details:

If the input array satisfies nda::mem::have_device_compatible_addr_space, cuTENSOR's permute operation is used.
If TBLIS is available and op is unary_op::IDENTITY or unary_op::CONJ, tblis_tensor_scale is used.
Otherwise, fallback to nda expression assignment, e.g. a = alpha * nda::sqrt(a).

The nda fallback supports unary_op::IDENTITY, unary_op::CONJ, unary_op::SQRT, unary_op::ABS, unary_op::EXP, unary_op::LOG, and unary_op::RCP. Other operations raise NDA_RUNTIME_ERROR.

Template Parameters

A	nda::blas_lapack::BlasArray type.

Parameters

alpha	Input scalar \( \alpha \).
a	Input/Output tensor \( A \).
op	Unary operation to apply element-wise (default: nda::tensor::unary_op::IDENTITY).

Definition at line 55 of file scale.hpp.

◆ set()

template<BlasArray A>

void nda::tensor::set	(	get_value_t< A >	alpha,
		A &&	a )

#include <nda/tensor/set.hpp>

In-place tensor constant fill with cuTENSOR/TBLIS/nda dispatch.

This function sets every element of a tensor to a constant value

\[ A \leftarrow \alpha \;, \]

where \( \alpha \) is a scalar and \( A \) is a tensor of arbitrary rank.

Dispatch order and details:

If the input array satisfies nda::mem::have_device_compatible_addr_space, cuTENSOR's elementwise binary operation is used with a temporary rank-0 tensor holding \( \alpha \).
If TBLIS is available, tblis_tensor_set is used.
Otherwise, fallback to nda expression assignment, i.e. a = alpha.

Template Parameters

A	nda::blas_lapack::BlasArray type.

Parameters

alpha	Input scalar \( \alpha \).
a	Output tensor \( A \).

Definition at line 54 of file set.hpp.

Detailed Description

Functions

Function Documentation

◆ add() [1/2]

◆ add() [2/2]

◆ assign()

◆ contract()

◆ dot()

◆ elementwise()

◆ elementwise_trinary()

◆ reduce()

◆ scale()

◆ set()