Detailed Description

Multi-dimensional tensor operations on host (CPU) and device (CUDA).

nda ships a thin C++ layer in the nda::tensor namespace that exposes general tensor operations (contractions, reductions, element-wise operations, permutations, ...) on top of two external backends:

TBLIS for host (CPU) execution, and
cuTENSOR for device (CUDA) execution.

Most operations also provide a generic nda fallback that requires neither backend. nda::tensor::contract is the only operation without a fallback; it still requires TBLIS on the host. The fallback path is restricted to the simple same-rank, identical-index-string case — features like permutation, broadcasting and einsum-style reduction remain backend-specific. See the per-operation pages for the exact restrictions.

Einstein-notation index strings. Every operation is parameterised by std::string_view index labels (e.g. "ij", "ijk") that name each axis of the tensors involved. Repeated labels between two tensors are summed (contracted) over, in the spirit of standard Einstein summation. The exact rules — for example, whether repeated labels are allowed within a single tensor — depend on the backend that ends up being called: cuTENSOR rejects such patterns, while TBLIS accepts them. Refer to the per-operation pages for the precise constraints.

Dispatch. Selection of the backend is automatic and uses the address space of the input arrays:

if the address space is device-compatible (see nda::mem::have_device_compatible_addr_space), the call is forwarded to the cuTENSOR-backed implementation in the nda::tensor::device namespace;
otherwise, if TBLIS is available and supports the requested operation, the call goes to the TBLIS-backed implementation in the nda::tensor::tblis namespace;
otherwise, the built-in nda fallback is used (if one exists for the requested operation). Some operations also fall through from TBLIS to the nda fallback for ops TBLIS does not handle (e.g. nda::tensor::reduce with nda::tensor::binary_op::PROD).

Capability flags. The booleans nda::tensor::have_cutensor and nda::tensor::have_tblis (in tensor/tools.hpp) report at compile time whether nda was built with the corresponding backend, and can be used to gate user code.

Topics
	Tensor operations
	User-facing tensor operations dispatched to cuTENSOR, TBLIS, or the nda fallback.
	Tensor utilities
	Supporting types and helpers used by the tensor operations.

Detailed Description

Topics