.. _parallelization:

Parallelization
===============

The code spends most of the time in LAPACK diagonalization routines (dsyev, dsyevr, zheev, etc.)
and in level-3 BLAS matrix-matrix multiplication routines (dgemm, zgemm). The execution time can
be significantly reduced by using a high-quality multithreaded BLAS/LAPACK libraries, such as
Intel MKL. Make sure that the relevant environment variables (e.g. OMP_NUM_THREADS, MKL_NUM_THREADS, MKL_DYNAMIC)
are set appropriately. Good choices are e.g. 4 or 8 threads.

The calculations over the different values of the twist parameter :math:`z` (different interleaved discretization grids)
are done in parallel using MPI parallelization. Good choice for the number of MPI processes is the value of ``Nz``, i.e.,
the number of different grids.

Finally, the diagonalisation of matrices can be OpenMP parallelized. This is, however, only beneficial for
large-scale multi-orbital calculations and is quite seldom used. This is configured using the parameter ``diagth``
in nrg_params_t ("low-level NRG parameters"), the default value being 1 (diagonalisations performed in series).