Mathematical and Physical Background

Many of the methods used for solving problems for interacting quantum systems yield results on the imaginary-frequency (or, equivalently, imaginary-time) axis. But often it is necessary to get results on the real-frequency axis to be able to physically interpret them. Obtaining real-frequency results from imaginary-frequency data can be achieved using analytic continuation.

Mathematical Formulation of the Problem

The retarded fermionic one electron Green function \(G(\omega+i0^+)\) and the Matsubara Green function \(G(i \omega_n)\) are related through the analyticity of \(G(z)\) in the whole complex plane with the exception of the poles below the real axis. This connection is explicit by writing the Green function \(G(z)\) in terms of the spectral function \(A(\omega)\) as

(1)\[G_{ab}(z) = \int d\omega \frac{A_{ab}(\omega)}{z - \omega}.\]

In general, both \(G(z)\) and \(A(\omega)\) are matrix-valued (with indices \(a\), \(b\)), but Eq. (1) is valid for each matrix element separately. The reversed expression, i.e. the matrix-valued \(A_{ab}(\omega)\) given the real-frequency Green function \(G_{ab}(\omega)\), is

\[A_{ab}(\omega) = \frac{1}{2\pi} i (G_{ab}(\omega) - G_{ba}^*(\omega)).\]

Note that for matrices, the spectral function is not proportional to the element-wise imaginary part of the Green function. A drawback of the expression in Eq. (1) is that the real and imaginary parts of \(G\) and \(A\) are coupled due to the fact that \(z\) is complex-valued. This is avoided by Fourier-transforming \(G(i\omega_n)\) to the imaginary time Green function \(G(\tau)\) at inverse temperature \(\beta\);

(2)\[G_{ab}(\tau) = \int d\omega \; \frac{-e^{-\omega\tau}}{1+e^{-\omega\beta}}A_{ab}(\omega).\]

The real part of the spectral function is only connected to the real part of \(G(\tau)\), and analogous for the imaginary part. In the following, we will first recapitulate the maximum entropy theory for a real-valued single-orbital problem as presented in Ref. [1] and later generalize to matrix-valued problems.

In order to handle this problem numerically, the functions \(G(\tau)\) and \(A(\omega)\) in Eq. (2) can be discretized to vectors \(G_n= G(\tau_n)\) and \(A_m = A(\omega_m)\); then, Eq. (2) can be formulated as

(3)\[\mathbf{G} = K \mathbf{A},\]

where the matrix

\[K_{nm} = \frac{-e^{-\omega_m\tau_n}}{1+e^{-\omega_m\beta}} \Delta \omega_m\]

is the kernel of the transformation [3]. It seems that inverting Eq. (3), i.e. calculating \(\mathbf{A}\) via \(\mathbf{A} = K^{-1}\mathbf{G}\), is just as straightforward as calculating \(G(\tau)\) from \(A(\omega)\). However, the inverse is an ill-posed problem. To be more specific, the condition number of \(K\) is very large due to the exponential decay of \(K_{nm}\) with \(\omega_m\) and \(\tau_n\), so that the direct inversion of \(K\) is numerically not feasible by standard techniques. Therefore, a bare minimization of the misfit \(\chi^2 (\mathbf{A}) = (K \mathbf{A} - \mathbf{G})^TC^{-1} (K \mathbf{A} - \mathbf{G})\), with the covariance matrix \(C\), leads to an uncontrollable error [2].

Entropy Regularization

One efficient way to regularize this ill-posed problem is to add an entropic term \(S(A)\). This leads to the maximum entropy method (MEM), where one does not minimize \(\chi^2 (A)\), but

\[\label{eq:Q} Q_\alpha(A) = \frac12 \chi^2 (A) - \alpha S(A).\]

The pre-factor of the entropy, usually denoted \(\alpha\), is a hyper-parameter that is introduced ad hoc and needs to be specified. There are different ways to choose \(\alpha\) that have been employed in the literature. This regularization with an entropy has been put on a rigorous probabilistic footing by John Skilling in 1989, using Bayesian methods [4]. He showed that the only consistent way to choose the entropy for a non-negative function \(A(\omega)\) is

\[S(A) = \int d\omega \left[A(\omega) - D(\omega) - A(\omega) \log\frac{A(\omega)}{D(\omega)} \right], \label{eq:entropy-conventional}\]

where \(D(\omega)\) is the default model. The default model influences the result in two ways: First, it defines the maximum of the prior distribution, which means that in the limit of large \(\alpha\) one has \(A(\omega) \rightarrow D(\omega)\). Second, it is also related to the width of the distribution, since the variance of the prior distribution is proportional to \(D(\omega)\). Often, a flat default model is used, corresponding to no prior knowledge.

Off-diagonal elements

For off-diagonal matrix elements, the spectral function \(A(\omega)\) is not non-negative anymore (the spectral function as a matrix is Hermitian and positive definite). Therefore, the entropy given above cannot be used anymore. However, the off-diagonal spectral function \(A(\omega)\) can be regarded as the difference of two non-negative functions, \(A(\omega) = A^+(\omega) - A^-(\omega)\). Then, for both \(A^+(\omega)\) and \(A^-(\omega)\), the expression of the normal entropy can be used and the total entropy is just the sum of the two [5].

Footnotes