# [triqs/statistics] Tools for statistical analysis: binning, jackknife and autocorrelation time¶

## Introduction¶

Given the statistical samples $$\lbrace x_i\rbrace _{i=0\dots N-1}$$ and $$\lbrace y_i\rbrace _{i=0\dots N-1}$$ of random variables $$X$$ and $$Y$$, one often wants to compute the estimate of the following observables:

$$\langle X \rangle$$, $$\langle X\rangle/\langle Y \rangle$$, $$\langle X \rangle^2$$, or in general $$f(\langle X \rangle , \langle Y \rangle, \dots)$$

as well as the estimate of the errors:

$$\Delta\langle X \rangle$$, $$\Delta\langle X\rangle /\langle Y \rangle$$, $$\Delta\langle X\rangle ^2$$ or $$\Delta f(\langle X \rangle , \langle Y \rangle, \dots)$$

The estimate of the expectation values is the empirical average :

$$\langle X \rangle \approx \frac{1}{N} \sum_{i=0}^{N-1} x_i$$

If the samples are independent from each other and $$f$$ is a linear function of its variables (e.g $$f=Id$$):

$$(\Delta \langle X \rangle)^2 \approx \frac{\frac{N-1}{N} \sigma^2({x})}{N}$$

where $$\sigma^2({x})$$ is the empirical variance of the sample.

In the general case, however,

• the samples are correlated (with a characteristic correlation time): one needs to bin the series to obtain a reliable estimate of the error bar
• $$f$$ is non-linear in its arguments: one needs to jackknife the series

This library allows one to reliably compute the estimates of $$f(\langle X \rangle , \langle Y \rangle, \dots)$$ and its error bar $$\Delta f(\langle X \rangle , \langle Y \rangle, \dots)$$ in the general case.

## Synopsis¶

average_and_error takes an object with the Observable concept (see below) and returns a struct with two members val and error:
• val is the estimate of the expectation value of the random variable for a given sample of it
• error is the estimate of the error on this expectation value for the given sample

## Concepts¶

### TimeSeries¶

An object has the concept of a TimeSeries if it has the following member functions:

Return type Name
value_type operator[](int i)
int size()

and the following member type:

Name Property
value_type belong to an algebra (has +,-,* operators)

### Observable¶

An object has the concept of an observable if it is a TimeSeries and has, additionally, the following member function:

Return type Name
observable& operator<<(T x)

where T belongs to an algebra.

## Example¶

#include <triqs/clef.hpp>
#include <triqs/statistics.hpp>
using namespace triqs::statistics;
int main() {
observable<double> X;
X << 1.0;
X << -1.0;
X << .5;
X << .0;
std::cout << average_and_error(X) << std::endl;
std::cout << average_and_error(X * X) << std::endl;
return 0;
}
---------- Result is -------
0.125 +/- 0.426956
0.0763889 +/- 0.174719


## Histogram¶

histogram is a lightweight object used to represent and to accumulate a histogram of a real random variable.

#include <triqs/statistics.hpp>
using namespace triqs::statistics;

int main() {

// Histogram with 21 bins over [0;10] range
histogram hist{0, 10, 21};

std::cout << "Number of bins = " << hist.size() << std::endl;
auto limits = hist.limits();
std::cout << "Histogram range [" << limits.first << ";" << limits.second << "]" << std::endl;

// Accumulate some value
for (double x : {-10.0, -0.05, 1.1, 2.0, 2.2, 2.9, 3.4, 5.0, 9.0, 10.0, 10.5, 12.1, 32.2}) hist << x;

// Print accumulated histogram
std::cout << "Histogram:\n" << hist << std::endl;

// Accumulated and lost samples
std::cout << "Accumulated data points: " << hist.n_data_pts() << std::endl;
std::cout << "Lost data points: " << hist.n_lost_pts() << std::endl;

std::cout << "Histogram data: " << hist.data() << std::endl;

// Make normalized histogram (PDF)
std::cout << "PDF:\n" << pdf(hist) << std::endl;

// Make integrated and normalized histogram (CDF)
std::cout << "CDF:\n" << cdf(hist) << std::endl;

return 0;
}
---------- Result is -------
Number of bins = 21
Histogram range [0;10]
Histogram:
0  0  0  0
0.5  0  0  0
1  1  0.125  0.125
1.5  0  0  0.125
2  2  0.25  0.375
2.5  0  0  0.375
3  1  0.125  0.5
3.5  1  0.125  0.625
4  0  0  0.625
4.5  0  0  0.625
5  1  0.125  0.75
5.5  0  0  0.75
6  0  0  0.75
6.5  0  0  0.75
7  0  0  0.75
7.5  0  0  0.75
8  0  0  0.75
8.5  0  0  0.75
9  1  0.125  0.875
9.5  0  0  0.875
10  1  0.125  1

Accumulated data points: 8
Lost data points: 5
Histogram data: [0,0,1,0,2,0,1,1,0,0,1,0,0,0,0,0,0,0,1,0,1]
PDF:
0  0  0  0
0.5  0  0  0
1  0.125  0.015625  0.015625
1.5  0  0  0.015625
2  0.25  0.03125  0.046875
2.5  0  0  0.046875
3  0.125  0.015625  0.0625
3.5  0.125  0.015625  0.078125
4  0  0  0.078125
4.5  0  0  0.078125
5  0.125  0.015625  0.09375
5.5  0  0  0.09375
6  0  0  0.09375
6.5  0  0  0.09375
7  0  0  0.09375
7.5  0  0  0.09375
8  0  0  0.09375
8.5  0  0  0.09375
9  0.125  0.015625  0.109375
9.5  0  0  0.109375
10  0.125  0.015625  0.125

CDF:
0  0  0  0
0.5  0  0  0
1  0.125  0.015625  0.015625
1.5  0.125  0.015625  0.03125
2  0.375  0.046875  0.078125
2.5  0.375  0.046875  0.125
3  0.5  0.0625  0.1875
3.5  0.625  0.078125  0.265625
4  0.625  0.078125  0.34375
4.5  0.625  0.078125  0.421875
5  0.75  0.09375  0.515625
5.5  0.75  0.09375  0.609375
6  0.75  0.09375  0.703125
6.5  0.75  0.09375  0.796875
7  0.75  0.09375  0.890625
7.5  0.75  0.09375  0.984375
8  0.75  0.09375  1.07812
8.5  0.75  0.09375  1.17188
9  0.875  0.109375  1.28125
9.5  0.875  0.109375  1.39062
10  1  0.125  1.51562