Distribution#

Helpers for probabilistic distributions, particularly in the exponential family.

pfl.internal.distribution.distribution.any_sum(elements)#

Compute the sum of the elements in the iterable. This can be of any type, unlike standard Python sum.

Return type:

TypeVar(Element)

pfl.internal.distribution.distribution.any_product(elements)#

Compute the product of the elements in the iterable. This can be of any type.

Return type:

TypeVar(Element)

class pfl.internal.distribution.distribution.Distribution#

A base class representing a (probability) distribution.

This is parameterised by Point, which is the type of a single observation. This is either a float or a Numpy array.

property point_shape#

The shape of points. This is predefined for single-dimensional distributions as an empty tuple. Otherwise, subclasses should override this and specify the type of point they expect.

abstract density(point)#
Return type:

LogFloat

Returns:

The density of the distribution at point point.

log_densities(points)#

Compute the log-density at multiple points. The result is a Numpy array, which cannot contain LogFloat. Subclasses may implement this in a faster way than calling density many times.

Return type:

ndarray

Returns:

Log-densities at many points.

abstract sample(number)#

Draw samples from the distribution.

Parameters:

number (int) – The number of samples to be drawn.

Return type:

ndarray

Diagonal Gaussian#

A trainable multivariate Gaussian distribution with a diagonal variance.

class pfl.internal.distribution.diagonal_gaussian.DiagonalGaussian(mean, variance)#

A multivariate Gaussian distribution with a diagonal variance.

Parameters:
  • mean (Union[ndarray, List[float], float]) – The mean, as a Numpy scalar or vector.

  • variance (Union[ndarray, List[float], float]) – The variance, as a Numpy scalar or vector. This must have the same shape as the mean.

property point_shape#

The shape of points. This is predefined for single-dimensional distributions as an empty tuple. Otherwise, subclasses should override this and specify the type of point they expect.

density(point)#
Return type:

LogFloat

Returns:

The density of the distribution at point point.

sample(number)#

Draw samples from the distribution.

Parameters:

number – The number of samples to be drawn.

split(offset=0.1)#

Split up this Gaussian, changing the mean along the direction of the highest variance, and keeping the variance. Note that the sum of the densities of the two new Gaussians is not the same as density of this one: that is impossible.

In the full-covariance case, this would require finding the first eigenvector, but since this Gaussian has a diagonal (co)variance, the highest entry of the variance vector is used.

Parameters:

offset – The offset as a fraction of the standard deviation in the direction of maximum variance.

pfl.internal.distribution.diagonal_gaussian.diagonal_standard_gaussian(num_dimensions=1)#

Return a unit Gaussian, i.e. with mean 0 and variance 1.

Parameters:

num_dimensions – The number of dimensions of the Gaussian.

Return type:

DiagonalGaussian

Log float#

Represent real value by their logarithms in floating-point format.

pfl.internal.distribution.log_float.log(value)#
Returns:

The natural logarithm of value, or -math.inf if value==0.

class pfl.internal.distribution.log_float.LogFloat(sign, log_value)#

A real number represented by its logarithm in floating-point format, and a sign. The sign is always either -1 or +1. If the value represented is 0, then the sign is always +1.

This can deal with a much larger dynamic range than a standard float. This is useful when computing likelihoods of high-dimensional data, such as sequences: it prevents underflow (or overflow). Various mathematical functions return natural logarithms to prevent overflow. See e.g. scipy.special.gammaln.

classmethod from_value(value)#

Construct a LogFloat from its value as a floating-point number.

property sign: int#
Returns:

The sign (-1 or +1) of the value.

property log_value: float#
Returns:

The logarithm of the absolute value.

property value: float#
Returns:

The value contained, converted to a plain floating-point representation.

Log float functions#

Functions that work on or return LogFloat. For some computations, this makes the results much simpler.

These are mostly simple wrappers for Numpy functions that exist already.

pfl.internal.distribution.log_float_functions.exp(x)#

The exp function that returns a LogFloat. Since LogFloat holds the logarithm of a value, underlyingly this does nothing. However, it is clearer in code.

Return type:

LogFloat

pfl.internal.distribution.log_float_functions.beta_function(alpha, beta)#

Compute the beta function in log space.

pfl.internal.distribution.log_float_functions.incomplete_beta_function(alpha, beta, x)#

Compute the incomplete beta function in log space.

pfl.internal.distribution.log_float_functions.normal_cdf(x)#

The CDF of a standard normal 𝒩(0,1). The result is returned as a LogFloat, so that it is particularly accurate in the left tail.

Return type:

LogFloat

pfl.internal.distribution.log_float_functions.erfc(value)#

Evaluate the complementary error function, 1-erf(value).

Return type:

LogFloat

Returns:

1-erfc(value) as a LogFloat, for numerical precision for value>0.

pfl.internal.distribution.log_float_functions.binomial_coefficients(exponent)#

Yield the binomial coefficients (n choose k) for fixed nonnegative n and k=0,1,2,3, ..., as LogFloat.

If exponent is an integer, then the generator will stop after exponent+1 elements, since the remaining elements would be 0.

If exponent is a float, generalized binomial coefficients are produced, and the generator continues indefinitely. These coefficients are the coefficients one gets when writing out (1+x)**exponent.

Return type:

Iterable[LogFloat]

Mixture#

A mixture of components.

class pfl.internal.distribution.mixture.Mixture(components)#

A mixture model, which has a density that is a weighted sum over “components” from another distribution.

Parameters:

components (Iterable[Tuple[float, Distribution]]) – The components, as pairs of the weight and the component. Weights do not have to add up to 1 (they will be normalized to sum to 1).

property point_shape#

The shape of points. This is predefined for single-dimensional distributions as an empty tuple. Otherwise, subclasses should override this and specify the type of point they expect.

property components: List[Tuple[float, Distribution]]#
Returns:

A list of (weight, component). The weights add up to 1.

responsibilities(point)#
Return type:

List[LogFloat]

Returns:

The responsibilities for this point and each of the components. The responsibility is the posterior probability of the component having generated the point.

density(point)#
Return type:

LogFloat

Returns:

The density of the distribution at point point.

sample(number)#

Draw samples from the distribution.

Parameters:

number – The number of samples to be drawn.