Skip to content

Probabilistic Library Comparison

Kiran Gopinathan edited this page Jun 9, 2019 · 6 revisions

Probabilistic Library Comparison

To justify the use of the infotheo library for reasoning about the probabilistic properties of the system, an analysis of the 4 main probabilistic libraries for coq were compared:

  • ALEA[1]
  • FCF[2]
  • infotheo[3]
  • Polaris IRIS extension[4]

ALEA[1] Library

The ALEA library[1] defines probability distributions over a type A as a record type contain the following:

  • a monotonic function u mapping functions of the form (A -> [0,1]) to values within 0 to 1
    • (A -> [0,1]) -> [0,1]
  • proofs that the function satisfied certain properties from measure theory
    • stable over addition: u(f1 + f2) = u(f1) + u(f2)
    • stable over multiplication: u(k x f) = k x u(f)
    • stable over inverse: u(1 - f) <= 1 - u(f) The library also introduces an additional type to represent a continuous real range of numbers between 0 and 1.

As can be seen from the definitions, the ALEA library uses the monad over expectations to represent probability, rather than the alternative of using a monad over distributions directly.

  • there are certain additional lemmas that are proven which hint at some probability-like properties, for example:
    • mu_inv_minus: u(\x -> 1 - f(x)) = u(1) - u(f)
    • mu_stable_inv_inv: u(f) <= 1 - u(\x -> 1 - f(x))
    • mu_simpl: (\f -> f x) q = q x

This suggests that the input to the function is some kind of assignment of probability to each element of A, and the distribution combines these probabilities to produce an output probability. i.e.

  • random_coin_dist := \f -> (1/2) * f(head) + (1/2) * f(tail)

This structure allows the distribution to be used both for extracting probabilities (by using a function returning 1 for the element in question), and composing distributions together.

The main benefit of this distribution structure is that it allows for composing distributions without having to manually enumerate and calculate the probability of each intermediate value by simply composing functions. The corresponding paper puts it succinctly as the library sidestepping issues of dealing with complicated probability theory and instead working directly with the algebraic properties of the monad.

This differs from other libraries such as infotheo, wherein distributions are defined by explicitly defining a function mapping elements of the type to probabilities and proving they sum to 1.

This function based definition also means that the library allows program-based distributions to be defined directly within coq rather than having to use a deep embedding, making for simpler definitions for such applications.

The main disadvantage of this definition comes from the fact that as the distributions are not defined in an explicit fashion, additional work must be done to prove that any formed programs are proper distributions and satisfy the properties required.

Additionally, the shallow embedding means that it is not possible to reason about probabilistic programs with general fixpoints in them directly as it can lead to infinite recursion.

FCF[2] Library

Distributions in FCF[2] are defined intuitively as:

  • Dist A : A -> Rat.

FCF uses a shallow embedding (the paper refers to it as a shallow embedding while it is in actuality deep?) of a standard probabilistic computation monad with the operations: - Ret - Bind a f - Rand n (random number between 0 and n) - Repeat c P

Additionally, FCF prove the support of any computation is finite by simply using the properties of an instance of an inductive type and their get_support function. As any instance of a simple inductive type consists of a finite number of applications of the type constructor, a terminating function to retrieve the support of the computation must always produce a finite list of values.

This has the massive benefit of not requiring the output types of distributions to be fintypes while still allowing a standard definition of the evaluation of a distribution.

FCF uses its own definition of Rationals as the output values of probability functions rather than Reals which simplifies some of the reasoning while suffering a negligable loss in reasoning capabilities for the types of probabilities encountered in typical computations.

FCF extends upon this definition of distributions to produce a Probabilistic Relational Hoare Logic system. The main lemmas of interest are: A

  • p ~ q{Q} -> (d := a <-$ p; b <-$ q; ret (a,b)) /\ \forall a,b \in supp(d), (Q a b) - Each element in the support of the joint distribution from distributions p and q satisfies Q

  • {P} p ~ q {Q} -> \forall a b, (P a b) -> (p a) ~ (p q) {Q} - Any pair of elements satisfying P, when fed to computations p and q will produce distributions related by Q

  • p ~ q {Q} -> {Q} r ~ s {S} -> (x <-$ p; r x) ~ (x <-$ q; s x){S}

While FCF uses Rationals to represent probability outcomes, they are still able to reason about notions of negligibly (typically represented as 2^{n}) using a library about asymptotic properties.

Finally FCF provides a mechanism for code extraction that provides a guarantee of equivalence between a model of a probabilistic program and its extracted version by proving that code is equivalent to a form that produces random results by consuming a finite list of random values.

infotheo[3] Library

The infotheo[3] library defines a probability distribution as a record type containing the following: - a function mapping elements of a finite type to positive reals - a proof that the integral of the function over its domain produces 1

In contrast to FCF's definitions, infotheo's requirement of a finType type means that the library can reason about and manipulate opaque distributions directly, whereas FCF's distribution definition provides no information with which distributions can be manipulated directly.

This modification is important for the infotheo library as it was constructed in the development of a library to mechanise Shannon's theorems from information theory in Coq, which often requires manipulating distributions.

Given that when reasoning about probabilistic programs, we rarely work with distributions directly, this modification does not seem to provide any significant advantage for reasoning about programs.

Polaris[4] - IRIS Extension library

The polaris[4] extension for the IRIS library provides distributions as a record with the following contents: - a function mapping countable types to real numbers - a proof that the function is always positive - a proof that the sum of the function over all inputs is equal to 1

The main difference between this definition and infotheo is that Polaris allows for countably infinite supports (only countType requirement), at the cost of requiring more complex proofs to construct the distribution(for example the definition of the bernouli definition).

The library then combines this probabilistic monad with a non-determinism monad to produce a monad for reasoning about concurrent probabilistic executions - additionally, as with FCF they adopt a Probabilistic Hoare relational logic for reasoning about the probabilistic programs.

References

[1] - Philippe Audebaud and Christine Paulin-Mohring. Proofs of randomized algorithms in Coq. Science of Computer Programming, 74(8):568–589, 2009. A preliminary version appeared in the proc. of MPC 06. [2] - Petcher, Adam, and Greg Morrisett. "The foundational cryptography framework." International Conference on Principles of Security and Trust. Springer, Berlin, Heidelberg, 2015. [3] - Affeldt, Reynald, and Manabu Hagiwara. "Formalization of Shannon’s theorems in SSReflect-Coq." International Conference on Interactive Theorem Proving. Springer, Berlin, Heidelberg, 2012. [4] - Tassarotti, Joseph, and Robert Harper. "A separation logic for concurrent randomized programs." Proceedings of the ACM on Programming Languages 3.POPL (2019): 64.