Reference
RestrictedBoltzmannMachines.Binary
— TypeBinary(θ)
Layer with binary units, with external fields θ
.
RestrictedBoltzmannMachines.Gaussian
— TypeGaussian(θ, γ)
Gaussian layer, with location parameters θ
and scale parameters γ
.
RestrictedBoltzmannMachines.Potts
— TypePotts(θ)
Layer with Potts units, with external fields θ
. Encodes categorical variables as one-hot vectors. The number of classes is the size of the first dimension.
RestrictedBoltzmannMachines.RBM
— TypeRBM{V,H,W}
RBM, with visible layer of type V
, hidden layer of type H
, and weights of type W
.
RestrictedBoltzmannMachines.ReLU
— TypeReLU(θ, γ)
Layer with ReLU units, with location parameters θ
and scale parameters γ
.
RestrictedBoltzmannMachines.Spin
— TypeSpin(θ)
Layer with spin units, with external fields θ
. The energy of a layer with units $s_i$ is given by:
\[E = -\sum_i \theta_i s_i\]
where each spin $s_i$ takes values $\pm 1$.
RestrictedBoltzmannMachines.BinaryRBM
— MethodBinaryRBM(a, b, w)
BinaryRBM(N, M)
Construct an RBM with binary visible and hidden units, which has an energy function:
\[E(v, h) = -a'v - b'h - v'wh\]
Equivalent to RBM(Binary(a), Binary(b), w)
.
RestrictedBoltzmannMachines.HopfieldRBM
— MethodHopfieldRBM(g, θ, γ, w)
HopfieldRBM(g, w)
Construct an RBM with spin visible units and Gaussian hidden units. If not given, θ = 0
and γ = 1
by default.
\[E(v, h) = -g'v - θ'h + \sum_\mu \frac{γ_\mu}{2} h_\mu^2 - v'wh\]
RestrictedBoltzmannMachines.ais
— Methodais(rbm0, rbm1, v0, βs)
Provided v0
is an equilibrated sample from rbm0
, returns F
such that mean(exp.(F))
is an unbiased estimator of Z1/Z0
, the ratio of partition functions of rbm1
and rbm0
.
!!! tip Use logmeanexp
logmeanexp(F)
, using the function logmeanexp
[@ref] provided in this package, tends to give a better approximation of log(Z1) - log(Z0)
than mean(F)
.
RestrictedBoltzmannMachines.aise
— Methodaise(rbm, [βs]; [nbetas], init=rbm.visible, nsamples=1)
AIS estimator of the log-partition function of rbm
. It is recommended to fit init
to the single-site statistics of rbm
(or the data).
!!! tip Use large nbetas
For more accurate estimates, use larger nbetas
. It is usually better to have large nbetas
and small nsamples
, rather than large nsamples
and small nbetas
.
RestrictedBoltzmannMachines.anneal
— Methodanneal(rbm0, rbm1; β)
Returns an RBM that interpolates between rbm0
and rbm1
. Denoting by E0(v, h)
and E1(v, h)
the energies assigned by rbm0
and rbm1
, respectively, the returned RBM assigns energies given by:
E(v,h) = (1 - β) * E0(v) + β * E1(v, h)
RestrictedBoltzmannMachines.batch_size
— Methodbatch_size(rbm, v, h)
Returns the batch size if energy(rbm, v, h)
were computed.
RestrictedBoltzmannMachines.batch_size
— Methodbatch_size(layer, x)
Batch sizes of x
, with respect to layer
.
RestrictedBoltzmannMachines.batchcov
— Methodbatchcov(layer, x; wts = nothing, [mean])
Covariance of x
over batch dimensions, weigthed by wts
.
RestrictedBoltzmannMachines.batchdims
— Methodbatchdims(layer, x)
Indices of batch dimensions in x
, with respect to layer
.
RestrictedBoltzmannMachines.batchmean
— Methodbatchmean(layer, x; wts = nothing)
Mean of x
over batch dimensions, weigthed by wts
.
RestrictedBoltzmannMachines.batchstd
— Methodbatchstd(layer, x; wts = nothing, [mean])
Standard deviation of x
over batch dimensions, weigthed by wts
.
RestrictedBoltzmannMachines.batchvar
— Methodbatchvar(layer, x; wts = nothing, [mean])
Variance of x
over batch dimensions, weigthed by wts
.
RestrictedBoltzmannMachines.block_matrix_invert
— Methodblock_matrix_invert(A, B, C, D)
Inversion of a block matrix, using the formula:
\[\begin{bmatrix} \mathbf{A} & \mathbf{B} \\ \mathbf{C} & \mathbf{D} \end{bmatrix}^{-1} = \begin{bmatrix} \left(\mathbf{A} - \mathbf{B} \mathbf{D}^{-1} \mathbf{C}\right)^{-1} & \mathbf{0} \\ \mathbf{0} & \left(\mathbf{D} - \mathbf{C} \mathbf{A}^{-1} \mathbf{B}\right)^{-1} \end{bmatrix} \begin{bmatrix} \mathbf{I} & -\mathbf{B} \mathbf{D}^{-1} \\ -\mathbf{C} \mathbf{A}^{-1} & \mathbf{I} \end{bmatrix}\]
Assumes that A
and D
are square and invertible.
RestrictedBoltzmannMachines.block_matrix_logdet
— Methodblock_matrix_logdet(A, B, C, D)
Log-determinant of a block matrix using the determinant lemma.
\[\det\left( \begin{bmatrix} \mathbf{A} & \mathbf{B} \\ \mathbf{C} & \mathbf{D} \end{bmatrix} \right) = \det(A) \det(D - CA^{-1}B) = \det(D) \det(A - BD^{-1}C)\]
Here we assume that A
and D
are invertible, and moreover are easy to invert (for example, if they are diagonal). We use this to chose one or the other of the two formulas above.
RestrictedBoltzmannMachines.broadlike
— Methodbroadlike(A, B...)
Broadcasts A
into the size of A .+ B .+ ...
(without actually doing a sum).
RestrictedBoltzmannMachines.categorical_rand
— Methodcategorical_rand(ps)
Randomly draw i
with probability ps[i]
. You must ensure that ps
defines a proper probability distribution.
RestrictedBoltzmannMachines.categorical_sample
— Methodcategorical_sample(P)
Given a probability array P
of size (q, *)
, returns an array C
of size (*)
, such that C[i] ∈ 1:q
is a random sample from the categorical distribution P[:,i]
. You must ensure that P
defines a proper probability distribution.
RestrictedBoltzmannMachines.categorical_sample_from_logits
— Methodcategorical_sample_from_logits(logits)
Given a logits array logits
of size (q, *)
(where q
is the number of classes), returns an array X
of size (*)
, such that X[i]
is a categorical random sample from the distribution with logits logits[:,i]
.
RestrictedBoltzmannMachines.categorical_sample_from_logits_gumbel
— Methodcategorical_sample_from_logits_gumbel(logits)
Like categoricalsamplefrom_logits, but using the Gumbel trick.
RestrictedBoltzmannMachines.cgf
— Functioncgf(layer, inputs = 0)
Cumulant generating function of layer, reduced over layer dimensions.
RestrictedBoltzmannMachines.cold_metropolis
— Methodcold_metropolis(rbm, v; steps = 1)
Samples the rbm
at zero temperature, starting from configuration v
.
RestrictedBoltzmannMachines.collect_states
— Methodcollect_states(layer)
Returns an array of all states of layer
. Only defined for discrete layers.
Use only for small layers. For large layers, the exponential number of states will not fit in memory.
RestrictedBoltzmannMachines.colors
— Methodcolors(layer)
Number of possible states of units in discrete layers.
RestrictedBoltzmannMachines.energies
— Methodenergies(layer, x)
Energies of units in layer (not reduced over layer dimensions).
RestrictedBoltzmannMachines.energy
— Methodenergy(rbm, v, h)
Energy of the rbm in the configuration (v,h)
.
RestrictedBoltzmannMachines.energy
— Methodenergy(layer, x)
Layer energy, reduced over layer dimensions.
RestrictedBoltzmannMachines.flatten
— Methodflatten(layer, x)
Returns a vectorized version of x
.
RestrictedBoltzmannMachines.free_energy
— Methodfree_energy(rbm, v)
Free energy of visible configuration (after marginalizing hidden configurations).
RestrictedBoltzmannMachines.generate_sequences
— Functiongenerate_sequences(n, A = 0:1)
Retruns an iterator over all sequences of length n
out of the alphabet A
.
RestrictedBoltzmannMachines.initialize!
— Functioninitialize!(rbm, [data]; ϵ = 1e-6)
Initializes the RBM and returns it. If provided, matches average visible unit activities from data
.
initialize!(layer, [data]; ϵ = 1e-6)
Initializes a layer and returns it. If provided, matches average unit activities from data
.
RestrictedBoltzmannMachines.initialize_w!
— Methodinitialize_w!(rbm, data; λ = 0.1)
Initializes rbm.w
such that typical inputs to hidden units are λ.
RestrictedBoltzmannMachines.inputs_h_from_v
— Methodinputs_h_from_v(rbm, v)
Interaction inputs from visible to hidden layer.
RestrictedBoltzmannMachines.inputs_v_from_h
— Methodinputs_v_from_h(rbm, h)
Interaction inputs from hidden to visible layer.
RestrictedBoltzmannMachines.interaction_energy
— Methodinteraction_energy(rbm, v, h)
Weight mediated interaction energy.
RestrictedBoltzmannMachines.log_likelihood
— Methodlog_likelihood(rbm, v)
Log-likelihood of v
under rbm
, with the partition function compued by extensive enumeration. For discrete layers, this is exponentially slow for large machines.
RestrictedBoltzmannMachines.log_partition
— Methodlog_partition(rbm)
Log-partition of rbm
, computed by extensive enumeration of visible states (except for particular cases such as Gaussian-Gaussian RBM). This is exponentially slow for large machines.
If your RBM has a smaller hidden layer, mirroring the layers of the rbm
first (see mirror
).
RestrictedBoltzmannMachines.log_partition_zero_weight
— Methodlog_partition_zero_weight(rbm)
Log-partition function of a zero-weight version of rbm
.
RestrictedBoltzmannMachines.log_pseudolikelihood
— Methodlog_pseudolikelihood(rbm, v; exact = false)
Log-pseudolikelihood of v
. If exact
is true
, the exact pseudolikelihood is returned. But this is slow if v
consists of many samples. Therefore by default exact
is false
, in which case the result is a stochastic approximation, where a random site is selected for each sample, and its conditional probability is calculated. In average the results with exact = false
coincide with the deterministic result, and the estimate is more precise as the number of samples increases.
RestrictedBoltzmannMachines.log_pseudolikelihood_exact
— Methodlog_pseudolikelihood_exact(rbm, v)
Log-pseudolikelihood of v
. This function computes the exact pseudolikelihood, doing traces over all sites. Note that this can be slow for large number of samples.
RestrictedBoltzmannMachines.log_pseudolikelihood_sites
— Methodlog_pseudolikelihood_sites(rbm, v, sites)
Log-pseudolikelihood of a site conditioned on the other sites, where sites
is an array of site indices (CartesianIndex), one for each sample. Returns an array of log-pseudolikelihood values, for each sample.
RestrictedBoltzmannMachines.log_pseudolikelihood_stoch
— Methodlog_pseudolikelihood_stoch(rbm, v)
Log-pseudolikelihood of v
. This function computes an stochastic approximation, by doing a trace over random sites for each sample. For large number of samples, this is in average close to the exact value of the pseudolikelihood.
RestrictedBoltzmannMachines.logmeanexp
— Methodlogmeanexp(A; dims=:)
Computes log.(mean(exp.(A); dims))
, in a numerically stable way.
RestrictedBoltzmannMachines.logstdexp
— Methodlogstdexp(A; dims=:)
Computes log.(std(exp.(A); dims))
, in a numerically stable way.
RestrictedBoltzmannMachines.logvarexp
— Methodlogvarexp(A; dims=:)
Computes log.(var(exp.(A); dims))
, in a numerically stable way.
RestrictedBoltzmannMachines.mean_h_from_v
— Methodmean_h_from_v(rbm, v)
Mean unit activation values, conditioned on the other layer, <h | v>.
RestrictedBoltzmannMachines.mean_v_from_h
— Methodmean_v_from_h(rbm, v)
Mean unit activation values, conditioned on the other layer, <v | h>.
RestrictedBoltzmannMachines.metropolis!
— Methodmetropolis!(v, rbm; β = 1)
Metropolis-Hastings sampling from rbm
at inverse temperature β. Uses v[:,:,..,:,1]
as initial configurations, and writes the Monte-Carlo chains in v[:,:,..,:,2:end]
.
RestrictedBoltzmannMachines.metropolis
— Methodmetropolis(rbm, v; β = 1, steps = 1)
Metropolis-Hastings sampling from rbm
at inverse temperature β
, starting from configuration v
. Moves are proposed by normal Gibbs sampling.
RestrictedBoltzmannMachines.mirror
— Methodmirror(rbm)
Returns a new RBM with visible and hidden layers flipped.
RestrictedBoltzmannMachines.mode_h_from_v
— Methodmode_h_from_v(rbm, v)
Mode unit activations, conditioned on the other layer.
RestrictedBoltzmannMachines.mode_v_from_h
— Methodmode_v_from_h(rbm, h)
Mode unit activations, conditioned on the other layer.
RestrictedBoltzmannMachines.moving_average
— Methodmoving_average(A, m)
Moving average of A
with window size m
.
RestrictedBoltzmannMachines.onehot_decode
— Methodonehot_decode(X)
Given a onehot encoded array X
of N + 1
dimensions, returns the equivalent categorical array of N
dimensions.
RestrictedBoltzmannMachines.onehot_encode
— Functiononehot_encode(A, code)
Given an array A
of N
dimensions, returns a one-hot encoded BitArray
of N + 1
dimensions where single entries of the first dimension are one.
RestrictedBoltzmannMachines.pcd!
— Methodpcd!(rbm, data)
Trains the RBM on data using Persistent Contrastive divergence.
RestrictedBoltzmannMachines.raise
— Methodraise(rbm::RBM, βs; v, init)
Reverse AIS estimator of the log-partition function of rbm
. While aise
tends to understimate the log of the partition function, raise
tends to overstimate it. v
must be an equilibrated sample from rbm
.
!!! tip Use logmeanexp
If F = raise(...)
, then -logmeanexp(-F)
, using the function logmeanexp
[@ref] provided in this package, tends to give a better approximation of log(Z)
than mean(F)
.
!!! tip Sandwiching the log-partition function If Rf = aise(...)
, Rr = raise(...)
are the AIS and reverse AIS estimators, we have the stochastic bounds logmeanexp(Rf) ≤ log(Z) ≤ -logmeanexp(-Rr)
.
RestrictedBoltzmannMachines.randgumbel
— Methodrandgumbel(T = Float64)
Generates a random Gumbel variate.
RestrictedBoltzmannMachines.randnt
— Methodrandnt([rng], a)
Random standard normal lower truncated at a
(that is, Z ≥ a).
RestrictedBoltzmannMachines.randnt_half
— Methodrandnt_half([rng], μ, σ)
Samples the normal distribution with mean μ
and standard deviation σ
truncated to positive values.
RestrictedBoltzmannMachines.reconstruction_error
— Methodreconstruction_error(rbm, v; steps = 1)
Stochastic reconstruction error of v
.
RestrictedBoltzmannMachines.rescale_activations!
— Methodrescale_activations!(layer, λ::AbstractArray)
For continuous layers with scale parameters, re-parameterizes such that unit activations are divided by λ
, and returns true
. For other layers, does nothing and returns false
.
rescale_hidden!(rbm, λ::AbstractArray)
For continuous hidden units with a scale parameter, scales parameters such that hidden unit activations are divided by λ
, and returns true
. For other hidden units does nothing and returns false
. The modified RBM is equivalent to the original one.
RestrictedBoltzmannMachines.rescale_weights!
— Methodrescale_weights!(rbm, λ::AbstractArray)
For continuous hidden units with a scale parameter, scales parameters such that the weights attached to each hidden unit have norm 1.
RestrictedBoltzmannMachines.reshape_maybe
— Methodreshape_maybe(x, shape)
Like reshape(x, shape)
, except that zero-dimensional outputs are returned as scalars.
RestrictedBoltzmannMachines.sample_h_from_h
— Methodsample_h_from_h(rbm, h; steps=1)
Samples a hidden configuration conditional on another hidden configuration h
. Ensures type stability by requiring that the returned array is of the same type as h
.
RestrictedBoltzmannMachines.sample_h_from_v
— Methodsample_h_from_v(rbm, v)
Samples a hidden configuration conditional on the visible configuration v
.
RestrictedBoltzmannMachines.sample_v_from_h
— Methodsample_v_from_h(rbm, h)
Samples a visible configuration conditional on the hidden configuration h
.
RestrictedBoltzmannMachines.sample_v_from_v
— Methodsample_v_from_v(rbm, v; steps=1)
Samples a visible configuration conditional on another visible configuration v
. Ensures type stability by requiring that the returned array is of the same type as v
.
RestrictedBoltzmannMachines.sitedims
— Methodsitedims(layer)
Number of dimensions of layer, with special handling of Potts layer, for which the first dimension doesn't count as a site dimension.
RestrictedBoltzmannMachines.sitesize
— Methodsitesize(layer)
Size of layer, with special handling of Potts layer, for which the first dimension doesn't count as a site dimension.
RestrictedBoltzmannMachines.sqrt1half
— Methodsqrt1half(x)
Accurate computation of sqrt(1 + (x/2)^2) + |x|/2.
RestrictedBoltzmannMachines.substitution_matrix_exhaustive
— Functionsubstitution_matrix_exhaustive(rbm, v)
Returns an q x N x B tensor of free energies F
, where q
is the number of possible values of each site, B
the number of data points, and N
the sequence length:
`q, N, B = size(v)
Thus F
and v
have the same size. The entry F[x,i,b]
gives the free energy cost of flipping site i
to x
of v[b]
from its original value to x
, that is:
F[x,i,b] = free_energy(rbm, v_) - free_energy(rbm, v[b])
where v_
is the same as v[b]
in all sites but i
, where v_
has the value x
.
Note that i
can be a set of indices.
RestrictedBoltzmannMachines.substitution_matrix_sites
— Functionsubstitution_matrix_sites(rbm, v, sites)
Returns an q x B matrix of free energies F
, where q
is the number of possible values of each site, and B
the number of data points. The entry F[x,b]
equals the free energy cost of flipping site[b]
of v[b]
to x
, that is (schemetically):
F[x, b] = free_energy(rbm, v_) - free_energy(rbm, v)
where v = v[b]
, and v_
is the same as v
in all sites except site[b]
, where v_
has the value x
.
RestrictedBoltzmannMachines.tnmean
— Methodtnmean(a)
Mean of the standard normal distribution, truncated to the interval (a, +∞).
RestrictedBoltzmannMachines.tnmeanvar
— Methodtnmeanvar(a)
Mean and variance of the standard normal distribution truncated to the interval (a, +∞). Equivalent to tnmean(a), tnvar(a)
but saves some common computations. WARNING: tnvar(a) can fail for very very large values of
a`.
RestrictedBoltzmannMachines.tnvar
— Methodtnvar(a)
Variance of the standard normal distribution, truncated to the interval (a, +∞). WARNING: Fails for very very large values of a
.
RestrictedBoltzmannMachines.total_mean_from_inputs
— Functiontotal_mean_from_inputs(layer, inputs; wts = nothing)
Total mean of unit activations from inputs.
RestrictedBoltzmannMachines.total_meanvar_from_inputs
— Functiontotal_meanvar_from_inputs(layer, inputs; wts = nothing)
Total mean and total variance of unit activations from inputs.
RestrictedBoltzmannMachines.total_var_from_inputs
— Functiontotal_var_from_inputs(layer, inputs; wts = nothing)
Total variance of unit activations from inputs.
RestrictedBoltzmannMachines.var_h_from_v
— Methodvar_h_from_v(rbm, v)
Variance of unit activation values, conditioned on the other layer, var(h | v).
RestrictedBoltzmannMachines.var_v_from_h
— Methodvar_v_from_h(rbm, v)
Variance of unit activation values, conditioned on the other layer, var(v | h).
RestrictedBoltzmannMachines.vstack
— Methodvstack(x)
Stack arrays along a new dimension inserted on the left.
RestrictedBoltzmannMachines.vwiden
— Methodvwiden(x)
Adds a singleton dimension on the left.
RestrictedBoltzmannMachines.wmean
— Methodwmean(A; wts = nothing, dims = :)
Weighted mean of A
along dimensions dims
, weighted by wts
.
\[\frac{\sum_i A_i w_i}{\sum_i w_i}\]
RestrictedBoltzmannMachines.wsum
— Methodwsum(A; wts = nothing, dims = :)
Weighted sum of A
along dimensions dims
, weighted by wts
.
\[\frac{\sum_i A_i w_i}\]
RestrictedBoltzmannMachines.zerosum!
— Methodzerosum!(rbm)
In-place zero-sum gauge on rbm
.
RestrictedBoltzmannMachines.zerosum!
— Methodzerosum!(∂, rbm)
Projects the gradient so that it doesn't modify the zerosum gauge.
RestrictedBoltzmannMachines.zerosum
— Methodzerosum(rbm)
Returns an equivalent rbm
in zerosum gauge. Only affects Potts layers. If the rbm
doesn't have Potts
layers, does nothing.
RestrictedBoltzmannMachines.∂cgf
— Function∂cgf(layer, inputs = 0; wts = 1)
Unit activation moments, conjugate to layer parameters. These are obtained by differentiating cgfs
with respect to the layer parameters. Averages over configurations (weigthed by wts
).
RestrictedBoltzmannMachines.∂energy
— Method∂energy(layer, data; wts = nothing)
Derivative of average energy of data
with respect to layer
parameters.
RestrictedBoltzmannMachines.∂free_energy
— Method∂free_energy(rbm, v)
Gradient of free_energy(rbm, v)
with respect to model parameters. If v
consists of multiple samples (batches), then an average is taken.
RestrictedBoltzmannMachines.∂regularize!
— Method∂regularize!(∂, rbm; l2_fields = 0, l1_weights = 0, l2_weights = 0, l2l1_weights = 0)
Updates RBM gradients ∂
, with the regularization gradient.