Uploaded by hawxbf

Piecewise Normalizing Flows

advertisement
Piecewise Normalizing Flows
arXiv:2305.02930v1 [stat.ML] 4 May 2023
H. T. J. Bevins1,2,∗ , Will J. Handley1,2
1
Cavendish Astrophysics, University of Cambridge
Cavendish Laboratory, JJ Thomson Avenue, Cambridge, CB3 0HE
2
Kavli Institute for Cosmology, University of Cambridge
Madingley Road, Cambridge, CB3 0HA
∗
htjb2@cam.ac.uk
Abstract
Normalizing flows are an established approach for modelling complex probability
densities through invertible transformations from a base distribution. However, the
accuracy with which the target distribution can be captured by the normalizing
flow is strongly influenced by the topology of the base distribution. A mismatch
between the topology of the target and the base can result in a poor performance, as
is the case for multi-modal problems. A number of different works have attempted
to modify the topology of the base distribution to better match the target, either
through the use of Gaussian Mixture Models [Izmailov et al., 2020, Ardizzone
et al., 2020, Hagemann and Neumayer, 2021] or learned accept/reject sampling
[Stimper et al., 2022]. We introduce piecewise normalizing flows which divide
the target distribution into clusters, with topologies that better match the standard
normal base distribution, and train a series of flows to model complex multi-modal
targets. The piecewise nature of the flows can be exploited to significantly reduce
the computational cost of training through parallelization. We demonstrate the
performance of the piecewise flows using standard benchmarks and compare the
accuracy of the flows to the approach taken in Stimper et al. [2022] for modelling
multi-modal distributions.
1
Introduction
Normalizing flows (NF) are an effective tool for capturing probability distributions given a set of
target samples. They are bijective and differentiable transformations between a base distribution,
often a standard normal distribution, and the target distribution [Papamakarios et al., 2019].
NFs have been applied successfully in a number of fields such as approximating Boltzmann distributions [Wirnsberger et al., 2020], cosmology [Alsing and Handley, 2021, Bevins et al., 2022a,
Friedman and Hassan, 2022], image generation [Ho et al., 2019, Grcić et al., 2021], audio synthesis
[van den Oord et al., 2018], density estimation [Papamakarios et al., 2017, Huang et al., 2018] and
variational inference [Rezende and Mohamed, 2015] among others.
In situations where the target distribution is multi-modal, NFs can take a long time to train well,
and often feature ‘bridges’ between the different modes that are not present in the training data (see
Fig. 1). These bridges arise due to differences between the topology of the base distribution and the
often complex target distribution [Cornish et al., 2019] and is a result of the homeomorphic nature of
the NFs [Runde, 2007].
One approach to deal with multi-modal distributions is to modify the topology of the base distribution.
This was demonstrated in Stimper et al. [2022], in which the authors developed a base distribution
Preprint. Under review.
Target.
MAF
Piecewise MAF
Gaussian Base
Gaussian Base
e.g. Papamakarios et al. 2017
This work
Figure 1: A simple demonstration of the piecewise normalizing flow (NF) described in this paper
and a single NF trained on the same target distribution. We train each NF in the piecewise approach,
of which there are three, and the single NF using early stopping. For the single NF we see bridging
between the different clusters that is not present in the piecewise approach. We use the same MAF
architecture for each network in the piecewise MAF and for the single MAF case.
derived using learned rejection sampling. Typically, the bijective nature of the normalizing flows is
lost when new tools are developed to learn multi-modal probability distributions, however this was
not the case for the resampled base distribution approach discussed in Stimper et al. [2022].
There are a number of other approaches to solve the mismatch between base and target distribution
topologies such as; continuously indexing flows [Cornish et al., 2019], augmenting the space the
model fits in [Huang et al., 2018] and adding stochastic layers [Wu et al., 2020] among others.
A few other works have attempted to use Gaussian mixture models for the base distribution and have
extended various NF structures to perform classification while training. For example, Izmailov et al.
[2020] showed that a high degree of accuracy can be accomplished if data is classified in modes in
the latent space based on a handful of labelled samples. Ardizzone et al. [2020] uses the information
bottleneck to perform classification via a Gaussian mixture model. Similar ideas were explored in
Hagemann and Neumayer [2021].
Our main contribution is to classify samples from our target distribution using a clustering algorithm
and train normalizing flows on the resultant clusters in a piecewise approach where each piece
constitutes a NF with a Gaussian base distribution. This improves the accuracy of the flow and can in
principle reduce the computational resources required for accurate training for two reasons. Firstly,
it is parallelizable and each piece can be trained simultaneously, and secondly if sufficient clusters
are used, the topology of each piece of the target distributions is closer to that of the Gaussian base
distribution. We compare the accuracy of our results against the approach put forward in Stimper
et al. [2022] using a series of benchmark multi-modal distributions.
2
Normalizing Flows
If we have a target distribution X and a base distribution Z ∼ N (µ = 0, σ = 1) then via a change of
variables we can write
δz
p(x) = p(z) det
.
(1)
δx
If there is a bijective transformation between the two f then z can be approximated by fθ−1 (x) and
the change of variable formula becomes
−1
δfθ (x)
p(x) = p(fθ−1 (x)) det
,
(2)
δx
2
where θ are the parameters describing the transformation from the base to target distribution. Here,
the absolute of the determinant of the Jacobian represents the volume contraction from pZ to pX . The
transformation f corresponds to the normalizing flow, and in our work we use Masked Autoregressive
Flows (MAFs) which were proposed for density estimation by Papamakarios et al. [2017] and are
based on the Masked Autoencoder for Distribution Estimation (MADE) outlined in Germain et al.
[2015]. θ are the weights and biases of the MADE networks that make up the flow.
The N dimensional target distribution is decomposed into a series of one dimensional conditional
distributions
Y
p(x) =
p(xi |x1 , x2 , ..., xi−1 ),
(3)
i
where the index i represents the dimension. Each conditional probability is modelled as a Gaussian
distribution with some standard deviation, σi , and mean, µi , which can be written as
P (xi |x1 , x2 , ..., xi−1 ) = N (µi (x1 , x2 , ..., xi−1 ), σi (x1 , x2 , ..., xi−1 )).
(4)
σi and µi are functions of θ and are output from the network. The optimum values of the network
weights and biases are arrived at by training the network to minimize the KL-divergence between the
target distribution and the reconstructed distribution
D(p(x)||pθ (x)) = −Ep(x) [log pθ (x)] + Ep(x) [log p(x)].
(5)
The second term in the KL divergence is independent of the network and so for a minimization
problem we can ignore this and
−Ep(x) [log pθ (x)] = −
N
1 X
log pθ (xj ),
N j=0
(6)
where j corresponds to a sample in the set of N samples. Our minimization problem becomes
argmax
θ
N
X
log pθ (xj ),
(7)
j=0
or equivalently
argmin −
θ
N
X
log pθ (xj ),
(8)
j=0
and by a change of variables
argmin −
θ
N
X
[log pθ (zj0 ) + log
j=0
δzj0
],
δxj
(9)
where z 0 is a function of θ [Alsing et al., 2019, Stimper et al., 2022] given by
(xj − µj )
zj0 =
.
(10)
σj
Typically, many MADE architectures are chained together to improve the expressivity of the MAF
[Papamakarios et al., 2017].
3
Clustering and Normalizing Flows
We propose a piecewise approach to learning complex multi-modal distributions, in which we
decompose the target distribution into clusters and train individual MAFs on each cluster.
In practice, we have k clusters and for each cluster we train a MAF such that we have a set
x = {x1 , x2 , ..., xk } and pθ = {pθ1 , pθ2 , ..., pθk }.
(11)
Typically, if our samples for our target distribution have been drawn with some weights (i.e. using
an MCMC [e.g. Foreman-Mackey et al., 2013] or Nested Sampling [Skilling, 2006, Handley et al.,
2015] algorithm) we write our loss function over N samples as
argmin −
θ
N
X
wj log pθ (xj ).
j=0
3
(12)
k = 2, s = 0.421
k = 4, s = 0.536
k = 6, s = 0.515
k = 8, s = 0.665
k = 10, s = 0.572
k = 12, s = 0.473
Figure 2: The figure shows how the silhouette score, s, can be used to determine the number of
clusters needed to describe the data. In the example s peaks for eight clusters corresponding to one
for each Gaussian in the circle.
Each set of samples in each cluster has associated weights
w = {w1 , w2 , ..., wk }.
(13)
If we have many clusters then our total loss is given by
X
Nk
N0
N1
X
X
argmin −
w0,j log pθ1 (x0,j ) +
w1,j log pθ2 (x1,j ) + ... +
wk,j log pθk (xk,j ) , (14)
θ
j=0
j=0
j=0
which can be written as
argmin −
θ
NX
Nk
cluster X
k=0
wk,j log pθk (xk,j ).
(15)
j=0
where Ncluster corresponds to the number of clusters.
In our analysis each MAF, pθk (xk ) is trained separately using an ADAM optimizer [Kingma and Ba,
2014]. Since each cluster constitutes a simpler target distribution than the whole and we can train
on different clusters in parallel, we find that we need fewer computing resources when training a
piecewise NF in comparison to training a single NF on the same problem.
To determine the number of clusters needed to best describe the target distribution, we use an average
silhouette score, where the silhouette score for an individual sample is a measure of how similar a
sample is with the samples in its cluster compared to samples outside its cluster [Rousseeuw, 1987].
The silhouette score ranges between -1 and 1 and peaks when the clusters are clearly separated and
well-defined, as in Fig. 2.
Our approach is independent of the choice of clustering algorithm, however in the analysis in this
paper we use K-means [Steinhaus, 1957, MacQueen, 1967].
Once trained, we can draw samples from the clusters in our piecewise normalizing flow with a weight
given by the number of samples in the target cluster relative to the total number of samples in the
target distribution, wk = Nk /N . This process is shown in Fig. 3 along with the general flow of
classification and training.
4
Piecewise Normalizing Flows
MAFk=1
w1 = N1 /N
N1
MAFk=2
x ∼ p(x)
N
N2
w2 = N2 /N
N3
w3 = N3 /N
x ∼ pθ (x)
Classifier
MAFk=3
w4 = N4 /N
N4
MAFk=4
Figure 3: The figure shows the training and sampling process for our piecewise NFs. From left to
right, we start with a set of samples x drawn from p(x) and classify the samples into clusters using
a K-means algorithm, determining the number of clusters to use based on the silhouette score. We
then train a Masked Autoregressive Flow on each cluster k using a standard normal distribution as
our base. When drawing samples from pθ (x) we select which MAF to draw from using weights
defined according to the ratio of the number of samples in the cluster and the number of samples in
the original target distribution.
Distribution
Two Moons
Circle of Gaussians
Two Rings
KL-Divergence
MAF
Real NVP
Gaussian Base
Resampled Base
e.g. Papamakarios et al. [2017] Stimper et al. [2022]
0.026 ± 0.041
−0.002 ± 0.012
0.365 ± 0.352
0.081 ± 0.157
0.071 ± 0.557
0.025 ± 0.103
Piecewise MAF
Gaussian Base
This work
0.001 ± 0.013
0.017 ± 0.022
0.005 ± 0.095
Table 1: The table shows the KL-divergence between the target samples for a series of multimodal
distributions and the equivalent representations with a MAF only, a real NVP with a resampled base
distribution [e.g Stimper et al., 2022] and a piecewise MAF. The corresponding distributions are
shown in Fig. 4 and we have used early stopping for training. We find that our piecewise approach is
competitive to the approach detailed in Stimper et al. [2022] for modelling multi-modal probability
distributions. We are effectively modifying the topology of the target distribution to be more like the
base distribution by splitting it in to clusters, whereas Stimper et al. [2022] modified the topology of
the base distribution to be more like the target distribution. Negative KL divergences are a result of
the Monte Carlo nature of the calculation.
4
Benchmarking
We test our piecewise normalising flow on a series of multi-modal problems that are explored in
Stimper et al. [2022]. For each, we model the distribution using a single MAF with a Gaussian
base distribution, a real NVP with a resampled base distribution as in Stimper et al. [2022] and our
piecewise MAF approach. We then calculate the Kullback-Leibler divergence between the original
target distribution and the trained flows.
The log-probability of our piecewise MAF for a sample, x, is given by
log pθ (x) = log
X
Nk
k=0
5
exp(log pθk (x)) ,
(16)
from which we can calculate the Kullback-Leibler divergence and an associated Monte Carlo error
pθ (x)
pθ (x)
1
± √ σ log
,
(17)
Dkl = log
p(x) pθ (x)
p(x) pθ (x)
N
where p(x) is the target distribution, x ∼ pθ (x) and σ(f )g is the standard deviation of f with respect
to g.
We show the target distributions and samples drawn from each type of flow, a MAF, a piecewise
MAF and a real NVP with a resampled base distribution, in Fig. 4 and report the corresponding
KL-divergences in Tab. 1. We train each flow ten times and calculate an average KL-divergence
for each flow and distribution. We add the associated Monte Carlo errors in quadrature to get an
error on the mean KL-divergence. We find that our piecewise approach outperforms the resampled
base distribution approach of Stimper et al. [2022] for all but one of our benchmarks, where the
models are consistent. We use the code available at https://github.com/VincentStimper/
resampled-base-flows to fit the Real NVP flows and a modified version of the code MARGARINE
[Bevins et al., 2022b,a] available at https://github.com/htjb/margarine to fit the masked
autoregressive flows and piecewise equivalents.
Piecewise MAF
Gaussian Base
This work
Two Rings
Circle of Gaussians
Two Moons
Target
MAF
Gaussian Base
Real NVP
e.g. Papamakarios
Resampled Base
et al. 2017
Stimper et al. 2022
Figure 4: The graph shows a series of multimodal target distributions as 2D histograms of samples in
the first column and three representations of this distribution produced with three different types of
normalizing flow. We use early stopping, as described in the main text, to train all the normalizing
flows. The second column shows a simple Masked Autoregressive Flow with a Gaussian base
distribution based on the work in Papamakarios et al. [2017] and the later two columns show two
approaches to improve the accuracy of the model. The first is from Stimper et al. [2022] and uses
a Real NVP flow to model the distributions while resampling the base distribution to better match
the topology of the target distribution. In the last column, we show the results from our piecewise
normalizing flows where we have effectively modified the topology of the target distribution to better
match the Gaussian base distribution by performing clustering.
6
Each flow in our piecewise approach is composed of a chain of six MADE networks each with two
hidden layers of 50 nodes. For all but the ‘Two Rings’ benchmark, where we increase the number of
MADEs to 10, our individual MAF has the same structure as the networks used in the piecewise flow.
As described, we use the silhouette score to determine the number of clusters needed to describe each
data set and find that we need two clusters to describe the ‘Two Moons’ distribution, eight to describe
the ‘Circle of Gaussians’ and 14 to describe the ‘Two Rings’ distribution.
We use early stopping to train all the flows in the paper, including the real NVP resampled base
models from Stimper et al. [2022]. For the single MAF and real NVP, we use 20% of the samples
from our target distribution as test samples. We early stop when the minimum test loss has not
improved for 2% of the maximum iterations requested, 10,000 for all examples in this paper, and
roll back to the optimum model. For the piecewise MAF, we split each cluster into training and test
samples and perform early stopping on each cluster according to the same criterion, meaning that
each cluster is trained for a different number of epochs.
All the experiments in this paper were run on an Apple MacBook Pro with an M2 chip and 8GB
memory.
5
Limitations
Normalizing flows are useful because they are bijective and differentiable, however these properties
are sometimes lost when trying to tackle multi-modal distributions. Our piecewise approach is
differentiable, provided that the optimization algorithm used to classify the samples from the target
distribution into clusters is also differentiable and k, the number of clusters, is fixed during training.
Differentiability is lost if k changes, however, one can imagine an improved k picking algorithm
where k is determined through minimisation of the loss in equation (15). This is left for future work.
Our piecewise approach is bijective if we know the cluster that our sample has been drawn from.
In other words, there is a one to one mapping between a set of real numbers drawn from an N
dimensional base distribution to a set of real numbers on the N dimensional target distribution plus a
cluster number, n ∈ N
N
RN
(18)
N (µ=0,σ=1) ⇐⇒ Rpθ (x) × N.
With an improved k picking algorithm then the dependence on the cluster number could be relinquished.
Our approach is also limited by the complexity of the target distribution and the number of clusters
required to describe the distribution. However, the complexity of the target distribution also impacts
other methods, such as the resampled base approach of Stimper et al. [2022], and the number of
required clusters is an issue of computational power which can be overcome by training each NF in
parallel.
6
Conclusions
In this work, we demonstrate a new method for accurately learning multi-modal probability distributions with normalizing flows through the aid of an initial classification of the target distribution into
clusters.
Normalizing flows generally struggle with multi-modal distributions because there is a significant
difference in the topology of the base distribution and the target distribution to be learnt. By dividing
our target distribution up into clusters, we are essentially, for each piece, making the topology of the
target distribution more like the topology of our base distribution, leading to an improved expressivity.
Stimper et al. [2022] by contrast, resampled the base distribution so that it and the target distribution
shared a more similar topology. We find that both approaches perform well on our benchmark tests,
but that our piecewise approach is better able to capture the structure of two of the distributions when
training with early stopping. The piecewise nature of our approach allows the MAFs to be trained in
parallel.
Bias in machine learning is currently an active area of concern and research [e.g. Jiang and Nachum,
2020, Mehrabi et al., 2021]. While we have focused on normalizing flows for probability density
estimation in this work, we believe that the general approach has wider applications. One can imagine
7
dividing a training data set into different classes and training a series of generative models to produce
new data. However, instead of drawing samples from each model based on the number of samples in
the original class split, as is done here, one could draw samples based on user defined weights that
give equal weight to each generative model. As such, one can begin to overcome some selection
biases when training machine learning models. Of course the ‘user determined’ nature of the weights
could be used to introduce more bias into the modelling, therefore the method through which they are
derived would have to be explicit and ethically sound.
For the time being, through the combination of clustering and normalizing flows, we are able to
produce accurate representations of complex multi-modal probability distributions.
References
Justin Alsing and Will Handley. Nested sampling with any prior you like. Monthly Notices of the
Royal Astronomical Society, 505(1):L95–L99, July 2021. doi: 10.1093/mnrasl/slab057.
Justin Alsing, Tom Charnock, Stephen Feeney, and Benjamin Wandelt. Fast likelihood-free cosmology with neural density estimators and active learning. Monthly Notices of the Royal Astronomical
Society, 488(3):4440–4458, September 2019. doi: 10.1093/mnras/stz1960.
Lynton Ardizzone, Radek Mackowiak, Carsten Rother, and Ullrich Köthe. Training normalizing
flows with the information bottleneck for competitive generative classification. Advances in Neural
Information Processing Systems, 33:7828–7840, 2020.
Harry Bevins, Will Handley, Pablo Lemos, Peter Sims, Eloy de Lera Acedo, and Anastasia Fialkov.
Marginal Bayesian Statistics Using Masked Autoregressive Flows and Kernel Density Estimators
with Examples in Cosmology. arXiv e-prints, art. arXiv:2207.11457, July 2022a.
Harry T. J. Bevins, William J. Handley, Pablo Lemos, Peter H. Sims, Eloy de Lera Acedo, Anastasia
Fialkov, and Justin Alsing. Removing the fat from your posterior samples with margarine. arXiv
e-prints, art. arXiv:2205.12841, May 2022b. doi: 10.48550/arXiv.2205.12841.
Robert Cornish, Anthony L. Caterini, George Deligiannidis, and A. Doucet. Relaxing bijectivity
constraints with continuously indexed normalising flows. In International Conference on Machine
Learning, 2019.
Daniel Foreman-Mackey, David W. Hogg, Dustin Lang, and Jonathan Goodman. emcee: The MCMC
Hammer. Publications of the Astronomical Society of the Pacific, 125(925):306, March 2013. doi:
10.1086/670067.
Roy Friedman and Sultan Hassan. HIGlow: Conditional Normalizing Flows for High-Fidelity HI
Map Modeling. arXiv e-prints, art. arXiv:2211.12724, November 2022. doi: 10.48550/arXiv.2211.
12724.
Mathieu Germain, Karol Gregor, Iain Murray, and Hugo Larochelle. MADE: Masked Autoencoder
for Distribution Estimation. arXiv e-prints, art. arXiv:1502.03509, February 2015. doi: 10.48550/
arXiv.1502.03509.
Matej Grcić, Ivan Grubišić, and Siniša Šegvić. Densely connected normalizing flows. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in
Neural Information Processing Systems, volume 34, pages 23968–23982. Curran Associates,
Inc., 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/
c950cde9b3f83f41721788e3315a14a3-Paper.pdf.
Paul Hagemann and Sebastian Neumayer. Stabilizing invertible neural networks using mixture
models. Inverse Problems, 37(8):085002, 2021.
W. J. Handley, M. P. Hobson, and A. N. Lasenby. POLYCHORD: next-generation nested sampling.
Monthly Notices of the Royal Astronomical Society, 453(4):4384–4398, November 2015. doi:
10.1093/mnras/stv1911.
8
Jonathan Ho, Xi Chen, Aravind Srinivas, Yan Duan, and Pieter Abbeel. Flow++: Improving flowbased generative models with variational dequantization and architecture design. In Kamalika
Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on
Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 2722–2730.
PMLR, 09–15 Jun 2019. URL https://proceedings.mlr.press/v97/ho19a.html.
Chin-Wei Huang, David Krueger, Alexandre Lacoste, and Aaron Courville. Neural autoregressive flows. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International
Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research,
pages 2078–2087. PMLR, 10–15 Jul 2018. URL https://proceedings.mlr.press/v80/
huang18d.html.
Pavel Izmailov, Polina Kirichenko, Marc Finzi, and Andrew Gordon Wilson. Semi-supervised
learning with normalizing flows. In International Conference on Machine Learning, pages 4615–
4630. PMLR, 2020.
Heinrich Jiang and Ofir Nachum. Identifying and correcting label bias in machine learning. In
International Conference on Artificial Intelligence and Statistics, pages 702–712. PMLR, 2020.
Diederik P. Kingma and Jimmy Ba.
abs/1412.6980, 2014.
Adam: A method for stochastic optimization.
CoRR,
J. B. MacQueen. Some methods for classification and analysis of multivariate observations. In
L. M. Le Cam and J. Neyman, editors, Proc. of the fifth Berkeley Symposium on Mathematical
Statistics and Probability, volume 1, pages 281–297. University of California Press, 1967.
Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. A survey
on bias and fairness in machine learning. ACM Comput. Surv., 54(6), jul 2021. ISSN 0360-0300.
doi: 10.1145/3457607. URL https://doi.org/10.1145/3457607.
George Papamakarios, Theo Pavlakou, and Iain Murray. Masked autoregressive flow for density
estimation. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and
R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper_files/paper/2017/
file/6c1da886822c67822bcf3679d04369fa-Paper.pdf.
George Papamakarios, Eric T. Nalisnick, Danilo Jimenez Rezende, Shakir Mohamed, and Balaji
Lakshminarayanan. Normalizing flows for probabilistic modeling and inference. J. Mach. Learn.
Res., 22:57:1–57:64, 2019.
Danilo Rezende and Shakir Mohamed. Variational inference with normalizing flows. In Francis Bach
and David Blei, editors, Proceedings of the 32nd International Conference on Machine Learning,
volume 37 of Proceedings of Machine Learning Research, pages 1530–1538, Lille, France, 07–09
Jul 2015. PMLR. URL https://proceedings.mlr.press/v37/rezende15.html.
Peter J. Rousseeuw. Silhouettes: A graphical aid to the interpretation and validation of cluster
analysis. Journal of Computational and Applied Mathematics, 20:53–65, 1987. ISSN 0377-0427.
doi: https://doi.org/10.1016/0377-0427(87)90125-7. URL https://www.sciencedirect.com/
science/article/pii/0377042787901257.
V. Runde. A Taste of Topology. Springer, 2007.
John Skilling. Nested sampling for general Bayesian computation. Bayesian Analysis, 1(4):833 –
859, 2006. doi: 10.1214/06-BA127. URL https://doi.org/10.1214/06-BA127.
Hugo Steinhaus. Sur la division des corps matériels en parties. Bull. Acad. Pol. Sci., Cl. III, 4:
801–804, 1957. ISSN 0001-4095.
Vincent Stimper, Bernhard Schölkopf, and Jose Miguel Hernandez-Lobato. Resampling base distributions of normalizing flows. Proceedings of The 25th International Conference on Artificial
Intelligence and Statistics, 151:4915–4936, 28–30 Mar 2022. URL https://proceedings.mlr.
press/v151/stimper22a.html.
9
Aaron van den Oord, Yazhe Li, Igor Babuschkin, Karen Simonyan, Oriol Vinyals, Koray Kavukcuoglu,
George van den Driessche, Edward Lockhart, Luis Cobo, Florian Stimberg, Norman Casagrande,
Dominik Grewe, Seb Noury, Sander Dieleman, Erich Elsen, Nal Kalchbrenner, Heiga Zen, Alex
Graves, Helen King, Tom Walters, Dan Belov, and Demis Hassabis. Parallel WaveNet: Fast
high-fidelity speech synthesis. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th
International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning
Research, pages 3918–3926. PMLR, 10–15 Jul 2018. URL https://proceedings.mlr.press/
v80/oord18a.html.
Peter Wirnsberger, Andrew J. Ballard, George Papamakarios, Stuart Abercrombie, Sébastien
Racanière, Alexander Pritzel, Danilo Jimenez Rezende, and Charles Blundell. Targeted free
energy estimation via learned mappings. The Journal of Chemical Physics, 153(14):144112, 2020.
doi: 10.1063/5.0018903. URL https://doi.org/10.1063/5.0018903.
Hao Wu, Jonas Köhler, and Frank Noé. Stochastic normalizing flows. Advances in Neural Information
Processing Systems, 33:5933–5944, 2020.
10
Download