Note about Markov Chain Monte Carlo (MCMC) and Restricted Boltzmann Machine (RBM)

Recently, I have been working very hard in digging the Restricted Boltzmann Machine (RBM) of Deep Learning.  Even I could do all the programming tasks, I still could not “deeply” and “fully connected” with the theory and the philosophy.  Maybe, because I lack a solid statistic background.
The MCMC and Gibbs Sampling are very important parts of RBM and Deep Belief Networks, it is important for every CS student to master these skills.
In the famous Science paper  “Reducing the Dimensionality of Data with Neural Networks”, Hinton and Salakhutdinov used the RBM to initialize the weights of the deep neural networks (as known as deep autoencoders), which eventually led to very surprising results. Later on, with the Contrastive Divergence (CD-k) method,  the RBMs could be trained faster for the Deep Belief Networks and other frameworks. Nowadays, after 10 years’ developments and research, RBM has been used in various ways to build models.

This slideshow requires JavaScript.

Abstract: High-dimensional data can be converted to low-dimensional codes by training a multilayer neural network with a small central layer to reconstruct high-dimensional input vectors. Gradient descent can be used for fine-tuning the weights in such “autoencoder” networks, but this works well only if the initial weights are close to a good solution. We describe an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much better than principal components analysis as a tool to reduce the dimensionality of data.
Here, I list some questions for future discussion:
  1. How to decide the number of hidden nodes in each RBM layer?
  2. What are the potential damage when Gibbs sampling and Markov Chain are not strictly followed in practice,  for example, we usually only apply CD-1?
  3. When could  I know the RBM converge to stationarity? in other words, how many iteration do I need?
  4. When evaluating RBM, why it is OK to use the reconstruction error instead of the log-likelihood?
  5. Is RBM Learning features or Learning Data Distributions ?


Stay Hungry. Stay Foolish.

— Steven Jobs

PS: I feel very lucky to join the AMSI-SSA lecture hosted by the Department of Mathematics and Statistics at La Trobe University. You could check the details in the following quotations.   I look forward to the statistician’s view of the MCMC or RBM.


You are welcome to attend the following AMSI – SSA Lecture hosted by the Department of Mathematics and Statistics at La Trobe University.
Abstract: Markov chain Monte Carlo (MCMC) algorithms, such as the Metropolis
Algorithm and the Gibbs Sampler, are extremely useful and popular for
approximately sampling from complicated probability distributions
through repeated randomness. They are frequently applied
to such diverse subjects as Bayesian statistics, physical chemistry,
medical research, financial modeling, numerical integration, and more.
This talk will use simple graphical simulations to explain how these
algorithms work, and why they are so useful. It
will also describe how mathematical analysis can provide deeper
insights into their implementation, optimisation, and convergence times,
and can even allow us to “adapt” the algorithms to improve their
performance on the fly.Professor Rosenthal’s talk is part of the 2016 AMSI SSA Lecture Tour
AMSI-SSA Lecturer



Jeffrey Rosenthal is a professor in the Department of Statistics at the University of Toronto. He received his BSc in Mathematics, Physics, and
Computer Science from the University of Toronto at the age of 20, his Ph.D. in Mathematics from Harvard University at the age of 24, and tenure in the Department of Statistics at the
The university of Toronto at the age of 29.

For his research, Rosenthal was awarded the 2006 CRM-SSC Prize, and also the 2007 COPSS Presidents’ Award, the most prestigious honor bestowed by the Committee of Presidents of Statistical Societies. For his teaching, he received a Harvard University Teaching Award in 1991, and an Arts and Science Outstanding Teaching Award at the University of Toronto in 1998. He was elected to Fellowship of the Institute of mathematical Statistics in 2005, and of the Royal Society of Canada in 2012, and was awarded the SSC Gold Medal in 2013.

Rosenthal’s book for the general public, Struck by Lightning: The Curious World of Probabilities, was published in sixteen editions and ten languages and was a bestseller in Canada. It led to numerous media and public appearances, and to his work exposing
the Ontario lottery retailer scandal. Rosenthal has also published two textbooks about probability theory, and over ninety refereed research papers, many related to the field of Markov chain Monte Carlo randomised computer algorithms and to interdisciplinary
applications of statistics. He has dabbled as a computer game programmer, musical performer, and improvisational comedy performer, and is fluent in French. His website is



At last,  Just sharing some photos 😛 ~

Author: Caihao (Chris) Cui

Digital Scientist & Advisory: Translating the modern machine learning, deep learning, and computer vision techniques into engineering and bringing ideas to life to design a better future.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s