MNIST L-BFGS dim reduction

YouTube Preview Image

MAP inference (using L-BFGS), simultaniously on the latent variables and the parameters. The architecture is a 2-100-768 MLP (with tanh() nonlinearity), where the 2 ‘input units’ act as latent variables. In other words, a quite simple architecture, and it’s being trained on only 5000 images, but looks quite interesting nonetheless.