Distributed Stochastic Neighbor Embedding
11 Jan 2018
If you want to see multidimensionally data, for example the MNIST dataset, there is a machine learning algoritm created by Geofrey Hinton and Laurens van der Maarten by 2008.
Basically comprises two stages:
- calculates the probability distribution of that each point par is similar $p_ij$.
- define a similar probability distribution on a low-dimensional map, and minimizes the diference between the two distributions using gradient descent.
Algorithm is free of use for non commercial use.
Python version (and another languages) is downloadable here
Python version is an example of MNIST dataset that has the following result:
Best way to use it is scikit-learn that has up to eight manifolds (tsne is one of them) to show high dimensional datasets.