The digits dataset is a handwritten dataset of numbers from 0 to 9. t-SNE is a method for “dimensionality” reduction. This is my own code for t-SNE.. so may not compare exactly with robust libraries designed by computer science doctorates.
Still.
The reason for t-SNE is that it self-categorizes data.
What’s interesting is that I can get most of the numbers to go into similar categories with themselves. That’s a good thing. For example, all of the reds are the number 3 and mostly clumped together.
The number “1” is an issue – why is it so scattered??

According to my algo, these are the outliers.

To be fair, that fourth number – the “9” – looks nothing like a “9”. That 8? I guess that’s an 8.
t-SNE is not “deep learning” technique, per se, as in it doesn’t use neurons and convolutional layers. It’s a statistical technique, but can perform a “training” session.
Oooo.. I just noticed that I used a different method at some point. Let’s compare!

It is the same problem, but obviously written with slightly different code. I clearly had some parameters different. Here you can see that the clusters are pretty well defined. 7 is sometimes confused for 9. 1 seems to be the most confused regularly.

Leave a Reply