Alex Carlin

What is MNIST for biology?

In the field of computer vision, the dataset MNIST is famous. The dataset consists of 70,000 grayscale images, each of which is 28 by 28 pixels. Each image is of a handwritten digit, like the kind you'd find on a piece of mail. The dataset consists of the images themselves, expressed as vectors of 784 pixel values, and labels for each image telling which digit is in the photo.

In ML, we often reach for MNIST as a simple, straightforward dataset to use when testing out a new algorithm or teaching concepts. But to be honest, I have often wished for something similar, but with biological relevance.

What is the equivalent of the MNIST dataset for biology?