Alex Carlin

Onramp to deep learning for biologists

If you were a biologist before the combination of new architectures and much better compute made deep learning approaches useful in biology, you might wonder about the vast array of different educational resources out there to help you master the concepts and application of deep learning. You might wonder which resources are really worth your time as you seek to understand classical ML approaches for supervised learning, language models trained on biological data, and diffusion models that are currently popular modeling protein structure.

This is a collection of three complete courses that I think are worth your time if you're a biologist who wants to master the concepts and application of deep learning to biology. Importantly, all three of these courses contain the richness of material—including video lectures, lecture notes, assignments, and a final project—that will give you the opportunity to deeply understand the material and be able to apply it to solve problems in biology.

I've give a minireview and some details about each course below.

Neural Networks: Zero to Hero

Minireview. "Neural Networks: Zero to Hero" is a series of 10 YouTube videos with accompanying code implementations. Each lecture is between 1 and 2 hours, and doing the coding exercises will take about 2x that time. The course begins at the very beginning—what is a neural network, how to implement backpropagation, how to build neural networks form simpler building blocks, the foundations of the field of natural language processing, building different kinds of generative models in increasing complexity, building a GPT-2 clone—and ends with a four-hour demonstration on scaling the GPT-2 clone to modern training techniques (and actually beating the original implementation).

Stanford CS224N: Natural Language Processing with Deep Learning

Minireview. This course traces the development of natural language processing and its history, paying special attention to interpretability, starting with the foundations of NLP, moving through recurrent neural networks, and ending up with transformer models. One really nice thing about this course is that it includes interspersed TA-led sections on coding in using PyTorch and specialized libraries like Hugging Face Transformers, and dives deeply into applications. While there's no biology-specific content, these sessions can be super helpful to jump from implementing everything yourself (as you do in Zero to Hero) to using provided, pre-trained implementations (and seeing how that ecosystem works).

Stanford CS236: Deep Generative Models

Minireview. This course is entirely focused on the details and theoretical basis of deep generative models. After a brief look at autoregressive models in context, the course covers maximum likelihood learning, VAEs, normalizing flows, GANs, energy-based models, score-based models, and closes with lectures on diffusion models for continuous and discrete modalities. This is a nice way to end an engaging and detailed course on deep generative models and bring you up to speed on the latest in the field (diffusion models for protein structures). The course is engagingly taught, and Ermon shows particular talent for contextualizing students’ questions.

Each of these courses builds on the previous course, and together they provide a comprehensive and nuanced understanding of deep learning that can be applied to solve problems in biology. These courses distinguish themselves from the vast array of material out there by being engagingly taught, sufficiently rich and deep, and completely free for anyone to take. In particular, the teachers Andrej Karpathy, Chris Manning, and Stefano Ermon deserve a huge amount of credit and thanks to the community for building these incredible resources and making them free for anyone to enjoy.

While not specifically about deep learning, these are some other courses I have enjoyed and think are also worth your time