Onramp to deep learning for biologists

16 Sep, 2024

If you were a biologist before the combination of new architectures and much better compute made deep learning approaches useful in biology, you might wonder about the vast array of different educational resources out there to help you master the concepts and application of deep learning. You might wonder which resources are really worth your time as you seek to understand classical ML approaches for supervised learning, language models trained on biological data, and diffusion models that are currently popular modeling protein structure.

This is a collection of three complete courses that I think are worth your time if you're a biologist who wants to master the concepts and application of deep learning to biology. Importantly, all three of these courses contain the richness of material—including video lectures, lecture notes, assignments, and a final project—that will give you the opportunity to deeply understand the material and be able to apply it to solve problems in biology.

Andrej Karpathy's Neural Networks: Zero to Hero
Stanford CS224N: Natural Language Processing with Deep Learning
Stanford CS236: Deep Generative Models

I've give a minireview and some details about each course below.

Neural Networks: Zero to Hero

Minireview. "Neural Networks: Zero to Hero" is a series of 10 YouTube videos with accompanying code implementations. Each lecture is between 1 and 2 hours, and doing the coding exercises will take about 2x that time. The course begins at the very beginning—what is a neural network, how to implement backpropagation, how to build neural networks form simpler building blocks, the foundations of the field of natural language processing, building different kinds of generative models in increasing complexity, building a GPT-2 clone—and ends with a four-hour demonstration on scaling the GPT-2 clone to modern training techniques (and actually beating the original implementation).

Taught by: Andrej Karpathy
Number of lectures: 10
Lecture duration: 1–2 hours
Total lecture duration: around 15 hours
YouTube video lectures: https://www.youtube.com/playlist?list=PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ
GitHub code implementation: https://github.com/karpathy/nn-zero-to-hero

Stanford CS224N: Natural Language Processing with Deep Learning

Minireview. This course traces the development of natural language processing and its history, paying special attention to interpretability, starting with the foundations of NLP, moving through recurrent neural networks, and ending up with transformer models. One really nice thing about this course is that it includes interspersed TA-led sections on coding in using PyTorch and specialized libraries like Hugging Face Transformers, and dives deeply into applications. While there's no biology-specific content, these sessions can be super helpful to jump from implementing everything yourself (as you do in Zero to Hero) to using provided, pre-trained implementations (and seeing how that ecosystem works).

Taught by: Christopher Manning
Number of lectures: 23
Lecture duration: 1 hour
Total lecture duration: around 24 hours
YouTube video lectures: https://www.youtube.com/playlist?list=PLoROMvodv4rMFqRtEuo6SGjY4XbRIVRd4
Course website: https://web.stanford.edu/class/cs224n/

Stanford CS236: Deep Generative Models

Minireview. This course is entirely focused on the details and theoretical basis of deep generative models. After a brief look at autoregressive models in context, the course covers maximum likelihood learning, VAEs, normalizing flows, GANs, energy-based models, score-based models, and closes with lectures on diffusion models for continuous and discrete modalities. This is a nice way to end an engaging and detailed course on deep generative models and bring you up to speed on the latest in the field (diffusion models for protein structures). The course is engagingly taught, and Ermon shows particular talent for contextualizing students’ questions.

Taught by: Stefano Ermon
Number of lectures: 18
Lecture duration: 1 hour 15 minutes
Total lecture duration: aroud 24 hours
YouTube video lectures: https://www.youtube.com/playlist?list=PLoROMvodv4rPOWA-omMM6STXaWW4FvJT8
Course website: https://deepgenerativemodels.github.io

Each of these courses builds on the previous course, and together they provide a comprehensive and nuanced understanding of deep learning that can be applied to solve problems in biology. These courses distinguish themselves from the vast array of material out there by being engagingly taught, sufficiently rich and deep, and completely free for anyone to take. In particular, the teachers Andrej Karpathy, Chris Manning, and Stefano Ermon deserve a huge amount of credit and thanks to the community for building these incredible resources and making them free for anyone to enjoy.

While not specifically about deep learning, these are some other courses I have enjoyed and think are also worth your time

Stanford CS229: Machine Learning as taught by Andrew Ng includes video lectures and covers classical supervised learning from the basics up to neural networks
MIT 5.08J Biological Chemistry II as taught by Elizabeth Nolan and JoAnne Stubbe includes video lectures and is a rich, recent guide to biochemistry that is well-paced and will be interesting even if you're already a biologist

Onramp to deep learning for biologists

Neural Networks: Zero to Hero

Stanford CS224N: Natural Language Processing with Deep Learning

Stanford CS236: Deep Generative Models

Some courses in related subjects