Onramp to deep learning for biologists
If you were a biologist before the combination of new architectures and much better compute made deep learning approaches useful in biology, you might wonder about the vast array of different educational resources out there to help you master the concepts and application of deep learning. You might wonder which resources are really worth your time as you seek to understand classical ML approaches for supervised learning, language models trained on biological data, and diffusion models that are currently popular modeling protein structure.
This is a collection of three complete courses that I think are worth your time if you're a biologist who wants to master the concepts and application of deep learning to biology. Importantly, all three of these courses contain the richness of material—including video lectures, lecture notes, assignments, and a final project—that will give you the opportunity to deeply understand the material and be able to apply it to solve problems in biology.
- Andrej Karpathy's Neural Networks: Zero to Hero
- Stanford CS224N: Natural Language Processing with Deep Learning
- Stanford CS236: Deep Generative Models
I've give a minireview and some details about each course below.
Neural Networks: Zero to Hero
Minireview. "Neural Networks: Zero to Hero" is a series of 10 YouTube videos with accompanying code implementations. Each lecture is between 1 and 2 hours, and doing the coding exercises will take about 2x that time. The course begins at the very beginning—what is a neural network, how to implement backpropagation, how to build neural networks form simpler building blocks, the foundations of the field of natural language processing, building different kinds of generative models in increasing complexity, building a GPT-2 clone—and ends with a four-hour demonstration on scaling the GPT-2 clone to modern training techniques (and actually beating the original implementation).
- Taught by: Andrej Karpathy
- Number of lectures: 10
- Lecture duration: 1–2 hours
- Total lecture duration: around 15 hours
- YouTube video lectures: https://www.youtube.com/playlist?list=PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ
- GitHub code implementation: https://github.com/karpathy/nn-zero-to-hero
Stanford CS224N: Natural Language Processing with Deep Learning
Minireview. This course traces the development of natural language processing and its history, paying special attention to interpretability, starting with the foundations of NLP, moving through recurrent neural networks, and ending up with transformer models. One really nice thing about this course is that it includes interspersed TA-led sections on coding in using PyTorch and specialized libraries like Hugging Face Transformers, and dives deeply into applications. While there's no biology-specific content, these sessions can be super helpful to jump from implementing everything yourself (as you do in Zero to Hero) to using provided, pre-trained implementations (and seeing how that ecosystem works).
- Taught by: Christopher Manning
- Number of lectures: 23
- Lecture duration: 1 hour
- Total lecture duration: around 24 hours
- YouTube video lectures: https://www.youtube.com/playlist?list=PLoROMvodv4rMFqRtEuo6SGjY4XbRIVRd4
- Course website: https://web.stanford.edu/class/cs224n/
Stanford CS236: Deep Generative Models
Minireview. This course is entirely focused on the details and theoretical basis of deep generative models. After a brief look at autoregressive models in context, the course covers maximum likelihood learning, VAEs, normalizing flows, GANs, energy-based models, score-based models, and closes with lectures on diffusion models for continuous and discrete modalities. This is a nice way to end an engaging and detailed course on deep generative models and bring you up to speed on the latest in the field (diffusion models for protein structures). The course is engagingly taught, and Ermon shows particular talent for contextualizing students’ questions.
- Taught by: Stefano Ermon
- Number of lectures: 18
- Lecture duration: 1 hour 15 minutes
- Total lecture duration: aroud 24 hours
- YouTube video lectures: https://www.youtube.com/playlist?list=PLoROMvodv4rPOWA-omMM6STXaWW4FvJT8
- Course website: https://deepgenerativemodels.github.io
Each of these courses builds on the previous course, and together they provide a comprehensive and nuanced understanding of deep learning that can be applied to solve problems in biology. These courses distinguish themselves from the vast array of material out there by being engagingly taught, sufficiently rich and deep, and completely free for anyone to take. In particular, the teachers Andrej Karpathy, Chris Manning, and Stefano Ermon deserve a huge amount of credit and thanks to the community for building these incredible resources and making them free for anyone to enjoy.
Some courses in related subjects
While not specifically about deep learning, these are some other courses I have enjoyed and think are also worth your time
- Stanford CS229: Machine Learning as taught by Andrew Ng includes video lectures and covers classical supervised learning from the basics up to neural networks
- MIT 5.08J Biological Chemistry II as taught by Elizabeth Nolan and JoAnne Stubbe includes video lectures and is a rich, recent guide to biochemistry that is well-paced and will be interesting even if you're already a biologist