Efficient Deep Learning Systems
This repository contains materials for the Efficient Deep Learning Systems course taught at the Faculty of Computer Science of HSE University and Yandex School of Data Analysis.
Syllabus
- Week 1: Introduction
- Lecture: Course overview and organizational details. Core concepts of the GPU architecture and CUDA API.
- Seminar: CUDA operations in PyTorch. Introduction to benchmarking.
- Week 2: Basics of distributed ML
- Lecture: Introduction to distributed training. Process-based communication. Parameter Server architecture.
- Seminar: Multiprocessing basics. Parallel GloVe training.
- Week 3: Data-parallel training and All-Reduce
- Lecture: Data-parallel training of neural networks. All-Reduce and its efficient implementations.
- Seminar: Introduction to PyTorch Distributed. Data-parallel training primitives.
- Week 4: Memory-efficient and model-parallel training
- Week 5: Profiling DL code, training-time optimizations
- Week 6: Basics of Python application deployment
- Week 7: Software for serving neural networks
- Week 8: Optimizing models for faster inference
- Week 9: Experiment tracking, model and data versioning
- Week 10: Testing, debugging and monitoring of models
Grading
There will be a total of 4 home assignments (some of them spread over several weeks). The final grade is a weighted sum of per-assignment grades. Please refer to the course page of your institution for details.