Course overview
This course introduces the architectures, algorithms, and techniques for high-performance and distributed data processing. Throughout the course, students will learn about the fundamental principles of parallel computing, shared memory, and distributed data processing architectures. Based on a solid prior knowledge of algorithms and their implementation in sequential processing systems, students will explore parallel programming paradigms, including Message Passing Interface, distributed computing frameworks, including Hadoop and Spark, and explore the capabilities of modern GPU devices for data processing and machine learning. Topics covered will include parallel algorithms, performance and scalability considerations for large-scale data processing. By the end of the course, students will be proficient in designing, implementing, and optimising high-performance computing systems for a variety of applications, preparing them for careers in data science, machine learning, computational science, and engineering.
Course learning outcomes
- Explain the differences between message passing, shared memory, distributed, and GPU architectures for high-performance computing and their suitability for different computing tasks
- Apply parallel programming paradigms to develop parallel algorithms for high-performance computing
- Apply distributed programming paradigms to develop distributed algorithms for high-performance computing
- Analyse the performance of parallel and distributed algorithms and identify opportunities for optimisation and scalability