Course overview
This course will introduce the fundamental concepts of modern data science. It will provide students with tools to deal with real, messy data, an understanding of the appropriate methods to use, and the ability to use these tools safely. Topics will include data structures; regression models including lasso regression, ridge regression and non-linearity with splines; classification models including logistic regression, linear discriminant analysis, support vector machines and random forests; and unsupervised learning methods such as principal component analysis, k-means and hierarchical clustering. The practical skills will be focused on data science in R.
Course learning outcomes
- Demonstrate an understanding of the foundational principles of machine learning
- Recognise which method to use for a given data analysis problem.
- Demonstrate an understanding the statistical underpinning of the chosen method.
- Implement safely any chosen method and interpret the results.
- Be confident to apply the methods to large datasets.
- Apply the theory in the course to solve a range of problems at an appropriate level of difficulty.