Course overview
This course builds upon DATA 7201OL Data Taming, to introduce advanced modern techniques for extracting meaningful information from real-world, messy datasets. The course covers methods such as generalised linear models, classification, advanced regression techniques, and unsupervised statistical learning. A particular focus will be data wrangling techniques for non-standard, big, messy data: natural language processing, networks and longitudinal data. The course teaches advanced R programming techniques for data science.
Course learning outcomes
- Create a predictive model for classification (predict classes) from real data using the TidyModels package in R
- Create a predictive model for regression (predict numbers) from real data using the TidyModels package in R
- Identify when predictive modelling is not giving accurate predictions due to overfitting
- Apply cross-validation to avoid overfitting
- Contrast the performance of prediction models to assess their viability
- Analyse unsupervised data to find the patterns and represent the patterns visually
- Communicate results of the interpretation and analysis of predictive modelling
Degree list
The following degrees include this course