Course overview
This course aims to equip learners with the knowledge and skills to analyse textual datasets. Building upon concepts of data science practice, this course will look at the challenges and methods used for textual datasets. By examining the methods of tokenization, sentiment analysis and topic modelling the course shows how the patterns in textual data can be identified. This course aligns with the program's intent to provide a comprehensive understanding of methods to find patterns in real datasets.
Course learning outcomes
- Demonstrate an ability to read text into R and then prepare it with tokenisation, stop words and word embedding
- Understand the importance of representative datasets and the ethics of using text from social sources
- Demonstrate the ability to apply sentiment analysis and topic modelling to real datasets
- Demonstrate the ability to interpret the output from sentiment analysis and topic modelling