β Week 11 - Checklist
DS202 - Data Science for Social Scientists
It is our final week and it is our final checklist!
Comprehension Check
By the end of the week, you should be able to:
- Explain what a document-frequency-matrix (bag of words) is
- Build a corpus of text
- Tokenise and pre-process text data (remove words that are too frequent, punctuation, etc.)
- Search a corpus of data for keywords
- Filter text before/after a search for keyword
- Use
quanteda
functions to pre-process text data - Run several dimensionality reductions techniques, including PCA and LSA on a dataset
- Run topic modelling on a text corpus
- Identify an βoptimalβ number of clusters on a dataset based on the output of
NbClust
R package. - Conduct an exploratory data analysis of text independently, using the techniques above and those we have been exploring since the start of this course.
- Search for alternative R packages online when you need to do something different (new clustering algorithms, for example)
Time Management Tips
Here is a suggestion of how to program your week in relation to this course:
If your lab is on Monday
If your lab is on Monday:
On Monday:
π₯ Download: Before or once you arrive at the classroom, download the DS202_2022MT_w11_lab_rmark.Rmd file that contains the lab roadmap (under ποΈ Week 11 section on Moodle). Or browse the webpage version here.
π» Participate: Actively engage with the material in the lab. Ask your class teacher for help if anything is unclear. Work with others whenever possible and take notes of theoretical concepts or practical coding skill you might want to revisit later in the week.
- Summative 03 is heavily based on W11 lab, so make the most of it!
Tuesday to Thursday
βοΈ Solve Summative 03
Try to solve it during the week, so that you have time to ask questions on Slack or to set up study groups.
Because the Summative 03 is to be released only on Monday, 5 December 2022 (during the day), the deadline will also be moved to Sunday, 18 December 2022, 11:59 PM
Friday
- π« Attend the lecture: It is our final lecture and we have two guests coming to talk about some applications of Data Science and Machine Learning techniques to Text data (Natural Language Processing to be more precise) and Social Media data. Visit Week 11 page to read the title of their talks.
Any time
π You know the drill. Share your questions on the
#week11
channel in our Slack group.πWant to talk to someone else about this course? Try reaching out to your course representatives,
@Zhang Ruishan (Yoyo)
or@Rachitha Raghuram
.
If your lab is on Friday
If your lab is on Friday:
Monday - Thursday:
- If you feel comfortable, take a look at the Week 11 lab content and start solving Summative 03 (which will be released on this Monday, 5 December 2022 during the day). Otherwise, keep practising the pre-processing/tidyverse/tidymodels tutorials we shared on last weekβs checklist before your lab on Friday.
Friday
π₯ Download: Before or once you arrive at the classroom, download the DS202_2022MT_w11_lab_rmark.Rmd file that contains the lab roadmap (under ποΈ Week 11 section on Moodle). Or browse the webpage version here.
π» Participate: Actively engage with the material in the lab. Ask your class teacher for help if anything is unclear. Work with others whenever possible and take notes of theoretical concepts or practical coding skill you might want to revisit later in the week.
- Summative 03 is heavily based on W11 lab, so make the most of it!
π« Attend the lecture: It is our final lecture and we have two guests coming to talk about some applications of Data Science and Machine Learning techniques to Text data (Natural Language Processing to be more precise) and Social Media data. Visit Week 11 page to read the title of their talks.
Some time early next week
βοΈ Solve Summative 03
Try to solve it as soon as possible, so that you have time to ask questions on Slack or to set up study groups.
Because the Summative 03 is to be released only on Monday, 5 December 2022 (during the day), the deadline will also be moved to Sunday, 18 December 2022, 11:59 PM
Any time
π You know the drill. Share your questions on the
#week11
channel in our Slack group.πWant to talk to someone else about this course? Try reaching out to your course representatives,
@Zhang Ruishan (Yoyo)
or@Rachitha Raghuram
.