๐๏ธ Week 9 - Unstructured Data (Text, Audio, Video)
2024/25 Autumn Term
We have been exploring tidy, rectangular data. But now it is time to explore the challenges associated with unstructured data: text, audio and video.
The lecture will be heavily demo-based.
๐จโ๐ซ Lecture Slides
Either click on the slide area below or click here to view it in fullscreen. Use your keypad to navigate the slides. You can also find a PDF version on Moodle.
Today, weโll be using a couple of demos throughout the lecture. You can download the Jupyter notebooks associated with each demo here.
Before we move to looking at text mining, weโll start by reviewing unsupervised learning (i.e clustering and anomaly detection) and weโll be using a demo for that.
- You can download the notebook for the demo by clicking on the link below:
- You can download the dataset that is used in the clustering/anomaly detection demo (and the description of the indicators that compose the dataset) by clicking on the buttons below:
For the text mining part of the lecture, we will be using two demo notebooks:
๐ฅ Looking for lecture recordings? You can only find those on Moodle.
๐ Recommended Reading
- Check the end of slides for the list of references cited in the lecture.
- Check the ๐ Syllabus for this weekโs complete list of indicative and recommended readings.
๐ Communication
- Post your reflections, questions, and links on Slack.
- Book office hours if you want to discuss your coursework with either me, Barry or Stuart.
๐ Preparation for this weekโs class
Within your respective class groups, you will be separated in groups of 3 (check Moodle and/or Slack for the announcement).
Each member of the group is assigned an article out of the following three articles to read and review:
- Article 1: Chocolate Consumption, Cognitive Function, and Nobel Laureate
- Article 2: Association of Sugar-Sweetened, Artificially Sweetened, and Unsweetened Coffee Consumption With All-Cause and Cause-Specific Mortality
- Article 3: Higher airborne pollen concentrations correlated with increased SARS-CoV-2 infection rates, as evidenced from 31 countries across the globe
The question each group member is trying to respond to (separately!) is the following:
- if you had been one of the original peer reviewers, would you have accepted or rejected the article you were assigned for review? On what grounds?
Group members can consult the resources on logical fallacies available here or have a look at the List 1 table on cognitive biases from (Croskerry 2003) to help with preparing their review. Group members can discuss logical fallacies and cognitive biases among themselves or share tips on how to do/write a review between themselves but not the articles they have been assigned and prepare their reviews independently of each other. This page and this page provide some guidance on how to review an article and this page shows some examples of reviews.
Bring the reviews you have prepared to the class on Friday (Nov 29th).