βοΈ Summative Problem Set 02 | W05-W07
DS105 - Data for Data Science
Welcome to the second summative assessment of this course!
This time you will continue exploring the world of web-scraping. We hope you will have fun completing this one, as we tried to make it as entertaining for you as possible.
Things to know before you start:
- Deadline: you have until 14 November 2022, 23:59 UK time to complete your solutions and share the single screenshot via Moodle.
- You will be granted a maximum of 100 points for the whole assignment. You will see how much each task is right next to the tasksβ names.
- This assessment is worth 15% of your final grade.
- Read the instructions carefully and make sure you follow them.
P0: Only for you
This assignment has individual tasks for each student. You will find them on your cloud machine profile.
π― ACTION POINTS
- Go to your cloud machine user.
- Locate the file called
summative_04.txt
and download it.
This file contains your unique assignment.
P1: Loving food (30 points)
In this part, you will be collecting data from the BBC Good Food website. You will be collecting recipes for various dishes.
π― ACTION POINTS
- Write the code that executes what is asked for in Task 1 in
summative_04.txt
. - Save the code in the file called
BBC_food_scrapping
(the file extension will depend on the language you use). Make sure you provide your code with comments.
The instructions on submitting your code and data are provided in P4: Upload your solutions below.
P2: Getting the news (50 points)
π― ACTION POINTS
- Register for the NYT Article Search API and acquire an API key.
- Explore the documentation of the API to complete further steps.
- Extract the titles of articles, their publishing dates and links to them following the instructions in
summative_04.txt
.
- Plot the average size of the articles (measured in the number of words) for each month in your period.
- Save the code in the file called
NYT_API_collection
(the extension of the file will depend on the language you use). Make sure you provide your code with comments.
P3: Simple English (20 points)
π― ACTION POINTS
Take the first 20 articles that you have scrapped in the previous exercise.
Go to this dictionary API and explore how to use it for the next tasks.
Using this API, create a JSON file containing the original 20 article titles and their phonetic representations. For instance, if the title is βData Scienceβ, your JSON record should look like this:
{"Data Science": "ΛdaetΙ ΛsaΙͺΙns"}
Save all the phonetic representations and the original titles in the file called
NYT_phonetic.json
.
P4: Upload your solutions
π― ACTION POINTS
- Connect to your user on the cloud machine.
- Create a folder called
summative04
on your user. - Upload all the created files (both your code and the scrapped data) to the folder.
- On the terminal, go to
summative04
and typels -lth
. Take a screenshot of the terminal and post it to Moodle.
If you do not complete this stage, we will not mark your assessment.