βοΈ Summative Problem Set 02 | W05-W07
DS105 - Data for Data Science
The deadline has been extended to 14 November 2022.
Welcome to the second summative assessment of this course!
This time you will continue exploring the world of web-scraping. We hope you will have fun completing this one, as we tried to make it as entertaining for you as possible.
Things to know before you start:
- Deadline: you have until 14 November 2022, 23:59 UK time to complete your solutions and share the single screenshot via Moodle.
- You will be granted a maximum of 100 points for the whole assignment. You will see how much each task is right next to the tasksβ names.
- This assessment is worth 15% of your final grade.
- Read the instructions carefully and make sure you follow them.
P0: Only for you
This assignment has individual tasks for each student. You will find them on your cloud machine profile.
π― ACTION POINTS
- Go to your cloud machine user.
- Locate the file called
summative_04.txt
and download it.
This file contains your unique assignment.
You do not need to write or test your scripts on the cloud.
In fact, we recommend you access the cloud only to get your custom assignment and to submit your responses later. It will make your life easier.
Try prototyping your solutions by using a computational notebook such as Jupyter Notebook, R Notebooks or Google Colab. It is easier to work with code that way, and you can add pieces of text to remind yourself what each part of your code does.
Take a look at π Week 05 - Appendix for util links
P1: Loving food (30 points)
In this part, you will be collecting data from the BBC Good Food website. You will be collecting recipes for various dishes.
π― ACTION POINTS
- Write the code that executes what is asked for in Task 1 in
summative_04.txt
. - Save the code in the file called
BBC_food_scrapping
(the file extension will depend on the language you use). Make sure you provide your code with comments.
The instructions on submitting your code and data are provided in P4: Upload your solutions below.
P2: Getting the news (50 points)
π― ACTION POINTS
- Register for the NYT Article Search API and acquire an API key.
- Explore the documentation of the API to complete further steps.
- Extract the titles of articles, their publishing dates and links to them following the instructions in
summative_04.txt
.
- Plot the average size of the articles (measured in the number of words) for each month in your period.
- Save the code in the file called
NYT_API_collection
(the extension of the file will depend on the language you use). Make sure you provide your code with comments.
P3: Simple English (20 points)
π― ACTION POINTS
Take the first 20 articles that you have scrapped in the previous exercise.
Go to this dictionary API and explore how to use it for the next tasks.
Using this API, create a JSON file containing the original 20 article titles and their phonetic representations. For instance, if the title is βData Scienceβ, your JSON record should look like this:
{"Data Science": "ΛdaetΙ ΛsaΙͺΙns"}
Save all the phonetic representations and the original titles in the file called
NYT_phonetic.json
.
P4: Upload your solutions
π― ACTION POINTS
- Connect to your user on the cloud machine.
- Create a folder called
summative04
on your user. - Upload all the created files (both your code and the scrapped data) to the folder.
- On the terminal, go to
summative04
and typels -lth
. Take a screenshot of the terminal and post it to Moodle.
If you do not complete this stage, we will not mark your assessment.
- It is okay to team up with your group/class colleagues to work on the problems together.
- It is ok to use Slack to share links to useful content
- Share things like βTip: I had to convert a JSON file to a dataframe and I used this codeβ
- It is also ok to ask generic programming-related questions publicly on Slack. For example, you can ask questions like:
βHow do I write a loop to convert a list of text to numbers? (R)β or
βDoes anyone know how to take just the first 20 items on a list ? (Python)β or
βHow do I do β¦ in Jupyter Notebooks?β
βCan I do β¦ in Python without installing any other package?β
βI am having trouble authenticating with the API. When I write
this small piece of code
it doesnβt return anything. Anyone else had the same problem?β
- What we cannot accept:
- sharing your entire script with others β but it is ok to share small pieces of code to ask for help, like the type of code people share on Stackoverflow
- asking others to do your work for you (LSE regulations on plagiarism)