โœ๏ธ Assessments

Data for Data Science

Author
Published

21 October 2022

Submissions

Click on the assignments below to go to the Moodle page and submit your responses:

  • โœ๏ธ Formative   Problem Set (01) | W02-W03: Moodle link & associated webpage with instructions.
  • โœ๏ธ Summative Problem Set (01) | W04-W05: Moodle link & associated webpage with instructions.
  • โœ๏ธ Formative Team Contract | W05-W07: Moodle link & associated webpage with instructions.
  • โœ๏ธ Summative Problem Set (02) | W05-W07: Moodle link & associated webpage with instructions.

This course is assessed by a mix of problem sets, group presentation and a final project. You can find the details below:

๐Ÿ“ Problem Sets (25%)

  • A mix of coding tasks and elements of self-assessment, similar to problem sets we solve in the weekly labs
  • Students have until the day before the following class to submit their response to problem sets
  • Summative problem sets will be released on:
    • Week 04 - worth 10% of final mark
    • Week 05 - worth 15% of final mark

๐Ÿ—ฃ๏ธ Group Presentation (35%)

  • Students will form groups prior to Reading Week (Week 06)
    • Pitch your ideas of API/datasets on Week 04
    • Form the groups on Week 05
  • Group presentations:
    • Week 08 - worth 15% of final mark
    • Week 11 - worth 20% of final mark

๐ŸŽ‡ Final Project (40%)

  • Each group will have to produce a webpage of their project, using Github Markdown
  • Description of data, research questions, challenges, statistics and simple plots
  • Think of it as a portfolio project!
  • Submission deadline: Lent Term
    • Exact date to be confirmed
    • (end of Jan/2023 - beginning of Feb/2023)

๐Ÿ–‹๏ธ Marking Criteria

Week 08 presentation - Marking Criteria

This presentation is worth 15% of your final grade. You will be marked as a group, but we might reward or penalize individuals for aspects of their individual presentation in accordance with the marking criteria outlined below.

What you must do:

  • You are asked to prepare a group presentation of 20 minutes to be presented during your lab sessions.
  • Share your .ppt or .pdf file on Slack #week08 before the start of your lab session.

Marking criteria

You will be marked on a 0-100 points scale.

Criteria Description Marks
01: Initial Goals - Tell us about the data source your group has selected
   (itโ€™s okay if you decide to change data source later)
- Tell us what makes you curious about this data.
20
02: Data Collection - Show us a preliminary data collection. You donโ€™t need to have a lot of data, a few samples will do.
- Show us a snippet of the code you are using to collect your data.
20
03: Data Types - What does the data look like?
- What are the types of data?
20
04: Next steps - What barriers have you spotted so far in the data collection process?
- If the data collection is straightforward, say your data comes from a very clean API, what do you plan to do with the data?
- What kind of analysis do you want to present by Week 11?
20
05: Group organisation and time management - The group used the 20 minutes well
- All members of the group presented
- The presentation had a nice โ€œflowโ€, it was not repetitive and there was a storyline.
20

Week 11 presentation - Marking Criteria

This presentation is worth 20% of your final grade. You will be marked as a group, but we might reward or penalize individuals for aspects of their individual presentation in accordance with the marking criteria outlined below.

What you must do:

  • You are asked to prepare a group presentation of 15 minutes to be presented during your lab sessions.
  • Share your .ppt or .pdf file on Slack #week11 before the start of your lab session.

Marking criteria

You will be marked on a 0-100 points scale.

Marking rubric

Some clarification about how we are going to assess these presentations, in line with LSEโ€™s grade distribution expectations.

In general terms, you should expect:

  • 100/100: if, not only your presentation addresses all the questions highlighted in the marking criteria, you went an extra mile and did some incredible analysis + you showed us a complex/impressive data pre-processing pipeline. Your usage of pandas/tidyverse/plots are truly fantastic, the slides and your delivery/communication skills have WOWed us.

  • ~70/100: if your work ticks all the boxes of what is detailed in the marking criteria. Itโ€™s good!

  • ~50/100: if you didnโ€™t address all the points listed in the marking criteria or the communication was vague/imprecise. It could be much improved

  • ~30/100: you showed up to the presentation but you didnโ€™t show us enough. Either your analysis were poor, or it wasnโ€™t clear at all what youโ€™re trying to do.

Criteria Description Marks
01: Dataset Show us that you have collected enough data to conduct preliminary analysis
- Tell us how much data you have managed to collect
- Enumerate the sources of your data
- Explain the choice of the variables that you have selected and how they will help you achieve your goals
20
02: Data wrangling Tell us how you worked with your data
- What tools are you using to make your data suitable for analysis?
- How did you use pandas/tidyverse to work with your data?
- How are you dealing with missing data?
- Did you have to pivot/melt/mutate your data? How did you do that?
- If you are joining several datasets, how did you do it?
40
03: Describing data Describe your data
- Calculate the key descriptive statistics
- Provide us with statistics and visualisations describing the first insights from your data
20
05: Group organisation and time management Present the results
- The group used the 20 minutes well
- All members of the group presented
- The presentation had a nice โ€œflowโ€, it was not repetitive and there was a storyline.
20

Projectโ€™s Marking Criteria

While the final project is worth 40% of your final grade, you will be marked on a 0-100 scale.

When you submit your final project, make sure your webpage addresses the questions/requirements below:

Source Criteria Description Marks
Webpage Motivation Does the webpage explain what made you curious about this kind of data in the first place? 5
Webpage Data Does the webpage tell the story of how you gathered the data (big picture)? Make sure you tell of the challenges you faced. 5
Webpage Exploratory Data Analysis (EDA) What is in the data? What does it look like in general? How big are your datasets? What is the range and distribution of the most relevant variables? 10
Webpage Visualisation Do you have nice plots? Are the labels clear and visible, and are variables clearly identified? Do your plots and tables paint a vivid picture of what the data looks like? Did you use ggplot (R) or plotnine (python) to generate your plots? 15
Webpage Storytelling Find a balance: is your text engaging and clear? Were you able to describe the relevant technical steps of your project without too many details? Did you write a nice conclusion? 15
Source code Organisation Is your source code available in a groupโ€™s GitHub repository? Is your source code replicable? Did you create a good structure of files and directories? 10
Source code Collaboration Did you list everyoneโ€™s contribution to the project somewhere in your projectโ€™s webpage or README file? Did you contribute to your groupโ€™s Github repository with at least one commit?

Note: we do not expect all group members to do the same thing; each person could have a different contribution. For example, one person could focus more on data collection while another takes care of the visualisations, and the other member could focus more on documentation.
10
Source code Data cleaning Did you use pandas (python) or tidyverse (R) to clean up your data? Do the data types of the variables make sense? Have you taken care of the missing values, deleted them or conducted imputation? Have you ensured the variablesโ€™ values are consistent and follow the same format? 15
Source code Data wrangling Did you use pandas and/or tidyverse to filter, merge, reshape and pivot your data as needed for your analysis? 15