✋ FAQ

2023/24 Autumn Term

Author

Your frequently asked questions answered. This page will be updated as the course progresses.

Frequently asked questions

Q: What is the final project worth?

  • The final project is worth 40% of your final grade. This includes a presentation (15%) and the submission of the GitHub repository that contains the source code of your project (25%).

    Check out the ✍️ Assessments page.

Q: When is the final project due?

  • The final project is due on Tuesday, 6 February 2024 at 12 pm (Week 04 of 2023/24 Winter Term)

Q: What will I submit?

  • The process will be the same as your previous assignments involving code. You will accept a group assignment via GitHub Classroom, and this will automatically create a repository for your group. You will then work on your project in this repository and submit it by pushing your changes to GitHub.

Q: How much data should I use?

  • You don’t need a lot of data. If your dataset has a few thousand rows, it should be fine. Big data projects are more impressive, though.

Q: What kind of data should I use?

  • You will have to choose one or several data sources. Your primary main data source must:

    • be collected by your group via web scraping
    • be collected by your group via an API
    • be a database/dataset of sufficient complexity (for example, the files in IMDB’s non-commercial database).

    The data can be either tabular or unstructured data (e.g., text, images, audio, video).

    Consult us if you are not sure your data source is appropriate.

Q: Do I have to pick a ‘serious’ topic?

  • While you can go for a serious, academic topic and therefore collect data from a government API or data-rich portals like the Wikimedia projects, it is absolutely acceptable to choose a more fun, light-hearted subject and collect data from social media, a sports or gaming platform, or a streaming service.

Q: What kind of analysis should I do?

  • MUST-dos:
    • Data cleaning (e.g., using adequate data types, removing missing values, removing duplicates, etc.)
    • Data exploration (e.g., summary statistics, histograms, etc.)
    • Data visualisation with a grammar-of-graphics package, such as plotnine (static) or altair (interactive)
    • Data analysis (insights from summarising the data, a closer look at certain plots, etc.)
  • CAN-dos (things that are not taught in the course but that would make your project more impressive):
    • Machine Learning (e.g., classification, clustering, etc.)
    • Natural Language Processing (e.g., sentiment analysis, topic modelling, etc.)
    • Network Analysis (e.g., centrality measures, community detection, etc.)
    • Deep Learning (e.g., image classification, object detection, etc.)
    • Interactive websites (e.g., using Streamlit)