✋ FAQ
2023/24 Autumn Term
Your frequently asked questions answered. This page will be updated as the course progresses.
Frequently asked questions
Q: What is the final project worth?
The final project is worth 40% of your final grade. This includes a presentation (15%) and the submission of the GitHub repository that contains the source code of your project (25%).
Check out the ✍️ Assessments page.
Q: When is the final project due?
- The final project is due on Tuesday, 6 February 2024 at 12 pm (Week 04 of 2023/24 Winter Term)
Q: What will I submit?
- The process will be the same as your previous assignments involving code. You will accept a group assignment via GitHub Classroom, and this will automatically create a repository for your group. You will then work on your project in this repository and submit it by pushing your changes to GitHub.
Q: How much data should I use?
- You don’t need a lot of data. If your dataset has a few thousand rows, it should be fine. Big data projects are more impressive, though.
Q: What kind of data should I use?
You will have to choose one or several data sources. Your primary main data source must:
- be collected by your group via web scraping
- be collected by your group via an API
- be a database/dataset of sufficient complexity (for example, the files in IMDB’s non-commercial database).
The data can be either tabular or unstructured data (e.g., text, images, audio, video).
Consult us if you are not sure your data source is appropriate.
Q: Do I have to pick a ‘serious’ topic?
- While you can go for a serious, academic topic and therefore collect data from a government API or data-rich portals like the Wikimedia projects, it is absolutely acceptable to choose a more fun, light-hearted subject and collect data from social media, a sports or gaming platform, or a streaming service.
Q: What kind of analysis should I do?
- MUST-dos:
- Data cleaning (e.g., using adequate data types, removing missing values, removing duplicates, etc.)
- Data exploration (e.g., summary statistics, histograms, etc.)
- Data visualisation with a grammar-of-graphics package, such as
plotnine
(static) oraltair
(interactive) - Data analysis (insights from summarising the data, a closer look at certain plots, etc.)
- CAN-dos (things that are not taught in the course but that would make your project more impressive):
- Machine Learning (e.g., classification, clustering, etc.)
- Natural Language Processing (e.g., sentiment analysis, topic modelling, etc.)
- Network Analysis (e.g., centrality measures, community detection, etc.)
- Deep Learning (e.g., image classification, object detection, etc.)
- Interactive websites (e.g., using Streamlit)