Step 1: Forming Groups (W09)
2023/24 Winter Term
During the 💻 Week 09 lab, we’ll ask you to form groups and begin working on a data project together. We’ve set aside some time for you to start forming groups, but if you haven’t done so yet, here’s what you need to do:
What we expect from you
Starting Week 09, you will begin working in groups. You have until 23 May 2024, 5pm UK time to complete it. The project will be a data science project, focusing more on data collection and manipulation than deep analysis. It will involve the following tasks:
Collecting data by yourself: You can either scrape data from a website (using
scrapy
, spiders, or selenium) or collect it from an API (as seen in the 🧑🏫 W08 lecture).Other than the
scrapy
library, we will also allow the use ofscrapy
spiders andselenium
if you prefer them for your project.If you need extra support for
spiders
orselenium
, let us know so we can plan additional teaching sessions during the Spring Term.⚠️ You cannot use
BeautifulSoup
,lxml
, or any other library to scrape data. You can only usescrapy
orselenium
for scraping.If you decide to collect data from an API, you’re free to choose any API, but you must use the
requests
library to collect the data.⚠️ You cannot use ready-made libraries like
tweepy
orpraw
to collect data from Twitter or Reddit. You must userequests
to collect the data from APIs.
Organising the collected data: Adhere to the DS105-style by avoiding
for
andwhile
loops (unless unavoidable), using list/dict comprehension for creating dictionaries, and using custom functions in.py
modules. Aim to usepd.apply()
for efficient single-column data creation/manipulation.Saving the data in a database: You must save your data using
sqlite3
(covered in the 🧑🏫 W10 lecture).Manipulating data in a vectorised manner with
pandas
: Demonstrate that you find opportunities to usegroupby()->apply()
andpivot()
, either in pandas or in SQL (covered in the 🧑🏫 W09 notebook and 🧑🏫 W10 lecture).Creating plots using the grammar-of-graphics style: Use
plotnine
(covered in the 💻 W08 lab) or altair.Cleaning text data using regular expressions: Use the
re
library mindfully (covered in the 🧑🏫 W11 lecture).Effective GitHub collaboration: Your group used branches, issues and pull requests effectively.
Neat website: your group’s website is neat and well-organised, with a clear structure and a good design.
The Process
Speak with others within your class group and try to form a group of three people from the same class group.
- If there are odd numbers, your class teacher may allow a group of four people.
- You must obtain your class teachers’ explicit permission to form a group that isn’t composed of three people.
Reach out to others on Slack or inform your class teacher if you don’t know anyone in your class or can’t form a group of three people. They will help facilitate the process.
What’s next? After forming a group, check out what we expect to see from you in the W10 lab’s pitch presentation (formative).
💡 Tips
General tips
- Create private Slack channels for your group to discuss your project (we don’t have access to them; they’re private). You can also use a different communication tool like WhatsApp.
- Once you’ve formed a group, schedule a meeting with your group members to create a team contract (outlining who will do what and how you’ll hold each other accountable) and a document with your initial data source ideas.