Step 1: Forming Groups (W09)

2023/24 Winter Term

How to form groups for the group project.
Author

During the 💻 Week 09 lab, we’ll ask you to form groups and begin working on a data project together. We’ve set aside some time for you to start forming groups, but if you haven’t done so yet, here’s what you need to do:

What we expect from you

Starting Week 09, you will begin working in groups. You have until 23 May 2024, 5pm UK time to complete it. The project will be a data science project, focusing more on data collection and manipulation than deep analysis. It will involve the following tasks:

  • Collecting data by yourself: You can either scrape data from a website (using scrapy, spiders, or selenium) or collect it from an API (as seen in the 🧑‍🏫 W08 lecture).

    • Other than the scrapy library, we will also allow the use of scrapy spiders and selenium if you prefer them for your project.

    • If you need extra support for spiders or selenium, let us know so we can plan additional teaching sessions during the Spring Term.

    • ⚠️ You cannot use BeautifulSoup, lxml, or any other library to scrape data. You can only use scrapy or selenium for scraping.

    • If you decide to collect data from an API, you’re free to choose any API, but you must use the requests library to collect the data.

    • ⚠️ You cannot use ready-made libraries like tweepy or praw to collect data from Twitter or Reddit. You must use requests to collect the data from APIs.

  • Organising the collected data: Adhere to the DS105-style by avoiding for and while loops (unless unavoidable), using list/dict comprehension for creating dictionaries, and using custom functions in .py modules. Aim to use pd.apply() for efficient single-column data creation/manipulation.

  • Saving the data in a database: You must save your data using sqlite3 (covered in the 🧑‍🏫 W10 lecture).

  • Manipulating data in a vectorised manner with pandas: Demonstrate that you find opportunities to use groupby()->apply() and pivot(), either in pandas or in SQL (covered in the 🧑‍🏫 W09 notebook and 🧑‍🏫 W10 lecture).

  • Creating plots using the grammar-of-graphics style: Use plotnine (covered in the 💻 W08 lab) or altair.

  • Cleaning text data using regular expressions: Use the re library mindfully (covered in the 🧑‍🏫 W11 lecture).

  • Effective GitHub collaboration: Your group used branches, issues and pull requests effectively.

  • Neat website: your group’s website is neat and well-organised, with a clear structure and a good design.

The Process

  1. Speak with others within your class group and try to form a group of three people from the same class group.

    • If there are odd numbers, your class teacher may allow a group of four people.
    • You must obtain your class teachers’ explicit permission to form a group that isn’t composed of three people.
  2. Reach out to others on Slack or inform your class teacher if you don’t know anyone in your class or can’t form a group of three people. They will help facilitate the process.

What’s next? After forming a group, check out what we expect to see from you in the W10 lab’s pitch presentation (formative).

💡 Tips

General tips

  • Create private Slack channels for your group to discuss your project (we don’t have access to them; they’re private). You can also use a different communication tool like WhatsApp.
  • Once you’ve formed a group, schedule a meeting with your group members to create a team contract (outlining who will do what and how you’ll hold each other accountable) and a document with your initial data source ideas.