ℹ️ Course Information

All you need to know about DS105A (2024/25)

Author
Image created with the AI embedded in MS Designer using the prompt 'abstract salmon pink light blue icon depicting the metaphysical experience of cleaning up, reshaping, pivoting, and manipulating data in search of the purest insights in data science.'

Last updated: 16 November 2024, 5pm (to clarify the W10 Summative deadline)

Are you eager to learn how to clean, reshape, pivot, and manipulate data in search of the purest insights in data science? You’re in the right place! Welcome to LSE DS105 - Data for Data Science, a LSE Data Science Institute course.

What is DS105A about?

DS105A is a course that teaches you the fundamentals of data manipulation and analysis using Python. You will learn how to clean, reshape, pivot, and manipulate data such that you can extract and communicate insights from it.

The course is designed to help you develop the skills you need to work with data in a professional setting, whether you want to become a data scientist, data analyst, or simply want to apply data science techniques in your academic research or work.

🥅 Intended Learning Outcomes

By the end of this course, you should be able to:

  • Understand the basic structure of data types and common data formats.
  • Show familiarity with international standards for common data types.
  • Manage a typical data cleaning, structuring, and analysis workflow using practical examples.
  • Clean data and diagnose common problems involved in data corruption and how to fix them.
  • Understand the concept of databases.
  • Link data from various sources.
  • Learn to use Python for the data manipulation workflow
  • Be exposed to how R is used in the data manipulation workflow and data visualisation
  • Use the collaboration and version control system GitHub, based on the git version control system.
  • Markup Language and the Markdown format for formatting documents and web pages.
  • Create and maintain simple websites using HTML and CSS.

📓 Syllabus

Check the syllabus for the most up-to-date information on the course structure, schedule, and assessment details.

📟 Communication

At DS105, we don’t think teaching is restricted to the classroom. We believe learning happens anytime, anywhere, and that frequent human-to-human communication is key to a successful learning experience.

With that in mind, we have set up a few channels for you to contact your peers, teaching staff, and administrative staff during this course.

Slack

You can think of Slack as our primary communication hub. We will use it to share resources, do a few polls, post announcements, and answer your questions. (For more serious matters, such as requests for extension, please use e-mail)

Most of our informal communication and interactions will happen through Slack. Use this place to ask questions, share resources, and engage with your peers and instructors.

Things we love to see on Slack:

  • People asking questions on the #help channel.
  • People practising what they’ve learned by answering each other’s questions

🗒️ IMPORTANT NOTE: You can contact your instructors directly on Slack and send us private direct messages (DMs). However, please note that we may not respond immediately or even on the same day. Please don’t expect any responses in the evenings or over weekends. Each week, the DS105 staff will dedicate a set number of hours to answer Slack questions, and we will always prioritise messages posted on public channels such as #help, #social, etc.

🆘 Weekly support sessions

Starting Week 02, we will offer drop-in sessions (no booking required) where you can get help with the week’s exercises, ask questions about the lectures, or chat with your peers and instructors.

Weekly Support Sessions

  • When: Every Tuesdays, 3-5pm
  • Where: COL.1.06 (DSI Visualisation Studio)
  • Who: Sara Luxmoore

Always check the 📓 Syllabus for the most up-to-date information. Some sessions might be rescheduled on particular weeks.

We will offer additional sessions on key weeks, such as closer to the deadlines for the graded assignments. We will announce these sessions on Slack, lectures and in the 📓 Syllabus.

🗨️ Office Hours

You can book a one-on-one slot with any teaching staff every week to discuss any questions about the course, practice exercises or data science as a career. As a good practice, add a little note to your booking so we can prepare in advance and make the most of our time together.

📅 Book your slot
Search for the name of one of the instructors. Add a little note to your booking so we can prepare in advance.

📧 E-mails

Leave e-mails for administrative queries (e.g. class changes, request for extensions, etc.) to Kevin, our Teaching and Assessment Administrator.

📧

👥 People and 📍 Places

Our Team

Name: Dr Jon Cardoso-Silva
Links LSE, GitHub, LinkedIn, 📧
Role at LSE: Assistant Professor (Education)
LSE Data Science Institute
At LSE since 2021
Office Hours: 🗓️ Wednesdays, 1-4pm (book via StudentHub)
Background:
  • PhD in Computer Science (King’s College London)
  • Former roles: Tech Lead, Data Scientist, Software Engineer
Likes to think about:

How Generative AI is influencing the way we learn 1.

COURSE LEADER | LECTURER

Riya Chhikara
Data Scientist
The Economist
📧

CLASS TEACHER

Alex Soldatkin
PhD candidate
Oxford School of Global and Area Studies
📧

CLASS TEACHER

Sara Luxmoore
Research Officer
LSE Data Science Institute and LSE Cities
📧

SUPPORT SESSIONS

Kevin Kittoe
Teaching and Assessment Administrator
LSE Data Science Institute
📧

ADMIN

Lecture Details

LECTURES
🗓️ Thursdays (except Week 06)
⏰ 04.00 pm - 06.00 pm
📍 CLM.5.02
👤 Jon

Class Details

📅 Class Group 1
⏰ 09.00 am - 10.30 am
📍 CKK.2.18 (check map)
👤 Riya Chhikara
(substituted by Sara Luxmoore on W01)

📅 Class Group 2
⏰ 10.30 am - 12.00 pm
📍 CKK.2.18 (check map)
👤 Riya Chhikara
(substituted by Sara Luxmoore on W01)

📅 Class Group 3
⏰ 12.00 pm - 01.30 pm
📍 PAN.3.04 (check map)
👤 Alex Soldatkin

📅 Class Group 4
⏰ 03.00 pm - 04.30pm
📍 CLM.2.06 (check map)
👤 Alex Soldatkin

📅 Class Group 5
⏰ 04.30 pm - 6.00 pm
📍 CLM.2.06 (check map)
👤 Alex Soldatkin

✍️ Assessment & Feedback

How will I be assessed in this course?

Your grade in this course consists of two main components, Coursework, worth 60% and Group Project, worth 40%. It is those two components that show up in your student record, but in reality, they are made up of several smaller parts:

  1. COURSEWORK (60%): This component is made up of two individual coding problem sets (worth 20% and 30% respectively) plus evidence of your individual contribution to the group project (10%). Instructions for each assignment will be provided in due course, and dates are in the 📓 Syllabus and the breakdown table below.

  2. GROUP PROJECT (40%): This component involves a group presentation and the submission of a GitHub repository containing the group’s code and data analysis (in the form of a website). There will also be a formative presentation (not graded) to help you prepare for the graded presentation in W10. More details will be provided in due course, and the dates are in the 📓 Syllabus and the breakdown table below.

📝 The practice exercises

Especially early in the course, you will be given lots of practice exercises (such as the 📝 W01 Formative Exercise) to help you prepare for the graded assignments. Although these are not graded, you are strongly encouraged to engage with all of them.

💡 Given the nature of the course, it is important to keep up with the material and practice regularly. The practice exercises are probably the best way to do so.

Feedback on practice exercises will primarily be provided during the lectures, where I will do a data-driven analysis of the most common mistakes and guide you on improving. However, you can ask any of the instructors for individual feedback during office hours at any time. Just remember to add a note when you book your slot so we can prepare in advance.

The 📝 W04 Formative Exercise will be an exception to this rule. There, we will provide individual feedback so you can prepare for the first graded assignment, due Week 06.

Grade Breakdown

20% Individual ✍️ W10 Summative Reveal: 17 Oct 2024
Due: 7 Nov 2024, 8pm
30% Individual ✍️ W10 Summative Reveal: 18 Nov 2024
Due: 3 Dec 2024, 8pm
10% Group Work 👥 W11 Presentation Reveal: 28 Nov 2024
Due: 13 Dec 2024
10%
30%
Group Work
+ Individual parts
📦 Final Project Reveal: 28 Nov 2024
Due: 5 Feb 2025

🤖 Generative AI Policy

Since ChatGPT came out in November 2022, teachers and experts have been thinking about how it might affect tests and assignments. Some worry that letting students use AI for answers could be seen as cheating and should be banned. Others think AI is another tool and should be allowed (Lau and Guo 2023). There’s no agreement on this, so each college or university has to figure it out.

Because of these different opinions, the official rule at LSE is to let each department and course leader choose whether to allow AI tools in tests and assignments.

There are three official positions at LSE:

Position 1: No authorised use of generative AI in assessment. (Unless your Department or course convenor indicates otherwise, the use of AI tools for grammar and spell-checking is not included in the full prohibition under Position 1.)

Position 2: Limited authorised use of generative AI in assessment.

Position 3: Full authorised use of generative AI in assessment.
👉 This is the position we adopt in this course

Source: School position on generative AI, LSE Website, September 2024

Our policy

We subscribe to Position 3, which allows the full authorised use of generative AI in assessments. However, because there are risks to your learning associated with the use of these tools, we have some guidelines to help you use these tools responsibly.

This means that you can use generative AI tools (GenAI) such as during lectures, labs, and assessments.

  1. You can use generative AI tools such as ChatGPT, Google Gemini, Claude, Notebook LM, Microsoft Copilot, GitHub Copilot, Grammarly AI, DALL·E, Midjourney, Microsoft Designer or similar during lectures, labs, and assessments.

  2. In particular for assessments, you must acknowledge the use of generative AI tools in your submission. This should identify the tool(s), and describe what you used it for and to what extent.

    💡 TIP: If you’re using ChatGPT or Gemini (or another similar AI service that lets you share a link to your chat history), you can avoid explaining how you used AI if you are disciplined at the start. When you begin working on an assignment, open a new chat window on ChatGPT or Gemini, and use that chat for all your questions about the assignment. Then, include the link to the chat history in your submission.

    The point of this acknowledgement is not punitive. We want to spot cases of when GenAI influences your learning negatively early on so we can help you improve.

  3. In code-related assessments, if you did not share your chat history,you should specify what tools were used, for what purpose and to what extent. For example:

    In Task 1, I used ChatGPT to create the skeleton of the function, then I edited the code myself to fix a problem with a variable that did not exist in the dataset. In Task 2, I typed a Python comment and let GitHub Copilot generate the code. The code worked, and it helped me realise what I had to do, but it didn’t follow the ‘no loops’ rule we learned in class. I then edited the code myself to fix this issue.

  4. In a written assessments, such as an essay or report or Jupyter Notebook, you must include a statement at the end of your submission, stating precisely how you used generative AI tools. We expect you to be honest and transparent about your use of these tools and as precise as possible. Here are some examples:

    I used ChatGPT to come up with an outline of my text, but it was too generic. I added a few paragraphs myself to make it more specific to the topic and then used the Grammarly AI to connect sentences for better readability. I didn’t accept any ‘facts’ or ‘arguments’ suggested directly by ChatGPT or Grammarly AI. I only used it to improve the structure and readability of my text.

  5. GenAI tools are not sources to be cited in the same manner as human-authored sources (books, papers, academic articles, etc.)

🗒️ Good practices for using Generative AI tools

  • DO use GenAI to personalise your learning experience. For example, you can ask it to help you better understand a concept by using an analogy related to an activity you enjoy:

    “Help me better understand the concept of data types in Python using an analogy that involves an activity I like: cooking.”

  • AVOID using GenAI when you are asked questions to check your understanding or when asked to write down your thoughts and opinions. The entire purpose of these exercises is to help you think independently – don’t delegate that to a machine!

  • DO use GenAI to produce code if you already know what you want to achieve and are confident in your coding skills to review and edit the code produced.

  • AVOID using GenAI to create the first draft of code or an essay if you’re unsure where to start. The tool might make the code unnecessarily complex or not follow standard practices. For essays, it could generate text that seems legit but that it is clear to an expert that no real thinking has been done.

  • AVOID asking GenAI questions for things you minimally understand if you don’t have enough time or the skills to fact-check the response. AI chatbots can generate plausible-sounding answers that are often wrong (in other words, “bullshit”).

  • ALWAYS give GenAI a lot of context. For example, suppose you use a tool like NotebookLM. In that case, you can attach the public links to the course materials you are studying and ask for a starting point for a task considering the content of the course materials. Alternatively, on more traditional, chat-based tools, be sure to provide a lot of context into what you’ve learned that is specific to your learning journey in this course.

Our position

The LSE Data Science Institute has been studying the impact of generative AI on education since Summer 2023 when we launched the GENIAL project. You can read more about it project on the project page.

What we have learned so far:

Although we have not yet fully analysed all the data, it is fair to summarise the good and bad aspects of using generative AI tools in education in the following way:

  • Good: The students who made the most resourceful use of GenAI remained in control of their learning. They often gave the chatbots a lot of context (“I want to perform web scraping of this website with the library scrapy, the code must contain functions – no classes – and I want to save the data in a CSV file.”) and would always check the code/output generated by GenAI against the course materials or reputable sources. They were able to identify when the AI was suggesting something that was not correct or not following best practices and would never blindly accept the AI’s suggestions.

  • Bad: If you don’t master a subject, GenAI can make you feel like you do. This pattern was frequent, for example, among students who had gaps in their understanding of programming concepts. They would ask the AI to generate code for them, and the AI would produce code that seemed to work but that generated the incorrect response or was so complex, it was virtually impossible to edit.

Read more about it in our preprint:

Dorottya Sallai, Jonathan Cardoso-Silva, Marcos E. Barreto, Francesca Panero,Ghita Berrada, and Sara Luxmoore. “Approach Generative AI Tools Proactively or Risk Bypassing the Learning Process in Higher Education”, Preprint, July 2024.

How I use GenAI in this course

When creating material for the course:

  • After I devise a plan of what I want to teach on a particular week or session, I draft the headings and subheadings of the lecture notes myself on VSCode, with the GitHub Copilot extension enabled. Very frequently, the AI autocompletes something closer to what I already wanted to say, so I hit ‘Tab’ and let it complete the sentence.
  • If I get stuck and I can’t think of any coding exercises that would help me illustrate a concept, I go to NotebookLM, import my drafts and query the tool for ideas on how to connect everything. Most of it is generic and I drop it, but sometimes it gives me a good idea that I can use.
  • Once I finished the draft, I run it through Grammarly AI to improve readability and coherence. I typically highlight a paragraph and ask it to ‘Improve but keep it conversational, jargon-free’. I then review the changes and accept them if they make sense. It is important to me that the text remains accessible to all students, regardless of their background, yet I keep my ‘voice’ in the text.

When grading your work:

  • I don’t upload your work to any commercial AI service.
  • I don’t trust current GenAI tools enough to understand the type of feedback I want to write. I find they generate more rework than they save me time.
  • But once I write my feedback comments, I tend to run them through Grammarly AI to improve readability and coherence.
  • Not AI but related: In some cases, we use autograding tools to help us check if your code is working as expected. These tools are not AI but are automated scripts that run your code against a set of tests. They are not perfect, but they help us identify common mistakes quickly.

References

Lau, Sam, and Philip Guo. 2023. “From "Ban It Till We Understand It" to "Resistance Is Futile": How University Programming Instructors Plan to Adapt as More Students Use AI Code Generation and Explanation Tools Such as ChatGPT and GitHub Copilot.” In Proceedings of the 2023 ACM Conference on International Computing Education Research V.1, 106–21. Chicago IL USA: ACM. https://doi.org/10.1145/3568813.3600138.

Footnotes

  1. Read about the GENIAL project↩︎