ℹ️ Course Information
All you need to know about DS105A (2024/25)
Last updated: 16 November 2024, 5pm (to clarify the W10 Summative deadline)
Are you eager to learn how to clean, reshape, pivot, and manipulate data in search of the purest insights in data science? You’re in the right place! Welcome to LSE DS105 - Data for Data Science, a LSE Data Science Institute course.
What is DS105A about?
DS105A is a course that teaches you the fundamentals of data manipulation and analysis using Python. You will learn how to clean, reshape, pivot, and manipulate data such that you can extract and communicate insights from it.
The course is designed to help you develop the skills you need to work with data in a professional setting, whether you want to become a data scientist, data analyst, or simply want to apply data science techniques in your academic research or work.
🥅 Intended Learning Outcomes
By the end of this course, you should be able to:
- Understand the basic structure of data types and common data formats.
- Show familiarity with international standards for common data types.
- Manage a typical data cleaning, structuring, and analysis workflow using practical examples.
- Clean data and diagnose common problems involved in data corruption and how to fix them.
- Understand the concept of databases.
- Link data from various sources.
- Learn to use Python for the data manipulation workflow
- Be exposed to how R is used in the data manipulation workflow and data visualisation
- Use the collaboration and version control system GitHub, based on the git version control system.
- Markup Language and the Markdown format for formatting documents and web pages.
- Create and maintain simple websites using HTML and CSS.
📓 Syllabus
Check the syllabus for the most up-to-date information on the course structure, schedule, and assessment details.
📟 Communication
At DS105, we don’t think teaching is restricted to the classroom. We believe learning happens anytime, anywhere, and that frequent human-to-human communication is key to a successful learning experience.
With that in mind, we have set up a few channels for you to contact your peers, teaching staff, and administrative staff during this course.
Slack
You can think of Slack as our primary communication hub. We will use it to share resources, do a few polls, post announcements, and answer your questions. (For more serious matters, such as requests for extension, please use e-mail)
Most of our informal communication and interactions will happen through Slack. Use this place to ask questions, share resources, and engage with your peers and instructors.
Things we love to see on Slack:
- People asking questions on the
#help
channel. - People practising what they’ve learned by answering each other’s questions
🗒️ IMPORTANT NOTE: You can contact your instructors directly on Slack and send us private direct messages (DMs). However, please note that we may not respond immediately or even on the same day. Please don’t expect any responses in the evenings or over weekends. Each week, the DS105 staff will dedicate a set number of hours to answer Slack questions, and we will always prioritise messages posted on public channels such as #help
, #social
, etc.
🆘 Weekly support sessions
Starting Week 02, we will offer drop-in sessions (no booking required) where you can get help with the week’s exercises, ask questions about the lectures, or chat with your peers and instructors.
Weekly Support Sessions
- When: Every Tuesdays, 3-5pm
- Where: COL.1.06 (DSI Visualisation Studio)
- Who: Sara Luxmoore
Always check the 📓 Syllabus for the most up-to-date information. Some sessions might be rescheduled on particular weeks.
We will offer additional sessions on key weeks, such as closer to the deadlines for the graded assignments. We will announce these sessions on Slack, lectures and in the 📓 Syllabus.
🗨️ Office Hours
You can book a one-on-one slot with any teaching staff every week to discuss any questions about the course, practice exercises or data science as a career. As a good practice, add a little note to your booking so we can prepare in advance and make the most of our time together.
📅 Book your slot
Search for the name of one of the instructors. Add a little note to your booking so we can prepare in advance.
📧 E-mails
Leave e-mails for administrative queries (e.g. class changes, request for extensions, etc.) to Kevin, our Teaching and Assessment Administrator.
📧
👥 People and 📍 Places
Our Team
Name: | Dr Jon Cardoso-Silva |
Links | LSE, GitHub, LinkedIn, 📧 |
Role at LSE: |
Assistant Professor (Education) LSE Data Science Institute At LSE since 2021 |
Office Hours: | 🗓️ Wednesdays, 1-4pm (book via StudentHub) |
Background: |
|
Likes to think about: |
How Generative AI is influencing the way we learn 1. |
COURSE LEADER | LECTURER
Riya Chhikara
Data Scientist
The Economist
📧
CLASS TEACHER
Alex Soldatkin
PhD candidate
Oxford School of Global and Area Studies
📧
CLASS TEACHER
Sara Luxmoore
Research Officer
LSE Data Science Institute and LSE Cities
📧
SUPPORT SESSIONS
Kevin Kittoe
Teaching and Assessment Administrator
LSE Data Science Institute
📧
ADMIN
Lecture Details
LECTURES
🗓️ Thursdays (except Week 06)
⏰ 04.00 pm - 06.00 pm
📍 CLM.5.02
👤 Jon
Class Details
📅 Class Group 1
⏰ 09.00 am - 10.30 am
📍 CKK.2.18 (check map)
👤 Riya Chhikara
(substituted by Sara Luxmoore on W01)
📅 Class Group 2
⏰ 10.30 am - 12.00 pm
📍 CKK.2.18 (check map)
👤 Riya Chhikara
(substituted by Sara Luxmoore on W01)
📅 Class Group 3
⏰ 12.00 pm - 01.30 pm
📍 PAN.3.04 (check map)
👤 Alex Soldatkin
📅 Class Group 4
⏰ 03.00 pm - 04.30pm
📍 CLM.2.06 (check map)
👤 Alex Soldatkin
📅 Class Group 5
⏰ 04.30 pm - 6.00 pm
📍 CLM.2.06 (check map)
👤 Alex Soldatkin
✍️ Assessment & Feedback
How will I be assessed in this course?
Your grade in this course consists of two main components, Coursework, worth 60% and Group Project, worth 40%. It is those two components that show up in your student record, but in reality, they are made up of several smaller parts:
COURSEWORK (60%): This component is made up of two individual coding problem sets (worth 20% and 30% respectively) plus evidence of your individual contribution to the group project (10%). Instructions for each assignment will be provided in due course, and dates are in the 📓 Syllabus and the breakdown table below.
GROUP PROJECT (40%): This component involves a group presentation and the submission of a GitHub repository containing the group’s code and data analysis (in the form of a website). There will also be a formative presentation (not graded) to help you prepare for the graded presentation in W10. More details will be provided in due course, and the dates are in the 📓 Syllabus and the breakdown table below.
📝 The practice exercises
Especially early in the course, you will be given lots of practice exercises (such as the 📝 W01 Formative Exercise) to help you prepare for the graded assignments. Although these are not graded, you are strongly encouraged to engage with all of them.
💡 Given the nature of the course, it is important to keep up with the material and practice regularly. The practice exercises are probably the best way to do so.
Feedback on practice exercises will primarily be provided during the lectures, where I will do a data-driven analysis of the most common mistakes and guide you on improving. However, you can ask any of the instructors for individual feedback during office hours at any time. Just remember to add a note when you book your slot so we can prepare in advance.
The 📝 W04 Formative Exercise will be an exception to this rule. There, we will provide individual feedback so you can prepare for the first graded assignment, due Week 06.
Grade Breakdown
20% | Individual | ✍️ W10 Summative |
Reveal: 17 Oct 2024 Due: 7 Nov 2024, 8pm |
30% | Individual | ✍️ W10 Summative |
Reveal: 18 Nov 2024 Due: 3 Dec 2024, 8pm |
10% | Group Work | 👥 W11 Presentation |
Reveal: 28 Nov 2024 Due: 13 Dec 2024 |
10%
30%
|
Group Work + Individual parts |
📦 Final Project |
Reveal: 28 Nov 2024 Due: 5 Feb 2025 |
🤖 Generative AI Policy
Since ChatGPT came out in November 2022, teachers and experts have been thinking about how it might affect tests and assignments. Some worry that letting students use AI for answers could be seen as cheating and should be banned. Others think AI is another tool and should be allowed (Lau and Guo 2023). There’s no agreement on this, so each college or university has to figure it out.
Because of these different opinions, the official rule at LSE is to let each department and course leader choose whether to allow AI tools in tests and assignments.
There are three official positions at LSE:
Position 1: No authorised use of generative AI in assessment. (Unless your Department or course convenor indicates otherwise, the use of AI tools for grammar and spell-checking is not included in the full prohibition under Position 1.)
Position 2: Limited authorised use of generative AI in assessment.
Position 3: Full authorised use of generative AI in assessment.
👉 This is the position we adopt in this course
Source: School position on generative AI, LSE Website, September 2024
Our policy
We subscribe to Position 3, which allows the full authorised use of generative AI in assessments. However, because there are risks to your learning associated with the use of these tools, we have some guidelines to help you use these tools responsibly.
This means that you can use generative AI tools (GenAI) such as during lectures, labs, and assessments.
You can use generative AI tools such as ChatGPT, Google Gemini, Claude, Notebook LM, Microsoft Copilot, GitHub Copilot, Grammarly AI, DALL·E, Midjourney, Microsoft Designer or similar during lectures, labs, and assessments.
In particular for assessments, you must acknowledge the use of generative AI tools in your submission. This should identify the tool(s), and describe what you used it for and to what extent.
💡 TIP: If you’re using ChatGPT or Gemini (or another similar AI service that lets you share a link to your chat history), you can avoid explaining how you used AI if you are disciplined at the start. When you begin working on an assignment, open a new chat window on ChatGPT or Gemini, and use that chat for all your questions about the assignment. Then, include the link to the chat history in your submission.
The point of this acknowledgement is not punitive. We want to spot cases of when GenAI influences your learning negatively early on so we can help you improve.
In code-related assessments, if you did not share your chat history,you should specify what tools were used, for what purpose and to what extent. For example:
In Task 1, I used ChatGPT to create the skeleton of the function, then I edited the code myself to fix a problem with a variable that did not exist in the dataset. In Task 2, I typed a Python comment and let GitHub Copilot generate the code. The code worked, and it helped me realise what I had to do, but it didn’t follow the ‘no loops’ rule we learned in class. I then edited the code myself to fix this issue.
In a written assessments, such as an essay or report or Jupyter Notebook, you must include a statement at the end of your submission, stating precisely how you used generative AI tools. We expect you to be honest and transparent about your use of these tools and as precise as possible. Here are some examples:
I used ChatGPT to come up with an outline of my text, but it was too generic. I added a few paragraphs myself to make it more specific to the topic and then used the Grammarly AI to connect sentences for better readability. I didn’t accept any ‘facts’ or ‘arguments’ suggested directly by ChatGPT or Grammarly AI. I only used it to improve the structure and readability of my text.
GenAI tools are not sources to be cited in the same manner as human-authored sources (books, papers, academic articles, etc.)
🗒️ Good practices for using Generative AI tools
DO use GenAI to personalise your learning experience. For example, you can ask it to help you better understand a concept by using an analogy related to an activity you enjoy:
“Help me better understand the concept of data types in Python using an analogy that involves an activity I like: cooking.”
AVOID using GenAI when you are asked questions to check your understanding or when asked to write down your thoughts and opinions. The entire purpose of these exercises is to help you think independently – don’t delegate that to a machine!
DO use GenAI to produce code if you already know what you want to achieve and are confident in your coding skills to review and edit the code produced.
AVOID using GenAI to create the first draft of code or an essay if you’re unsure where to start. The tool might make the code unnecessarily complex or not follow standard practices. For essays, it could generate text that seems legit but that it is clear to an expert that no real thinking has been done.
AVOID asking GenAI questions for things you minimally understand if you don’t have enough time or the skills to fact-check the response. AI chatbots can generate plausible-sounding answers that are often wrong (in other words, “bullshit”).
ALWAYS give GenAI a lot of context. For example, suppose you use a tool like NotebookLM. In that case, you can attach the public links to the course materials you are studying and ask for a starting point for a task considering the content of the course materials. Alternatively, on more traditional, chat-based tools, be sure to provide a lot of context into what you’ve learned that is specific to your learning journey in this course.
Our position
The LSE Data Science Institute has been studying the impact of generative AI on education since Summer 2023 when we launched the GENIAL project. You can read more about it project on the project page.
What we have learned so far:
Although we have not yet fully analysed all the data, it is fair to summarise the good and bad aspects of using generative AI tools in education in the following way:
Good: The students who made the most resourceful use of GenAI remained in control of their learning. They often gave the chatbots a lot of context (“I want to perform web scraping of this website with the library
scrapy
, the code must contain functions – no classes – and I want to save the data in a CSV file.”) and would always check the code/output generated by GenAI against the course materials or reputable sources. They were able to identify when the AI was suggesting something that was not correct or not following best practices and would never blindly accept the AI’s suggestions.Bad: If you don’t master a subject, GenAI can make you feel like you do. This pattern was frequent, for example, among students who had gaps in their understanding of programming concepts. They would ask the AI to generate code for them, and the AI would produce code that seemed to work but that generated the incorrect response or was so complex, it was virtually impossible to edit.
Read more about it in our preprint:
Dorottya Sallai, Jonathan Cardoso-Silva, Marcos E. Barreto, Francesca Panero,Ghita Berrada, and Sara Luxmoore. “Approach Generative AI Tools Proactively or Risk Bypassing the Learning Process in Higher Education”, Preprint, July 2024.
How I use GenAI in this course
When creating material for the course:
- After I devise a plan of what I want to teach on a particular week or session, I draft the headings and subheadings of the lecture notes myself on VSCode, with the GitHub Copilot extension enabled. Very frequently, the AI autocompletes something closer to what I already wanted to say, so I hit ‘Tab’ and let it complete the sentence.
- If I get stuck and I can’t think of any coding exercises that would help me illustrate a concept, I go to NotebookLM, import my drafts and query the tool for ideas on how to connect everything. Most of it is generic and I drop it, but sometimes it gives me a good idea that I can use.
- Once I finished the draft, I run it through Grammarly AI to improve readability and coherence. I typically highlight a paragraph and ask it to ‘Improve but keep it conversational, jargon-free’. I then review the changes and accept them if they make sense. It is important to me that the text remains accessible to all students, regardless of their background, yet I keep my ‘voice’ in the text.
When grading your work:
- I don’t upload your work to any commercial AI service.
- I don’t trust current GenAI tools enough to understand the type of feedback I want to write. I find they generate more rework than they save me time.
- But once I write my feedback comments, I tend to run them through Grammarly AI to improve readability and coherence.
- Not AI but related: In some cases, we use autograding tools to help us check if your code is working as expected. These tools are not AI but are automated scripts that run your code against a set of tests. They are not perfect, but they help us identify common mistakes quickly.
References
Footnotes
Read about the GENIAL project↩︎