LSE DS105 (2023)
2023/24 Winter Term
Check this page every week for more information on studying for the course.
Introduction
The first week is all about setting up your computer and getting familiar with the tools we will use in the course.
ποΈ Week 01
15 Jan 2024 -
19 Jan 2024
π§βπ« Lecture
The Data Science Toolbox and the Terminal
π» Lab
Setting up your computer and getting familiar with the terminal
π Readings
Click to see recommended resources
Indicative
- π Book Chapter: (Schutt and OβNeil 2013, chap. 1) - What is Data Science?
- π Book Chapter: (Shah 2020, chap. 1) - Introduction
Recommended
- π Academic Article: βBeyond Unicorns: Educating, Classifying, and Certifying Business Data Scientistsβ (Davenport 2020)
Behind the Scenes
Weeks 02 and 03 are about the underlying technologies that power the data science tools we will use later in the course. Donβt underestimate the importance of these topics. If you master them, you will be a lot more productive in the long run.
ποΈ Week 02
22 Jan 2024 -
26 Jan 2024
π§βπ« Lecture
Operating Systems and the βοΈ Cloud
π» Lab
Running commands on a remote computer.
π£ Assignment Reveal
Release of π W03 Summative (10%) instructions.
π Readings
Click to see recommended resources
Practice some more with the terminal:
Recommended
- π» Tutorial: βUzing Z Shell on Macsβ (Hartl 2020)
- π» Tutorial: βInstall Ubuntu on WSL2 on Windows 11β (Canonical 2022)
- π» Tutorial: βWhat is Windows Subsystem for Linuxβ (Microsoft 2022)
- π Blog post: βWhat is Ubuntu?β (Abubakar 2021)
- π Academic article: βTen simple rules for getting started with command-line bioinformaticsβ (Brandies and Hogg 2021)
Go deeper
- π Book: βMastering Linux Shell Scriptingβ (Ebrahim and Mallett 2018)
- π Book: βFundamentals of Linuxβ (Pelz 2018)
ποΈ Week 03
29 Jan 2024 -
2 Feb 2024
π» Lab
A tutorial on Git/GitHub and the Markdown language
β²οΈ Deadline
Submit your π W03 Summative (10%) via Moodle by 30 January 5 pm.
π§βπ« Lecture
Git/GitHub, the Markdown + dev environment setup (VSCode, Jupyter, etc.)
βοΈ Formative
Release of βοΈ W04 Formative (XP)
- Practice Jupyter Notebooks (Markdown + Python code)
- Submit on GitHub Classroom for feedback.
Collecting Data
In the next few weeks, we will learn how to collect data from the web. This is a crucial skill for data scientists!
ποΈ Week 04
5 Feb 2024 -
9 Feb 2024
π» Lab
Data types and the basics of pandas
π Drop-in sessions
We will host drop-in sessions in Week 04 to help you with the βοΈ W04 Formative (XP) or any set-up issues you might have.
β²οΈ Deadline
Submit your βοΈ W04 Formative (XP) via GitHub by 7 February 2024, 5 pm.
π§βπ« Lecture
The Internet, its protocols and the Web
(plus scraping data from the web using Python)
π£ Assignment Reveal
Release of π W06 Summative (30%).
Deadline: 21 February 5 pm
π Readings
Click to see recommended resources
Indicative
- π Book: (Duckett 2014, chaps. 1β6) - HTML & CSS - design and build websites
- π Software Documentation: βBeautifulSoup libraryβ (beautifulSoup 2023)
Recommended
- π Software Documentation: βpython requests libraryβ (requests 2023)
- π Software Documentation: βData types β NumPy v1.24 Manualβ (numpy 2022a)
- π Software Documentation: βIntro to data structures β Pandas v1.5.3 Manualβ (pandas 2022b)
- π Software Documentation: βHow do I subset data? β Pandas v1.5.3 Manualβ (pandas 2022a)
Go Deeper
π Software Documentation: βData type objects (dtype) β NumPy v1.24 Manualβ (numpy 2022b)
</td>
ποΈ Week 05
12 Feb 2024 -
16 Feb 2024
π» Lab
Web scraping
π Drop-in sessions
We will host drop-in sessions to help you get up to speed with web scraping.
π§βπ« Lecture
More web scraping: CSS Selectors & XPaths
ποΈ Week 06
19 Feb 2024 -
23 Feb 2024
π Drop-in sessions
There is no lecture or lab this week. Instead, we will hold drop-in sessions to help you with your work.
Check your calendar for the exact times.
β²οΈ Deadline
Submit your π W06 Summative (30%) via GitHub until 21 February 5 pm.
π£ Assignment Reveal
Release of π W08 Summative (20%).
Deadline: 6 March 2024, 5 pm (Week 08)
Cleaning and reshaping data
ποΈ Week 07
26 Feb 2024 -
1 Mar 2024
π» Lab
Simple data cleaning with pandas
GENIAL Open Lecture
Using AI Chatbots for Learning
π Location: CBG Auditorium
ποΈ Date: 27 February 2024
β Time: 18:00 - 20:00
π Details
π§βπ« Lecture
Data summarisation and the grammar of graphics
Putting it all together: from web scraping to initial data cleaning
A case study on web scraping, compiling a list of the last few UK general elections from Wikipedia, and revisiting the essential concepts we have been covering, such as CSS/XPath selectors, functions, list comprehensions, and using pd.apply()
, with a hands-on approach to Git commands.
ποΈ Week 08
4 Mar 2024 -
8 Mar 2024
π» Lab
Recap of data types + the concept of grammar-of-graphics
(using the plotnine
library)
- We will also mediate the formation of teams for the final project.
π Drop-in sessions
Join one of the drop-in sessions throughout Tuesday, 5 March 2024, to help you with your upcoming deadline.
- Tuesday, 5 Mar 10am-12pm (in-person only): COL.1.06 DSI Visualisation Studio. Host: our class teacher, Sara Luxmoore.
- Tuesday, 5 Mar 3pm-4pm (hybrid): KSL.2.02. Host: our colleagues at the Digital Skills Lab (not teachers in the course).
- Tuesday, 5 Mar 4pm-6pm (online only): Microsoft Teams (check your calendar invites). Host: our class teacher, Alexander Soldatkin.
β²οΈ Deadline
Submit your π W08 Summative (20%) assessment via GitHub by 8 March 2024, 5 pm.
π§βπ« Lecture
Data summarisation and more grammar-of-graphics
ποΈ Week 09
11 Mar 2024 -
15 Mar 2024
π£ Assignment Reveal
(Formative)
Groups must start thinking about the Week 10 presentation.
- Write and sign a βteam contractβ
- Prepare a 10-minute pitch of their project idea.
π» Lab
Using Git as a team
- Setting up your project board on GitHub
- Git branches and pull requests
π§βπ« Lecture
Databases and join operations with pandas
and SQL
Applications
In the final two weeks, the focus is on setting up your projects. The lectures focus on practical applications and tips that closely resemble the problems you are facing in your projects.
For example, if several groups are struggling with merging data from two different data sources, I select a dataset that requires this operation and show you how to do it. If groups are not struggling with anything in particular, I have some content prepared on text mining and network analysis.
ποΈ Week 10
18 Mar 2024 -
22 Mar 2024
π» Lab
π£οΈ GROUP PRESENTATIONS (formative)
π§βπ« Lecture
Applications I: conda
environments + pivot tables
ποΈ Week 11
25 Mar 2024 -
28 Mar 2024
π» Lab
π£οΈ GROUP PRESENTATIONS (15%)
- Present your progress.
- We donβt expect to see much code or any data analysis at this stage.
- The key thing is: have you started collecting data? Compared to the ideas presented on the previous week, does your project still seem feasible?
π§βπ« Lecture
Applications II: Regular expressions and text mining
Final Steps (Winter Term)
After the end of the Autumn Term, you will have to submit your final project (25%). The deadline is in Week 04 of the Winter Term. More details about the requirements of the final project, as well as drop-in sessions will be announced in the Autumn Term.