LSE DS105 (2023)
Data for Data Scientists
Check this page every week to see more info on how to study for the course.
Introduction
The first week is all about setting up your computer and getting familiar with the tools we will use in the course.
ποΈ Week 01
25 Sep 2023 -
29 Sep 2023
π§βπ« Lecture
The Data Science Toolbox and the Terminal
π» Lab
Set up your computer and meet the Terminal
π Readings
Click to see recommended resources
Indicative
- π Book Chapter: (Schutt and OβNeil 2013, chap. 1) - What is Data Science?
- π Book Chapter: (Shah 2020, chap. 1) - Introduction
Recommended
- π Academic Article: βBeyond Unicorns: Educating, Classifying, and Certifying Business Data Scientistsβ (Davenport 2020)
Behind the Scenes
Weeks 02 and 03 are about the underlying technologies that power the data science tools we will use later in the course. Donβt underestimate the importance of these topics. If you master them, you will be a lot more productive in the long run.
ποΈ Week 02
02 Oct 2023 -
06 Oct 2023
π§βπ« Lecture
Operating Systems and the βοΈ Cloud
π» Lab
Running commands on a remote computer
π£ Assignment Reveal
Release of Problem Set 01 β Shell scripting (10%).
Details: βοΈ W03 Summative
π Readings
Click to see recommended resources
Practice some more with the terminal:
Recommended
- π» Tutorial: βUzing Z Shell on Macsβ (Hartl 2020)
- π» Tutorial: βInstall Ubuntu on WSL2 on Windows 11β (Canonical 2022)
- π» Tutorial: βWhat is Windows Subsystem for Linuxβ (Microsoft 2022)
- π Blog post: βWhat is Ubuntu?β (Abubakar 2021)
- π Academic article: βTen simple rules for getting started with command-line bioinformaticsβ (Brandies and Hogg 2021)
Go deeper
- π Book: βMastering Linux Shell Scriptingβ (Ebrahim and Mallett 2018)
- π Book: βFundamentals of Linuxβ (Pelz 2018)
ποΈ Week 03
09 Oct 2023 -
13 Oct 2023
π Drop-in sessions
We will host drop-in sessions in early Week 03 to help support you with your Problem Set 01.
β²οΈ Deadline
Submit your Problem Set 01 via Moodle a day before the lecture.
π§βπ« Lecture
Original title: Data types, File formats, Git and markdown
Revised title: Git, GitHub & Markdown
π» Lab
Git tutorial + handling your first Git conflict
βοΈ Formative
Practice Python for
and while
loops.
Submit on GitHub Classroom for feedback.
More details in the lecture.
π Readings
Click to see recommended resources
Indicative
- π Book: (Duckett 2014, chaps. 1β6) - HTML & CSS - design and build websites
- π Software Documentation: βBeautifulSoup libraryβ (beautifulSoup 2023)
Recommended
- π Software Documentation: βpython requests libraryβ (requests 2023)
- π Software Documentation: βData types β NumPy v1.24 Manualβ (numpy 2022a)
- π Software Documentation: βIntro to data structures β Pandas v1.5.3 Manualβ (pandas 2022b)
- π Software Documentation: βHow do I subset data? β Pandas v1.5.3 Manualβ (pandas 2022a)
Go Deeper
π Software Documentation: βData type objects (dtype) β NumPy v1.24 Manualβ (numpy 2022b)
</td>
Collecting Data
In the next few weeks, we will spend some time learning how to collect data from the web. This is a crucial skill for data scientists!
ποΈ Week 04
16 Oct 2023 -
20 Oct 2023
π§βπ« Lecture
Original title: The Internet and the World Wide Web
Revised title: Data types, File formats & Python tricks
π» Lab
Web Scraping in Python using the requests
and scrapy
libraries
π£ Assignment Reveal
Release of Problem Set 02 β Web Scraping (20%).
Details: βοΈ W05 Summative
ποΈ Week 05
23 Oct 2023 -
27 Oct 2023
β²οΈ Deadline
Submit your Problem Set 02 via GitHub Classroom until a day before the lecture.
π§βπ« Lecture
Web APIs and principles of data collection
π» Lab
Collecting data from APIs in Python using the requests
library
π£ Assignment Reveal
Release of Problem Set 03 β Web APIs (30%).
Deadline: W07
Details: TBA during the lecture.
ποΈ Week 06
30 Oct 2023 -
03 Nov 2023
π Drop-in sessions
There is no lecture or lab this week. Instead, we will hold drop-in sessions to help you with your Summative 03. The exact times and dates will be announced in the lecture of Week 05.
Cleaning and reshaping data
Here we reach the main core of the course. We will spend a lot of time learning how to clean and reshape data.
ποΈ Week 07
06 Nov 2023 -
10 Nov 2023
β²οΈ Deadline
Submit your Problem Set 03 via GitHub Classroom until a day before the lecture.
π§βπ« Lecture
Data summarisation and the grammar of graphics
π» Lab
- Dataviz with
plotnine
- Form your groups for the project in the lab
π£ Assignment Reveal
(Formative)
For Week 08, each group will have to:
- Write and sign a βteam contractβ
- Prepare a 10-minute pitch of their project idea.
Details: TBA during the lecture.
ποΈ Week 08
13 Nov 2023 -
17 Nov 2023
β²οΈ Deadline
Submit your team contracts via GitHub Classroom until the day of the lecture.
π§βπ« Lecture
Databases & data pivoting
Pre-processing and grouping data with pandas, a groupby-apply tutorial
π» Lab
π£οΈ GROUP PRESENTATIONS (formative)
βοΈ Formative
This is a group assignment we will do during the lecture.
Practice using GitHub as a team to collaborate on a data reshaping task.
ποΈ Week 09
20 Nov 2023 -
24 Nov 2023
π§βπ« Lecture
Conda environments, databases and join operations
π» Lab
Github Issues & Pull Requests
π£ Assignment Reveal
Groups must start preparing a group presentation for Week 11.
Details: TBA during the lecture.
Applications
In the final two weeks, the focus is on setting up your projects. The lectures focus on practical applications and tips that closely resemble the problems you are facing in your projects.
For example, if several groups are struggling with merging data from two different data sources, I select a dataset that requires this operation and show you how to do it. If groups are not struggling with anything in particular, I have some content prepared on text mining and network analysis.
ποΈ Week 10
27 Nov 2023 -
01 Dec 2023
π§βπ« Lecture
Applications I: Text mining/Network analysis
π» Lab
π¦Έπ»ββοΈ Super Tech Support
- We will use the lab to help you with your projects.
ποΈ Week 11
04 Dec 2023 -
08 Dec 2023
π§βπ« Lecture
Applications II: Text Mining/Network Analysis
π» Lab
π£οΈ GROUP PRESENTATIONS (15%)
Final Steps (Winter Term)
After the end of the Autumn Term, you will have to submit your final project (25%). The deadline is in Week 04 of the Winter Term. More details about the requirements of the final project, as well as drop-in sessions will be announced in the Autumn Term.