LSE DS105 (2023)

2023/24 Winter Term

Check this page every week for more information on studying for the course.

Introduction

The first week is all about setting up your computer and getting familiar with the tools we will use in the course.

πŸ—“οΈ Week 01
15 Jan 2024 -
19 Jan 2024

πŸ§‘β€πŸ« Lecture

The Data Science Toolbox and the Terminal

πŸ’» Lab

Setting up your computer and getting familiar with the terminal

πŸ“– Readings

Click to see recommended resources

Indicative

Recommended

  • πŸ“ƒ Academic Article: β€œBeyond Unicorns: Educating, Classifying, and Certifying Business Data Scientists” (Davenport 2020)

Behind the Scenes

Weeks 02 and 03 are about the underlying technologies that power the data science tools we will use later in the course. Don’t underestimate the importance of these topics. If you master them, you will be a lot more productive in the long run.

πŸ—“οΈ Week 02
22 Jan 2024 -
26 Jan 2024

πŸ§‘β€πŸ« Lecture

Operating Systems and the ☁️ Cloud

πŸ’» Lab

Running commands on a remote computer.

πŸ“£ Assignment Reveal

Release of πŸ“ W03 Summative (10%) instructions.

πŸ“– Readings

Click to see recommended resources

Practice some more with the terminal:

Recommended

  • πŸ’» Tutorial: β€œUzing Z Shell on Macs” (Hartl 2020)
  • πŸ’» Tutorial: β€œInstall Ubuntu on WSL2 on Windows 11” (Canonical 2022)
  • πŸ’» Tutorial: β€œWhat is Windows Subsystem for Linux” (Microsoft 2022)
  • πŸ“„ Blog post: β€œWhat is Ubuntu?” (Abubakar 2021)
  • πŸ“ƒ Academic article: β€œTen simple rules for getting started with command-line bioinformatics” (Brandies and Hogg 2021)

Go deeper

πŸ—“οΈ Week 03
29 Jan 2024 -
2 Feb 2024

πŸ’» Lab

A tutorial on Git/GitHub and the Markdown language

⏲️ Deadline

Submit your πŸ“ W03 Summative (10%) via Moodle by 30 January 5 pm.

πŸ§‘β€πŸ« Lecture

Git/GitHub, the Markdown + dev environment setup (VSCode, Jupyter, etc.)

✍️ Formative

Release of ✏️ W04 Formative (XP)

  • Practice Jupyter Notebooks (Markdown + Python code)
  • Submit on GitHub Classroom for feedback.

Collecting Data

In the next few weeks, we will learn how to collect data from the web. This is a crucial skill for data scientists!

πŸ—“οΈ Week 04
5 Feb 2024 -
9 Feb 2024

πŸ’» Lab

Data types and the basics of pandas

πŸ†˜ Drop-in sessions

We will host drop-in sessions in Week 04 to help you with the ✏️ W04 Formative (XP) or any set-up issues you might have.

⏲️ Deadline

Submit your ✏️ W04 Formative (XP) via GitHub by 7 February 2024, 5 pm.

πŸ§‘β€πŸ« Lecture

The Internet, its protocols and the Web
(plus scraping data from the web using Python)

πŸ“£ Assignment Reveal

Release of πŸ“ W06 Summative (30%).
Deadline: 21 February 5 pm

πŸ“– Readings

Click to see recommended resources

Indicative

Recommended

  • πŸ“– Software Documentation: β€œpython requests library” (requests 2023)
  • πŸ“– Software Documentation: β€œData types β€” NumPy v1.24 Manual” (numpy 2022a)
  • πŸ“– Software Documentation: β€œIntro to data structures β€” Pandas v1.5.3 Manual” (pandas 2022b)
  • πŸ“– Software Documentation: β€œHow do I subset data? β€” Pandas v1.5.3 Manual” (pandas 2022a)

Go Deeper

  • πŸ“– Software Documentation: β€œData type objects (dtype) β€” NumPy v1.24 Manual” (numpy 2022b)

      </td>

πŸ—“οΈ Week 05
12 Feb 2024 -
16 Feb 2024

πŸ’» Lab

Web scraping

πŸ†˜ Drop-in sessions

We will host drop-in sessions to help you get up to speed with web scraping.

πŸ§‘β€πŸ« Lecture

More web scraping: CSS Selectors & XPaths

πŸ—“οΈ Week 06
19 Feb 2024 -
23 Feb 2024

πŸ†˜ Drop-in sessions

There is no lecture or lab this week. Instead, we will hold drop-in sessions to help you with your work.
Check your calendar for the exact times.

⏲️ Deadline

Submit your πŸ“ W06 Summative (30%) via GitHub until 21 February 5 pm.

πŸ“£ Assignment Reveal

Release of πŸ“ W08 Summative (20%).
Deadline: 6 March 2024, 5 pm (Week 08)

Cleaning and reshaping data

πŸ—“οΈ Week 07
26 Feb 2024 -
1 Mar 2024

πŸ’» Lab

Simple data cleaning with pandas

GENIAL Open Lecture

Using AI Chatbots for Learning
πŸ“ Location: CBG Auditorium
πŸ—“οΈ Date: 27 February 2024
⌚ Time: 18:00 - 20:00
πŸ”— Details

πŸ§‘β€πŸ« Lecture

Data summarisation and the grammar of graphics
Putting it all together: from web scraping to initial data cleaning

A case study on web scraping, compiling a list of the last few UK general elections from Wikipedia, and revisiting the essential concepts we have been covering, such as CSS/XPath selectors, functions, list comprehensions, and using pd.apply(), with a hands-on approach to Git commands.

πŸ—“οΈ Week 08
4 Mar 2024 -
8 Mar 2024

πŸ’» Lab

Recap of data types + the concept of grammar-of-graphics
(using the plotnine library)

  • We will also mediate the formation of teams for the final project.

πŸ†˜ Drop-in sessions

Join one of the drop-in sessions throughout Tuesday, 5 March 2024, to help you with your upcoming deadline.

  • Tuesday, 5 Mar 10am-12pm (in-person only): COL.1.06 DSI Visualisation Studio. Host: our class teacher, Sara Luxmoore.
  • Tuesday, 5 Mar 3pm-4pm (hybrid): KSL.2.02. Host: our colleagues at the Digital Skills Lab (not teachers in the course).
  • Tuesday, 5 Mar 4pm-6pm (online only): Microsoft Teams (check your calendar invites). Host: our class teacher, Alexander Soldatkin.

⏲️ Deadline

Submit your πŸ“ W08 Summative (20%) assessment via GitHub by 8 March 2024, 5 pm.

πŸ§‘β€πŸ« Lecture

Data summarisation and more grammar-of-graphics

πŸ—“οΈ Week 09
11 Mar 2024 -
15 Mar 2024

πŸ“£ Assignment Reveal
(Formative)

Groups must start thinking about the Week 10 presentation.

  • Write and sign a β€˜team contract’
  • Prepare a 10-minute pitch of their project idea.

πŸ’» Lab

Using Git as a team

  • Setting up your project board on GitHub
  • Git branches and pull requests

πŸ§‘β€πŸ« Lecture

Databases and join operations with pandas and SQL

Applications

In the final two weeks, the focus is on setting up your projects. The lectures focus on practical applications and tips that closely resemble the problems you are facing in your projects.

For example, if several groups are struggling with merging data from two different data sources, I select a dataset that requires this operation and show you how to do it. If groups are not struggling with anything in particular, I have some content prepared on text mining and network analysis.

πŸ—“οΈ Week 10
18 Mar 2024 -
22 Mar 2024

πŸ’» Lab

πŸ—£οΈ GROUP PRESENTATIONS (formative)

πŸ§‘β€πŸ« Lecture

Applications I: conda environments + pivot tables

πŸ—“οΈ Week 11
25 Mar 2024 -
28 Mar 2024

πŸ’» Lab

πŸ—£οΈ GROUP PRESENTATIONS (15%)

  • Present your progress.
  • We don’t expect to see much code or any data analysis at this stage.
  • The key thing is: have you started collecting data? Compared to the ideas presented on the previous week, does your project still seem feasible?

πŸ§‘β€πŸ« Lecture

Applications II: Regular expressions and text mining

Final Steps (Winter Term)

After the end of the Autumn Term, you will have to submit your final project (25%). The deadline is in Week 04 of the Winter Term. More details about the requirements of the final project, as well as drop-in sessions will be announced in the Autumn Term.

References

Abubakar, Mohammed. 2021. β€œWhat Is Ubuntu?” Blogpost. How-To Geek. https://www.howtogeek.com/763775/what-is-ubuntu/.
beautifulSoup. 2023. β€œBeautiful Soup Documentation β€” Beautiful Soup 4.9.0 Documentation.” https://www.crummy.com/software/BeautifulSoup/bs4/doc/.
Brandies, Parice A., and Carolyn J. Hogg. 2021. β€œTen Simple Rules for Getting Started with Command-Line Bioinformatics.” PLOS Computational Biology 17 (2): e1008645. https://doi.org/10.1371/journal.pcbi.1008645.
Canonical. 2022. β€œInstall Ubuntu on WSL2 on Windows 11 with GUI Support.” Tutorial. Ubuntu. https://ubuntu.com/tutorials/install-ubuntu-on-wsl2-on-windows-11-with-gui-support.
Davenport, Thomas. 2020. β€œBeyond Unicorns: Educating, Classifying, and Certifying Business Data Scientists.” Harvard Data Science Review 2 (2). https://doi.org/10.1162/99608f92.55546b4a.
Duckett, Jon. 2014. HTML & CSS: Design and Build Websites. Indianapolis, Indiana: John Wiley & Sons Inc.
Ebrahim, Mokhtar, and Andrew Mallett. 2018. Mastering Linux Shell Scripting: A Practical Guide to Linux Command-Line, Bash Scripting, and Shell Programming, 2nd Edition. 2nd ed. Birmingham: Packt Publishing.
Hartl, Michael. 2020. β€œUsing Z Shell on Macs with the Learn Enough Tutorials.” Online {Course}. Learn Enough News & Blog. https://news.learnenough.com/macos-bash-zshell.
Microsoft. 2022. β€œWhat Is Windows Subsystem for Linux.” Tutorial. What Is Windows Subsystem for Linux. https://docs.microsoft.com/en-us/windows/wsl/about.
numpy. 2022a. β€œData Type Objects (Dtype) β€” NumPy V1.24 Manual.” https://numpy.org/doc/1.24/reference/arrays.dtypes.html#arrays-dtypes.
β€”β€”β€”. 2022b. β€œData Types β€” NumPy V1.24 Manual.” https://numpy.org/doc/1.24/user/basics.types.html.
pandas. 2022a. β€œHow Do I Select a Subset of a DataFrame? β€” Pandas 1.5.3 Documentation.” https://pandas.pydata.org/docs/getting_started/intro_tutorials/03_subset_data.html.
β€”β€”β€”. 2022b. β€œIntro to Data Structures β€” Pandas 1.5.3 Documentation.” https://pandas.pydata.org/pandas-docs/version/1.5/user_guide/dsintro.html#dsintro.
Pelz, Oliver. 2018. Fundamentals of Linux: Explore the Essentials of the Linux Command Line. Birmingham: Packt Publishing Ltd.
requests. 2023. β€œRequests: HTTP for Humansβ„’ β€” Requests Documentation.” https://requests.readthedocs.io/en/v3.0.0/.
Schutt, Rachel, and Cathy O’Neil. 2013. Doing Data Science. 1st edition. Beijing ; Sebastopol: O’Reilly Media. https://ebookcentral.proquest.com/lib/londonschoolecons/detail.action?docID=1465965.
Shah, Chirag. 2020. A Hands-on Introduction to Data Science. Cambridge, United Kingdom ; New York, NY, USA: Cambridge University Press. https://librarysearch.lse.ac.uk/permalink/f/1n2k4al/TN_cdi_askewsholts_vlebooks_9781108673907.