πŸ“‹ Getting Ready

2024/25 Winter Term

Author

On this page, you will find information on how to set up your computer for this course, as well as practical advice on Python programming.

1. Set up your computer

Below you will find links to download and install the software that you will need for this course. If you prefer a more detailed guide, you can follow this guide created by our colleagues from the LSE Digital Skills Lab.

Important

LSE Digital Skills Labs are also offering in-person workshops at the start of winter term (to find the workshops head to the Training and Development System and search for Python Pre-Sessional Workshops) . We highly recommend you take these workshops if you are new to Python programming.

1.1 Checking your operating system (OS) details

Before doing any install, take note of your operating system (i.e whether you are on Windows, Mac, Linux and which version/flavour you are running e.g Windows 10, Windows 11, Mac Ventura 13.6, Ubuntu 23.10 Mantic Minotaur, Debian version 12.2 bookworm).

To find out this information, follow these instructions:

On Windows

This tutorial explains the many different ways you can try to find detailed information about your operating system.

On Mac

For Mac, head this way. In Mac’s case, also take note of the chip information i.e whether you have an Intel chip or an M1 chip (which version of software you should download, e.g for Anaconda, also depends on this piece of information!).

On Linux

If you are running a Linux distribution (e.g Ubuntu, Fedora, Debian, Gentoo), open a terminal and follow either of this or this.

Once you’ve taken note of operating system (OS) details, you’re ready to proceed with the installs.

1.2 Install Anaconda

Anaconda is an open source distribution of the Python language distribution for data science and machine learning: it bundles Python, popular libraries, Jupyter Notebook, and the conda manager together for easier package and environment management. If you want to understand the difference between Anaconda and Python a bit better, have a look at this link. As of the time of writing, Anaconda comes bundled with Python 3.12 (for all operating systems), which is the version of Python we’ll use in this course.

To install the latest version of Anaconda for you OS:

  1. Head to the Anaconda download page
  2. Choose the graphical installer that best corresponds to your own OS and download it by clicking on the link on the page
  3. Follow the install instructions corresponding to your OS found here

⚠️ If installing on Windows, we would suggest the following alterations to the instructions given in the Anaconda documentation:

  • If you have admin rights and you are not sharing your computer with anyone, we would recommend that, in step 5, you install Python for All Users instead of JustMe. This is to avoid you having to think about which account you installed Python for and save confusion.
  • In step 8, tick the Add Anaconda3 to my PATH environment variable. If you don’t, after the install, the system won’t recognize python as a valid command and you won’t be able to use python as a command on the command line terminal.
  1. Test your installation as shown here. On Mac, just as on Linux, you can open Terminal to do your testing and on Windows, you can try opening the Command Line Prompt or PowerShell (to open the Command Line Prompt, press Windows+R, type cmd and press Enter and alternatively, to open PowerShell, press Windows+R, type powershell and press Enter). We would recommend using Terminal for both Mac and Linux and either of Command Line Prompt or PowerShell on Windows and testing the commands conda list and python. If these commands are not recognized, this means Python and/or Anaconda are not on the PATH and you will need to add them to the PATH: you can follow this tutorial to do so.

⚠️ Find out in which folder Anaconda is installed in your system and within that folder, which folders contain the python and conda executables, e.g C:\ProgramData\anaconda3\ for the python executable i.e C:\ProgramData\anaconda3\python.exe and `C:\ProgramData\anaconda3\Scripts\ for the conda executable i.e C:\ProgramData\anaconda3\Scripts\conda.exe. It is those folders you should add to your environment variables/PATH. Based on the previous example, you would be adding C:\ProgramData\anaconda3\ and C:\ProgramData\anaconda3\Scripts\ to PATH.

You are now done with installing Anaconda/Python.

1.3 Install Quarto

In this course, we’ll be using Python within Quarto Markdown files and/or heavily annotating Python notebooks with Markdown and converting them into Quarto Markdown files. We’ll need to install Quarto before we install our IDE of choice (VSCode) to avoid any installation issues.

Install the version of Quarto recommended for your OS.

⚠️ In case you encounter issues with the latest version of Quarto (version 1.6.40 at the time of writing), try uninstalling the latest version of Quarto and installing an older version from this page (tab Older releases) e.g version 1.4.557. Make sure however that you have a version above 1.4 installed: some of the features in the tutorial (e.g inline Python code) won’t work for earlier versions of Quarto!

1.4 Install an IDE (VSCode)

Once you download Anaconda, you can start using Python on the Terminal/Command Line straight away. All you need to do is type python in your terminal/command line to start the Python interpreter (provided you’ve made sure to add Python and Anaconda to PATH during install). You can also write your Python scripts in any text editor and run them from the terminal with the python command. However, it is much more convenient to use an integrated development environment (IDE) that provides a graphical user interface (GUI) to Python.

In this course, we will use Visual Studio Code (VSCode) as an IDE.

VS Code (Visual Studio Code) is a source-code editor made by Microsoft for Windows, Linux and macOS. Some of its features include support for debugging, syntax highlighting, intelligent code completion, snippets, code refactoring, and embedded Git. VSCode users can customize the theme, keyboard shortcuts, preferences of the editor to suit their needs as well as install extensions that add functionality to the editor (e.g extensions that add support to programming languages such as Python). We will primarily use VSCode to write Quarto files1 but you can also use it (if you so wish) to test snippets of Python code demonstrated in class.

  1. Follow these instructions to install VSCode depending on your OS.

  2. Enable the Python and Jupyter extensions (see here for how). You can add as many extensions as you see fit.

  3. Take some time to configure your workspace settings and get used to the interface.

  4. Try out your new environment with this setup notebook (click the button below to download it). Make sure you have selected the correct Python kernel (Anaconda Python 3.12) to start executing the code blocks one by one.

    Also try rendering the .ipynb file into HTML by opening the terminal and typing the command quarto render setup.ipynb --execute (make sure you are in the right directory i.e the same folder as the folder where setup.ipynb is located when executing this command!). For more details on how ipynb files and Quarto work together, see here.

2. Learn Python

This course relies a lot on R programming. You should practice your Python programming skills, especially before the second lab (Week 02) of this course. Here are several ways to do that.

2.1 Take the in-person pre-sessional workshops

Our colleagues at the LSE Digital Skills Lab have put together a great programme of workshops to help students prepare for courses that require Python programming. The courses are free and open to all LSE students and are perfect if you are new to programming in Python, or you need a refresher.

πŸ”— Link: Find out more about the LSE Digital Skills Lab

2.2 Take the self-paced Dataquest course

If you prefer to learn Python at your own pace, you can take the Dataquest course pre-selected by our colleagues at the Digital Skills Lab.

Warning

As a student at LSE, you can get a free Dataquest subscription but you need to request a premium license first! To do so, please follow the instructions on the Getting access and using Dataquest section of the Moodle page for this course. Our colleagues at the Digital Skills Lab will respond to you as soon as possible.

πŸ”— Link: Python for Data Science Pre-sessional Course 24/25

2.2 Python books

Perhaps one the best ways to learn Python on your own is to work through either Python for Data Analysis 3E by Wes McKinney or Python Data Science Handbook by Jake VanderPlas. Both books are available for free online.

More specifically, you can read Python for Data Analysis 3E in the following order:

Main Book

  • Chapter 1: Preliminaries
  • Chapter 2: Python Language Basics, IPython, and Jupyter Notebooks
  • Chapter 3: Built-In Data Structures, Functions, and Files
  • Chapter 4: NumPy Basics: Arrays and Vectorized Computation
  • Chapter 5: Getting Started with pandas
  • Chapter 6: Data Loading, Storage, and File Formats
  • Chapter 7: Data Cleaning and Preparation
  • Chapter 8: Data Wrangling: Join, Combine, and Reshape
  • Chapter 9: Plotting and Visualization
  • Chapter 10: Data Aggregation and Group Operations
  • Chapter 11: Time Series
  • Chapter 12: Introduction to Modeling Libraries in Python
  • Chapter 13: Data Analysis Examples

Appendices

  • A: Advanced NumPy
  • B: More on the IPython System

As for the Python Data Science Handbook, you can read it in the following order:

Part 1: IPython: Beyond Normal Python

  • Help and Documentation in IPython
  • Keyboard Shortcuts in the IPython Shell
  • IPython Magic Commands
  • Input and Output History
  • IPython and Shell Commands
  • Errors and Debugging
  • Profiling and Timing Code
  • More IPython Resources

Part 2: Introduction to NumPy

  • Understanding Data Types in Python
  • The Basics of NumPy Arrays
  • Computation on NumPy Arrays: Universal Functions
  • Aggregations: Min, Max, and Everything In Between
  • Computation on Arrays: Broadcasting
  • Comparisons, Masks, and Boolean Logic
  • Fancy Indexing
  • Sorting Arrays
  • Structured Data: NumPy’s Structured Arrays

Part 3: Data Manipulation with Pandas

  • Introducing Pandas Objects
  • Data Indexing and Selection
  • Operating on Data in Pandas
  • Handling Missing Data
  • Hierarchical Indexing
  • Combining Datasets: Concat and Append
  • Combining Datasets: Merge and Join
  • Aggregation and Grouping
  • Pivot Tables
  • Vectorized String Operations
  • Working with Time Series
  • High-Performance Pandas: eval() and query()
  • Further Resources

Part 4: Visualization with Matplotlib

  • Simple Line Plots
  • Simple Scatter Plots
  • Visualizing Errors
  • Density and Contour Plots
  • Histograms, Binnings, and Density
  • Customizing Plot Legends
  • Customizing Colorbars
  • Multiple Subplots
  • Text and Annotation
  • Customizing Ticks
  • Customizing Matplotlib: Configurations and Stylesheets
  • Three-Dimensional Plotting in Matplotlib
  • Geographic Data with Basemap
  • Visualization with Seaborn
  • Further Resources

Part 5: Machine Learning

  • What Is Machine Learning?
  • Introducing Scikit-Learn
  • Hyperparameters and Model Validation
  • Feature Engineering
  • In Depth: Naive Bayes Classification
  • In Depth: Linear Regression
  • In-Depth: Support Vector Machines
  • In-Depth: Decision Trees and Random Forests
  • In Depth: Principal Component Analysis
  • In-Depth: Manifold Learning
  • In Depth: k-Means Clustering
  • In Depth: Gaussian Mixture Models
  • In-Depth: Kernel Density Estimation
  • Application: A Face Detection Pipeline
  • Further Machine Learning Resources

Footnotes

  1. In a bit of a meta touch, this guide was written on VSCode as a Quarto (i.e .qmd) file! πŸ˜‰β†©οΈŽ