π Getting Ready
2024/25 Winter Term
On this page, you will find information on how to set up your computer for this course, as well as practical advice on Python programming.
1. Set up your computer
Below you will find links to download and install the software that you will need for this course. If you prefer a more detailed guide, you can follow this guide created by our colleagues from the LSE Digital Skills Lab.
LSE Digital Skills Labs are also offering in-person workshops at the start of winter term (to find the workshops head to the Training and Development System and search for Python Pre-Sessional Workshops) . We highly recommend you take these workshops if you are new to Python programming.
1.1 Checking your operating system (OS) details
Before doing any install, take note of your operating system (i.e whether you are on Windows, Mac, Linux and which version/flavour you are running e.g Windows 10, Windows 11, Mac Ventura 13.6, Ubuntu 23.10 Mantic Minotaur, Debian version 12.2 bookworm).
To find out this information, follow these instructions:
On Windows
This tutorial explains the many different ways you can try to find detailed information about your operating system.
On Mac
For Mac, head this way. In Macβs case, also take note of the chip information i.e whether you have an Intel chip or an M1 chip (which version of software you should download, e.g for Anaconda, also depends on this piece of information!).
On Linux
If you are running a Linux distribution (e.g Ubuntu, Fedora, Debian, Gentoo), open a terminal and follow either of this or this.
Once youβve taken note of operating system (OS) details, youβre ready to proceed with the installs.
1.2 Install Anaconda
Anaconda is an open source distribution of the Python language distribution for data science and machine learning: it bundles Python, popular libraries, Jupyter Notebook, and the conda manager together for easier package and environment management. If you want to understand the difference between Anaconda and Python a bit better, have a look at this link. As of the time of writing, Anaconda comes bundled with Python 3.12 (for all operating systems), which is the version of Python weβll use in this course.
To install the latest version of Anaconda for you OS:
- Head to the Anaconda download page
- Choose the graphical installer that best corresponds to your own OS and download it by clicking on the link on the page
- Follow the install instructions corresponding to your OS found here
β οΈ If installing on Windows, we would suggest the following alterations to the instructions given in the Anaconda documentation:
- If you have admin rights and you are not sharing your computer with anyone, we would recommend that, in step 5, you install Python for All Users instead of JustMe. This is to avoid you having to think about which account you installed Python for and save confusion.
- In step 8, tick the Add Anaconda3 to my PATH environment variable. If you donβt, after the install, the system wonβt recognize python as a valid command and you wonβt be able to use python as a command on the command line terminal.
- Test your installation as shown here. On Mac, just as on Linux, you can open Terminal to do your testing and on Windows, you can try opening the Command Line Prompt or PowerShell (to open the Command Line Prompt, press
Windows+R
, typecmd
and pressEnter
and alternatively, to open PowerShell, pressWindows+R
, typepowershell
and pressEnter
). We would recommend using Terminal for both Mac and Linux and either of Command Line Prompt or PowerShell on Windows and testing the commandsconda list
andpython
. If these commands are not recognized, this means Python and/or Anaconda are not on the PATH and you will need to add them to the PATH: you can follow this tutorial to do so.
β οΈ Find out in which folder Anaconda is installed in your system and within that folder, which folders contain the python and conda executables, e.g C:\ProgramData\anaconda3\
for the python executable i.e C:\ProgramData\anaconda3\python.exe
and `C:\ProgramData\anaconda3\Scripts\
for the conda executable i.e C:\ProgramData\anaconda3\Scripts\conda.exe
. It is those folders you should add to your environment variables/PATH. Based on the previous example, you would be adding C:\ProgramData\anaconda3\
and C:\ProgramData\anaconda3\Scripts\
to PATH.
You are now done with installing Anaconda/Python.
1.3 Install Quarto
In this course, weβll be using Python within Quarto Markdown files and/or heavily annotating Python notebooks with Markdown and converting them into Quarto Markdown files. Weβll need to install Quarto before we install our IDE of choice (VSCode) to avoid any installation issues.
Install the version of Quarto recommended for your OS.
β οΈ In case you encounter issues with the latest version of Quarto (version 1.6.40 at the time of writing), try uninstalling the latest version of Quarto and installing an older version from this page (tab Older releases
) e.g version 1.4.557. Make sure however that you have a version above 1.4 installed: some of the features in the tutorial (e.g inline Python code) wonβt work for earlier versions of Quarto!
1.4 Install an IDE (VSCode)
Once you download Anaconda, you can start using Python on the Terminal/Command Line straight away. All you need to do is type python
in your terminal/command line to start the Python interpreter (provided youβve made sure to add Python and Anaconda to PATH during install). You can also write your Python scripts in any text editor and run them from the terminal with the python
command. However, it is much more convenient to use an integrated development environment (IDE) that provides a graphical user interface (GUI) to Python.
In this course, we will use Visual Studio Code (VSCode) as an IDE.
VS Code (Visual Studio Code) is a source-code editor made by Microsoft for Windows, Linux and macOS. Some of its features include support for debugging, syntax highlighting, intelligent code completion, snippets, code refactoring, and embedded Git. VSCode users can customize the theme, keyboard shortcuts, preferences of the editor to suit their needs as well as install extensions that add functionality to the editor (e.g extensions that add support to programming languages such as Python). We will primarily use VSCode to write Quarto files1 but you can also use it (if you so wish) to test snippets of Python code demonstrated in class.
Follow these instructions to install VSCode depending on your OS.
Enable the Python and Jupyter extensions (see here for how). You can add as many extensions as you see fit.
Take some time to configure your workspace settings and get used to the interface.
Try out your new environment with this setup notebook (click the button below to download it). Make sure you have selected the correct Python kernel (Anaconda Python 3.12) to start executing the code blocks one by one.
Also try rendering the
.ipynb
file into HTML by opening the terminal and typing the commandquarto render setup.ipynb --execute
(make sure you are in the right directory i.e the same folder as the folder wheresetup.ipynb
is located when executing this command!). For more details on howipynb
files and Quarto work together, see here.
2. Learn Python
This course relies a lot on R programming. You should practice your Python programming skills, especially before the second lab (Week 02) of this course. Here are several ways to do that.
2.1 Take the in-person pre-sessional workshops
Our colleagues at the LSE Digital Skills Lab have put together a great programme of workshops to help students prepare for courses that require Python programming. The courses are free and open to all LSE students and are perfect if you are new to programming in Python, or you need a refresher.
π Link: Find out more about the LSE Digital Skills Lab
2.2 Take the self-paced Dataquest course
If you prefer to learn Python at your own pace, you can take the Dataquest course pre-selected by our colleagues at the Digital Skills Lab.
As a student at LSE, you can get a free Dataquest subscription but you need to request a premium license first! To do so, please follow the instructions on the Getting access and using Dataquest section of the Moodle page for this course. Our colleagues at the Digital Skills Lab will respond to you as soon as possible.
π Link: Python for Data Science Pre-sessional Course 24/25
2.2 Python books
Perhaps one the best ways to learn Python on your own is to work through either Python for Data Analysis 3E by Wes McKinney or Python Data Science Handbook by Jake VanderPlas. Both books are available for free online.
More specifically, you can read Python for Data Analysis 3E in the following order:
Main Book
- Chapter 1: Preliminaries
- Chapter 2: Python Language Basics, IPython, and Jupyter Notebooks
- Chapter 3: Built-In Data Structures, Functions, and Files
- Chapter 4: NumPy Basics: Arrays and Vectorized Computation
- Chapter 5: Getting Started with pandas
- Chapter 6: Data Loading, Storage, and File Formats
- Chapter 7: Data Cleaning and Preparation
- Chapter 8: Data Wrangling: Join, Combine, and Reshape
- Chapter 9: Plotting and Visualization
- Chapter 10: Data Aggregation and Group Operations
- Chapter 11: Time Series
- Chapter 12: Introduction to Modeling Libraries in Python
- Chapter 13: Data Analysis Examples
Appendices
- A: Advanced NumPy
- B: More on the IPython System
As for the Python Data Science Handbook, you can read it in the following order:
Part 1: IPython: Beyond Normal Python
- Help and Documentation in IPython
- Keyboard Shortcuts in the IPython Shell
- IPython Magic Commands
- Input and Output History
- IPython and Shell Commands
- Errors and Debugging
- Profiling and Timing Code
- More IPython Resources
Part 2: Introduction to NumPy
- Understanding Data Types in Python
- The Basics of NumPy Arrays
- Computation on NumPy Arrays: Universal Functions
- Aggregations: Min, Max, and Everything In Between
- Computation on Arrays: Broadcasting
- Comparisons, Masks, and Boolean Logic
- Fancy Indexing
- Sorting Arrays
- Structured Data: NumPyβs Structured Arrays
Part 3: Data Manipulation with Pandas
- Introducing Pandas Objects
- Data Indexing and Selection
- Operating on Data in Pandas
- Handling Missing Data
- Hierarchical Indexing
- Combining Datasets: Concat and Append
- Combining Datasets: Merge and Join
- Aggregation and Grouping
- Pivot Tables
- Vectorized String Operations
- Working with Time Series
- High-Performance Pandas: eval() and query()
- Further Resources
Part 4: Visualization with Matplotlib
- Simple Line Plots
- Simple Scatter Plots
- Visualizing Errors
- Density and Contour Plots
- Histograms, Binnings, and Density
- Customizing Plot Legends
- Customizing Colorbars
- Multiple Subplots
- Text and Annotation
- Customizing Ticks
- Customizing Matplotlib: Configurations and Stylesheets
- Three-Dimensional Plotting in Matplotlib
- Geographic Data with Basemap
- Visualization with Seaborn
- Further Resources
Part 5: Machine Learning
- What Is Machine Learning?
- Introducing Scikit-Learn
- Hyperparameters and Model Validation
- Feature Engineering
- In Depth: Naive Bayes Classification
- In Depth: Linear Regression
- In-Depth: Support Vector Machines
- In-Depth: Decision Trees and Random Forests
- In Depth: Principal Component Analysis
- In-Depth: Manifold Learning
- In Depth: k-Means Clustering
- In Depth: Gaussian Mixture Models
- In-Depth: Kernel Density Estimation
- Application: A Face Detection Pipeline
- Further Machine Learning Resources
Footnotes
In a bit of a meta touch, this guide was written on VSCode as a Quarto (i.e
.qmd
) file! πβ©οΈ