DS205 – Advanced Data Manipulation
27 Jan 2025
10:00 – 10:03
This lecture will cover the transition from data exploration to serving data insights via APIs, focusing on the practical application of FastAPI. You can read more about these core concepts on these two key sources:
This session introduces the core concepts of RESTful APIs using the FastAPI package, and introduces the idea of data validation using Pydantic models. The lecture assumes you have a good understanding of the pandas
library. introduces the core concepts of RESTful APIs.
10:03 – 10:10
(Only if anyone has done the W01 Exercise or has questions about pandas)
00:10 – 00:15
API stands for Application Programming Interface. It is a set of rules that allows us to expose/serve data or services to other computers or digital applications.
Examples: Google Maps API, Twitter API, Spotify API.
To make this more concrete: think of APIs as a waiter in a restaurant. The waiter takes your order, sends it to the kitchen, and then brings your food back to you. In this analogy, the waiter is the API, the kitchen is the server, and the food is the data you requested.
APIs provide seamless access to data. You don’t need to know how the data is stored or processed, or from which table in which database it comes from.
We can set APIs to communicate with each other, without direct human intervention. This is particularly useful when you want to automate a process or when you want to make data available to a wide audience. Think: Twitter bots.
10:15 – 11:00
Here is how a typical API architecture looks like (at a conceptual level):
Let’s look at each of these components in more detail 👉🏻
Description
The Web Client is the interface that interacts with the API. You, as a user, can write code/click buttons to send requests and receive responses in structured formats.
Here’s an example of using Python as a client to fetch weather data from an API:
Description
The Web Server receives the request from the client, routes it to the appropriate service, and prepares a response. It is the entry point to the API.
FastAPI is a modern, fast (high-performance) Python web framework for building APIs. It’s designed with Python type hints for simplicity and automatic validation.
Some highlights:
FastAPI is not a web server on its own. It’s a framework, a set of tools and coding patterns to build APIs.
👉🏻 To run FastAPI, you need a web server. The most common one is Uvicorn.
Here are three ways to test this app:
Open your browser and go to http://localhost:8000/
to see the response.
{"Hello": "World"}
displayed on the page.http://127.0.0.1:8000/
.Via the command line using curl
:
We could add more routes to our FastAPI app. Here’s how you can do it:
from fastapi import FastAPI
app = FastAPI()
@app.get("/")
async def read_root():
return {"Hello": "World"}
@app.get("/alternative")
async def read_alternative():
return {"Hello": "Alternative World"}
Which you would then access via http://localhost:8000/alternative
.
Description
The Service layer processes the request and handles business logic. It communicates with the data layer to fetch or manipulate data.
If OpenMeteo API was built using FastAPI, the server code might look like this:
from fastapi import FastAPI
from typing import Dict
app = FastAPI()
@app.get("/v1/forecast")
async def get_forecast(latitude: float, longitude: float, daily: str) -> Dict:
#
# Some code here to fetch the forecast data
#
return {
"latitude": latitude,
"longitude": longitude,
"daily": {
"temperature_2m_min": list_of_min_temperatures
}
}
Description
The Model layer defines the structure of the data being passed around. This ensures consistency and validation.
FastAPI relies on a library called Pydantic for defining models.
Examples:
Pydantic Model
Usage in FastAPI:
Descriptions
The Data layer interacts with the database or file storage to read or write data.
Code Example: Querying Data from a Database
But it could be as simple as reading a file. Say, a XLSX file:
The Database is the storage layer where the data resides. It could be a relational database, NoSQL database, or even flat files for simpler systems.
Sample SQLAlchemy model:
from sqlalchemy import Column, Integer, String
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class ForecastModel(Base):
__tablename__ = "forecasts"
id = Column(Integer, primary_key=True, index=True)
latitude = Column(String, index=True)
longitude = Column(String, index=True)
data = Column(String) # JSON representation of forecast data
We will keep things simple for now. We won’t use a database this week. We’ll focus on implementing the Service and Model layers in FastAPI. But this layer might be useful in a future assignment.
11:00 – 11:10
11:10 – 11:20
I want to use this time to distinguish two common ways of passing data to APIs: query parameters and path parameters.
Use these slides later as a reference.
Query parameters are key-value pairs which, from the client side, are appended to the URL after a ?
Example:
https://api.open-meteo.com/v1/forecast?latitude=51.51&longitude=-0.13&daily=temperature_2m_min
They are used to pass optional data to the server.
latitude
, longitude
, and daily
are all required.Instead of ?key=value
pairs, path parameters are part of the URL path itself. They are used to pass mandatory data to the server.
For example, we could conceive of a path parameter like this (I don’t think this exists on this particular API we’re using as an example):
https://MY-OWN-API/v1/forecast/UK/London
11:10 – 11:25
Resources are identified using URLs, with paths representing different endpoints.
Example:
/v1/forecast # this serves one type of data
/v1/historical # this serves another type of data
GET
: Retrieve data (read-only).POST
: Create a new resource.PUT
: Update an existing resource.DELETE
: Remove a resource.Here’s a different way to explore this.
200 OK
: Request successful.404 Not Found
: Resource does not exist.500 Internal Server Error
: Something went wrong on the server.Read more about HTTP status codes here
11:25 – 12:00
OBJECTIVE: Design API endpoints for ASCOR assessments.
Which endpoints seem to make sense for the ASCOR dataset?
🗣️ CLASSROOM DISCUSSION (if time allows)
We will now interact on the follow GitHub repository: lse-ds205/ascor-api.
/v1/country-data/{country}
Purpose: Retrieve country-level benchmark data for a specific year.
Compulsory:
country
: Name or ISO code of the country (path parameter).year
: Year for benchmark data (query parameter).Optional:
If none are passed, the API should return all available data for the country.
pillar
: Select only data from a specific pillar.area
: Select only data from a specific area.indicators
: Select only data from a specific set of indicators.metric
: Select only data from a specific metric.{
"country": "United Kingdom",
"assessment_year": 2023,
..., // Other metadata
"data": {
"EP.1": {
"assessment": "Partial",
"indicators": {
"EP1.a": "Yes",
"EP1.b": "No",
"EP1.c": "No"
}
},
"EP.2": {
"assessment": "Partial",
"indicators": {
"EP2.a": "Yes",
"EP2.b": "No",
"EP2.c": "No",
"EP2.d": "No"
},
"metrics": {
"EP.2.a.i": "-25%",
"EP2.b.i": "No or unsuitable disclosure",
"EP2.c.1": "62%",
"EP2.d.i": "822%"
}
},
...
}
}
💻 W02 Lab Session (Tuesday, 28 January 2025):
Create a FastAPI app skeleton.
Implement the /v1/country-data/{country}
endpoint with query and path parameters.
Test the API using curl
and Python.
✍🏻 W02 Formative Exercise
Practice GitFlow by creating a new branch for your API project.
Implement the logic for a separate endpoint /v1/indicator-data/
.
LSE DS205 (2024/25)