π» Week 05 - Lab Roadmap (90 min)
DS105 - Data for Data Science
Last week we have introduced the basic web-scrapping tools. Now, itβs time to explore web-scrapping with APIs. API stands for Application Programming Interface and allows different parts of an application to send requests and responses from and to each other. Today we will see how some websites use APIs to send data.
In the first part of the lab, we will acquire data from a Ticketmaster API. After that, we will convert the response into a dataframe and save it.
Part 1: Buying tickets (45 min)
Part 1: Buying tickets
After last week you might think βWell, yeah, this is useful for my academic endeavours, but not for my day-to-day lifeβ¦β Let us show you how you can use these skills to solve your problems!
It turns out that TicketMaster (one of the biggest websites that sells tickets) has its own API. It means that you can automate your ticket search if you wanted to! Letβs explore this together.
We do not provide you with code straight away, however, you will find the solutions below.
In the same way as before we will first complete some tasks together to understand how APIs work and then work independently.
π€ WORKING TOGETHER
- Go to https://developer.ticketmaster.com/ and register an account. You will only need an email address.
- Acquire an API token using your new account. It will be called a Consumer Key 1 in your appsβ information.
- Make an API call to get all the venues in New York.
And now, itβs time to get tickets!
π― ACTION POINTS
- Using the documentation find all the music events happening in London.
- How many events have you found?
- Do you think you got all the events happening in London?
- Is there a way to show more events?
- How many events have you found?
- Letβs make our search a bit more narrow. Letβs imagine you are coming back from holidays on the 15th of October. Can you find Rock music events in London that are happening after that date?
π OPTIONAL TASKS
- What if you wanted an event that you can get to for less than Β£30? Can you find one for yourself?
- Go ahead and try to find London events related to data and data science. Are there any? Extend your search and try again.
- What about family-friendly events in London? Are there any?
Solution Code
Python users
All the tasks above are solved with the same API URL. Here we will show the base code and mainly the parameters that yield the desired results.
Task 1
# import the required libraries
import requests
# saving your API key
= "YOUR_API_KEY"
api_key
# setting up the API query parameters (they will be changing)
= {"classificationName": "music",
params "countryCode": "GB",
"city": "London",
"apikey": api_key}
# sending a request to the API
= requests.get("https://app.ticketmaster.com/discovery/v2/events.json",
response =params)
params
# parse the response
= response.json()
resp_json
# extract the events
"_embedded"]["events"] resp_json[
Next, we will be only providing the query parameters.
Task 1 (extending search)
# setting up the API query parameters (they will be changing)
= {"classificationName": "music",
params "countryCode": "GB",
"city": "London",
"size": 200, # feel free to change this number
"page": 1, # we add pages here to show that you can get more results if needed
"apikey": api_key}
Task 2 and 3
# setting up the API query parameters (they will be changing)
= {"classificationName": "music",
params "countryCode": "GB",
"city": "London",
"genre_name":"Rock",
"startDateTime":"2022-10-15T00:00:00Z",
"size": 200, # feel free to change this number
"page": 1, # we add pages here to show that you can get more results if needed
"apikey": api_key}
This API does not have a price parameter, so you would need to filter the JSON manually.
Task 4
# setting up the API query parameters (they will be changing)
= {"countryCode": "GB",
params "city": "London",
"keyword":"data",
"size": 200, # feel free to change this number
"page": 1, # we add pages here to show that you can get more results if needed
"apikey": api_key}
# or
= {"keyword":"data",
params "size": 200,
"page": 1,
"apikey": api_key}
Task 5
# setting up the API query parameters (they will be changing)
= {"countryCode": "GB",
params "city": "London",
"includeFamily":"only",
"size": 200, # feel free to change this number
"page": 1, # we add pages here to show that you can get more results if needed
"apikey": api_key}
R users
All the tasks above are solved with the same API URL. Here we will show the base code and mainly the parameters that yield the desired results.
Task 1
# importing required packages
library("httr")
library("jsonlite")
# saving your API key
<- "ErOQNylYMIv9wqPLWsezUdByUjftJIa6"
api_key
# setting up the base URL and parameters
<- "https://app.ticketmaster.com/discovery/v2/events.json"
base_url
# sending a request
<- GET(base_url, query = list("classificationName" = "music",
response "countryCode" = "GB",
"city" = "London",
"apikey" = api_key))
# parse the response
<- content(response, "parsed") json
Next, we will be only providing the query parameters.
Task 1 (extending search)
# sending a request
<- GET(base_url, query = list("classificationName" = "music",
response "countryCode" = "GB",
"city" = "London",
"size" = 200,
"page" = 1,
"apikey" = api_key))
Task 2 and 3
# sending a request
<- GET(base_url, query = list("classificationName" = "music",
response "countryCode" = "GB",
"city" = "London",
"genre_name" = "Rock",
"startDateTime" = "2022-10-15T00:00:00Z",
"size" = 200, # feel free to change this number
"page" = 1, # we add pages here to show that you can get more results if needed
"apikey" = api_key))
This API does not have a price parameter, so you would need to filter the JSON manually.
Task 4
# sending a request
<- GET(base_url, query = list(
response "countryCode" = "GB",
"city" = "London",
"keyword" = "data",
"size" = 200, # feel free to change this number
"page" = 1, # we add pages here to show that you can get more results if needed
"apikey" = api_key))
# or
<- GET(base_url, query = list(
response "keyword" = "data",
"size" = 200, # feel free to change this number
"page" = 1, # we add pages here to show that you can get more results if needed
"apikey" = api_key))
Task 5
# sending a request
<- GET(base_url, query = list(
response "countryCode" = "GB",
"city" = "London",
"includeFamily" = "only",
"size" = 200, # feel free to change this number
"page" = 1, # we add pages here to show that you can get more results if needed
"apikey" = api_key))
Part 2: Letβs make it a table (25 min)
Part 2: Letβs make it a table
Using an API usually implies working with JSON responses. However, sometimes it becomes a bit tricky to use JSON formats. We are very used to tables, so letβs convert this response to one.
π€ WORKING TOGETHER
- Get an API response for all music events in London.
- Apply the
pd.DataFrame()
and thepd.json_normalize()
functions to the final response and examine the difference. - Print the first rows of the DataFrame.
- Examine the column names of the DataFrame.
- Search how to delete columns.
- Delete a column called
test
.
Now, itβs time for you to work independently. We havenβt taught you the majority of the things you will be doing now. You will have to search for the solutions online. We do it to help you get the gist of what data scientists and programmers usually do best - googling.
π― ACTION POINTS
- Change the name of the column
sales.public.startDateTime
tosales_start
. - Select only the column called
sales_start
. - Select two columns:
sales_start
andname
. - Print the 5th name from the column called
name
. - Print the 5th observation from the 3rd column.
- Check what data types are present in the dataframe.
- Check if there are any missing data. If so, what columns contain them?
- Save the DataFrame to an Excel file.
Part 3: Avengers, assemble (15 min)
Part 3: Avengers, assemble
This course implies working on a group project. You will be working in groups of 3 people 2. During this time we allocate some time for you to find your groupmates. Letβs do this now!
π― ACTION POINTS
I have already formed a team
- Great! You can start thinking of the next steps
- Go to Moodle and download the {DS105M} βοΈ Formative Team Contract | W05-W07. This forms the basis of your project, and we will refer to it on the first group presentation on Week 08.
- Have a look at what you will need to discuss as a group to fill out the team contract, decide when you want to talk about it or, if you already have the answers, start filling out the contract.
- The team contract must be submitted by 9 November 2022 23:59 UK time via Moodle by one team member.
I have not formed a team
- Take a look at the ideas others from the class group as you have posted on the
#dataset-ideas-and-team-formation
channel on Slack. Do you spot any ideas that sound like a good match to your interests? Reach out to the people who posted it and try to form a team. - If you are unable to form a group, ask your class teacher for help. He might ask you to pitch your ideas to other peers who also do not have a group.
- Go to Moodle and download the {DS105M} βοΈ Formative Team Contract | W05-W07. This forms the basis of your project, and we will refer to it on the first group presentation on Week 08.
- Have a look at what you will need to discuss as a group to fill out the team contract, decide when you want to talk about it.
- The team contract must be submitted by 9 November 2022 23:59 UK time via Moodle by one team member.
Footnotes
This is related to the concepts of SSH keys we talked about in Week 03.β©οΈ
If necessary, we will try to be flexible and accommodate groups of 2, or 4 people. But ideally, all groups should have 3 members and there shouldnβt be more than 4 teams per class group. Otherwise, you will not have much feedback time during presentations (Weeks 08 & 11).β©οΈ