DS105 Data for Data Science

๐Ÿ—“๏ธ Week 01 - Part I: Structure of the course

Dr. Jon Cardoso Silva

LSE Data Science Institute

9/30/22

What we will cover today

  • Who we are
  • Learning Objectives
  • Structure of this course
  • How did we get here?
  • The possibilities
  • Whatโ€™s next

Who we are

The Data Science Institute

  • This course is offered by the LSE Data Science Institute (DSI).
  • DSI is the hub for LSEโ€™s interdisciplinary collaboration in data science

Sign up for DSI events at lse.ac.uk/DSI/Events

The Data Science Institute

Activities of interest to you:

  • CIVICA Seminar Series
  • Careers in Data Science
  • Social events
  • Industry โ€œfield tripsโ€
  • Summer projects

Sign up for DSI events at lse.ac.uk/DSI/Events

Our courses

DSI offer accessible introductions to Data Science:

DS101

Fundamentals of
Data Science

๐ŸŽฏ Focus:
theoretical concepts of data science

๐Ÿ“‚ How:
reflections through reading and writing

DS105

Data for
Data Scientists

๐ŸŽฏ Focus:
collection and handling of real data

๐Ÿ“‚ How:
hands-on coding exercises and a group project

DS202

Data Science for
Social Scientists

๐ŸŽฏ Focus:
fundamental machine learning algorithms

๐Ÿ“‚ How:
practical use of ML techniques and metrics

Your lecturer



Dr. Jonathan Cardoso-Silva

  • PhD in Computer Science
  • Background: Computer Science,Engineering,Data Science
  • Research:
    • Networks
    • Optimisation
    • Machine Learning applications
    • Data Science Workflow

Teaching Assistants

Photo of Anton Boichenko
Anton Boichenko
Guest Teacher at the DSI
Product Developer at Decoded
MSc in Applied Social Data Science (LSE)

Photo of Mustafa Can Ozkan
Mustafa Can Ozkan
Guest Teacher at the DSI
PhD cand. in the Spacetime Lab (UCL)
MSc in Transport (Imperial/UCL)

Photo of Stuart Bramwell
Dr. Stuart Bramwell
ESRC Postdoctoral Fellow
Department of Methodology
PhD in Politics (Oxford)

Photo of Xiaowei Gao
Xiaowei Gao
Guest Teacher at the DSI
PhD cand. in the Spacetime Lab (UCL)
MSc in Data Science (KCL)

Photo of Yijun Wang
Yijun Wang
Guest Teacher at the DSI
PhD cand. in Health Informatics (KCL)
MSc in Data Science (KCL)

Who are you

Programme Freq
BSc in Economics 11
BSc in Politics and Data Science 5
BSc in Politics and Economics 4
General Course 4
BSc in Philosophy and Economics 2
BSc in International Social and Public Policy with Politics 1
BSc in Mathematics, Statistics and Business 1
BSc in Philosophy, Logic and Scientific Method 1
BSc in Philosophy, Politics and Economics 1
BSc in Politics 1

Source: LSE For You. Last Updated: 27 September 2022

Degree Programme vs Year of Study

BSc in Economics - Course Selection Options

Data extracted from the UBEC 2021/2022 degree regulation.

Add the overlaid graph, showing the actual paths these students took.

Learning Objectives

This course will cover the fundamentals of data, with an aim to understanding:

  • how data is generated,
  • how it is collected,
  • how it must be transformed for use and storage
  • how it is stored, and
  • the ways it can be retrieved and communicated.

Learning Objectives (cont.)

The course will also cover:

  • workflow management of individual and collaborative data science project
  • setup and tools for typical data pre-processing (data transformation and data cleaning)
    • frequently the starting point and most time-consuming part of any data science project.

Structure of this course

Syllabus

Intro
    Introduction and key tools for data scientists Week 01
Behind the scenes
    The Terminal: navigating the command line
    The Cloud: accessing and getting data in and out
    The Internet: protocols + scrapping + APIs
Week 02
Week 03
Week 04
Working with data
    The nature and shape of data
    Tabular data: dataframes and databases
    Unstructured data (text, audio & image)
    Text as data, regex and sentiment analysis
Week 05
Week 07
Week 08
Week 09
Applications
    Topic modelling & document similarities
    Data viz with the grammar of graphics
Week 10
Week 11

Structure of lectures ๐Ÿ‘จ๐Ÿปโ€๐Ÿซ

Our lectures will be split in two parts:

  • Part I (~ 50 min): Traditional exposition of theoretical content
  • break (~ 10 min): Grab coffee โ˜• or relax ๐Ÿง˜
  • Part II (~ 50 min): Live demo
    • Typically, demonstration of terminal usage or Jupyter notebooks
    • Feel free to follow along in your own laptops.

Structure of classes ๐Ÿ‘ฉโ€๐Ÿ’ป

  • Students will work on weekly, structured problem sets in the staff-led class sessions.
  • Tips to get the most of classes:
    • Bring your own laptops ๐Ÿ’ป (most tablets are not suitable for programming)
    • Read the recommended reading prior to the class
    • Attempt to replicate the examples demonstrated in the live demo during the lecture

Class groups

Class groups


Group 01

  • ๐Ÿ“† Fridays
  • โŒš 09:00 โ€” 10:30
  • ๐Ÿ“ 32L.G.06

Group 02

  • ๐Ÿ“† Fridays
  • โŒš 12:00 โ€” 13:30
  • ๐Ÿ“ NAB.LG.03

Group 03

  • ๐Ÿ“† Fridays
  • โŒš 16:00 โ€” 17:30
  • ๐Ÿ“ KSW.1.02

๐Ÿ—บ๏ธ Check LSE campus map

Assessments ๐Ÿ“”

The breakdown of assessment for this course will be as follows:

Assessments - Problem sets (25%)

  • These will involve a mix of coding tasks and elements of self-assessment (similar to problem sets we will solve in the labs)
  • You will have until the day before the following class to submit your response
  • Summative problem sets will be released on:
    • Week 03 - worth 10% of final mark
    • Week 04 - worth 15% of final mark

Assessments - Group presentations (35%)

  • You will form groups prior to Reading Week
    • Pitch your ideas of API/datasets on Week 04
    • Form the groups on Week 05
  • Group presentations:
    • Week 08 - worth 15% of final mark
    • Week 11 - worth 20% of final mark

Assessments - Final project (40%)

  • Each group will produce a webpage of their project
  • Description of data, research questions, challenges, statistics and simple plots
  • Think of it as a portfolio project!
  • Submission deadline: Lent Term
    • Exact date to be confirmed
    • (end of Jan/2023 - beginning of Feb/2023)

Office hours

  • It is probably a good idea to book office hours if:
    • you struggled with a technical or theoretical aspect of a problem set in the previous week,
    • you have queries about careers in data science,
    • you want guidance in how to apply data science to other things you are studying outside this course.
  • Come prepared. You only have 15 minutes.
  • Ask for help sooner rather than later.
  • Book slots via StudentHub up to 12 hours in advance.

Communication

  • Join our Slack group (more info here)
  • Use the public Slack channels to talk to share links, content (or memes) with your colleagues.
  • Our teaching team will dedicate some time during the week to answer questions or other interactions on Slack.
  • Reserve ๐Ÿ“ง e-mail for formal requests: extensions, deferrals, etc.
    • No need to e-mail to inform you will skip a class, for example.

Any questions?

Image created with the DALLยทE algorithm using the prompt: โ€˜35mm macro photography of a robot holding a question mark card, white backgroundโ€™

How did we get here?

  • and how did we get to the point that we can collect, extract and analyse all of this data?
  • Well, this abundance of data is strongly associated with the dramatic changes in technology we have experienced recently.
  • Take a look of the technology people were using back in 2005.
  • This photo was taken outside St. Peterโ€™s Basilica in the Vatican at the time Ratzinger was ordained as pope
  • Now, fast forward to 2013 โ€” only 8 years later โ€” when Pope Francis was elected the new pope, and you will see a lot of bright screens.
  • Our habits have changed

This abundance of data is strongly associated with the dramatic changes in technology in the past few decades.

St.Peterโ€™s Basilica at the Vatican in
๐Ÿ“… 19 April 2005
when Ratzinger
was elected the 265th pope.

St.Peterโ€™s Basilica at the Vatican in
๐Ÿ“… 13 March 2013
when Pope Francis
was elected the 266th pope.

Source: (Kolawole 2013)

We changed how we consume music ๐ŸŽง

To interact with this plot, check reference (Fischer-Baum 2017) at the end of this presentation.

We changed how we consume video ๐ŸŽž๏ธ

To interact with this plot, check reference (Fischer-Baum 2017) at the end of this presentation.

Smartphones ๐Ÿ“ฑ are a very recent thing

To interact with this plot, check reference (Fischer-Baum 2017) at the end of this presentation.

We spend a lot more time connected

โ€ฆ and our social media habits keep on changing

๐Ÿ’ก Move the slider to explore the chart

The possibilities

  • Humans and machines nowadays generate A LOT of data ALL THE TIME
  • It has become cheap to collect and store this data
  • This abundance of data opens up new possibilities for research & policy-making

New data to answer old questions:

How do rumours spread?

New questions enabled by new data:

Is social media a threat to democracy?

Whatโ€™s next

After our 10-min break โ˜•:

  • Given all this, what do we mean by data science?
  • A tale of unicorns
  • Approaching the ocean of data: the concept of data wrangling
  • The data science toolkit
  • What to expect of the rest of this course

References

Fischer-Baum, Reuben. 2017. โ€œWhat โ€˜Tech Worldโ€™ Did You Grow up In?โ€ Washington Post. https://www.washingtonpost.com/graphics/2017/entertainment/tech-generations/.
Kolawole, Emi. 2013. โ€œAbout Those 2005 and 2013 Photos of the Crowds in St. Peterโ€™s Square.โ€ Washington Post. http://wapo.st/WKKTMh.

DS105 - Data for Data Science ๐Ÿ–ฅ๏ธ ๐Ÿคน

1 / 35
DS105 Data for Data Science ๐Ÿ—“๏ธ Week 01 - Part I: Structure of the course Dr. Jon Cardoso Silva LSE Data Science Institute 9/30/22

  1. Slides

  2. Tools

  3. Close
  • DS105 Data for Data Science
  • What we will cover today
  • Who we are
  • The Data Science Institute
  • The Data Science Institute
  • Our courses
  • Your lecturer
  • Teaching Assistants
  • Who are you
  • Degree Programme vs Year of Study
  • BSc in Economics - Course Selection Options
  • Learning Objectives
  • Learning Objectives (cont.)
  • Structure of this course
  • Syllabus
  • Structure of lectures ๐Ÿ‘จ๐Ÿปโ€๐Ÿซ
  • Structure of classes ๐Ÿ‘ฉโ€๐Ÿ’ป
  • Class groups
  • Class groups
  • Assessments ๐Ÿ“”
  • Assessments - Problem sets (25%)
  • Assessments - Group presentations (35%)
  • Assessments - Final project (40%)
  • Office hours
  • Communication
  • Any questions?
  • How did we get here?
  • We changed how we consume music ๐ŸŽง
  • We changed how we consume video ๐ŸŽž๏ธ
  • Smartphones ๐Ÿ“ฑ are a very recent thing
  • We spend a lot more time connected
  • โ€ฆ and our social media habits keep on changing
  • The possibilities
  • Whatโ€™s next
  • References
  • f Fullscreen
  • s Speaker View
  • o Slide Overview
  • e PDF Export Mode
  • b Toggle Chalkboard
  • c Toggle Notes Canvas
  • d Download Drawings
  • ? Keyboard Help