🗓️ Week 01
Structure of this course

DS101 – Fundamentals of Data Science

16 Jan 2023

Who are we

The Data Science Institute

The Data Science Institute

Activities of interest to you:

Our courses

DSI offer accessible introductions to Data Science:

DS101

Fundamentals of
Data Science

🎯 Focus:
theoretical concepts of data science

📂 How:
reflections through reading and writing

DS105

Data for
Data Scientists

🎯 Focus:
collection and handling of real data

📂 How:
hands-on coding exercises and a group project

DS202

Data Science for
Social Scientists

🎯 Focus:
fundamental machine learning algorithms

📂 How:
practical use of ML techniques and metrics

Your lecturer

Photo of Jon Cardoso-Silva
Dr. Jon Cardoso-Silva
Assist. Prof. Lecturer
LSE Data Science Institute

  • PhD in Computer Science (King’s College London)
  • Background: Engineering, Bio & Health Informatics
  • Former Lead Data Scientist

networks
optimisation
machine learning applications
data science workflow

Teaching Assistants

Photo of Stuart Bramwell
Dr. Stuart Bramwell
ESRC Postdoctoral Fellow
Department of Methodology


  • PhD in Politics (Oxford)
  • Background: Political Science
  • Founder of WhoGov data set
    • won the Lijphart/Przeworski/Verba data set award conferred annually by the American Political Science Association

social identity democratisation
political elites political economy

Who are you

Course Year Department
BSc in Philosophy, Politics and Economics 3 Philosophy
BSc in Sociology 2 Sociology
BSc in Economics 1 Economics
BSc in Philosophy and Economics 3 Philosophy
BSc in Philosophy, Logic and Scientific Method 3 Philosophy
BSc in Sociology 2 Sociology
BSc in Economics 1 Economics
General Course 1 General Course
BA in Geography 3 Geography
General Course 1 General Course
BSc in Economics and Economic History 3 Economics
BA in History 1 History
BSc in Politics and Economics 3 Government
BA in History 3 History
Department n
Economics 3
Philosophy 3
General Course 2
History 2
Sociology 2
Geography 1
Government 1

How did we get here?

This abundance of data is strongly associated with the dramatic changes in technology in the past few decades.

St.Peter’s Basilica at the Vatican in
📅 19 April 2005
when Ratzinger
was elected the 265th pope.

St.Peter’s Basilica at the Vatican in
📅 13 March 2013
when Pope Francis
was elected the 266th pope.

We changed how we consume music 🎧

We changed how we consume video 🎞️

Smartphones 📱 are a very recent thing

We spend a lot more time connected

… and our social media habits keep on changing

The possibilities

  • Humans and machines nowadays generate A LOT of data ALL THE TIME
  • It has become cheap to collect and store this data
  • This abundance of data opens up new possibilities for research & policy-making

New data to answer old questions:

How do rumours spread?

New questions enabled by new data:

Is social media a threat to democracy?

What do we mean by data science?

Data science is…

“[…] a field of study and practice that involves the collection, storage, and processing of data in order to derive important 💡 insights into a problem or a phenomenon.

Such data may be generated by humans (surveys, logs, etc.) or machines (weather data, road vision, etc.),

and could be in different formats (text, audio, video, augmented or virtual reality, etc.).”

The mythical unicorn 🦄

knows everything about statistics

able to communicate insights perfectly

fully understands businesses like no one

is a fluent computer programmer

In reality…

We are all jugglers 🤹

  • Everyone brings a different skill set.
  • We need multi-disciplinary teams.
  • Good data scientists know a bit of everything.
    • Not fluent in all things
    • Understands their strenghts and weaknessess
    • They know when and where to interface with others

The
Data
Science
Workflow

The Data Science Workflow

start Start gather Gather data   start->gather store Store it          somewhere gather->store       clean Clean &         pre-process store->clean       build Build a dataset clean->build       eda Exploratory     data analysis build->eda ml Machine learning eda->ml       insight Obtain    insights ml->insight       communicate Communicate results          insight->communicate       end End communicate->end

The Data Science Workflow

start Start gather Gather data   start->gather end End store Store it          somewhere gather->store       clean Clean &         pre-process store->clean       build Build a dataset clean->build       eda Exploratory     data analysis build->eda ml Machine learning eda->ml       insight Obtain    insights ml->insight       communicate Communicate results          insight->communicate       communicate->end

It is often said that 80% of the time and effort spent on a data science project goes to the abovementioned tasks.

The Data Science Workflow

start Start gather Gather data   start->gather end End store Store it          somewhere gather->store       clean Clean &         pre-process store->clean       build Build a dataset clean->build       eda Exploratory     data analysis build->eda ml Machine learning eda->ml       insight Obtain    insights ml->insight       communicate Communicate results          insight->communicate       communicate->end

  • This is the bit that typically gets everyone excited about data science.
  • Machine Learning is a sub-field of Artificial Intelligence that focuses on developing algorithms that can learn from data.

What do you think of this image?

An algorithm created this image!

  • Instead of a 🖌️ brush (physical or digital), the author typed a few words into an algorithm
  • It took him ⌛ 80 hours to find the “right words” to express his idea

  • Jason Allen used a generative Artificial Intelligence (AI) algorithm called ‘Midjourney’
    • ⏭️ Let’s look at a few other examples

What about ChatGPT?

(Unsolved) questions about the generative AI revolution

Who owns the rights to the images?


“a horde of octopuses protesting during the French revolution, photojournalism”

What biases do these tools propagate?



“a futuristic octopus-like alien holding a jar that contains the source of true happiness”

Are these tools safe?




“a futuristic octopus-like alien holding a jar that contains the source of true happiness”

Time for coffee ☕

After the break:

  • DS101L website
  • Syllabus
  • Assessments
  • Tools
  • This week’s coursework

References

Davenport, Thomas. 2020. “Beyond Unicorns: Educating, Classifying, and Certifying Business Data Scientists.” Harvard Data Science Review 2 (2). https://doi.org/10.1162/99608f92.55546b4a.
Fischer-Baum, Reuben. 2017. “What ‘Tech World’ Did You Grow up In?” Washington Post. https://www.washingtonpost.com/graphics/2017/entertainment/tech-generations/.
Kolawole, Emi. 2013. “About Those 2005 and 2013 Photos of the Crowds in St. Peter’s Square.” Washington Post. http://wapo.st/WKKTMh.
Sample, Ian. 2023. ChatGPT: What Can the Extraordinary Artificial Intelligence Chatbot Do?” The Guardian, January. https://www.theguardian.com/technology/2023/jan/13/chatgpt-explainer-what-can-artificial-intelligence-chatbot-do-ai.
Schutt, Rachel, and Cathy O’Neil. 2013. Doing Data Science. First edition. Beijing ; Sebastopol: O’Reilly Media. https://ebookcentral.proquest.com/lib/londonschoolecons/detail.action?docID=1465965.
Shah, Chirag. 2020. A Hands-on Introduction to Data Science. Cambridge, United Kingdom ; New York, NY, USA: Cambridge University Press. https://librarysearch.lse.ac.uk/permalink/f/1n2k4al/TN_cdi_askewsholts_vlebooks_9781108673907.
Weale, Sally. 2023. “Lecturers Urged to Review Assessments in UK Amid Concerns over New AI Tool.” The Guardian, January. https://www.theguardian.com/technology/2023/jan/13/end-of-the-essay-uk-lecturers-assessments-chatgpt-concerns-ai.