🗓️ Week 01
Structure of this course

DS101 – Fundamentals of Data Science

Dr. Ghita Berrada

LSE Data Science Institute

15 Jan 2024

What we will cover today:

  • Who we are
  • How did we get here?
  • How did we get here?
  • What do we mean by data science?
  • The Data Science Workflow
  • (Unsolved) questions about the (generative) AI revolution

Who we are

The Data Science Institute

  • This course is offered by the LSE Data Science Institute (DSI).
  • DSI is the hub for LSE’s interdisciplinary collaboration in data science

Sign up for DSI events at lse.ac.uk/DSI/Events

The Data Science Institute

Activities of interest to you:

  • CIVICA Seminar Series
  • Careers in Data Science
  • Social events

Sign up for DSI events at lse.ac.uk/DSI/Events

The Data Science Institute

Activities of interest to you:

  • Industry “field trips” (next up: Ekimetrics on February 6th and Lloyds on February 20th)

  • Summer projects

Sign up for DSI events at lse.ac.uk/DSI/Events

Other resources of interest to you

  • LSE LIFE
  • Alan Turing events (e.g seminars and lectures)

Our courses

DSI offers accessible introductions to Data Science:

DS101

Fundamentals of
Data Science

🎯 Focus:
theoretical concepts of data science

📂 How:
reflections through reading and writing

DS105

Data for
Data Scientists

🎯 Focus:
collection and handling of real data

📂 How:
hands-on coding exercises and a group project

DS202

Data Science for
Social Scientists

🎯 Focus:
fundamental machine learning algorithms

📂 How:
practical use of ML techniques and metrics

Your lecturer

Photo of Ghita Berrada

Dr. Ghita Berrada
Assist. Prof. (Education)
LSE Data Science Institute

  • PhD in Computer Science (University of Twente, Netherlands)
  • Background: Engineering, Databases, Health Informatics, ML for cybersecurity
  • Formerly Research Associate at King’s College London and the University of Edinburgh (School of Informatics)

decision support systems
machine learning applications
databases
provenance
ethical AI/XAI

Teaching Assistants

Photo of Riya Chhikara

Riya Chhikara
Data Science Trainer
LSE Digital Skills Lab


  • MSc Applied Social Data Science
  • Qualified Assistant Professor in Sociology (MA, Jawaharlal Nehru University, New Delhi)
  • Background: Statistics in social sciences, Data Analytics, Literature, Social Systems

data in public policy
machine learning
cloud computing
quantitative research
Causal Inference

Who you are

Programme Freq
General Course 5
BSc in Economics 1
BSc in Finance 1
BSc in International Relations and History 1
Exchange Programme for Students from SGH Warsaw School of Economics 1

Source: LSE For You. Last Updated: 15 January 2024

How did we get here?

  • and how did we get to the point that we can collect, extract and analyse all of this data?
  • Technology has changed dramatically over the decades, making it increasingly easier and cheaper to capture and store data.
  • So, data has become more and more abundant.
  • Our habits have changed as technology has evolved.
  • At the same time, the value of all that data has become increasingly apparent (if only to gain more insights into our habits themselves!).

How did we get here?

  • and how did we get to the point that we can collect, extract and analyse all of this data?
  • Technology has changed dramatically over the decades, making it increasingly easier and cheaper to capture and store data.
  • So, data has become more and more abundant.
  • Our habits have changed as technology has evolved.
  • At the same time, the value of all that data has become increasingly apparent (if only to gain more insights into our habits themselves!).

3.5” floppy disk

Cassette tape

VHS (Video Home System) tape

Cell phone circa 2000 (Nokia 3310)

The current abundance of data is strongly associated with the dramatic changes in technology in the past few decades.

Smartphones 📱 are a very recent thing

To interact with this plot, check reference (Fischer-Baum 2017) at the end of this presentation.

We changed how we consume music 🎧

To interact with this plot, check reference (Fischer-Baum 2017) at the end of this presentation.

We changed how we consume video 🎞️

To interact with this plot, check reference (Fischer-Baum 2017) at the end of this presentation.

We spend a lot more time connected

… and our social media habits keep on changing

… and our social media habits keep on changing

The possibilities

  • Humans and machines nowadays generate A LOT of data ALL THE TIME
  • It has become very cheap to collect and store this data

Source:Roser, Ritchie, and Mathieu (2023)
  • This abundance of data opens up new possibilities for research & policy-making

New data to answer old questions:

  • How do rumours spread?
  • How can we predict unemployment rates accurately?

New questions enabled by new data/new technologies:

  • Is social media a threat to democracy?
  • Is generative AI a threat to the job market?

What do we mean by data science?

Data science is…

“[…] a field of study and practice that involves the collection, storage, and processing of data in order to derive important 💡insights into a problem or a phenomenon.

Such data may be generated by humans (surveys, logs, etc.) or machines (weather data, road vision, etc.),

and could be in different formats (text, audio, video, augmented or virtual reality, etc.).”

(Shah 2020, chap. 1) - Emphasis and emojis are of my own making.

The mythical unicorn 🦄

knows everything about statistics

is a fluent computer programmer

fully understands businesses like no one

able to communicate insights perfectly

Of course, such a person does not exist!

See (Davenport 2020) for a more in-depth discussion about this

In practice…

We are all jugglers 🤹

  • We need and work in multi-disciplinary teams
  • Everyone brings distinct skill sets
  • Good data scientists know a bit of everything.
    • Not fluent in all things
    • Understand their strengths and, more crucially, their weaknesses/shortcomings
    • They know when and where to interface with others and ask for support

See (Schutt and O’Neil 2013, chap. 1) for more on this.

The Data Science Workflow

The Data Science Workflow

start Start gather Gather data   start->gather store Store it          somewhere gather->store       clean Clean &         pre-process store->clean       build Build a dataset clean->build       eda Exploratory     data analysis build->eda ml Machine learning eda->ml       insight Obtain    insights ml->insight       communicate Communicate results          insight->communicate       end End communicate->end

⚠️ Note that this is a simplified version of what happens in a data science project.
In practice, the process is hardly linear, and many feedback loops exist.

The Data Science Workflow

start Start gather Gather data   start->gather end End store Store it          somewhere gather->store       clean Clean &         pre-process store->clean       build Build a dataset clean->build       eda Exploratory     data analysis build->eda ml Machine learning eda->ml       insight Obtain    insights ml->insight       communicate Communicate results          insight->communicate       communicate->end

It is often said that 80% of the time and effort spent on a data science project goes to the abovementioned tasks.

The Data Science Workflow

start Start gather Gather data   start->gather end End store Store it          somewhere gather->store       clean Clean &         pre-process store->clean       build Build a dataset clean->build       eda Exploratory     data analysis build->eda ml Machine learning eda->ml       insight Obtain    insights ml->insight       communicate Communicate results          insight->communicate       communicate->end

  • This is the bit that typically gets everyone excited about data science.
  • Machine Learning is a sub-field of Artificial Intelligence that focuses on developing algorithms that can learn from data.

What do you think of this photograph?

Boris Eldagsen’s award-winning picture “Pseudomnesia: The Electrician” at the Sony world photography awards.

  • This photograph won the 1st Prize of the Creative Category of the Sony world photography awards
  • What’s the link to Data Science, you say?

An algorithm created this photograph!

  • Instead of using a 📷, the author typed a few words (called a prompt) into an algorithm
  • It took him ⌛ some time and effort to find the “right words” to express his idea: the image was was re-edited between 20 to 40 times.

  • Boris Eldagsen used a generative Artificial Intelligence (AI) algorithm called ‘DALL-E 2’

Source: Williams (2023)

Eldagsen’s position on generative AI:

  • he does not consider the process of building an AI image a dehumanised one or one where the human is sidelined:
    • “I don’t see it as a threat to creativity. For me, it really is setting me free. All the boundaries I had in the past – material boundaries, budgets – no longer matter. And for the first time in history, the older generation has an advantage, because AI is a knowledge accelerator. Two thirds of the prompts are only good if you have knowledge and skills, when you know how photography works, when you know art history. This is something that a 20-year-old can’t do.”
  • he instead worries about fake images and the threat to democracy:
    • “The threat is to democracy and to photojournalism; we have so many fake images, we need to come up with a way to show people what is what.”

Source: Grierson (2023)

Eldagsen’s position on generative AI:

  • he refused the photography award and says he “applied as a cheeky monkey” to find out if competitions would be prepared for AI images to enter. “They are not,” according to him.

  • “We, the photo world, need an open discussion. A discussion about what we want to consider photography and what not. Is the umbrella of photography large enough to invite AI images to enter – or would this be a mistake? With my refusal of the award I hope to speed up this debate.”

  • “AI images and photography should not compete with each other in an award like this. They are different entities. AI is not photography. Therefore I will not accept the award.”

  • ⏭️ Let’s look at a few other examples

Threading a fine line with AI…

Source: Yong (2018)

Source: Taylor (2023)

Threading a fine line with AI…

Source: Naughton (2023)

Threading a fine line with AI…

Source: Weale (2023)

Source: Sample (2023)

Source: Kelly (2023)

(Unsolved) questions about the (generative) AI revolution

Who owns the rights to the images?


“An impressionist revolutionary cat on a roof”

What biases do these tools propagate?



“A cat on the moon holding a box with the source of true wisdom and happiness; cubism style”

Do these tools impact on people’s choices, autonomy or human dignity?



“A dignified cat thinking hard about existential questions, Van Gogh painting”

Are these tools safe?




“A silly cat on the moon holding a box with the source of true wisdom and happiness; Dutch realist painting style”

Time for a break 🍵

After the break:

  • DS101W website
  • Syllabus
  • Assessments
  • Tools
  • Course rep
  • This week’s coursework

References

Davenport, Thomas. 2020. “Beyond Unicorns: Educating, Classifying, and Certifying Business Data Scientists.” Harvard Data Science Review 2 (2). https://doi.org/10.1162/99608f92.55546b4a.
Fischer-Baum, Reuben. 2017. “What ‘Tech World’ Did You Grow up In?” Washington Post. https://www.washingtonpost.com/graphics/2017/entertainment/tech-generations/.
Grierson, Jamie. 2023. “Photographer Admits Prize-Winning Image Was AI-Generated.” The Guardian. Retrieved from Https://Www. Theguardian. Com/Technology/2023/Apr/17/Photographer-Admits-Prize-Winning-Image-Was-Ai-Generated.
Kelly, Jack. 2023. “Goldman Sachs Predicts 300 Million Jobs Will Be Lost or Degraded by Artificial Intelligence.” Forbes. Accessed 16 Apr.
Naughton, John. 2023. “Can AI-Generated Art Be Copyrighted? A US Judge Says Not, but It’s Just a Matter of Time.” The Observer. Retrieved from Https://Www.theguardian.com/Commentisfree/2023/Aug/26/Ai-Generated-Art-Copyright-Law-Recent-Entrance-Paradise-Creativity-Machine.
Roser, Max, Hannah Ritchie, and Edouard Mathieu. 2023. “Technological Change.” Our World in Data.
Sample, Ian. 2023. “ChatGPT: What Can the Extraordinary Artificial Intelligence Chatbot Do?” The Guardian, January. https://www.theguardian.com/technology/2023/jan/13/chatgpt-explainer-what-can-artificial-intelligence-chatbot-do-ai.
Schutt, Rachel, and Cathy O’Neil. 2013. Doing Data Science. First edition. Beijing ; Sebastopol: O’Reilly Media. https://ebookcentral.proquest.com/lib/londonschoolecons/detail.action?docID=1465965.
Shah, Chirag. 2020. A Hands-on Introduction to Data Science. Cambridge, United Kingdom ; New York, NY, USA: Cambridge University Press. https://librarysearch.lse.ac.uk/permalink/f/1n2k4al/TN_cdi_askewsholts_vlebooks_9781108673907.
Taylor, Luke. 2023. “Colombian Judge Says He Used ChatGPT in Ruling.” The Guardian 2. https://www.theguardian.com/technology/2023/feb/03/colombia-judge-chatgpt-ruling.
Weale, Sally. 2023. “Lecturers Urged to Review Assessments in UK Amid Concerns over New AI Tool.” The Guardian, January. https://www.theguardian.com/technology/2023/jan/13/end-of-the-essay-uk-lecturers-assessments-chatgpt-concerns-ai.
Williams, Zoe. 2023. “Interview: ’AI Isn’t a Threat’ – Boris Eldagsen, Whose Fake Photo Duped the Sony Judges, Hits Back.” The Guardian. Retrieved from Https://Www.theguardian.com/Artanddesign/2023/Apr/18/Ai-Threat-Boris-Eldagsen-Fake-Photo-Duped-Sony-Judges-Hits-Back.
Yong, Ed. 2018. “A Popular Algorithm Is No Better at Predicting Crimes Than Random People.” The Atlantic. Retrieved from Https://Www.theatlantic.com/Technology/Archive/2018/01/Equivant-Compas-Algorithm/550646/.

LSE DS101W (2023/24) – Week 01 | archive

1 / 43
🗓️ Week 01 Structure of this course DS101 – Fundamentals of Data Science Dr. Ghita Berrada LSE Data Science Institute 15 Jan 2024

  1. Slides

  2. Tools

  3. Close
  • 🗓️ Week 01 Structure of this course
  • What we will cover today:
  • Who we are
  • The Data Science Institute
  • The Data Science Institute
  • The Data Science Institute
  • Other resources of interest to you
  • Our courses
  • Your lecturer
  • Teaching Assistants
  • Who you are
  • How did we get here?
  • How did we get here?
  • Smartphones 📱 are a very recent thing
  • We changed how we consume music 🎧
  • We changed how we consume video 🎞️
  • We spend a lot more time connected
  • … and our social media habits keep on changing
  • … and our social media habits keep on changing
  • The possibilities
  • What do we mean by data science?
  • Data science is…
  • The mythical unicorn 🦄
  • In practice…
  • The Data Science Workflow
  • The Data Science Workflow
  • The Data Science Workflow
  • The Data Science Workflow
  • What do you think of this photograph?
  • An algorithm created this photograph!
  • Source: Williams...
  • Slide 32
  • Source: Grierson...
  • Threading a fine line with AI…
  • Threading a fine line with AI…
  • Threading a fine line with AI…
  • (Unsolved) questions about the (generative) AI revolution
  • Who owns the rights to the images?
  • What biases do these tools propagate?
  • Do these tools impact on people’s choices, autonomy or human dignity?
  • Are these tools safe?
  • Time for a break 🍵
  • References
  • f Fullscreen
  • s Speaker View
  • o Slide Overview
  • e PDF Export Mode
  • r Scroll View Mode
  • ? Keyboard Help