🗓️ Week 01
Structure of this course

DS101 – Fundamentals of Data Science

30 Sep 2024

Who we are

The Data Science Institute

  • This course is offered by the LSE Data Science Institute (DSI).
  • DSI is the hub for LSE’s interdisciplinary collaboration in data science
  • ⏭️ Let’s see a few activities that might be of interest to you

CIVICA Seminar Series

Careers in Data Science

Hear from alumni or industry experts about their career paths and how they got to where they are today.

Last event:

🗓️ Keeping London Moving with Data (28 February 4 - 5.30pm)

A talk about life in the data world at TfL. Jemima, Graduate Data Scientist at Transport for London (TfL) will talk about her experience as a Data Science Graduate in our inaugural programme. Lauren Sager Weinstein, Chief Data Officer, at Transport for London (TfL) will talk about how she’s leading TfL’s data strategy, and how all the components of data careers (data scientists, data developers, data product managers, and data users) can come together to deliver on our data vision: To empower our people to make better decisions with data.

Industry “field trips”

Visit at Lloyds (2023)

Other resources of interest to you

Our courses

DSI offers accessible introductions to Data Science:

DS101

Fundamentals of
Data Science

🎯 Focus:
theoretical concepts of data science

📂 How:
reflections through reading and writing

DS105

Data for
Data Scientists

🎯 Focus:
collection and handling of real data

📂 How:
hands-on coding exercises and a group project

DS202

Data Science for
Social Scientists

🎯 Focus:
fundamental machine learning algorithms

📂 How:
practical use of ML techniques and metrics

DS205

Advanced Data Manipulation

🎯 Focus:
More advanced data collection/manipulation techniques than DS105

📂 How:

  • hands-on coding exercises and a group project
  • partnership with the Grantham Research Institute that provides their real business needs to be made into case studies/assignments for the course

Your lecturer

Photo of Ghita Berrada

Dr. Ghita Berrada
Assist. Prof. (Education)
LSE Data Science Institute

  • PhD in Computer Science (University of Twente, Netherlands)
  • Background: Engineering, Databases, Health Informatics, ML for cybersecurity
  • Formerly Research Associate at King’s College London and the University of Edinburgh (School of Informatics)

decision support systems
machine learning applications
databases
provenance
ethical AI/XAI

Teaching Assistants

Photo of Stuart Bramwell
Dr Stuart Bramwell
Guest Lecturer at the Data Science Institute
📧 E-mail
guest teacher
  • DPhil in Politics (Oxford University)
  • Formerly postdoctoral researcher New News Project at Royal Holloway
  • Background: Political Science
  • Founder of WhoGov data set
    • won the Lijphart/Przeworski/Verba data set award conferred annually by the American Political Science Association

social identity democratisation
political elites political economy

Teaching Assistants

Photo of Barry Ledeatte
Dr. Barry Ledeatte
AI Learning Consultant
📧 E-mail
guest teacher
  • PhD Human Vision (University of Nottingham)
  • Background: electronics, neural computation
  • Industrial experience in software consultancy, product management & technical training: mobile, OS
  • Data science: fraud analytics, fintech, marketing

data science machine learning
psychology & physiology

Who you are

Programme Freq
General Course 12
BSc in International Social and Public Policy 6
BSc in Economics 3
BA in History 2
BSc in Economic History 2
BSc in Politics 2
LLB in Laws 2
BSc in History and Politics 1
BSc in International Relations 1
BSc in Philosophy, Logic and Scientific Method 1
BSc in Psychological and Behavioural Science 1
Erasmus Reciprocal Exchange Programme of Study 1
Exchange Programme for Students from IE University Madrid 1
Exchange Programme for Students from University of California, Santa Cruz 1

How did we get here?

How did we get here?

3.5” floppy disk

Cassette tape

VHS (Video Home System) tape

Cell phone circa 2000 (Nokia 3310)

The current abundance of data is strongly associated with the dramatic changes in technology in the past few decades.

Smartphones 📱 are a very recent thing

We changed how we consume music 🎧

We changed how we consume video 🎞️

We spend a lot more time connected

We spend a lot more time connected

… and our social media habits keep on changing

… and our social media habits keep on changing

… and our social media habits keep on changing

… and our social media habits keep on changing

The possibilities

  • Humans and machines nowadays generate A LOT of data ALL THE TIME
  • It has become very cheap to collect and store this data

Source:Roser, Ritchie, and Mathieu (2023)
  • This abundance of data opens up new possibilities for research & policy-making

New data to answer old questions:

  • How do rumours spread?
  • How can we predict unemployment rates accurately?

New questions enabled by new data/new technologies:

  • Is social media a threat to democracy/public order?
  • Is generative AI a threat to the job market?

What do we mean by data science?

Data science is…

“[…] a field of study and practice that involves the collection, storage, and processing of data in order to derive important 💡insights into a problem or a phenomenon.

Such data may be generated by humans (surveys, logs, etc.) or machines (weather data, road vision, etc.),

and could be in different formats (text, audio, video, augmented or virtual reality, etc.).”

The mythical unicorn 🦄

knows everything about statistics

is a fluent computer programmer

fully understands businesses like no one

able to communicate insights perfectly

In practice…

We are all jugglers 🤹

  • We need and work in multi-disciplinary teams
  • Everyone brings distinct skill sets
  • Good data scientists know a bit of everything.
    • Not fluent in all things
    • Understand their strengths and, more crucially, their weaknesses/shortcomings
    • They know when and where to interface with others and ask for support

The Data Science Workflow

The Data Science Workflow

start Start gather Gather data   start->gather store Store it          somewhere gather->store       clean Clean &         pre-process store->clean       build Build a dataset clean->build       eda Exploratory     data analysis build->eda ml Machine learning eda->ml       insight Obtain    insights ml->insight       communicate Communicate results          insight->communicate       end End communicate->end

The Data Science Workflow

start Start gather Gather data   start->gather end End store Store it          somewhere gather->store       clean Clean &         pre-process store->clean       build Build a dataset clean->build       eda Exploratory     data analysis build->eda ml Machine learning eda->ml       insight Obtain    insights ml->insight       communicate Communicate results          insight->communicate       communicate->end

It is often said that 80% of the time and effort spent on a data science project goes to the abovementioned tasks.

The Data Science Workflow

start Start gather Gather data   start->gather end End store Store it          somewhere gather->store       clean Clean &         pre-process store->clean       build Build a dataset clean->build       eda Exploratory     data analysis build->eda ml Machine learning eda->ml       insight Obtain    insights ml->insight       communicate Communicate results          insight->communicate       communicate->end

  • This is the bit that typically gets everyone excited about data science.
  • Machine Learning is a sub-field of Artificial Intelligence that focuses on developing algorithms that can learn from data.

What do you think of this photograph?

Boris Eldagsen’s award-winning picture “Pseudomnesia: The Electrician” at the Sony world photography awards (2023).

An algorithm created this photograph!

  • Instead of using a 📷, the author typed a few words (called a prompt) into an algorithm
  • It took him some time and effort to find the “right words” to express his idea: the image was was re-edited between 20 to 40 times.

  • Boris Eldagsen used a generative Artificial Intelligence (AI) algorithm called ‘DALL-E 2’

Source: Williams (2023)

Eldagsen’s position on generative AI:

  • he does not consider the process of building an AI image a dehumanised one or one where the human is sidelined:
    • “I don’t see it as a threat to creativity. For me, it really is setting me free. All the boundaries I had in the past – material boundaries, budgets – no longer matter. And for the first time in history, the older generation has an advantage, because AI is a knowledge accelerator. Two thirds of the prompts are only good if you have knowledge and skills, when you know how photography works, when you know art history. This is something that a 20-year-old can’t do.”
  • he instead worries about fake images and the threat to democracy:
    • “The threat is to democracy and to photojournalism; we have so many fake images, we need to come up with a way to show people what is what.”

Source: Grierson (2023)

Eldagsen’s position on generative AI:

  • he refused the photography award and says he “applied as a cheeky monkey” to find out if competitions would be prepared for AI images to enter. “They are not,” according to him.

  • “We, the photo world, need an open discussion. A discussion about what we want to consider photography and what not. Is the umbrella of photography large enough to invite AI images to enter – or would this be a mistake? With my refusal of the award I hope to speed up this debate.”

  • “AI images and photography should not compete with each other in an award like this. They are different entities. AI is not photography. Therefore I will not accept the award.”

  • ⏭️ Let’s look at a few other examples

Threading a fine line with AI…

Source: Yong (2018)

Source: Taylor (2023)

Threading a fine line with AI…

Source: Naughton (2023)

Source: Makortoff (2024)

Threading a fine line with AI…

Source: Weale (2023)

Source: Adams (2024)

Threading a fine line with AI…

Source: Kelly (2023)

Source: Ungoed-Thomas and Abdulahi (2024)

(Unsolved) questions about the (generative) AI revolution

Who owns the rights to the media?


“An impressionist revolutionary cat on a roof”

What biases do these tools propagate?



“A cat on the moon holding a box with the source of true wisdom and happiness; cubism style”

Do these tools impact on people’s choices, autonomy or human dignity?



“A dignified cat thinking hard about existential questions, Van Gogh painting”

Are these tools safe?




“A silly cat on the moon holding a box with the source of true wisdom and happiness; Dutch realist painting style”

Time for a break 🍵

After the break:

  • DS101A website
  • Syllabus
  • Assessments
  • Tools
  • Course rep
  • This week’s coursework

References

Adams, Richard. 2024. “Researchers Fool University Markers with AI-Generated Exam Papers.” The Guardian, June. https://www.theguardian.com/education/article/2024/jun/26/researchers-fool-university-markers-with-ai-generated-exam-papers.
Davenport, Thomas. 2020. “Beyond Unicorns: Educating, Classifying, and Certifying Business Data Scientists.” Harvard Data Science Review 2 (2). https://doi.org/10.1162/99608f92.55546b4a.
Fischer-Baum, Reuben. 2017. “What ‘Tech World’ Did You Grow up In?” Washington Post. https://www.washingtonpost.com/graphics/2017/entertainment/tech-generations/.
Grierson, Jamie. 2023. “Photographer Admits Prize-Winning Image Was AI-Generated.” The Guardian. Retrieved from Https://Www. Theguardian. Com/Technology/2023/Apr/17/Photographer-Admits-Prize-Winning-Image-Was-Ai-Generated.
Kelly, Jack. 2023. “Goldman Sachs Predicts 300 Million Jobs Will Be Lost or Degraded by Artificial Intelligence.” Forbes. Accessed 16 Apr.
Makortoff, Kalyeena. 2024. “Warning Social Media Videos Could Be Exploited by Scammers to Clone Voices.” The Guardian (London). https://www.theguardian.com/money/2024/sep/18/warning-social-media-videos-exploited-scammers-clone-voices.
Naughton, John. 2023. “Can AI-Generated Art Be Copyrighted? A US Judge Says Not, but It’s Just a Matter of Time.” The Observer. Retrieved from Https://Www.theguardian.com/Commentisfree/2023/Aug/26/Ai-Generated-Art-Copyright-Law-Recent-Entrance-Paradise-Creativity-Machine.
Roser, Max, Hannah Ritchie, and Edouard Mathieu. 2023. “Technological Change.” Our World in Data.
Schutt, Rachel, and Cathy O’Neil. 2013. Doing Data Science. First edition. Beijing ; Sebastopol: O’Reilly Media. https://ebookcentral.proquest.com/lib/londonschoolecons/detail.action?docID=1465965.
Shah, Chirag. 2020. A Hands-on Introduction to Data Science. Cambridge, United Kingdom ; New York, NY, USA: Cambridge University Press. https://librarysearch.lse.ac.uk/permalink/f/1n2k4al/TN_cdi_askewsholts_vlebooks_9781108673907.
Taylor, Luke. 2023. “Colombian Judge Says He Used ChatGPT in Ruling.” The Guardian 2.
Ungoed-Thomas, Jon, and Yusra Abdulahi. 2024. “Warnings AI Tools Used by Government on UK Public Are ‘Racist and Biased’.” The Guardian (London). https://www.theguardian.com/technology/article/2024/aug/25/register-aims-to-quash-fears-over-racist-and-biased-ai-tools-used-on-uk-public.
Weale, Sally. 2023. “Lecturers Urged to Review Assessments in UK Amid Concerns over New AI Tool.” The Guardian, January. https://www.theguardian.com/technology/2023/jan/13/end-of-the-essay-uk-lecturers-assessments-chatgpt-concerns-ai.
Williams, Zoe. 2023. “Interview: ’AI Isn’t a Threat’ – Boris Eldagsen, Whose Fake Photo Duped the Sony Judges, Hits Back.” The Guardian. Retrieved from Https://Www.theguardian.com/Artanddesign/2023/Apr/18/Ai-Threat-Boris-Eldagsen-Fake-Photo-Duped-Sony-Judges-Hits-Back.
Yong, Ed. 2018. “A Popular Algorithm Is No Better at Predicting Crimes Than Random People.” The Atlantic. Retrieved from Https://Www.theatlantic.com/Technology/Archive/2018/01/Equivant-Compas-Algorithm/550646/.