🗓️ Week 01
Structure of this course

DS101 – Fundamentals of Data Science

25 Sep 2023

Who we are

The Data Science Institute

The Data Science Institute

Activities of interest to you:

Other resources of interest to you

Our courses

DSI offers accessible introductions to Data Science:

DS101

Fundamentals of
Data Science

🎯 Focus:
theoretical concepts of data science

📂 How:
reflections through reading and writing

DS105

Data for
Data Scientists

🎯 Focus:
collection and handling of real data

📂 How:
hands-on coding exercises and a group project

DS202

Data Science for
Social Scientists

🎯 Focus:
fundamental machine learning algorithms

📂 How:
practical use of ML techniques and metrics

Your lecturer

Photo of Ghita Berrada

Dr. Ghita Berrada
Assist. Prof. (Education)
LSE Data Science Institute

  • PhD in Computer Science (University of Twente, Netherlands)
  • Background: Engineering, Databases, Health Informatics, ML for cybersecurity
  • Formerly Research Associate at King’s College London and the University of Edinburgh (School of Informatics)

decision support systems
machine learning applications
databases
provenance
ethical AI/XAI

Teaching Assistants

Photo of Garima Chaudhary

Garima Chaudhary
Clinical Data Engineer, 33N



  • MSc in Health Data Science (LSE)
    • Background: Mathematics, Finance and Health Informatics
    • Former Senior Data Science Engineer
    • Won Brian-Abel Smith award at LSE for Highest Aggregated Score in the cohort

Mathematics
Health Informatics
Financial Data

Who you are

Programme Number Department
General Course 14 General Course
BSc in Politics 3 Government
BA in History 2 International History
BSc in Economics 2 Economics
BSc in International Social and Public Policy 2 Social Policy
BSc in Politics and Economics 2 Government
Exchange Programme for Students from IE University Madrid 2
BSc in International Social and Public Policy and Economics 1 Social Policy
BSc in International Social and Public Policy with Politics 1 Social Policy
BSc in Philosophy and Economics 1 Philosophy, Logic and Scientific Method
MSc in Global Media and Communications (LSE and USC) 1 Media and Communications

How did we get here?

How did we get here?

3.5” floppy disk

Cassette tape

VHS (Video Home System) tape

Cell phone circa 2000 (Nokia 3310)

The current abundance of data is strongly associated with the dramatic changes in technology in the past few decades.

Smartphones 📱 are a very recent thing

We changed how we consume music 🎧

We changed how we consume video 🎞️

We spend a lot more time connected

… and our social media habits keep on changing

… and our social media habits keep on changing

The possibilities

  • Humans and machines nowadays generate A LOT of data ALL THE TIME
  • It has become very cheap to collect and store this data

Source:Roser, Ritchie, and Mathieu (2023)
  • This abundance of data opens up new possibilities for research & policy-making

New data to answer old questions:

  • How do rumours spread?
  • How can we predict unemployment rates accurately?

New questions enabled by new data/new technologies:

  • Is social media a threat to democracy?
  • Is generative AI a threat to the job market?

What do we mean by data science?

Data science is…

“[…] a field of study and practice that involves the collection, storage, and processing of data in order to derive important 💡insights into a problem or a phenomenon.

Such data may be generated by humans (surveys, logs, etc.) or machines (weather data, road vision, etc.),

and could be in different formats (text, audio, video, augmented or virtual reality, etc.).”

The mythical unicorn 🦄

knows everything about statistics

is a fluent computer programmer

fully understands businesses like no one

able to communicate insights perfectly

In practice…

We are all jugglers 🤹

  • We need and work in multi-disciplinary teams
  • Everyone brings distinct skill sets
  • Good data scientists know a bit of everything.
    • Not fluent in all things
    • Understand their strengths and, more crucially, their weaknesses/shortcomings
    • They know when and where to interface with others and ask for support

The Data Science Workflow

The Data Science Workflow

start Start gather Gather data   start->gather store Store it          somewhere gather->store       clean Clean &         pre-process store->clean       build Build a dataset clean->build       eda Exploratory     data analysis build->eda ml Machine learning eda->ml       insight Obtain    insights ml->insight       communicate Communicate results          insight->communicate       end End communicate->end

The Data Science Workflow

start Start gather Gather data   start->gather end End store Store it          somewhere gather->store       clean Clean &         pre-process store->clean       build Build a dataset clean->build       eda Exploratory     data analysis build->eda ml Machine learning eda->ml       insight Obtain    insights ml->insight       communicate Communicate results          insight->communicate       communicate->end

It is often said that 80% of the time and effort spent on a data science project goes to the abovementioned tasks.

The Data Science Workflow

start Start gather Gather data   start->gather end End store Store it          somewhere gather->store       clean Clean &         pre-process store->clean       build Build a dataset clean->build       eda Exploratory     data analysis build->eda ml Machine learning eda->ml       insight Obtain    insights ml->insight       communicate Communicate results          insight->communicate       communicate->end

  • This is the bit that typically gets everyone excited about data science.
  • Machine Learning is a sub-field of Artificial Intelligence that focuses on developing algorithms that can learn from data.

What do you think of this photograph?

Boris Eldagsen’s award-winning picture “Pseudomnesia: The Electrician” at the Sony world photography awards.

An algorithm created this photograph!

  • Instead of using a 📷, the author typed a few words (called a prompt) into an algorithm
  • It took him some time and effort to find the “right words” to express his idea: the image was was re-edited between 20 to 40 times.

  • Boris Eldagsen used a generative Artificial Intelligence (AI) algorithm called ‘DALL-E 2’

Source: Williams (2023)

Eldagsen’s position on generative AI:

  • he does not consider the process of building an AI image a dehumanised one or one where the human is sidelined:
    • “I don’t see it as a threat to creativity. For me, it really is setting me free. All the boundaries I had in the past – material boundaries, budgets – no longer matter. And for the first time in history, the older generation has an advantage, because AI is a knowledge accelerator. Two thirds of the prompts are only good if you have knowledge and skills, when you know how photography works, when you know art history. This is something that a 20-year-old can’t do.”
  • he instead worries about fake images and the threat to democracy:
    • “The threat is to democracy and to photojournalism; we have so many fake images, we need to come up with a way to show people what is what.”

Source: Grierson (2023)

Eldagsen’s position on generative AI:

  • he refused the photography award and says he “applied as a cheeky monkey” to find out if competitions would be prepared for AI images to enter. “They are not,” according to him.

  • “We, the photo world, need an open discussion. A discussion about what we want to consider photography and what not. Is the umbrella of photography large enough to invite AI images to enter – or would this be a mistake? With my refusal of the award I hope to speed up this debate.”

  • “AI images and photography should not compete with each other in an award like this. They are different entities. AI is not photography. Therefore I will not accept the award.”

  • ⏭️ Let’s look at a few other examples

Threading a fine line with AI…

Source: Yong (2018)

Source: Taylor (2023)

Threading a fine line with AI…

Source: Naughton (2023)

Threading a fine line with AI…

Source: Weale (2023)

Source: Sample (2023)

Source: Kelly (2023)

(Unsolved) questions about the (generative) AI revolution

Who owns the rights to the images?


“An impressionist revolutionary cat on a roof”

What biases do these tools propagate?



“A cat on the moon holding a box with the source of true wisdom and happiness; cubism style”

Do these tools impact on people’s choices, autonomy or human dignity?



“A dignified cat thinking hard about existential questions, Van Gogh painting”

Are these tools safe?




“A silly cat on the moon holding a box with the source of true wisdom and happiness; Dutch realist painting style”

Time for a break 🍵

After the break:

  • DS101A website
  • Syllabus
  • Assessments
  • Tools
  • Course rep
  • This week’s coursework

References

Davenport, Thomas. 2020. “Beyond Unicorns: Educating, Classifying, and Certifying Business Data Scientists.” Harvard Data Science Review 2 (2). https://doi.org/10.1162/99608f92.55546b4a.
Fischer-Baum, Reuben. 2017. “What ‘Tech World’ Did You Grow up In?” Washington Post. https://www.washingtonpost.com/graphics/2017/entertainment/tech-generations/.
Grierson, Jamie. 2023. “Photographer Admits Prize-Winning Image Was AI-Generated.” The Guardian. Retrieved from Https://Www. Theguardian. Com/Technology/2023/Apr/17/Photographer-Admits-Prize-Winning-Image-Was-Ai-Generated.
Kelly, Jack. 2023. “Goldman Sachs Predicts 300 Million Jobs Will Be Lost or Degraded by Artificial Intelligence.” Forbes. Accessed 16 Apr.
Naughton, John. 2023. “Can AI-Generated Art Be Copyrighted? A US Judge Says Not, but It’s Just a Matter of Time.” The Observer. Retrieved from Https://Www.theguardian.com/Commentisfree/2023/Aug/26/Ai-Generated-Art-Copyright-Law-Recent-Entrance-Paradise-Creativity-Machine.
Roser, Max, Hannah Ritchie, and Edouard Mathieu. 2023. “Technological Change.” Our World in Data.
Sample, Ian. 2023. ChatGPT: What Can the Extraordinary Artificial Intelligence Chatbot Do?” The Guardian, January. https://www.theguardian.com/technology/2023/jan/13/chatgpt-explainer-what-can-artificial-intelligence-chatbot-do-ai.
Schutt, Rachel, and Cathy O’Neil. 2013. Doing Data Science. First edition. Beijing ; Sebastopol: O’Reilly Media. https://ebookcentral.proquest.com/lib/londonschoolecons/detail.action?docID=1465965.
Shah, Chirag. 2020. A Hands-on Introduction to Data Science. Cambridge, United Kingdom ; New York, NY, USA: Cambridge University Press. https://librarysearch.lse.ac.uk/permalink/f/1n2k4al/TN_cdi_askewsholts_vlebooks_9781108673907.
Taylor, Luke. 2023. “Colombian Judge Says He Used ChatGPT in Ruling.” The Guardian 2.
Weale, Sally. 2023. “Lecturers Urged to Review Assessments in UK Amid Concerns over New AI Tool.” The Guardian, January. https://www.theguardian.com/technology/2023/jan/13/end-of-the-essay-uk-lecturers-assessments-chatgpt-concerns-ai.
Williams, Zoe. 2023. “Interview: ’AI Isn’t a Threat’ – Boris Eldagsen, Whose Fake Photo Duped the Sony Judges, Hits Back.” The Guardian. Retrieved from Https://Www.theguardian.com/Artanddesign/2023/Apr/18/Ai-Threat-Boris-Eldagsen-Fake-Photo-Duped-Sony-Judges-Hits-Back.
Yong, Ed. 2018. “A Popular Algorithm Is No Better at Predicting Crimes Than Random People.” The Atlantic. Retrieved from Https://Www.theatlantic.com/Technology/Archive/2018/01/Equivant-Compas-Algorithm/550646/.