LSE DS101A - Fundamentals of Data Science

2023/24 Autumn Term

Author
Published

10 October 2023


Intro
๐Ÿ—“๏ธ Week 01

25 Sep 2023-
29 Sep 2023
๐Ÿง‘โ€๐Ÿซ Lecture Introduction, Context & Key Concepts
๐Ÿ’ป Class Discussions: the boundaries of personal data
โœ๏ธ Coursework
  • What: Read the indicative reading articles and answer questions about them to prepare for the week 1 class discussion
  • Release date: Week 1 Lecture i.e 25 September 2023
  • When: Throughout the week
  • Deadline: 29 September 2023
๐Ÿ“– Readings Indicative Recommended Go deeper
Basic concepts from Computer Science and Statistics
๐Ÿ—“๏ธ Week 02

02 Oct 2023-
06 Oct 2023
๐Ÿง‘โ€๐Ÿซ Lecture Data types and the concept of tidy data
๐Ÿ’ป Class Live Demo: How data scientists use programming to preprocess data
โญ Formative
  • What: Create a tidy spreadsheet from Wikipedia data
  • Release date: Week 2 Lecture i.e 2 Oct 2023
  • When: Throughout the week
  • Deadline: 5 October 2023
๐Ÿ“– Readings Indicative Go deeper
  • ๐Ÿ•ธ๏ธ Online resource: Basic types in Python (Sturz 2023)
  • ๐Ÿ•ธ๏ธ Online course: Basic types in Python (Jones 2023)
  • ๐Ÿ•ธ๏ธ Online resource: Floating Point Arithmetic in Python: Issues and Limitations (Python documentation 2023)
  • ๐Ÿ•ธ๏ธ Online course: Introduction to Python (Real Python 2023)
๐Ÿ—“๏ธ Week 03

09 Oct 2023-
13 Oct 2023
๐Ÿง‘โ€๐Ÿซ Lecture Computational Thinking and Programming
๐Ÿ’ป Class Live Demo: How data scientists use programming to visualise data
๐ŸŒŸ Summative
  • Worth: 10% of final marks
  • Prepare for your group presentation next week
  • Release date: 6 October 2023
  • Deadline: 27 October 2023
๐Ÿ“– Readings Indicative Go deeper
๐Ÿ—“๏ธ Week 04

16 Oct 2023-
20 Oct 2023
๐Ÿง‘โ€๐Ÿซ Lecture Statistical Inference I
๐Ÿ’ป Class ๐ŸŒŸ Tutorial: Introduction to Zotero & Quarto Markdown
๐Ÿ“– Readings Indicative Recommended Go deeper
๐Ÿ—“๏ธ Week 05

23 Oct 2023-
27 Oct 2023
๐Ÿง‘โ€๐Ÿซ Lecture Statistical Inference II
๐Ÿ’ป Class Group Presentations (worth 10% of final grade)
โญ Formative
  • What: Answer questions about the indicative readings
  • Release date: 23 October 2023
  • When: Throughout Weeks 05 & 06
  • Deadline: 15 November 2023
โœ๏ธ Coursework
  • What: Practice Zotero and Quarto Markdown
  • When: Throughout Weeks 05 & 06
  • Deadline: 10 November 2023
๐Ÿ“– Readings Indicative Recommended Go deeper
๐Ÿ—“๏ธ Week 06

30 Oct 2023-
04 Nov 2023
Reading Week
Machine Learning & AI
๐Ÿ—“๏ธ Week 07

06 Nov 2023-
11 Nov 2023
๐Ÿง‘โ€๐Ÿซ Lecture Machine Learning I: Supervised Learning
๐Ÿ’ป Class Live Demo: Supervised Learning
๐Ÿ†˜ Drop-in session We will host a drop-in session on Week 07 to help answer any questions you have about Quarto Markdown and Zotero
โญ Formative
  • What: Start gathering academic papers in Zotero
  • When: Throughout the week
  • Deadline: 11 November 2023
๐Ÿ“– Readings Indicative Recommended Go deeper
๐Ÿ—“๏ธ Week 08

13 Nov 2023-
17 Nov 2023
๐Ÿง‘โ€๐Ÿซ Lecture Machine Learning II: Unsupervised Learning
๐Ÿ’ป Class Peer-reviewing activity
(Details about the activity will be given on the week 7 Lecture)
โญ Formative
  • What: Start writing your first (formative) essay using Quarto markdown
  • Release date: 15 November 2023
  • Deadline: 30 November 2023
๐Ÿ“– Readings Indicative Recommended
๐Ÿ—“๏ธ Week 09

20 Nov 2023-
24 Nov 2023
๐Ÿง‘โ€๐Ÿซ Lecture Unstructured Data (Text, Audio, Video)
๐Ÿ’ป Class In-class activity: exploring Machine Learning metrics (with a case study)
๐ŸŒŸ Summative
  • What: Start writing your first (summative) essay using Quarto markdown
  • Worth: 30% of your final grade
  • Release date: 30 November 2023
  • Deadline: 20 December 2023
๐Ÿ“– Readings Indicative
Decisions and Implications
๐Ÿ—“๏ธ Week 10

27 Nov 2023-
01 Dec 2023
๐Ÿง‘โ€๐Ÿซ Lecture Prediction vs. Explanation
๐Ÿ’ป Class Live Demo: Unsupervised Learning
๐Ÿ†˜ Drop-in session We will host a drop-in session on Week 11+1 to help answer any questions you have about your summative essay
โœ๏ธ Coursework
  • What: Work in groups, find examples of data science/AI applications with ethical issues and answer questions about them
  • Release date: 27 November 2023
  • Deadline: 4 December 2023
๐Ÿ“– Readings Indicative Go deeper
๐Ÿ—“๏ธ Week 11

04 Dec 2023-
08 Dec 2023
๐Ÿง‘โ€๐Ÿซ Lecture Ethical issues of AI and ethical AI: an overview
๐Ÿ’ป Class Exploring Generative AI
Deadline
Approaching โฒ๏ธ
Keep working on your essays:
  • Attend drop-in sessions
  • Organise study groups
๐Ÿ“– Readings Indicative Recommended Go deeper
After the Term
๐Ÿ—“๏ธ Week 11+1 Deadline โŒ› Submit your essay by 20 December 2023
๐ŸŒŸ Summative
  • What: Start writing your second (summative) essay using Quarto markdown
  • Worth: 60% of your final grade
  • Release date: 20 December 2023
  • Deadline: 22 February 2024
Dec 2023-
Jan 2024
Winter break
Winter Term (Jan & Feb 2024)
๐Ÿ—“๏ธ Week 03 Deadline
Approaching โฒ๏ธ
Keep working on your essays:
  • Attend drop-in sessions
  • Organise study groups
๐Ÿ—“๏ธ Week 6 Deadline โŒ› Submit your essay by 22 February 2024
The End

References

Aschwanden, Christie. 2015. โ€œScience Isnโ€™t Broken.โ€ FiveThirtyEight. https://fivethirtyeight.com/features/science-isnt-broken/.
Bakir, Vian. 2020. โ€œPsychological Operations in Digital Political Campaigns: Assessing Cambridge Analyticaโ€™s Psychographic Profiling and Targeting.โ€ Frontiers in Communication 5 (September): 67. https://doi.org/10.3389/fcomm.2020.00067.
Bender, Emily M., Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021. โ€œOn the Dangers of Stochastic Parrots: Can Language Models Be Too Big? ๐Ÿฆœ.โ€ In, 610โ€“23. Virtual Event Canada: ACM. https://doi.org/10.1145/3442188.3445922.
Bossman, Julia. 2016. โ€œTop 9 Ethical Issues in Artificial Intelligence. World Economic Forum.โ€ October 21, 2016. https://www.weforum.org/agenda/2016/10/top-10-ethical-issues-in-artificial-intelligence/.
Bridle, James. 2023. โ€œThe Stupidity of AI.โ€ The Guardian, March. https://www.theguardian.com/technology/2023/mar/16/the-stupidity-of-ai-artificial-intelligence-dall-e-chatgpt.
Broman, Karl W., and Kara H. Woo. 2018. โ€œData Organization in Spreadsheets.โ€ The American Statistician 72 (1): 2โ€“10. https://doi.org/10.1080/00031305.2017.1375989.
Bruce, Peter C., and Andrew Bruce. 2017. Practical Statistics for Data Scientists: 50 Essential Concepts. First edition. Sebastopol, CA: Oโ€™Reilly. https://ebookcentral.proquest.com/lib/londonschoolecons/detail.action?docID=4857224.
Cheng, Lu, Kush R. Varshney, and Huan Liu. 2021. โ€œSocially Responsible AI Algorithms: Issues, Purposes, and Challenges.โ€ J. Artif. Int. Res. 71 (September): 1137โ€“81. https://doi.org/10.1613/jair.1.12814.
Dโ€™Ignazio, Catherine, and Lauren F. Klein. 2020. Data Feminism. Strong Ideas Series. Cambridge, Massachusetts: The MIT Press. https://ebookcentral.proquest.com/lib/londonschoolecons/reader.action?docID=6120950.
Denning, Peter J., and Matti Tedre. 2019. Computational Thinking. The MIT Press Essential Knowledge Series. Cambridge, Massachusetts: The MIT Press.
Enders, Craig K. 2022. Applied Missing Data Analysis. Guilford Publications.
Flach, Peter A. 2012. Machine Learning: The Art and Science of Algorithms That Make Sense of Data. Cambridge: Cambridge University Press. https://doi-org.gate3.library.lse.ac.uk/10.1017/CBO9780511973000.
Floridi, Luciano, Josh Cowls, Monica Beltrametti, Raja Chatila, Patrice Chazerand, Virginia Dignum, Christoph Luetge, et al. 2018. โ€œAI4Peopleโ€”an Ethical Framework for a Good AI Society: Opportunities, Risks, Principles, and Recommendations.โ€ Minds and Machines (Dordrecht) 28 (4): 689โ€“707.
Gimlet. n.d. โ€œ#177 Gleeks and Gurgles Reply All.โ€ Accessed January 15, 2023. https://gimletmedia.com:443/shows/reply-all/z3h78d6.
Gramegna, Alex, and Paolo Giudici. 2021. โ€œSHAP and LIME: An Evaluation of Discriminative Power in Credit Risk.โ€ Frontiers in Artificial Intelligence 4. https://doi.org/10.3389/frai.2021.752558.
Greshake, Kai, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. 2023. โ€œNot What Youโ€™ve Signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection.โ€ https://arxiv.org/abs/2302.12173.
Guyan, Kevin. 2022. Queer Data: Using Gender, Sex and Sexuality Data for Action. Bloomsbury Studies in Digital Cultures. London: Bloomsbury Academic. https://web-s-ebscohost-com.gate3.library.lse.ac.uk/ehost/detail/detail?nobk=y&vid=2&sid=a8efeedd-6bfc-459a-9f0c-a67dabcc75d1@redis&bdata=JnNpdGU9ZWhvc3QtbGl2ZQ==#AN=3077276&db=nlebk.
Hofman, Jake M., Amit Sharma, and Duncan J. Watts. 2017. โ€œPrediction and Explanation in Social Systems.โ€ Science 355 (6324): 486โ€“88. https://doi.org/10.1126/science.aal3856.
Hofman, Jake M., Duncan J. Watts, Susan Athey, Filiz Garip, Thomas L. Griffiths, Jon Kleinberg, Helen Margetts, et al. 2021. โ€œIntegrating Explanation and Prediction in Computational Social Science.โ€ Nature 595 (7866): 181โ€“88. https://doi.org/10.1038/s41586-021-03659-0.
Hullman, Jessica, Sayash Kapoor, Priyanka Nanayakkara, Andrew Gelman, and Arvind Narayanan. 2022. โ€œThe Worst of Both Worlds: A Comparative Analysis of Errors in Learning from Data in Psychology and Machine Learning.โ€ In Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society, 335โ€“48. Oxford United Kingdom: ACM. https://doi.org/10.1145/3514094.3534196.
Illowsky, Barbara, and Susan L. Dean. 2013. Introductory Statistics. Houston, Texas: OpenStax College. https://openstax.org/details/books/introductory-statistics.
Isaak, Jim, and Mina J. Hanna. 2018. โ€œUser Data Privacy: Facebook, Cambridge Analytica, and Privacy Protection.โ€ Computer 51 (8): 56โ€“59. https://doi.org/10.1109/MC.2018.3191268.
Jones, Darren. 2023. โ€œBasic Data Types in Python.โ€ Real Python. https://realpython.com/courses/python-data-types/.
Lazer, David, Ryan Kennedy, Gary King, and Alessandro Vespignani. 2014. โ€œThe Parable of Google Flu: Traps in Big Data Analysis.โ€ Science 343 (6176): 1203โ€“5. https://doi.org/10.1126/science.1248506.
Li, Bo, Peng Qi, Bo Liu, Shuai Di, Jingen Liu, Jiquan Pei, Jinfeng Yi, and Bowen Zhou. 2023. โ€œTrustworthy AI: From Principles to Practices.โ€ ACM Comput. Surv. 55 (9). https://doi.org/10.1145/3555803.
Lupton, Deborah. 2016. โ€œThe Diverse Domains of Quantified Selves: Self-Tracking Modes and Dataveillance.โ€ Economy and Society 45 (1): 101โ€“22. https://doi.org/10.1080/03085147.2016.1143726.
โ€”โ€”โ€”. 2020. โ€œData Mattering and Self-Tracking: What Can Personal Data Do?โ€ Continuum 34 (1): 1โ€“13. https://doi.org/10.1080/10304312.2019.1691149.
Mehrabi, Ninareh, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. 2021. โ€œA Survey on Bias and Fairness in Machine Learning.โ€ ACM Comput. Surv. 54 (6). https://doi.org/10.1145/3457607.
Nauta, Meike, Jan Trienes, Shreyasi Pathak, Elisa Nguyen, Michelle Peters, Yasmin Schmitt, Jรถrg Schlรถtterer, Maurice van Keulen, and Christin Seifert. 2023. โ€œFrom Anecdotal Evidence to Quantitative Evaluation Methods: A Systematic Review on Evaluating Explainable AI.โ€ ACM Comput. Surv. 55 (13s). https://doi.org/10.1145/3583558.
Parsons, Lian. 2020. โ€œEthical Concerns Mount as AI Takes Bigger Decision-Making Role. Harvard Gazette.โ€ October 26, 2020. https://news.harvard.edu/gazette/story/2020/10/ethical-concerns-mount-as-ai-takes-bigger-decision-making-role/.
Perkel, Jeffrey M. 2022. โ€œSix Tips for Better Spreadsheets.โ€ Nature 608 (7921): 229โ€“30. https://doi.org/10.1038/d41586-022-02076-1.
Perry, Neil, Megha Srivastava, Deepak Kumar, and Dan Boneh. 2022. โ€œDo Users Write More Insecure Code with AI Assistants?โ€ https://arxiv.org/abs/2211.03622.
Pessach, Dana, and Erez Shmueli. 2022. โ€œA Review on Fairness in Machine Learning.โ€ ACM Comput. Surv. 55 (3). https://doi.org/10.1145/3494672.
Pietsch, Wolfgang. 2022. On the Epistemology of Data Science: Conceptual Tools for a New Inductivism. Philosophical Studies Series, Volume 148. Cham: Springer.
Podoletz, Lena. 2022. โ€œWe Have to Talk about Emotional AI and Crime.โ€ AI & SOCIETY, May. https://doi.org/10.1007/s00146-022-01435-w.
Prince, J. Dale. 2014. โ€œThe Quantified Self: Operationalizing the Quotidien.โ€ Journal of Electronic Resources in Medical Libraries 11 (2): 91โ€“99. https://doi.org/10.1080/15424065.2014.909145.
Python documentation. 2023. โ€œFloating Point Arithmetic: Issues and Limitations.โ€ Python Documentation. https://docs.python.org/3.10/tutorial/floatingpoint.html.
Rastogi, Charvi, Yunfeng Zhang, Dennis Wei, Kush R. Varshney, Amit Dhurandhar, and Richard Tomsett. 2022. โ€œDeciding Fast and Slow: The Role of Cognitive Biases in AI-Assisted Decision-Making.โ€ Proc. ACM Hum.-Comput. Interact. 6 (CSCW1). https://doi.org/10.1145/3512930.
Real Python. 2023. โ€œIntroduction to Python.โ€ Real Python. https://realpython.com/learning-paths/python3-introduction/.
Rettberg, Jill Walker. 2022. โ€œAlgorithmic Failure as a Humanities Methodology: Machine Learningโ€™s Mispredictions Identify Rich Cases for Qualitative Analysis.โ€ Big Data & Society 9 (2): 205395172211312. https://doi.org/10.1177/20539517221131290.
Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. 2016. โ€œ"Why Should i Trust You?": Explaining the Predictions of Any Classifier.โ€ https://arxiv.org/abs/1602.04938.
Sachs, Jeffrey, Rahshemah Wise, and Daniel Karell. 2021. โ€œThe TikTok Self: Music, Signaling, and Identity on Social Media.โ€ Preprint. SocArXiv. https://doi.org/10.31235/osf.io/2rx46.
Scheffer, Judi. 2002. โ€œDealing with Missing Data.โ€
Schroeder, Stan. 2022. โ€œTikTokโ€™s in-App Browser Can Monitor Your Every Click and Keystroke.โ€ Mashable. https://mashable.com/article/tiktok-browser-monitoring.
Schutt, Rachel, and Cathy Oโ€™Neil. 2013. Doing Data Science. First edition. Beijing ; Sebastopol: Oโ€™Reilly Media. https://ebookcentral.proquest.com/lib/londonschoolecons/detail.action?docID=1465965.
scikit-learn. 2023. โ€œScikit Learn User Guide: Clustering.โ€ Scikit-Learn. https://scikit-learn/stable/modules/clustering.html.
Segura, Thomas L. 2023. โ€œYes, GitHubโ€™s Copilot Can Leak (Real) Secrets. GitGuardian Blog - Automated Secrets Detection.โ€ October 12, 2023. https://blog.gitguardian.com/yes-github-copilot-can-leak-secrets/.
Shafer, Douglas S., and Zhiyi Zhang. 2012. Introductory Statistics. Saylor Foundation. https://saylordotorg.github.io/text_introductory-statistics/.
Shah, Chirag. 2020. A Hands-on Introduction to Data Science. Cambridge, United Kingdom ; New York, NY, USA: Cambridge University Press. https://librarysearch.lse.ac.uk/permalink/f/1n2k4al/TN_cdi_askewsholts_vlebooks_9781108673907.
Sturz, John. 2023. โ€œBasic Data Types in Python.โ€ Real Python. https://realpython.com/python-data-types/.
Swan, Melanie. 2013. โ€œThe Quantified Self: Fundamental Disruption in Big Data Science and Biological Discovery.โ€ Big Data 1 (2): 85โ€“99. https://doi.org/10.1089/big.2012.0002.
Sweeney, Latanya. 2013. โ€œDiscrimination in Online Ad Delivery: Google Ads, Black Names and White Names, Racial Discrimination, and Click Advertising.โ€ Queue 11 (3): 10โ€“29. https://doi.org/10.1145/2460276.2460278.
Verhagen, Mark D. 2022. โ€œA Pragmatistโ€™s Guide to Using Prediction in the Social Sciences.โ€ Socius: Sociological Research for a Dynamic World 8 (January): 237802312210817. https://doi.org/10.1177/23780231221081702.
Verma, Sahil, and Julia Rubin. 2018. โ€œFairness Definitions Explained.โ€ In Proceedings of the International Workshop on Software Fairness, 1โ€“7. FairWare โ€™18. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3194770.3194776.
Viswanathan, Giri. 2023. โ€œChatGPT Struggles to Answer Medical Questions, New Research Finds. CNN.โ€ December 10, 2023. https://www.cnn.com/2023/12/10/health/chatgpt-medical-questions/index.html.
Warne, Russell T. 2021. Statistics for the Social Sciences: A General Linear Model Approach. Second edition. Cambridge, United Kingdom New York, NY Port Melbourne, Australia New Delhi, India Singapore: Cambridge University Press. https://doi.org/10.1017/9781108894319.
โ€œWhat Is Quantified Self?โ€ n.d. Quantified Self. Accessed January 15, 2023. https://quantifiedself.com/about/what-is-quantified-self/.
Wickham, Hadley. 2014. โ€œTidy Data.โ€ Journal of Statistical Software 59 (10). https://doi.org/10.18637/jss.v059.i10.
Wike, Richard, Laura Silver, Janell Fetterolf, Christine Huang, Sarah Austin, Laura Clancy, and Sneha Gubbala. 2022. โ€œSocial Media Seen as Mostly Good for Democracy Across Many Nations, but US Is a Major Outlier.โ€ Pew Research Centerโ€™s Global Attitudes Project. Retrieved January 27: 2023.
Wong, Julia Carrie. 2019. โ€œThe Cambridge Analytica Scandal Changed the World โ€“ but It Didnโ€™t Change Facebook.โ€ The Guardian, March. https://www.theguardian.com/technology/2019/mar/17/the-cambridge-analytica-scandal-changed-the-world-but-it-didnt-change-facebook.
Zuboff, Shoshana. 2019. The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power. First edition. New York: PublicAffairs. https://www.publicaffairsbooks.com/titles/shoshana-zuboff/the-age-of-surveillance-capitalism/9781610395694/.