LSE DS101L - Fundamentals of Data Science

2022/23 Lent Term

Author
Published

16 January 2023

Intro
๐Ÿ—“๏ธ Week 01 Lecture Introduction, Context & Key Concepts
Class No class this week.
(But there is a take-home formative assignment plus indicative readings)
Formative
  • What: Personal data
  • When: Throughout the week
  • Deadline: 23 January 2023
Readings Indicative Recommended Go deeper
Basic concepts from Computer Science and Statistics
๐Ÿ—“๏ธ Week 02 Lecture Data types and the concept of tidy data
Class Discussions: the boundaries of personal data
Formative
  • What: Create a tidy spreadsheet
  • When: Throughout the week
  • Deadline: 30 January 2023
Readings Indicative Recommended
๐Ÿ—“๏ธ Week 03 Lecture Computational Thinking and Programming
Class Live Demo: How data scientists use programming to clean data
Formative
  • What: Create some charts!
  • When: Throughout the week
  • Deadline: 6 February 2023
Readings Indicative Recommended Go deeper
  • ๐Ÿซ Online Course: Join the PS-R4DS 22/23 Moodle page and read the instructions carefully to gain a premium license to a Dataquest course about R.
๐Ÿ—“๏ธ Week 04 Lecture Statistical Inference I
Class Live Demo: How data scientists use programming for data visualisation
Summative ๐ŸŒŸ
  • Worth: 10% of final marks
  • Prepare for your group presentation next week
  • Release date: 6 February 2023
  • Deadline: 13 February 2023
Readings Indicative Recommended Go deeper
๐Ÿ—“๏ธ Week 05 Lecture Statistical Inference II
Class ๐ŸŒŸ Group Presentations (worth 10% of final grade)
Formative
  • What: Answer questions about the indicative readings
  • When: Throughout Weeks 05 & 06
  • Deadline: 27 February 2023
Readings Indicative Recommended Go deeper
๐Ÿ—“๏ธ Week 06 Reading Week
Machine Learning & AI
๐Ÿ—“๏ธ Week 07 Lecture Machine Learning I: Supervised Learning
Class Live Demo: Supervised Learning
Formative
  • What: Start gathering academic papers in Zotero
  • When: Throughout the week
  • Deadline: 6 February 2023
Readings Indicative Recommended Go deeper
๐Ÿ—“๏ธ Week 08 Lecture Machine Learning II: Unsupervised Learning
Class Tutorial: Introduction to Zotero & (& Quarto Markdown)
Formative โญ
  • What: Start writing your first (formative) essay using Quarto markdown
  • Release date: 6 March 2023
  • Deadline: 16 March 2023
Readings Indicative Recommended
๐Ÿ—“๏ธ Week 09 Lecture Unstructured Data (Text, Audio, Video)
Class In-class activity: exploring Machine Learning metrics (with a case study)
Readings Indicative Recommended Go deeper
Decisions and Implications
๐Ÿ—“๏ธ Week 10 Lecture Prediction vs. Explanation
Class Live Demo: Unsupervised Learning
Summative ๐ŸŒŸ
  • What: Start writing your first (summative) essay using Quarto markdown
  • Worth: 30% of your final grade
  • Release date: 20 March 2023
  • Deadline: 4 April 2023
Readings Indicative Go deeper
๐Ÿ—“๏ธ Week 11 Lecture Privacy and ethical concerns of the current wave of Generative AI (e.g., ChatGPT)
Class Exploring Generative AI
Deadline
Approaching โฒ๏ธ
Keep working on your essays:
  • Attend drop-in sessions
  • Organise study groups
Readings ๐Ÿ“ฐ Newspaper Opinion Piece: (Bridle 2023)
After the Term
๐Ÿ—“๏ธ Week 11+1 Deadline โŒ› Submit your essay by 4 April 2023
April/2023 Spring Break
Summer Term (May & Jun 2023)
๐Ÿ—“๏ธ Week 02 Summative ๐ŸŒŸ
  • What: Start writing your second (summative) essay using Quarto markdown
  • Worth: 60% of your final grade
  • Release date: 9 May 2023
  • Deadline: 25 May 2023
๐Ÿ—“๏ธ Week 03 Deadline
Approaching โฒ๏ธ
Keep working on your essays:
  • Attend drop-in sessions
  • Organise study groups
๐Ÿ—“๏ธ Week 11+1 Deadline โŒ› Submit your essay by 25 May 2023
The End

References

Aschwanden, Christie. 2015. โ€œScience Isnโ€™t Broken.โ€ FiveThirtyEight. https://fivethirtyeight.com/features/science-isnt-broken/.
Benoit, Kenneth. 2022. โ€œQuantitative Analysis of Textual Data.โ€ https://quanteda.io/index.html.
Benoit, Kenneth, Kohei Watanabe, Haiyan Wang, Paul Nulty, Adam Obeng, Stefan Mรผller, and Akitaka Matsuo. 2018. โ€œQuanteda: An R Package for the Quantitative Analysis of Textual Data.โ€ Journal of Open Source Software 3 (30): 774. https://doi.org/10.21105/joss.00774.
Bridle, James. 2023. โ€œThe Stupidity of AI.โ€ The Guardian, March. https://www.theguardian.com/technology/2023/mar/16/the-stupidity-of-ai-artificial-intelligence-dall-e-chatgpt.
Broman, Karl W., and Kara H. Woo. 2018. โ€œData Organization in Spreadsheets.โ€ The American Statistician 72 (1): 2โ€“10. https://doi.org/10.1080/00031305.2017.1375989.
Bruce, Peter C., and Andrew Bruce. 2017. Practical Statistics for Data Scientists: 50 Essential Concepts. First edition. Sebastopol, CA: Oโ€™Reilly. https://ebookcentral.proquest.com/lib/londonschoolecons/detail.action?docID=4857224.
Dโ€™Ignazio, Catherine, and Lauren F. Klein. 2020. Data Feminism. Strong Ideas Series. Cambridge, Massachusetts: The MIT Press. https://ebookcentral.proquest.com/lib/londonschoolecons/reader.action?docID=6120950.
Denning, Peter J., and Matti Tedre. 2019. Computational Thinking. The MIT Press Essential Knowledge Series. Cambridge, Massachusetts: The MIT Press.
Flach, Peter A. 2012. Machine Learning: The Art and Science of Algorithms That Make Sense of Data. Cambridge: Cambridge University Press. https://doi-org.gate3.library.lse.ac.uk/10.1017/CBO9780511973000.
Gimlet. n.d. โ€œ#177 Gleeks and Gurgles Reply All.โ€ Accessed January 15, 2023. https://gimletmedia.com:443/shows/reply-all/z3h78d6.
Grolemund, Garrett. 2014. Hands-on Programming with R. First edition. Sebastopol, CA: Oโ€™Reilly. https://rstudio-education.github.io/hopr/index.html.
Guyan, Kevin. 2022. Queer Data: Using Gender, Sex and Sexuality Data for Action. Bloomsbury Studies in Digital Cultures. London: Bloomsbury Academic. https://web-s-ebscohost-com.gate3.library.lse.ac.uk/ehost/detail/detail?nobk=y&vid=2&sid=a8efeedd-6bfc-459a-9f0c-a67dabcc75d1@redis&bdata=JnNpdGU9ZWhvc3QtbGl2ZQ==#AN=3077276&db=nlebk.
Hofman, Jake M., Amit Sharma, and Duncan J. Watts. 2017. โ€œPrediction and Explanation in Social Systems.โ€ Science 355 (6324): 486โ€“88. https://doi.org/10.1126/science.aal3856.
Hofman, Jake M., Duncan J. Watts, Susan Athey, Filiz Garip, Thomas L. Griffiths, Jon Kleinberg, Helen Margetts, et al. 2021. โ€œIntegrating Explanation and Prediction in Computational Social Science.โ€ Nature 595 (7866): 181โ€“88. https://doi.org/10.1038/s41586-021-03659-0.
Hullman, Jessica, Sayash Kapoor, Priyanka Nanayakkara, Andrew Gelman, and Arvind Narayanan. 2022. โ€œThe Worst of Both Worlds: A Comparative Analysis of Errors in Learning from Data in Psychology and Machine Learning.โ€ In Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society, 335โ€“48. Oxford United Kingdom: ACM. https://doi.org/10.1145/3514094.3534196.
Ismay, Chester, and Albert Young-Sun Kim. 2020. Statistical Inference via Data Science: A ModernDive into R and the Tidyverse. Chapman & Hall/CRC the R Series. Boca Raton: CRC Press / Taylor & Francis Group. https://moderndive.com/.
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2021. An Introduction to Statistical Learning: With Applications in R. Second edition. Springer Texts in Statistics. New York NY: Springer. https://www.statlearning.com/.
Lazer, David, Ryan Kennedy, Gary King, and Alessandro Vespignani. 2014. โ€œThe Parable of Google Flu: Traps in Big Data Analysis.โ€ Science 343 (6176): 1203โ€“5. https://doi.org/10.1126/science.1248506.
Lupton, Deborah. 2016. โ€œThe Diverse Domains of Quantified Selves: Self-Tracking Modes and Dataveillance.โ€ Economy and Society 45 (1): 101โ€“22. https://doi.org/10.1080/03085147.2016.1143726.
โ€”โ€”โ€”. 2020. โ€œData Mattering and Self-Tracking: What Can Personal Data Do?โ€ Continuum 34 (1): 1โ€“13. https://doi.org/10.1080/10304312.2019.1691149.
Perkel, Jeffrey M. 2022. โ€œSix Tips for Better Spreadsheets.โ€ Nature 608 (7921): 229โ€“30. https://doi.org/10.1038/d41586-022-02076-1.
Pietsch, Wolfgang. 2022. On the Epistemology of Data Science: Conceptual Tools for a New Inductivism. Philosophical Studies Series, Volume 148. Cham: Springer.
Prince, J. Dale. 2014. โ€œThe Quantified Self: Operationalizing the Quotidien.โ€ Journal of Electronic Resources in Medical Libraries 11 (2): 91โ€“99. https://doi.org/10.1080/15424065.2014.909145.
Rettberg, Jill Walker. 2022. โ€œAlgorithmic Failure as a Humanities Methodology: Machine Learningโ€™s Mispredictions Identify Rich Cases for Qualitative Analysis.โ€ Big Data & Society 9 (2): 205395172211312. https://doi.org/10.1177/20539517221131290.
Sachs, Jeffrey, Rahshemah Wise, and Daniel Karell. 2021. โ€œThe TikTok Self: Music, Signaling, and Identity on Social Media.โ€ Preprint. SocArXiv. https://doi.org/10.31235/osf.io/2rx46.
Schroeder, Stan. 2022. โ€œTikTokโ€™s in-App Browser Can Monitor Your Every Click and Keystroke.โ€ Mashable. https://mashable.com/article/tiktok-browser-monitoring.
Schutt, Rachel, and Cathy Oโ€™Neil. 2013. Doing Data Science. First edition. Beijing ; Sebastopol: Oโ€™Reilly Media. https://ebookcentral.proquest.com/lib/londonschoolecons/detail.action?docID=1465965.
scikit-learn. 2023. โ€œScikit Learn User Guide: Clustering.โ€ Scikit-Learn. https://scikit-learn/stable/modules/clustering.html.
Shah, Chirag. 2020. A Hands-on Introduction to Data Science. Cambridge, United Kingdom ; New York, NY, USA: Cambridge University Press. https://librarysearch.lse.ac.uk/permalink/f/1n2k4al/TN_cdi_askewsholts_vlebooks_9781108673907.
Silge, Julia, and David Robinson. 2017. Text Mining with R: A Tidy Approach. First edition. Beijing ; Boston: Oโ€™Reilly. https://www.tidytextmining.com/index.html.
Swan, Melanie. 2013. โ€œThe Quantified Self: Fundamental Disruption in Big Data Science and Biological Discovery.โ€ Big Data 1 (2): 85โ€“99. https://doi.org/10.1089/big.2012.0002.
Verhagen, Mark D. 2022. โ€œA Pragmatistโ€™s Guide to Using Prediction in the Social Sciences.โ€ Socius: Sociological Research for a Dynamic World 8 (January): 237802312210817. https://doi.org/10.1177/23780231221081702.
Warne, Russell T. 2021. Statistics for the Social Sciences: A General Linear Model Approach. Second edition. Cambridge, United Kingdom New York, NY Port Melbourne, Australia New Delhi, India Singapore: Cambridge University Press. https://doi.org/10.1017/9781108894319.
โ€œWhat Is Quantified Self?โ€ n.d. Quantified Self. Accessed January 15, 2023. https://quantifiedself.com/about/what-is-quantified-self/.
Wickham, Hadley. 2014. โ€œTidy Data.โ€ Journal of Statistical Software 59 (10). https://doi.org/10.18637/jss.v059.i10.
Wickham, Hadley, and Garrett Grolemund. 2016. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. First edition. Sebastopol, CA: Oโ€™Reilly. https://r4ds.had.co.nz/.
Zuboff, Shoshana. 2019. The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power. First edition. New York: PublicAffairs. https://www.publicaffairsbooks.com/titles/shoshana-zuboff/the-age-of-surveillance-capitalism/9781610395694/.