LSE DS101L - Fundamentals of Data Science
2022/23 Lent Term
Intro | |||
---|---|---|---|
๐๏ธ Week 01 | Lecture | Introduction, Context & Key Concepts | |
Class |
No class this week. (But there is a take-home formative assignment plus indicative readings) |
||
Formative |
|
||
Readings |
Indicative
|
||
Basic concepts from Computer Science and Statistics | |||
๐๏ธ Week 02 | Lecture | Data types and the concept of tidy data | |
Class | Discussions: the boundaries of personal data | ||
Formative |
|
||
Readings |
Indicative
|
||
๐๏ธ Week 03 | Lecture | Computational Thinking and Programming | |
Class | Live Demo: How data scientists use programming to clean data | ||
Formative |
|
||
Readings |
Indicative
|
||
๐๏ธ Week 04 | Lecture | Statistical Inference I | |
Class | Live Demo: How data scientists use programming for data visualisation | ||
Summative ๐ |
|
||
Readings |
Indicative
|
||
๐๏ธ Week 05 | Lecture | Statistical Inference II | |
Class ๐ | Group Presentations (worth 10% of final grade) | ||
Formative |
|
||
Readings |
Indicative
|
||
๐๏ธ Week 06 | Reading Week | ||
Machine Learning & AI | |||
๐๏ธ Week 07 | Lecture | Machine Learning I: Supervised Learning | |
Class | Live Demo: Supervised Learning | ||
Formative |
|
||
Readings |
Indicative
|
||
๐๏ธ Week 08 | Lecture | Machine Learning II: Unsupervised Learning | |
Class | Tutorial: Introduction to Zotero & (& Quarto Markdown) | ||
Formative โญ |
|
||
Readings |
Indicative
|
||
๐๏ธ Week 09 | Lecture | Unstructured Data (Text, Audio, Video) | |
Class | In-class activity: exploring Machine Learning metrics (with a case study) | ||
Readings |
Indicative
|
||
Decisions and Implications | |||
๐๏ธ Week 10 | Lecture | Prediction vs. Explanation | |
Class | Live Demo: Unsupervised Learning | ||
Summative ๐ |
|
||
Readings |
Indicative
|
||
๐๏ธ Week 11 | Lecture | Privacy and ethical concerns of the current wave of Generative AI (e.g., ChatGPT) | |
Class | Exploring Generative AI | ||
Deadline Approaching โฒ๏ธ |
Keep working on your essays:
|
||
Readings | ๐ฐ Newspaper Opinion Piece: (Bridle 2023) | ||
After the Term | |||
๐๏ธ Week 11+1 | Deadline โ | Submit your essay by 4 April 2023 | |
April/2023 | Spring Break | ||
Summer Term (May & Jun 2023) | |||
๐๏ธ Week 02 | Summative ๐ |
|
|
๐๏ธ Week 03 |
Deadline Approaching โฒ๏ธ |
Keep working on your essays:
|
|
๐๏ธ Week 11+1 | Deadline โ | Submit your essay by 25 May 2023 | |
The End |
References
Aschwanden, Christie. 2015. โScience Isnโt Broken.โ FiveThirtyEight. https://fivethirtyeight.com/features/science-isnt-broken/.
Benoit, Kenneth. 2022. โQuantitative Analysis of Textual Data.โ https://quanteda.io/index.html.
Benoit, Kenneth, Kohei Watanabe, Haiyan Wang, Paul Nulty, Adam Obeng, Stefan Mรผller, and Akitaka Matsuo. 2018. โQuanteda: An R Package for the Quantitative Analysis of Textual Data.โ Journal of Open Source Software 3 (30): 774. https://doi.org/10.21105/joss.00774.
Bridle, James. 2023. โThe Stupidity of AI.โ The Guardian, March. https://www.theguardian.com/technology/2023/mar/16/the-stupidity-of-ai-artificial-intelligence-dall-e-chatgpt.
Broman, Karl W., and Kara H. Woo. 2018. โData Organization in Spreadsheets.โ The American Statistician 72 (1): 2โ10. https://doi.org/10.1080/00031305.2017.1375989.
Bruce, Peter C., and Andrew Bruce. 2017. Practical Statistics for Data Scientists: 50 Essential Concepts. First edition. Sebastopol, CA: OโReilly. https://ebookcentral.proquest.com/lib/londonschoolecons/detail.action?docID=4857224.
DโIgnazio, Catherine, and Lauren F. Klein. 2020. Data Feminism. Strong Ideas Series. Cambridge, Massachusetts: The MIT Press. https://ebookcentral.proquest.com/lib/londonschoolecons/reader.action?docID=6120950.
Denning, Peter J., and Matti Tedre. 2019. Computational Thinking. The MIT Press Essential Knowledge Series. Cambridge, Massachusetts: The MIT Press.
Flach, Peter A. 2012. Machine Learning: The Art and Science of Algorithms That Make Sense of Data. Cambridge: Cambridge University Press. https://doi-org.gate3.library.lse.ac.uk/10.1017/CBO9780511973000.
Gimlet. n.d. โ#177 Gleeks and Gurgles Reply All.โ Accessed January 15, 2023. https://gimletmedia.com:443/shows/reply-all/z3h78d6.
Grolemund, Garrett. 2014. Hands-on Programming with R. First edition. Sebastopol, CA: OโReilly. https://rstudio-education.github.io/hopr/index.html.
Guyan, Kevin. 2022. Queer Data: Using Gender, Sex and Sexuality Data for Action. Bloomsbury Studies in Digital Cultures. London: Bloomsbury Academic. https://web-s-ebscohost-com.gate3.library.lse.ac.uk/ehost/detail/detail?nobk=y&vid=2&sid=a8efeedd-6bfc-459a-9f0c-a67dabcc75d1@redis&bdata=JnNpdGU9ZWhvc3QtbGl2ZQ==#AN=3077276&db=nlebk.
Hofman, Jake M., Amit Sharma, and Duncan J. Watts. 2017. โPrediction and Explanation in Social Systems.โ Science 355 (6324): 486โ88. https://doi.org/10.1126/science.aal3856.
Hofman, Jake M., Duncan J. Watts, Susan Athey, Filiz Garip, Thomas L. Griffiths, Jon Kleinberg, Helen Margetts, et al. 2021. โIntegrating Explanation and Prediction in Computational Social Science.โ Nature 595 (7866): 181โ88. https://doi.org/10.1038/s41586-021-03659-0.
Hullman, Jessica, Sayash Kapoor, Priyanka Nanayakkara, Andrew Gelman, and Arvind Narayanan. 2022. โThe Worst of Both Worlds: A Comparative Analysis of Errors in Learning from Data in Psychology and Machine Learning.โ In Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society, 335โ48. Oxford United Kingdom: ACM. https://doi.org/10.1145/3514094.3534196.
Ismay, Chester, and Albert Young-Sun Kim. 2020. Statistical Inference via Data Science: A ModernDive into R and the Tidyverse. Chapman & Hall/CRC the R Series. Boca Raton: CRC Press / Taylor & Francis Group. https://moderndive.com/.
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2021. An Introduction to Statistical Learning: With Applications in R. Second edition. Springer Texts in Statistics. New York NY: Springer. https://www.statlearning.com/.
Lazer, David, Ryan Kennedy, Gary King, and Alessandro Vespignani. 2014. โThe Parable of Google Flu: Traps in Big Data Analysis.โ Science 343 (6176): 1203โ5. https://doi.org/10.1126/science.1248506.
Lupton, Deborah. 2016. โThe Diverse Domains of Quantified Selves: Self-Tracking Modes and Dataveillance.โ Economy and Society 45 (1): 101โ22. https://doi.org/10.1080/03085147.2016.1143726.
โโโ. 2020. โData Mattering and Self-Tracking: What Can Personal Data Do?โ Continuum 34 (1): 1โ13. https://doi.org/10.1080/10304312.2019.1691149.
Perkel, Jeffrey M. 2022. โSix Tips for Better Spreadsheets.โ Nature 608 (7921): 229โ30. https://doi.org/10.1038/d41586-022-02076-1.
Pietsch, Wolfgang. 2022. On the Epistemology of Data Science: Conceptual Tools for a New Inductivism. Philosophical Studies Series, Volume 148. Cham: Springer.
Prince, J. Dale. 2014. โThe Quantified Self: Operationalizing the Quotidien.โ Journal of Electronic Resources in Medical Libraries 11 (2): 91โ99. https://doi.org/10.1080/15424065.2014.909145.
Rettberg, Jill Walker. 2022. โAlgorithmic Failure as a Humanities Methodology: Machine Learningโs Mispredictions Identify Rich Cases for Qualitative Analysis.โ Big Data & Society 9 (2): 205395172211312. https://doi.org/10.1177/20539517221131290.
Sachs, Jeffrey, Rahshemah Wise, and Daniel Karell. 2021. โThe TikTok Self: Music, Signaling, and Identity on Social Media.โ Preprint. SocArXiv. https://doi.org/10.31235/osf.io/2rx46.
Schroeder, Stan. 2022. โTikTokโs in-App Browser Can Monitor Your Every Click and Keystroke.โ Mashable. https://mashable.com/article/tiktok-browser-monitoring.
Schutt, Rachel, and Cathy OโNeil. 2013. Doing Data Science. First edition. Beijing ; Sebastopol: OโReilly Media. https://ebookcentral.proquest.com/lib/londonschoolecons/detail.action?docID=1465965.
scikit-learn. 2023. โScikit Learn User Guide: Clustering.โ Scikit-Learn. https://scikit-learn/stable/modules/clustering.html.
Shah, Chirag. 2020. A Hands-on Introduction to Data Science. Cambridge, United Kingdom ; New York, NY, USA: Cambridge University Press. https://librarysearch.lse.ac.uk/permalink/f/1n2k4al/TN_cdi_askewsholts_vlebooks_9781108673907.
Silge, Julia, and David Robinson. 2017. Text Mining with R: A Tidy Approach. First edition. Beijing ; Boston: OโReilly. https://www.tidytextmining.com/index.html.
Swan, Melanie. 2013. โThe Quantified Self: Fundamental Disruption in Big Data Science and Biological Discovery.โ Big Data 1 (2): 85โ99. https://doi.org/10.1089/big.2012.0002.
Verhagen, Mark D. 2022. โA Pragmatistโs Guide to Using Prediction in the Social Sciences.โ Socius: Sociological Research for a Dynamic World 8 (January): 237802312210817. https://doi.org/10.1177/23780231221081702.
Warne, Russell T. 2021. Statistics for the Social Sciences: A General Linear Model Approach. Second edition. Cambridge, United Kingdom New York, NY Port Melbourne, Australia New Delhi, India Singapore: Cambridge University Press. https://doi.org/10.1017/9781108894319.
Wickham, Hadley. 2014. โTidy Data.โ Journal of Statistical Software 59 (10). https://doi.org/10.18637/jss.v059.i10.
Wickham, Hadley, and Garrett Grolemund. 2016. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. First edition. Sebastopol, CA: OโReilly. https://r4ds.had.co.nz/.
Zuboff, Shoshana. 2019. The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power. First edition. New York: PublicAffairs. https://www.publicaffairsbooks.com/titles/shoshana-zuboff/the-age-of-surveillance-capitalism/9781610395694/.