๐Ÿ—“๏ธ Week 02 - Data types and the concept of tidy data

2023/24 Autumn Term

Author

What does data look like? And how do we organise it? This weekโ€™s lecture will discuss the concept of tidy data and how to organise data, even if in a spreadsheet. We will also discuss the different data types and how to deal with them.

๐Ÿ‘จโ€๐Ÿซ Lecture Slides

Either click on the slide area below or click here to view it in fullscreen. Use your keypad to navigate the slides. You can also find a PDF version on Moodle.

๐ŸŽฅ Looking for lecture recordings? You can only find those on Moodle.

References used in the lecture slides this week

Abeysooriya, Mandhri, Megan Soria, Mary Sravya Kasu, and Mark Ziemann. 2021. โ€œGene Name Errors: Lessons Not Learned.โ€ PLoS Computational Biology 17 (7): e1008984.
Baraniuk, Chris. 2015. โ€œThe Number Glitch That Can Lead to Catastrophe.โ€ BBC Future, May. https://www.bbc.com/future/article/20150505-the-numbers-that-lead-to-disaster.
BBC News. 2014. โ€œGangnam Style Music Video โ€™Brokeโ€™ YouTube View Limit.โ€ BBC News, December. https://www.bbc.co.uk/news/world-asia-30288542.
Beales, Richard. 2013. โ€œBlame Microsoft.โ€ New York Times (Online). 2013. https://www.proquest.com/blogs-podcasts-websites/blame-microsoft/docview/2215333604/se-2.
Felbo, Bjarke, Alan Mislove, Anders Sรธgaard, Iyad Rahwan, and Sune Lehmann. 2017. โ€œUsing Millions of Emoji Occurrences to Learn Any-Domain Representations for Detecting Sentiment, Emotion and Sarcasm.โ€ In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 1615โ€“25. Copenhagen, Denmark: Association for Computational Linguistics. https://doi.org/10.18653/v1/D17-1169.
Gibbs, Samuel. 2014a. โ€œIs the Year 2038 Problem the New Y2K Bug?โ€ The Guardian (London). https://www.theguardian.com/technology/2014/dec/17/is-the-year-2038-problem-the-new-y2k-bug.
โ€”โ€”โ€”. 2014b. โ€œY2K Bug Triggers Army Conscription Notices Sent to 14,000 Dead Men.โ€ The Guardian (London). https://www.theguardian.com/technology/2014/jul/11/y2k-bug-us-army-conscription-1800s-pennsylvania.
Hern, Alex. 2020. โ€œCovid: How Excel May Have Caused Loss of 16,000 Test Results in England.โ€ The Guardian. https://www.theguardian.com/politics/2020/oct/05/how-excel-may-have-caused-loss-of-16000-covid-tests-in-england.
Kwak, James. 2013. โ€œThe Importance of Excel.โ€ "The Baseline Scenario" Blog, February. https://baselinescenario.com/2013/02/09/the-importance-of-excel/.
Leek, Jeff. 2016. โ€œNon-Tidy Data.โ€ "Simply Statistics" Blog. https://simplystatistics.org/posts/2016-02-17-non-tidy-data/.
Oren, Nir. 2019. โ€œIf You Think the Millennium Bug Was a Hoax, Here Comes a History Lesson.โ€ The Conversation. http://theconversation.com/if-you-think-the-millennium-bug-was-a-hoax-here-comes-a-history-lesson-129042.
Thomas, Martyn. 2019. โ€œThe Millennium Bug Was Real โ€“ and 20 Years Later We Face the Same Threats.โ€ The Guardian (Online). https://www.theguardian.com/commentisfree/2019/dec/31/millennium-bug-face-fears-y2k-it-systems.
Uenuma, Francine. 2019. โ€œYears Later, the Y2K Bug Seems Like a Jokeโ€“Because Those Behind the Scenes Took It Seriously.โ€ Time. https://time.com/5752129/y2k-bug-history/.
Vincent, James. 2020. โ€œScientists Rename Human Genes to Stop Microsoft Excel from Misreading Them as Dates.โ€ The Verge, August. https://www.theverge.com/2020/8/6/21355674/human-genes-rename-microsoft-excel-misreading-dates.
Wickham, Hadley. 2014. โ€œTidy Data.โ€ Journal of Statistical Software 59 (10). https://doi.org/10.18637/jss.v059.i10.
Wickham, Hadley, and Garrett Grolemund. 2016. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. First edition. Sebastopol, CA: Oโ€™Reilly. https://r4ds.had.co.nz/.
Williams, Martyn. 2000a. โ€œComputer Problems Hit Three Nuclear Plants in Japan.โ€ CNN. https://edition.cnn.com/2000/TECH/computing/01/03/japan.nukes.y2k.idg/index.html.
โ€”โ€”โ€”. 2000b. โ€œY2K Bug Hits Heating System in Korean Apartments.โ€ CNN. https://edition.cnn.com/2000/TECH/computing/01/03/korea.heat.y2k.idg/index.html.
Ziemann, Mark, Yotam Eren, and Assam El-Osta. 2016. โ€œGene Name Errors Are Widespread in the Scientific Literature.โ€ Genome Biology 17 (1): 1โ€“3.

โœ๏ธ Coursework (Formative)

Collect data from Wikipedia and make it tidy

    • This will help you build a portfolio of your work throughout the course. We might ask you to refer to this material during your summative assessments.

๐Ÿ“Ÿ Communication

  • Post your reflections, questions, and links on Slack (depending on your purpose, use either of the #general, #help-classes or #help-lectures channels - you might also want to use the #interesting-articles channel).