DS105W – Data for Data Science
18 Jan 2024
networks
optimisation
software engineering
machine learning applications
the impact of generative AI for education
Write an e-mail to Kevin:
Sign up for DSI events at lse.ac.uk/DSI/Events
Follow the seminar series: 🔗 Link
Hear from alumni or industry experts about their career paths and how they got to where they are today.
Upcoming event:
🗓️ Keeping London Moving with Data (28 February 4 - 5.30pm)
A talk about life in the data world at TfL. Jemima, Graduate Data Scientist at Transport for London (TfL) will talk about her experience as a Data Science Graduate in our inaugural programme. Lauren Sager Weinstein, Chief Data Officer, at Transport for London (TfL) will talk about how she’s leading TfL’s data strategy, and how all the components of data careers (data scientists, data developers, data product managers, and data users) can come together to deliver on our data vision: To empower our people to make better decisions with data.
Programme | Count |
---|---|
BSc in Economics | 31 |
General Course | 8 |
BSc in International Social and Public Policy | 2 |
BSc in Politics and Data Science | 2 |
BSc in Politics and Economics | 2 |
BSc in Psychological and Behavioural Science | 2 |
BSc in Sociology | 2 |
BSc in Economic History | 1 |
BSc in Economics and Economic History | 1 |
BSc in Finance | 1 |
BSc in Philosophy | 1 |
BSc in Politics | 1 |
Year | Count |
---|---|
1 | 35 |
2 | 13 |
3 | 5 |
4 | 1 |
📑 Course Brief
Focus: learn how to collect and handle so-called “real data”
How: hands-on coding exercises and a group project
📑 Course Brief
Focus: learn how to collect and handle so-called “real data”
How: hands-on coding exercises and a group project
👨🏻🏫 THE LECTURES
💻 THE LABS
You will encounter these icons:
👉 Now, let’s navigate our Moodle page to see the 📓 Syllabus and to talk about ✍️ Assessments & Feedback.
If you are reading this but you are not an LSE student, the same content is available on the course’s 🌐 public website
Image created with DALL·E via Bing Chat AI bot. Prompt: “An illustration of a person trying to solve a puzzle with pieces that have different symbols and formulas on them. The person is looking at a screen that shows the 📋 Getting Ready guide and has a smile on their face.”
Do you use ChatGPT, GitHub Copilot, or other AI tools?
“LSE takes challenges to academic integrity and to the value of its degrees with the utmost seriousness. The School has detailed regulations and processes for ensuring academic integrity in summative work.
Unless Departments provide otherwise in guidance on the authorised use of generative AI, its use in summative and formative assessment is prohibited. Departmental Teaching Committees are strongly encouraged to define what constitutes authorised use of Generative AI tools (if any) for students taking courses in their Department. Where they do so, they must clearly communicate this to colleagues, and to students.”
Source: LSE (2023) (Emphasis added)
Examples:
“I used ChatGPT to provide an initial solution to Question X. The code ran and worked fine, but as it was not efficient to the standards of vectorisation taught in the course, I had to edit the code myself to fix the issue.”
“I had GitHub Copilot autocomplete on when writing the code for Question X. The code produced was unnecessarily long and didn’t use the
pd.merge
command I learned in Week 08, so I went back and edited it.”
What do you think of generative AI tools?
Participating Courses:
How will it work:
Create a ChatGPT 3.5 (OR a Google Bard) account if you don’t have one already.
Open a new ‘chat window’ inside your selected chatbot and tell the AI: ’I will use this chat for all things related to DS105W’
Use this same chat window whenever you feel like using a generative AI tool during the course.
Sign up to GENIAL and help us find out!
Image created with DALL·E via Bing Chat AI bot. Prompt: “robots enjoying a coffee break. Circular tables, white room, pops of color, modern, cosy, clean flat design.”
Our first proper lecture will start in a few minutes.
“🧰 The Data Science Toolbox and the Terminal”
In the meantime, fill out the form for the GENIAL project:
“[…] a field of study and practice that involves the collection, storage, and processing of data in order to derive important 💡 insights into a problem or a phenomenon.
Such data may be generated by humans (surveys, logs, etc.) or machines (weather data, road vision, etc.),
and could be in different formats (text, audio, video, augmented or virtual reality, etc.).”
It is often said that 80% of the time and effort spent on a data science project goes to the abovementioned tasks.
And this is what this course is about! You will learn some of the most common tools used during this process.
The data science dilemma: Python or R ??
tidyverse
to be more intuitive than Python’s pandas
Python
Many people struggle with programming because they don’t understand what is going on under the hood.
👉 This is why we spend the first weeks of this course learning and practising with the terminal and file systems.
To truly master programming, learn how to master the command line first
Image source: AskUbuntu
Image source: Gortu at English Wikipedia
sh
or the Bourne shell: developed at AT&T labs in the 70s by a guy named Stephen Bourne.bash
or the Bourne again shell: very popular, compatible with sh
shell scripts.
bash
ksh
or the Korn shell: provides enhancements over the sh
and it is also compatible with bash
.csh
and tcsh
: shells that have a syntax similar to the programming language C
.Want to become a shell scripting pro? Check out (Ebrahim and Mallett 2018).
CMD
Powershell
Original product: Microsoft. This animation: Useerup, CC BY-SA 3.0, via Wikimedia Commons.
Read more on (Pelz 2018, chap. 3)
Image created with DALL·E via Bing Chat AI bot. Prompt: “robots sorting and shelving physical files in folders. Circular tables, white room, pops of color, modern, cosy, clean flat design”
0
s and 1
sIn MacOS as well as in Linux, the directory structure typically looks like this:
Now for a little demo ⏭️
Let’s go even deeper into the rabbit hole 🐇
For more details, check (Silberschatz, Galvin, and Gagne 2005, chap. 1)
“An operating system is similar to a government. Like a government, it performs no useful function by itself. It simply provides an environment within which other programs can do userful work.”
– (Silberschatz, Galvin, and Gagne 2005, chap. 1)
Terminal
Image created with DALL·E via Bing Chat AI bot. Prompt: “a gigantic wooden question mark looms above the big ben, ultra-realistic awesome painting”
Tip
Let’s face it. You will always encounter puzzling ⚠️ error messages when programming, no matter how senior or skilled you are.
Understanding a little about how everything is tied together will help you get to the core of the problem more quickly.
A computer from the 1950s
(Computer History Museum n.d.)
Source: Wikimedia Commons - Rwoodsmall
Note
GNU stands for “GNU is not Unix”. Computer nerds love a recursive joke.
See (Silberschatz, Galvin, and Gagne 2005, Appendix B) for more on Windows.
Note
Tip
LSE DS105W (2023/24) – Week 01 | archive