DS105A – Data for Data Science
28 Sep 2023
networks
optimisation
software engineering
data science workflow
machine learning applications
Write an e-mail to Kevin:
Sign up for DSI events at lse.ac.uk/DSI/Events
Follow the seminar series: 🔗 Link
Hear from alumni or industry experts about their career paths and how they got to where they are today.
Example of past event:
🗓️ Data Science Careers Panel and Networking (31 January)
A panel of alumni followed by Q&A and a networking session.
Panel:
Read more about this series of events: 🔗 Link
Programme | Count |
---|---|
BSc in Economics | 25 |
BSc in Politics and Data Science | 14 |
General Course | 8 |
BSc in Philosophy and Economics | 3 |
BA in History | 2 |
BSc in International Social and Public Policy and Economics | 1 |
BSc in Philosophy, Politics and Economics | 1 |
BSc in Politics and Economics | 1 |
BSc in Sociology | 1 |
Exchange Programme for Students from Universidad de Valladolid | 1 |
Exchange Programme for Students from University of California, San Diego | 1 |
Exchange Programme for Students in Anthropology (Tokyo) | 1 |
Year | Count |
---|---|
1 | 46 |
2 | 7 |
3 | 6 |
📑 Course Brief
Focus: learn how to collect and handle so-called “real data”
How: hands-on coding exercises and a group project
📑 Course Brief
Focus: learn how to collect and handle so-called “real data”
How: hands-on coding exercises and a group project
👨🏻🏫 THE LECTURES
💻 THE LABS
You will encounter these icons:
👉 Now, let’s navigate our Moodle page to see the 📓 Syllabus and to talk about ✍️ Assessments & Feedback.
If you are reading this but you are not an LSE student, the same content is available on the course’s 🌐 public website
Image created with DALL·E via Bing Chat AI bot. Prompt: “An illustration of a person trying to solve a puzzle with pieces that have different symbols and formulas on them. The person is looking at a screen that shows the 📋 Getting Ready guide and has a smile on their face.”
Do you use ChatGPT, GitHub Copilot, or other AI tools?
“LSE takes challenges to academic integrity and to the value of its degrees with the utmost seriousness. The School has detailed regulations and processes for ensuring academic integrity in summative work.
Unless Departments provide otherwise in guidance on the authorised use of generative AI, its use in summative and formative assessment is prohibited. Departmental Teaching Committees are strongly encouraged to define what constitutes authorised use of Generative AI tools (if any) for students taking courses in their Department. Where they do so, they must clearly communicate this to colleagues, and to students.”
Source: LSE (2023) (Emphasis added)
Examples:
“I used ChatGPT to provide an initial solution to Question X. The code ran and worked fine, but as it was not efficient to the standards of vectorisation taught in the course, I had to edit the code myself to fix the issue.”
“I had GitHub Copilot autocomplete on when writing the code for Question X. The code produced was unnecessarily long and didn’t use the
pd.merge
command I learned in Week 08, so I went back and edited it.”
What do you think of generative AI tools?
Participating Courses:
Data for Data Science
Data Science for Social Scientists
Databases
Sign up to GENIAL and help us find out!
How will it work:
W01 lab
Normal lab
W02 lab
Normal lab
W03 lab
W04 lab
Participants will be split into two groups:
W05 lab
Normal lab
W07 lab
Participants will be split into two groups at random:
W08 lab
Normal lab
W09 lab
Participants will be split into two groups at random:
W10 lab
Normal lab
W11 lab
Normal lab
Sign up to GENIAL and help us find out!
Sign up to GENIAL and help us find out!
Image created with DALL·E via Bing Chat AI bot. Prompt: “robots enjoying a coffee break. Circular tables, white room, pops of color, modern, cosy, clean flat design.”
Our first proper lecture will start in a few minutes.
“🧰 The Data Science Toolbox and the Terminal”
In the meantime, consider signing up for the GENIAL project:
“[…] a field of study and practice that involves the collection, storage, and processing of data in order to derive important 💡 insights into a problem or a phenomenon.
Such data may be generated by humans (surveys, logs, etc.) or machines (weather data, road vision, etc.),
and could be in different formats (text, audio, video, augmented or virtual reality, etc.).”
It is often said that 80% of the time and effort spent on a data science project goes to the abovementioned tasks.
And this is what this course is about! You will learn some of the most common tools used during this process.
The data science dilemma: Python or R ??
tidyverse
to be more intuitive than Python’s pandas
Python
Many people struggle with programming because they don’t understand what is going on under the hood.
👉 This is why we spend the first weeks of this course learning and practising with the terminal and file systems.
To truly master programming, learn how to master the command line first
Image source: AskUbuntu
Image source: Gortu at English Wikipedia
sh
or the Bourne shell: developed at AT&T labs in the 70s by a guy named Stephen Bourne.bash
or the Bourne again shell: very popular, compatible with sh
shell scripts.
bash
ksh
or the Korn shell: provides enhancements over the sh
and it is also compatible with bash
.csh
and tcsh
: shells that have a syntax similar to the programming language C
.Want to become a shell scripting pro? Check out (Ebrahim and Mallett 2018).
CMD
Powershell
Original product: Microsoft. This animation: Useerup, CC BY-SA 3.0, via Wikimedia Commons.
Read more on (Pelz 2018, chap. 3)
Image created with DALL·E via Bing Chat AI bot. Prompt: “robots sorting and shelving physical files in folders. Circular tables, white room, pops of color, modern, cosy, clean flat design”
0
s and 1
sIn MacOS as well as in Linux, the directory structure typically looks like this:
Now for a little demo ⏭️
Let’s go even deeper into the rabbit hole 🐇
For more details, check (Silberschatz, Galvin, and Gagne 2005, chap. 1)
“An operating system is similar to a government. Like a government, it performs no useful function by itself. It simply provides an environment within which other programs can do userful work.”
– (Silberschatz, Galvin, and Gagne 2005, chap. 1)
Terminal
Image created with DALL·E via Bing Chat AI bot. Prompt: “a gigantic wooden question mark looms above the big ben, ultra-realistic awesome painting”
Tip
Let’s face it. You will always encounter puzzling ⚠️ error messages when programming, no matter how senior or skilled you are.
Understanding a little about how everything is tied together will help you get to the core of the problem more quickly.
A computer from the 1950s
(Computer History Museum n.d.)
Source: Wikimedia Commons - Rwoodsmall
Note
GNU stands for “GNU is not Unix”. Computer nerds love a recursive joke.
See (Silberschatz, Galvin, and Gagne 2005, Appendix B) for more on Windows.
Note
Tip
LSE DS105A (2023/24) – Week 01 | archive