📝 W02 Formative Exercise - CSV files & Python practice
2024/25 Autumn Term
⏲️ Due Date:
- 9 October 2024, 8.00 PM (the day before the first lecture)
📋 NOTE: If you joined the course late or you would like to have more time to complete this assignment, please let me know (@jonjoncardoso on Slack, or send an e-mail).
It’s absolutely fine to have an extension on this practice assignment, no questions asked.
🤔 Am I ready to start this exercise?
You will benefit more from this exercise if you have completed the following tasks:
-
Here we assume you’ve engaged with the pre-sessional material provided. If you haven’t, follow the steps below:
Access the self-paced Moodle course page ‘Introduction to Python for Data Science - Dataquest’, which was created by our colleagues in the LSE Digital Skills Lab.
Carefully read the ‘Get started’ section and request the required license for Dataquest.
Dedicate some time to work on the ‘Python for Data Science: Fundamentals Part I’ section
Also complete the ‘Dictionaries’ lesson that is part of the ‘Python for Data Science: Fundamentals Part II’ section.
👉 If all you need is just a recap, take a look at this Datacamp Tutorial on Python Data Structures.
⭐ Pro-tip: The LSE Digital Skills Lab also offers Python workshops. Take a look at their Python website
📑 Useful online resources
- Primitive Data Types vs Non Primitive Data Types in Python | Geeks for Geeks
- Python Data Structures | DataCamp
- Python Data Structures CheatSheet | Cheatography
📃 Submission
- You should submit your solutions entirely via the Nuvolos Cloud platform.
🎯 Main Objectives:
If you complete this assignment successfully, you will have practiced and learned the following skills:
- Basic data structures in Python and manipulation of lists and dictionaries
- Basic string manipulation and arithmetic operations in Python
- Reading files using Pandas and working with JSON
📋 NOTE: We will not provide any scripts. You are expected to create your own files and directories as needed through the Nuvolos Cloud terminal.
Task 1: Access the assignment environment via Nuvolos
You may remember Nuvolos from when we introduced it in the first lecture. Nuvolos is a cloud-based platform that offers an interface for accessing the terminal and writing and running Python scripts directly from your browser. You won’t need to install anything on your local machine.
👉 We will eventually want you to run code directly from your computer, especially to practice working with Git/GitHub. However, Nuvolos will be helpful in the first few weeks of this course for two reasons: 1) You can make progress with the teaching material even if you haven’t fully set up your computer yet, and 2) It helps you learn how to work on a remote machine, a skill that can come in handy if you ever need to run big data tools and algorithms.
Go to Moodle to find the link to the Nuvolos Cloud platform. Nuvolos is a private platform that requires a login and is only available for students currently enrolled in this course. Therefore, you will need to go to our course page on Moodle to access the link. You will find the following button on the Moodle page:
Create an account using your LSE email address. You will need to create an account using your LSE email address. If you have already created an account, you can log in using your credentials.
Access the platform. After logging in, you will be able to access the platform. You will see a screen similar to the one below:
Click on the ‘DS105A-2024’ words to access the Overview page.
Click on the ‘W02 Formative Practice’ to access the current assignment. Then, click on the ‘Applications’ tab to see the apps you have access to. You should see the Terminal and the VS Code apps. You can use either of them to complete the assignment.
Feel free to click on the other tabs to explore the platform further. When an assignment has pre-set files, you will see them in the ‘Files’ tab, for example.
Run the Terminal emulator app. After a few seconds, you will see a terminal window that you can use to run Python scripts.
💡 You can use the VS Code app for a more user-friendly environment to write your scripts. To open a terminal directly from VS Code, go to Terminal > New Terminal in the top menu bar.
Activate the Terminal history. Still on the terminal, run each one of the lines below:
echo 'HISTTIMEFORMAT="%d/%m/%y %T "' >> ~/.bashrc echo 'HISTTIMEFORMAT="%F %T "' >> ~/.bashrc source ~/.bashrc
Then type the following command to confirm that the history is working:
history
You should see a list of commands you’ve run so far. Read more about this in the AskUbuntu forum.
📋 Why? If you need any technical support from us, it will help to look at the history of commands you typed in.
Check that Python is accessible from the terminal. Use your knowledge from last week’s practice to check the Python version installed on the system.
✅ Click here to check your understanding of Task 1
✅ Check your understanding
After completing this task, you should be able to explain the following to a colleague:
💭 Think about it: a good indicator that we have grasped a new concept is when we can clearly explain it to others.
🗣️ Talk about it! If you’re having difficulty articulating answers to the questions above, you can benefit from chatting with others (peers or instructors) to revisit your learning. What if you posted a message in the #help
channel on Slack, such as, “I’m still trying to wrap my head around the notion of the HOME directory. Does anyone have a useful trick that helped them truly learn this?”?
Task 2: Create files, directories, and read data from a CSV file
🎯 ACTION POINTS:
Upon entering the ‘Space’ for this course on Nuvolos, you will find an empty folder named ‘W02 Practice’. Please, use the terminal (either the standalone Terminal app or the one embedded in VSCode) to create the required files and folders to replicate the directory structure provided below.
W02-Practice/ ├── README.md ├── data/ └── tasks/
💡 That is, under that ‘W02-Practice’ folder, you should have a plain text
README.md
file and two subfolders nameddata
andtasks
.Create a file at
./W02-Practice/data/students.csv
with the following column headers. Add the data for three students, including yourself, in the following format:name,age,city,is_lse_student Alice,20,London,true Junjie,19,Beijing,false Jose,22,Barcelona,false
where
is_lse_student
is set to False if student is a General Course or Exchange student.⭐️ Pro-tip: Do you want to be a Terminal wizzard? Try editing the file with
vim
, on the terminal, instead of using VS Code.Change the data to include your own name, age, and city of origin, as well as those of two other people you know from the class. Ensure that the
.csv
file is formatted correctly!By now, you should see the following directory structure:
W02-Practice/ ├── README.md ├── data/ │ └── students.csv └── tasks/
In the
tasks
sub-folder (create one just like you did with thedata
folder), create a Python script namedread_data.py
and add the following code to read each line of data from thestudents.csv
file into a list:
Click HERE to see the sample Python script
# TODO: document why is this line of code necessary?
= []
lines
print("Reading the ./data/students.csv file...")
# TODO: document what this 'with' and 'for' loop are doing
with open('./data/students.csv', 'r') as file:
for line in file:
lines.append(line.strip())
"""
TODO: document what is the point of these two print() below?
TODO: describe what the whole f"..." thing represents
TODO: explain where the -'s in the output come from
TODO: describe what the \n represents
"""
print(f"(Number of lines: {len(lines)})\n")
print(f"Content:\n{'-'*50}\n")
# TODO: document what this loop does
for line in lines:
print(line)
print(f"{'-'*50}")
Still on Nuvolos, go to the terminal and
cd
toW02-Practice
. Run the Python script using the following command:python ./tasks/read_data.py
You should see something like this, albeit with your own data:
Reading the data/students.csv file... (Number of lines: 4) Content: -------------------------------------------------- name,age,city,is_lse_student Alice,20,London,true Junjie,19,Beijing,false Jose,22,Barcelona,false --------------------------------------------------
💭 If instead of at
W02-Practice
you were inside theW02-Practice/tasks
folder, what would you need to change in the Python script and the way you run it?Understanding what this code does. Open the Python script on VS Code and solve each one of the
TODO
s. That is, replace the “TODO:” and everything that follows with a comment that explains what the code is doing.💡 Feel free to use an AI chatbot such as ChatGPT, Microsoft Copilot, or Google Gemini to assist you with this task. However, be sure not to blindly copy and paste the chatbot’s output without checking your understanding, as this won’t help you learn.
✋ Need help? If you don’t quite understand how to follow the instructions above, send a message to our Slack channel
#help
saying something like “I am not sure I know what it means to ‘add a comment’ to Python code. Can anybody help me?”
✅ Click here to check your understanding of Task 2
✅ Check your Understanding
After completing this task, you should be able to explain the following to a colleague:
💭 Think about it: a good indicator that we have grasped a new concept is when we can clearly explain it to others.
🗣️ Talk about it! If you’re having difficulty articulating answers to the questions above, you can benefit from chatting with others (peers or instructors) to revisit your learning. What if you posted a message in the #help
channel on Slack, such as, “I’m still trying to wrap my head around the notion of the HOME directory. Does anyone have a useful trick that helped them truly learn this?”?
Task 3: Manipulating data from the Python shell
When you are writing code, you are actively thinking about the problem you are trying to solve. It helps to break the problem down into smaller parts and test each part individually. Python scripts are great for writing code when you already know precisely what you want to do. However, to build the intuition before writing the final version of the script, it helps to use the Python shell to test small pieces of code.
💡 Tip: Try to make it a habit to use the Python shell to test small pieces of code or to explore the behavior of variables.
Let’s practice some of these skills.
🎯 ACTION POINTS:
Open a Python shell. Run the Python shell by typing
python
in the terminal. Then, copy-paste the content of the Python script only up until thewith
statement and explore thelines
variable.Note that the
lines
variable is a list of strings. How do I know? If typetype(lines)
on the Python shell, you will see that it is of a type list. If I select just the first element of the list, I can see that it is a string with thetype(lines[0])
command.⭐ Pro-Tip: Your shell doesn’t look as colourful as the one above? That’s because I’m using the IPython shell, a more user friendly version of the Python shell. To install it, run
python -m install ipython
on the terminal, then restart the terminal and typeipython
.Deeply explore the
lines
variable. Now that you have the data in a list, investigate all the things you can do with it.📋 NOTE: There are better and more efficient ways to perform the operations listed here – we will learn about the
pandas
package next week, for example.
But for now, it’s crucial to focus on getting comfortable with the fundamental data structures in basic Python.Here’s one example of what you could do:
Convert each string line into a list of values. Because each element of the
lines
list is a string, we can use many of the String Methods available in Python, such asstr.split()
which splits a string into a list of strings based on a delimiter we provide. So, to convert every element separated by a comma into a list, we can use the following code:# Type the following and press Enter to see the result 0].split(',') lines[
Store the first line of data as a dictionary. I know that the first line just represents the column names. I can use this information to create a dictionary with the keys
name
,age
,city
, andis_lse_student
and the values from the list you just created.# Type the following and press Enter to see the result = lines[1].split(',') line = { student 'name': line[0], 'age': line[1], 'city': line[2], 'is_lse_student': line[3] } student
We can do better. Instead of hard-coding the keys of this dictionary, we can just repurpose the content of
lines[0]
to create the dictionary. Let’s do it without using thezip
function at all:# Type line by line in the Python shell and press Enter after each line print("Transforming the first line into a dictionary...") # TODO: Explain what the columns variable looks like = lines[0].split(',') columns # I will process just the first line of data = lines[1].split(',') current_line # TODO: Why do I need to create an empty dictionary? = {} student # TODO: What does this loop do? # TODO: What does `i` represent? for i in range(len(columns)): = columns[i] column_name = current_line[i] column_value # TODO: What is the point of this line? = column_value student[column_name]
Now, if you type
student
in the Python shell, you should see a dictionary with the first student’s data.💭 Think about it: Do you see any benefits of storing data as a dictionary as opposed to just keeping it as pure text (string)?
💭 Think about it: How would you expand on the code above to create a list of dictionaries to contain all the data available?
Add your experimental code to the
read_data.py
script. Copy the code above, plus whatever other changes you’ve made to it to the end of theread_data.py
script.Fix those
TODO
s. Go through the script and replace theTODO
s with comments that explain what the code is doing. Feel free to add even more comments if you feel it would help you understand the code better.Run the script again. Run the script using the same command as before:
python ./tasks/read_data.py
You should see the same output as before, but now you have a list of dictionaries with the data from the
students.csv
file.
✅ Click here to check your understanding of Task 3
✅ Check your Understanding
After completing this task, you should be able to explain the following to a colleague:
💭 Think about it: a good indicator that we have grasped a new concept is when we can clearly explain it to others.
🗣️ Talk about it! If you’re having difficulty articulating answers to the questions above, you can benefit from chatting with others (peers or instructors) to revisit your learning. What if you posted a message in the #help
channel on Slack, such as, “I’m still trying to wrap my head around the notion of the HOME directory. Does anyone have a useful trick that helped them truly learn this?”?
Submit your work
We’ve provided a detailed, step-by-step guide to help you complete the tasks, along with the necessary code to add to specific files. When submitting your work, the goal is to demonstrate to us, the instructors, that you have successfully completed the tasks and have provided the correct explanations for the TODO
s.
However, just editing the files alone won’t make your changes visible to the instructors. Any modifications made within Nuvolos are only visible to you. To receive feedback and help us understand your progress with the exercises, it’s important to submit your work.
📋 NOTE: I will check all the submissions on Nuvolos. This will help me understand how easy/difficult the exercises and materials are for everyone. While I won’t necessarily give you individualised feedback on your submission, I will compile common misconceptions and best practices to share with the class in the 🗓️ W02 Lecture.
🎯 ACTION POINTS:
Export your bash history. Before you submit your work, export your bash history to a file. This will help you demonstrate to us that you’ve been practising and experimenting with the terminal. To do this, run the following command:
cd "W02 Practice" # ensure you are in the right directory history > ./bash_history.txt
After this command, your directory structure should look like this:
W02 Practice/ ├── README.md ├── data/ │ └── students.csv ├── tasks/ │ └── read_data.py └── bash_history.txt
Go to the Assignments Tab. There you will find the list of assignments available. Locate the ‘W02 Formative Practice’ assignment and click on the ‘Hand-In’ Button
Hand-in the submission. You will be asked to type an identifier
Re-submit if needed. If you need to make changes to your submission, you can hand it in again. The last submission will be the one considered for grading.
It’s absolutely natural and common to have many questions at this stage. WE WANT TO HEAR ABOUT YOUR QUESTIONS! Do voice them on the #help
channel on Slack, attend support sessions or bring them to the lecture.