📝 W02 Formative Exercise - CSV files & Python practice

2024/25 Autumn Term

Author
Image created with the AI embedded in MS Designer using the prompt 'abstract salmon pink light blue icon depicting the metaphysical experience of cleaning up, reshaping, pivoting, and manipulating data in search of the purest insights in data science.'

⏲️ Due Date:

📋 NOTE: If you joined the course late or you would like to have more time to complete this assignment, please let me know (@jonjoncardoso on Slack, or send an e-mail).
It’s absolutely fine to have an extension on this practice assignment, no questions asked.

🤔 Am I ready to start this exercise?

You will benefit more from this exercise if you have completed the following tasks:

  • Here we assume you’ve engaged with the pre-sessional material provided. If you haven’t, follow the steps below:

    • Access the self-paced Moodle course page ‘Introduction to Python for Data Science - Dataquest’, which was created by our colleagues in the LSE Digital Skills Lab.

    • Carefully read the ‘Get started’ section and request the required license for Dataquest.

    • Dedicate some time to work on the ‘Python for Data Science: Fundamentals Part I’ section

    • Also complete the ‘Dictionaries’ lesson that is part of the ‘Python for Data Science: Fundamentals Part II’ section.

    👉 If all you need is just a recap, take a look at this Datacamp Tutorial on Python Data Structures.

    Pro-tip: The LSE Digital Skills Lab also offers Python workshops. Take a look at their Python website

📑 Useful online resources

📃 Submission

🎯 Main Objectives:

If you complete this assignment successfully, you will have practiced and learned the following skills:

📋 NOTE: We will not provide any scripts. You are expected to create your own files and directories as needed through the Nuvolos Cloud terminal.

Task 1: Access the assignment environment via Nuvolos

You may remember Nuvolos from when we introduced it in the first lecture. Nuvolos is a cloud-based platform that offers an interface for accessing the terminal and writing and running Python scripts directly from your browser. You won’t need to install anything on your local machine.

👉 We will eventually want you to run code directly from your computer, especially to practice working with Git/GitHub. However, Nuvolos will be helpful in the first few weeks of this course for two reasons: 1) You can make progress with the teaching material even if you haven’t fully set up your computer yet, and 2) It helps you learn how to work on a remote machine, a skill that can come in handy if you ever need to run big data tools and algorithms.

  1. Go to Moodle to find the link to the Nuvolos Cloud platform. Nuvolos is a private platform that requires a login and is only available for students currently enrolled in this course. Therefore, you will need to go to our course page on Moodle to access the link. You will find the following button on the Moodle page:

    Figure 1. Nuvolos Button on Moodle
  2. Create an account using your LSE email address. You will need to create an account using your LSE email address. If you have already created an account, you can log in using your credentials.

  3. Access the platform. After logging in, you will be able to access the platform. You will see a screen similar to the one below:

    Figure 2. Your Nuvolos frontpage will have a ‘Space’ dedicated to this course.
  4. Click on the ‘DS105A-2024’ words to access the Overview page.

    Figure 3. The Overview page has a list of all available (practice) assignments at the top and a bit of information about how to use the platform.
  5. Click on the ‘W02 Formative Practice’ to access the current assignment. Then, click on the ‘Applications’ tab to see the apps you have access to. You should see the Terminal and the VS Code apps. You can use either of them to complete the assignment.

    Figure 4. The Applications tab shows the available applications.

    Feel free to click on the other tabs to explore the platform further. When an assignment has pre-set files, you will see them in the ‘Files’ tab, for example.

  6. Run the Terminal emulator app. After a few seconds, you will see a terminal window that you can use to run Python scripts.

    💡 You can use the VS Code app for a more user-friendly environment to write your scripts. To open a terminal directly from VS Code, go to Terminal > New Terminal in the top menu bar.

  7. Activate the Terminal history. Still on the terminal, run each one of the lines below:

    echo 'HISTTIMEFORMAT="%d/%m/%y %T "' >> ~/.bashrc 
    echo 'HISTTIMEFORMAT="%F %T "' >> ~/.bashrc
    source ~/.bashrc

    Then type the following command to confirm that the history is working:

    history

    You should see a list of commands you’ve run so far. Read more about this in the AskUbuntu forum.

    📋 Why? If you need any technical support from us, it will help to look at the history of commands you typed in.

  8. Check that Python is accessible from the terminal. Use your knowledge from last week’s practice to check the Python version installed on the system.

✅ Click here to check your understanding of Task 1

✅ Check your understanding

After completing this task, you should be able to explain the following to a colleague:

💭 Think about it: a good indicator that we have grasped a new concept is when we can clearly explain it to others.

🗣️ Talk about it! If you’re having difficulty articulating answers to the questions above, you can benefit from chatting with others (peers or instructors) to revisit your learning. What if you posted a message in the #help channel on Slack, such as, “I’m still trying to wrap my head around the notion of the HOME directory. Does anyone have a useful trick that helped them truly learn this?”?

Task 2: Create files, directories, and read data from a CSV file

🎯 ACTION POINTS:

  1. Upon entering the ‘Space’ for this course on Nuvolos, you will find an empty folder named ‘W02 Practice’. Please, use the terminal (either the standalone Terminal app or the one embedded in VSCode) to create the required files and folders to replicate the directory structure provided below.

    W02-Practice/
    ├── README.md
    ├── data/
    └── tasks/

    💡 That is, under that ‘W02-Practice’ folder, you should have a plain text README.md file and two subfolders named data and tasks.

  2. Create a file at ./W02-Practice/data/students.csv with the following column headers. Add the data for three students, including yourself, in the following format:

    name,age,city,is_lse_student
    Alice,20,London,true
    Junjie,19,Beijing,false
    Jose,22,Barcelona,false

    where is_lse_student is set to False if student is a General Course or Exchange student.

    ⭐️ Pro-tip: Do you want to be a Terminal wizzard? Try editing the file with vim, on the terminal, instead of using VS Code.

    Change the data to include your own name, age, and city of origin, as well as those of two other people you know from the class. Ensure that the .csv file is formatted correctly!

    By now, you should see the following directory structure:

    W02-Practice/
    ├── README.md
    ├── data/
    │   └── students.csv
    └── tasks/
  3. In the tasks sub-folder (create one just like you did with the data folder), create a Python script named read_data.py and add the following code to read each line of data from the students.csv file into a list:

Click HERE to see the sample Python script
# TODO: document why is this line of code necessary?
lines = []

print("Reading the ./data/students.csv file...")

# TODO: document what this 'with' and 'for' loop are doing
with open('./data/students.csv', 'r') as file:
    for line in file:
        lines.append(line.strip())

"""
TODO: document what is the point of these two print() below?
TODO: describe what the whole f"..." thing represents
TODO: explain where the -'s in the output come from
TODO: describe what the \n represents
"""
print(f"(Number of lines: {len(lines)})\n")
print(f"Content:\n{'-'*50}\n")    

# TODO: document what this loop does
for line in lines:
    print(line)

print(f"{'-'*50}")
  1. Still on Nuvolos, go to the terminal and cd to W02-Practice. Run the Python script using the following command:

    python ./tasks/read_data.py

    You should see something like this, albeit with your own data:

    Reading the data/students.csv file...
    (Number of lines: 4)
    
    Content:
    --------------------------------------------------
    
    name,age,city,is_lse_student
    Alice,20,London,true
    Junjie,19,Beijing,false
    Jose,22,Barcelona,false
    --------------------------------------------------

    💭 If instead of at W02-Practice you were inside the W02-Practice/tasks folder, what would you need to change in the Python script and the way you run it?

  2. Understanding what this code does. Open the Python script on VS Code and solve each one of the TODOs. That is, replace the “TODO:” and everything that follows with a comment that explains what the code is doing.

    💡 Feel free to use an AI chatbot such as ChatGPT, Microsoft Copilot, or Google Gemini to assist you with this task. However, be sure not to blindly copy and paste the chatbot’s output without checking your understanding, as this won’t help you learn.

    Need help? If you don’t quite understand how to follow the instructions above, send a message to our Slack channel #help saying something like “I am not sure I know what it means to ‘add a comment’ to Python code. Can anybody help me?”

✅ Click here to check your understanding of Task 2

✅ Check your Understanding

After completing this task, you should be able to explain the following to a colleague:

💭 Think about it: a good indicator that we have grasped a new concept is when we can clearly explain it to others.

🗣️ Talk about it! If you’re having difficulty articulating answers to the questions above, you can benefit from chatting with others (peers or instructors) to revisit your learning. What if you posted a message in the #help channel on Slack, such as, “I’m still trying to wrap my head around the notion of the HOME directory. Does anyone have a useful trick that helped them truly learn this?”?

Task 3: Manipulating data from the Python shell

When you are writing code, you are actively thinking about the problem you are trying to solve. It helps to break the problem down into smaller parts and test each part individually. Python scripts are great for writing code when you already know precisely what you want to do. However, to build the intuition before writing the final version of the script, it helps to use the Python shell to test small pieces of code.

💡 Tip: Try to make it a habit to use the Python shell to test small pieces of code or to explore the behavior of variables.

Let’s practice some of these skills.

🎯 ACTION POINTS:

  1. Open a Python shell. Run the Python shell by typing python in the terminal. Then, copy-paste the content of the Python script only up until the with statement and explore the lines variable.

    Figure 5. Exploring the lines variable on the Python shell

    Note that the lines variable is a list of strings. How do I know? If type type(lines) on the Python shell, you will see that it is of a type list. If I select just the first element of the list, I can see that it is a string with the type(lines[0]) command.

    Pro-Tip: Your shell doesn’t look as colourful as the one above? That’s because I’m using the IPython shell, a more user friendly version of the Python shell. To install it, run python -m install ipython on the terminal, then restart the terminal and type ipython.

  2. Deeply explore the lines variable. Now that you have the data in a list, investigate all the things you can do with it.

    📋 NOTE: There are better and more efficient ways to perform the operations listed here – we will learn about the pandas package next week, for example.
    But for now, it’s crucial to focus on getting comfortable with the fundamental data structures in basic Python.

    Here’s one example of what you could do:

    1. Convert each string line into a list of values. Because each element of the lines list is a string, we can use many of the String Methods available in Python, such as str.split() which splits a string into a list of strings based on a delimiter we provide. So, to convert every element separated by a comma into a list, we can use the following code:

      # Type the following and press Enter to see the result
      lines[0].split(',')
    2. Store the first line of data as a dictionary. I know that the first line just represents the column names. I can use this information to create a dictionary with the keys name, age, city, and is_lse_student and the values from the list you just created.

      # Type the following and press Enter to see the result
      line = lines[1].split(',')
      student = {
          'name': line[0],
          'age': line[1],
          'city': line[2],
          'is_lse_student': line[3]
      }
      student
    3. We can do better. Instead of hard-coding the keys of this dictionary, we can just repurpose the content of lines[0] to create the dictionary. Let’s do it without using the zip function at all:

      # Type line by line in the Python shell and press Enter after each line
      
      print("Transforming the first line into a dictionary...")
      
      # TODO: Explain what the columns variable looks like
      columns = lines[0].split(',')
      
      # I will process just the first line of data
      current_line = lines[1].split(',')
      
      # TODO: Why do I need to create an empty dictionary?
      student = {}
      
      # TODO: What does this loop do?
      # TODO: What does `i` represent?
      for i in range(len(columns)):
      
          column_name  = columns[i]
          column_value = current_line[i]
      
          # TODO: What is the point of this line?
          student[column_name] = column_value

      Now, if you type student in the Python shell, you should see a dictionary with the first student’s data.

      💭 Think about it: Do you see any benefits of storing data as a dictionary as opposed to just keeping it as pure text (string)?

      💭 Think about it: How would you expand on the code above to create a list of dictionaries to contain all the data available?

  3. Add your experimental code to the read_data.py script. Copy the code above, plus whatever other changes you’ve made to it to the end of the read_data.py script.

  4. Fix those TODOs. Go through the script and replace the TODOs with comments that explain what the code is doing. Feel free to add even more comments if you feel it would help you understand the code better.

  5. Run the script again. Run the script using the same command as before:

    python ./tasks/read_data.py

    You should see the same output as before, but now you have a list of dictionaries with the data from the students.csv file.

✅ Click here to check your understanding of Task 3

✅ Check your Understanding

After completing this task, you should be able to explain the following to a colleague:

💭 Think about it: a good indicator that we have grasped a new concept is when we can clearly explain it to others.

🗣️ Talk about it! If you’re having difficulty articulating answers to the questions above, you can benefit from chatting with others (peers or instructors) to revisit your learning. What if you posted a message in the #help channel on Slack, such as, “I’m still trying to wrap my head around the notion of the HOME directory. Does anyone have a useful trick that helped them truly learn this?”?

Submit your work

We’ve provided a detailed, step-by-step guide to help you complete the tasks, along with the necessary code to add to specific files. When submitting your work, the goal is to demonstrate to us, the instructors, that you have successfully completed the tasks and have provided the correct explanations for the TODOs.

However, just editing the files alone won’t make your changes visible to the instructors. Any modifications made within Nuvolos are only visible to you. To receive feedback and help us understand your progress with the exercises, it’s important to submit your work.

📋 NOTE: I will check all the submissions on Nuvolos. This will help me understand how easy/difficult the exercises and materials are for everyone. While I won’t necessarily give you individualised feedback on your submission, I will compile common misconceptions and best practices to share with the class in the 🗓️ W02 Lecture.

🎯 ACTION POINTS:

  1. Export your bash history. Before you submit your work, export your bash history to a file. This will help you demonstrate to us that you’ve been practising and experimenting with the terminal. To do this, run the following command:

    cd "W02 Practice" # ensure you are in the right directory
    history > ./bash_history.txt

    After this command, your directory structure should look like this:

    W02 Practice/
    ├── README.md
    ├── data/
    │   └── students.csv
    ├── tasks/
    │   └── read_data.py
    └── bash_history.txt
  2. Go to the Assignments Tab. There you will find the list of assignments available. Locate the ‘W02 Formative Practice’ assignment and click on the ‘Hand-In’ Button

    Figure 6. In the assignments page, you will find the ‘Hand-In’ button
  3. Hand-in the submission. You will be asked to type an identifier

    Figure 7. You will be asked to type an identifier. Please add your LSE candidate number.
  4. Re-submit if needed. If you need to make changes to your submission, you can hand it in again. The last submission will be the one considered for grading.

It’s absolutely natural and common to have many questions at this stage. WE WANT TO HEAR ABOUT YOUR QUESTIONS! Do voice them on the #help channel on Slack, attend support sessions or bring them to the lecture.