βœ… Solution & Analysis of the W02 Formative Exercise

2024/25 Autumn Term

Author
Published

21 October 2024

    Image created with the AI embedded in MS Designer using the prompt 'abstract salmon pink light blue icon depicting the metaphysical experience of cleaning up, reshaping, pivoting, and manipulating data in search of the purest insights in data science.'

This page contains comments and solutions to the tasks in the πŸ“ W02 Formative Exercise.

πŸ“Š Statistics

Here are some big numbers from the data you submitted:

                                                                                                                                                                                                                                                                                                                                                                                                           
A total of 106 students were enrolled at the time of the deadline
Of those, 48 made a submission on Nuvolos before the deadline
Of those, 38 had a valid data/bash_history.csv
Of those, 32 have also created the code/ folder correctly
Of those, 24 had a non-empty commands_analysis.py file
A total of 15 had a full script (more than just reading the JSON)
And only 2 attempted the πŸ† Challenge (it was a challenge after all)

When did people start working on this exercise?

Click here to see how much time most people spent on this exercise

The plot below is very rich in details and meant to be seen on a large screen (not mobile-friendly). Hover your mouse to discover more about the data points!

I can teach how to create plots like this. Remind me during or after the πŸ§‘β€πŸ« W05 Lecture (link not yet available).

πŸ—¨οΈ Commentary & Analysis

In the sub-headings below, you will find examples of the quickest way to complete the tasks in the formative exercise. I will also provide some analysis of the data you submitted.

πŸ—¨οΈ Don’t be scared to make mistakes

Note that it’s expected to make mistakes when using the bash commands for the first time on a Linux machine, and it would be normal to have taken you more commands to complete the tasks.

In fact, I secretly hoped that many of you would accidentally create files and folders in the wrong places, leading to the need to use other commands featured in the πŸ“ W01 Formative and Terminal Cheatsheet pages, like rm or mv, to correct those errors.

This process would have help you understand why those mistakes happened and encourage you to think more deeply about how to fix them, ultimately making it easier to grasp the concepts in the long run.

Task 1

How would I approach this task?

When I first access the Terminal app on Nuvolos for the first time, I see this:

Figure 1. The welcome message in the Terminal app.

From the instructions, I learned that my goal is to create the following directory structure:

W02-Practice/
β”œβ”€β”€ data/
└── code/

I would then follow the steps below in the Terminal app:

# Does the W02-Practice folder already exist?
ls

# Seeing that it does exist, I would navigate into it
cd W02-Practice

# Is the data folder already there?
ls

# Here, I'd see that the W02-Practice folder is empty

# I'd then create the data folder
mkdir data

# I also need to create a code folder
mkdir code

# I'd check to see if the folders were created
ls

# All good. Let me test the history command
history

# What about the print-history command?
print-history

Which would have printed out something like this:

1, YYYY-MM-DD-HH:mm:SS, ls
2, YYYY-MM-DD-HH:mm:SS, cd W02-Practice
3, YYYY-MM-DD-HH:mm:SS, ls
4, YYYY-MM-DD-HH:mm:SS, mkdir data
5, YYYY-MM-DD-HH:mm:SS, mkdir code
6, YYYY-MM-DD-HH:mm:SS, ls
7, YYYY-MM-DD-HH:mm:SS, history

The last ls command I typed gave me the confidence that I did the right thing - I saw that my current working directory was set to /files/W02-Practice and that I had the data and code folders in there 1. From here on, I just had to follow the steps to submit my work, i.e. create the bash_history.csv file and press the hand-in button.

By the way, the reason I kept asking you to submit your work after completing every task is to make it easier for you to internalise the Git Rituals that would later be introduced in πŸ“ W04 Formative Exercise.

Analysis of your data

The above shows the quickest route to completing Task 1, with 7 (5 if you don’t use ls) commands to the first point of submission. However, I expected most of you to explore the Terminal on Nuvolos a bit first. After all, it was your first time accessing a remote machine with an unfamiliar file system on your local machine.

I will use your data as a pretext to show you some curiosity-driven data analysis, something I mentioned in πŸ§‘β€πŸ« Week 01 Lecture.

Q: How many commands did most of you type before creating the CSV file?

A:

How I created this boxplot

The code below uses concepts you will only learn in Weeks 04 & 05 of DS105A

I first created a df DataFrame with the combined data from all the bash_history.csv files. Then, I used a function to count the number of commands each student typed before creating the CSV file for the first time. I then plotted the distribution of the number of commands using the lets-plot library.

def what_came_before(rows):
    commands = rows['command']
    first_print_history = commands.str.contains('bash_history.csv').idxmax()
    cmds = commands.loc[:first_print_history]
    return cmds.tolist()

plot_df = (
    df.groupby(['username'])
    .apply(what_came_before, include_groups=False)
    .apply(lambda x: pd.Series({'count': len(x)}))
)


g = (
    ggplot(plot_df, aes(y='count')) +
    geom_boxplot(fill='#5d9ebc') +
    coord_flip() +
    scale_y_continuous(name="Number of commands", 
                       breaks=[3, 8, 11.5, 15, 25, 32],) +
    theme(axis_text_x=element_blank(),
          axis_title_x=element_blank(),
          axis_text_y=element_text(size=15),
          panel_grid_major=element_blank(),
          panel_grid_minor=element_blank(),
          axis_title_y=element_blank(),
          plot_title=element_text(size=20)) +
    ggsize(width=700, height=100) +
    labs(title=("Most people typed between 8 and 15"
                " commands before completing Task 1"))
)

ggsave(g, filename='w02-formative-task1.svg', path='.')

Q: Did the person who typed only 3 commands got it right?

A:

Here’s the command history of that person up until they created the bash_history.csv file:

pyton # a typo
python # corrected the typo
print-history > ./data/bash_history.csv

The remaining commands show that they cd’ed into the W02-Practice/data folder and created the bash_history.csv file, seemingly without errors. The folders were created correctly, so it is likely that this person was already knowledgeable about Python and created the folder inside the Python shell using os.mkdir() or os.makedirs().

Q: What about the people who typed more than 15 commands?

A:

All are invariably trying to figure out which commands to type or are trying to understand the instructions. This is normal,  expected and good! The more you explore, the more you learn.

There are many cases of people creating folders in the wrong place, but then they managed to use rm -r to delete them and start again. This is also great to see!

Don’t let the πŸ€– machines take over your thoughts

There were several cases of people typing mkdir [code] and mkdir [data] (with the square brackets) instead of mkdir code or mkdir data. This is down to people using AI to figure out the commands. The AI output probably included these square brackets to demarcate them as placeholders to illustrate where the folder name should go, something like β€˜Use mkdir [folder_name] to create a new folder on Linux’. You were meant to replace everything (including the square brackets) with the actual folder name.

If this is the case, it is an example of how AI can sometimes lead you astray, making you ignore the course material instructions and leaving you confused and not in control of your learning. Thankfully, most of you realised the mistake and corrected it.

Q: How many people used rm to delete the files and folders they created in the wrong place?

A:

There were 4 such cases. Curiously, one thing they all had in common was that they ended up the W02 Exercise with a data/bash_history.csv file, but none of them had a code/ folder in the right place. Either they ran out of time

Task 2

There is no need for a solution for Task 2. I didn’t get any questions about it, as it was simply a matter of typing the code from the instructions.

Analysis

Q: How many people explored with the wc command?

A:

23 people followed the instructions of the Task 2!

Q: How many were brave enough to follow through the optional step in Task 2?

I found that 10 people played with the cut & uniq commands!! We have a selected group of distinct explorers!

Task 3

Once again, we don’t need a solution for this task. All the code is given in the instructions.

However, it is worth pointing out that most people who made it to Task 3 got stuck on two things:

1. What is the difference between the bash shell and the python shell?

I have also addressed similar questions in the πŸ§‘β€πŸ« Week 02 Lecture, and you can read a summary of my lecture notes here.

When you access the Terminal app on Nuvolos, you are running the bash shell. The Terminal app is the graphical interface that lets us talk to the computer, while bash is the underlying program that awaits and responds to your commands. The default shell might be different on your local machine (PowerShell, or zsh).

When you type python in the Terminal app, you are no longer talking to bash. You cannot access bash commands or other apps from the Python shell. You can only run Python commands. You can tell you are in the Python shell because the prompt changes from $ to >>>. To leave the Python shell, type exit() and hit Enter, or press Ctrl + D.

Similarly, when you are in the bash shell and type nano, you are taken to the nano text editor program. This app, which runs directly from the Terminal window, lets you edit text files. You’d have to follow the instructions to learn how to exit nano and then go back to the bash shell.

2. β€œI simply copy-pasted the code but Python is throwing some errors!”

Nearly all of the remaining errors seen inside the Python shell had to do with something called indentation.

There are several Python commands that demarcate the start and end of a block of code. For instance, if, for, while, def, class, and with commands all allow you to type one or multiple lines of code "inside" of them. To signal what is inside the block of code, you must indent the lines of code, that is, add spaces or tabs at the beginning of the line.

In the best case scenario, if you forget to indent Python will throw an IndentationError. In the worst case scenario, Python will interpret the code as being outside of the block and you might get unexpected results.

To illustrate the problem, I copy here a response I gave someone on Slack.


When you type this:

with open('./data/bash_history.json', 'w') as file:

You will notice that the symbol on the Python shell changes from >>> to ... . This is because the Python shell understands that you started a section of code, and you are still not done. You might want to type multiple other lines before you close this section.

Now, when you type in the next line, the leading spaces (typically two or four spaces) tell Python that this line of code is inside the with statement:

    json.dump(full_data, file)

When you type enter again, you will see that the Python shell symbol is still ... - Python still 'thinks' you have more commands to type. If you hit Enter again, then (and only then), the Python shell will execute all the code you typed above in one go.


Footnotes

  1. If this conclusion is not obvious to you, reach out to me or any other class teacher and ask for clarification. Understanding this is crucial for success in the course.β†©οΈŽ