π» Week 03 - Lab Roadmap (90 min)
DS105 - Data for Data Science
Last week we explored how to navigate your computer using the bash shell. Now, it is time to go beyond your machine and get to the cloud!
As we have seen in the lectures, accessing the cloud opens a variety of possibilities for us as data scientists. This week we will explore some of them. Feel free to explore more on your own. We have provided you with some support resources.
Step 0: Acquire your credentials (2 min)
You will need a username to connect to the cloud. It is student_<LSE_ID>
where <LSE_ID>
is your LSE ID. For instance, if your ID is 20220000, your username will be student_20220000
. You will use it for connecting to the virtual machine.
Step 1: Connecting to the cloud (5-15 min)
Step 1: Connecting to the cloud
The first thing we will do is we will connect to the cloud. This will be done using an SSH connection. There are differences between UNIX systems and others in the way they work with SSH. Choose the OS you are using and follow the steps to connect to the cloud.
Windows users
There are various SSH clients that you can use for Windows. Here we explore the one called PuTTY. You will need to follow the steps below to establish an SSH connection.
π― ACTION POINTS
- Go to this website and download the installation file.
- Install PuTTY using the installation file.
- Launch PuTTY.
- Navigate to the
Host Name (or IP address)
box and enter the host address of our machine which isec2-18-170-39-14.eu-west-2.compute.amazonaws.com
- Click Open.
If you see a security alert pop up, simply click Accept. - Enter your username and your password (which is at this point your username).
The first time you log in, you need to give a new password for your account. Type your current password and then make a new password as instructed. Note that it may fail to update the password because the password is too simple. Try to make a more complex pattern in this case. It is recommended to have at least 8 characters with a combination of letters and numbers.
Once the password is updated, the connection will be closed and you need to reopen a connection using PuTTY. If your connection has been successful, you will see a message starting with βWelcome to Ubuntuβ. Ask your teacher to check if it is done correctly.
Viola! Now you are connected to the cloud!
If you wish to close your connection to the cloud simply write
exit
and hit enter.
macOS, Linux or WSL (Windows Subsystem for Linux) users
Luckily for macOS and Linux users, the connection to SSH is much simpler. You will see that following the steps below.
π― ACTION POINTS
Open the terminal.
Type the following command after replacing
<username>
with your actual username.ssh <username>@ec2-18-170-39-14.eu-west-2.compute.amazonaws.com
Enter your password (which is at this point your username).
The first time you log in, you need to give a new password for your account. Type your current password and then make a new password as instructed. Note that it may fail to update the password because the password is too simple. Try to make a more complex pattern in this case. It is recommended to have at least 8 characters with a combination of letters and numbers.
Viola! Now you are connected to the cloud!
If you wish to close your connection to the cloud simply write
exit
and hit enter.
Step 2: Exploring the machine (7 min)
Step 2: Exploring the machine
Now when you have connected to the cloud machine, you can explore it! Basically, you are now inside a computer but it is running on a cloud server. It means that we can run the same commands as we ran locally last week.
If you follow the instructions below you will see that the cloud machine is very similar to your local one.
π― ACTION POINTS
Use the
pwd
and$HOME
commands to understand whether you are in your home directory.Does your cloud machine have any files stored? Maybe any hidden files?
If you forgot how to answer those questions feel free to use the materials from last week.Can you access the root directory of the machine? If so, what are the folders inside the root directory?
Come back to your home directory to continue the exercises.
If you experience any difficulties, ask your teacher for help.
Step 3: File management in the cloud (7 min)
Step 3: File management in the cloud
Now, you can see that the cloud machine is very similar to your local one. It means that we can also create files and directories there!
Letβs try doing it together!
π― ACTION POINTS
- Make sure you are in your home directory using
cd
andpwd
. - Create a folder called
<username>_files
. Make sure to replaceusername
with your actual username. - Go to this folder and create a file called
secret_name.txt
. - Save the name of your favourite place to eat in the file. It must be kept very secret. We donβt need this place to be too crowded, do we?
- Create another file called
secret_address.txt
. - Save the address of the same place there.
Perfect! Now we have two (very secret) files in our system. In the next step, we will explore how to send these files to your machine.
Step 4: A bridge between the worlds (15 min)
Step 4: A bridge between the worlds
In the first steps of this lab, we have established a secure connection between our computer and the cloud machine. Using this channel we can not only send commands but also send and receive files. To do that we will use the scp
command which stands for βsecure copyβ. Follow the steps below to save the secret files you created above to your computer.
Generally, the scp
commands work in the following way:
scp location_1/file location_2/file
It means that we are copying a file from location_1
to location_2
. Letβs see how itβs done with the cloud.
π― ACTION POINTS
- Exit the cloud machine using the following code:
exit
To copy the
secret_name.txt
file from your cloud machine use the following command. Make sure you have replaced all the<username>
s with your actual username.scp <username>@ec2-18-170-39-14.eu-west-2.compute.amazonaws.com:/home/<username>/<username>_files/secret_name.txt .
β Important
We would encourage you to pay attention to several things here:- There is a period symbol at the end of the code. This period stands for your current location on your local machine. You remember that the
scp
command takes two locations - the one we copy the file from and the one we copy the file to. A period here basically says βCopy it right hereβ. - You can see that the actual path to the file on your virtual machine is specified after the hostname followed by a colon. First, we specify the username, then the hostname and then the path inside of the machine.
- There is a period symbol at the end of the code. This period stands for your current location on your local machine. You remember that the
Make sure the file is copied to the chosen directory.
But what if we wanted to copy all the files from the folder or the folder itself? For that we can avoid specifying the whole path to a concrete file and simply replace the name of the file with an asterisk:
scp <username>@ec2-18-170-39-14.eu-west-2.compute.amazonaws.com:/home/<username>/<username>_files/* .
This way we copy all the files from the
<username>_files
folder.Should you wish to copy the folder itself, use the path to the folder. You will also add the
-r
(for recursive) option in the code as shown below.scp -r <username>@ec2-18-170-39-14.eu-west-2.compute.amazonaws.com:/home/<username>/<username>_files .
Make sure both commands worked for you.
Step 5: Sending it back (10 min)
Step 5: Sending it back
We have just learned how to copy files from the cloud. The last task for us today is to send files to the cloud machine.
π― ACTION POINTS
- Choose any directory you want on your local machine and
cd
there. - Create a file called
secret_dish.txt
and save the name of your favourite dish there. - π€ Stop for a second. Do you think you can guess the way to copy this file to your cloud machine using your already-acquired knowledge? We hid the solution for you to experiment.
Solution
Use the code below to copy your file to the cloud machine. Make sure to replace the
<username>
s with your actual username.scp secret_dish.txt <username>@ec2-18-170-39-14.eu-west-2.compute.amazonaws.com:/home/<username>/<username>_files
- Log in to your virtual machine and check if the file is there.
Step 6: It can wait (10 min)
Step 6: It can wait
Great job so far! You have managed to connect to the cloud machine, navigate it and even exchange files with it. Now itβs time to get to the most exciting part! Running code on the cloud! This is what you would usually use the cloud for. Imagine you need to process millions of rows of data and your computer would take ages to do that. A cloud can help here by executing it for you without taking the resources of your computer.
Letβs do it, but before that, we will learn how to do a very interesting trick.
π― ACTION POINTS
Open a new bash shell window on your local machine.
Navigate to a directory of your choice or create one.
Create a new file called
waiting.py
and include the following code inside:import time 10) time.sleep( print('The waiting is complete.')
This file launches a script that waits for 10 seconds and then prints βThe waiting is complete.β. You might think that it doesnβt make sense. However, let us show you somethingβ¦
Run this Python script in the following way:
python waiting.py
What did it do? Hopefully, exactly what was expected. It waited for 10 seconds and then printed one sentence. You might have noticed that you could not execute commands while it was running. But what if we could?
Try running the following code:
python waiting.py &
Do you see the difference? Can you now run
whoami
orls
while we wait for the code to run?
Well done! Now you have learned how to create and run Python scripts in your terminal and also do things in parallel. Shall we try it in the cloud?
Step 7: Getting closer to software engineering (15 min)
Step 7: Getting closer to software engineering
Letβs now explore the same operations in the cloud.
π― ACTION POINTS
Connect to the virtual machine.
Create a folder called
test_code
.Create a file called
test.py
ortest.R
depending on what language you want to use (for Python and R, respectively).In the file
- create a variable called
age
- assign it with your age
- make the machine wait for 5 seconds (
time.sleep(5)
in Python orSys.sleep(5)
in R) - use
print()
function to print your age
- create a variable called
Use the following code to execute your script on the cloud:
python test.py
or
Rscript test.R
Does it print your age?
Go ahead and experiment with using the
&
operator. It really comes in handy if you want your cloud machine to continue running without you constantly monitoring it.
You can check out the tutorial on how to run Python scripts or R scripts to help you.