🔖 Week 01 - Appendix

Author

If you want to go deeper into the topics covered in this week’s lecture and the lab, here are some resources you can check out.

📖 Some simple definitions of common computer terms

📖 Some simple definitions of common computer terms

A handy glossary of terms you might encounter when learning/studying computer programming.

Source: (Brandies and Hogg 2021)

Term Definition
Algorithm The set of rules or calculations that are performed by a computer program. Specific algorithms may be more suitable for particular datasets and may have differences in performance (e.g., in speed or accuracy).
Central processing
unit (CPU)
The chip performs the actual computation on a compute node or VM.
Compute node An individual computer with some CPUs and associated RAM.
Core Part of a CPU. Single-core processors contain one core per CPU, meaning CPUs and cores are often interchangeable terms.
CPU time The time CPUs have spent actually processing data
(often \(\operatorname{CPU time} \approx \operatorname{Walltime} \times \operatorname{Number of CPUs}\)).
Dependency Software that another tool or pipeline requires for successful execution.
Executable The file that contains a tool/program. Some software has a single executable, while others have multiple executables for different commands/steps.
High performance
computer (HPC)
A collection of connected compute nodes.
Operating system
(OS)
The base software that supports a computer’s essential functions. Some of the most common Linux-based operating systems include those of the Debian distribution (Ubuntu) and RedHat distribution (Fedora and CentOS).
Pipeline A pipeline is a workflow consisting of a variety of steps (commands) and tools that process a given set of inputs to create the desired output files.
Programming
languages
Specific syntax and rules for instructing a computer to perform specific tasks. Some common programming language includes Bash, Python, Perl, R, C, and C++.
Random access
memory (RAM)
Temporarily stores all the information the CPUs require
(can be accessed by all of the CPUs on the associated node or VM).
Scheduler Manages jobs (scripts) running on shared HPC environments. Some common schedulers include SLURM, PBS, Torque, and SGE.
Script A file that contains code to be executed in a single programming language.
Thread Number of computations that a program can perform concurrently—depends on the number of cores (usually 1 core = 1 thread).
Tool A software program that analyses an input dataset to extract meaningful outputs/information—Tool, software, and program are often used interchangeably but refer to the core components of bioinformatics pipelines.
VM Virtual machine—Similar to a compute node, it behaves as a single computer and contains a desired number of CPUs and associated RAM (usually associated with cloud computing).
Walltime The time a program takes to run in our clock-on-the-wall time.
🐧 Linux & the Terminal

🐧 Linux & the Terminal

📃 Text editors

📃 Text editors

vim

Emacs

🪟 Do you want to master Windows PowerShell instead?

🪟 Do you want to master Windows PowerShell instead?

If you use Windows and for some reason could not get Linux installed or you would rather learn the shell that Windows provides, you will like the book/tutorials below:

References

Brandies, Parice A., and Carolyn J. Hogg. 2021. “Ten Simple Rules for Getting Started with Command-Line Bioinformatics.” PLOS Computational Biology 17 (2): e1008645. https://doi.org/10.1371/journal.pcbi.1008645.