🔖 Week 01 - Appendix
If you want to go deeper into the topics covered in this week’s lecture and the lab, here are some resources you can check out.
📖 Some simple definitions of common computer terms
📖 Some simple definitions of common computer terms
A handy glossary of terms you might encounter when learning/studying computer programming.
Source: (Brandies and Hogg 2021)
Term | Definition |
---|---|
Algorithm | The set of rules or calculations that are performed by a computer program. Specific algorithms may be more suitable for particular datasets and may have differences in performance (e.g., in speed or accuracy). |
Central processing unit (CPU) |
The chip performs the actual computation on a compute node or VM. |
Compute node | An individual computer with some CPUs and associated RAM. |
Core | Part of a CPU. Single-core processors contain one core per CPU, meaning CPUs and cores are often interchangeable terms. |
CPU time | The time CPUs have spent actually processing data (often \(\operatorname{CPU time} \approx \operatorname{Walltime} \times \operatorname{Number of CPUs}\)). |
Dependency | Software that another tool or pipeline requires for successful execution. |
Executable | The file that contains a tool/program. Some software has a single executable, while others have multiple executables for different commands/steps. |
High performance computer (HPC) |
A collection of connected compute nodes. |
Operating system (OS) |
The base software that supports a computer’s essential functions. Some of the most common Linux-based operating systems include those of the Debian distribution (Ubuntu) and RedHat distribution (Fedora and CentOS). |
Pipeline | A pipeline is a workflow consisting of a variety of steps (commands) and tools that process a given set of inputs to create the desired output files. |
Programming languages |
Specific syntax and rules for instructing a computer to perform specific tasks. Some common programming language includes Bash, Python, Perl, R, C, and C++. |
Random access memory (RAM) |
Temporarily stores all the information the CPUs require (can be accessed by all of the CPUs on the associated node or VM). |
Scheduler | Manages jobs (scripts) running on shared HPC environments. Some common schedulers include SLURM, PBS, Torque, and SGE. |
Script | A file that contains code to be executed in a single programming language. |
Thread | Number of computations that a program can perform concurrently—depends on the number of cores (usually 1 core = 1 thread). |
Tool | A software program that analyses an input dataset to extract meaningful outputs/information—Tool, software, and program are often used interchangeably but refer to the core components of bioinformatics pipelines. |
VM | Virtual machine—Similar to a compute node, it behaves as a single computer and contains a desired number of CPUs and associated RAM (usually associated with cloud computing). |
Walltime | The time a program takes to run in our clock-on-the-wall time. |
🐧 Linux & the Terminal
🐧 Linux & the Terminal
📃 Text editors
📃 Text editors
vim
Emacs
🪟 Do you want to master Windows PowerShell instead?
🪟 Do you want to master Windows PowerShell instead?
If you use Windows and for some reason could not get Linux installed or you would rather learn the shell that Windows provides, you will like the book/tutorials below:
References
Brandies, Parice A., and Carolyn J. Hogg. 2021. “Ten Simple Rules for Getting Started with Command-Line Bioinformatics.” PLOS Computational Biology 17 (2): e1008645. https://doi.org/10.1371/journal.pcbi.1008645.