π Week 02 - Appendix
DS105 - Data for Data Science
π Some simple definitions of common computer terms
π Some simple definitions of common computer terms
A handy glossary of terms that you might encounter when learning/studying computer programming.
Source: (Brandies and Hogg 2021)
Term | Definition |
---|---|
Algorithm | The set of rules or calculations that are performed by a computer program. Certain algorithms may be more suitable for particular datasets and may have differences in performance (e.g., in speed or accuracy). |
Central processing unit (CPU) |
The chip that performs the actual computation on a compute node or VM. |
Compute node | An individual computer that contains a number of CPUs and associated RAM. |
Core | Part of a CPU. Single-core processors contain 1 core per CPU, meaning CPUs and cores are often interchangeable terms. |
CPU time | The time CPUs have spent actually processing data (often \(\operatorname{CPU time} \approx \operatorname{Walltime} \times \operatorname{Number of CPUs}\)). |
Dependency | Software that is required by another tool or pipeline for successful execution. |
Executable | The file that contains a tool/program. Some software has a single executable, while others have multiple executables for different commands/steps. |
High performance computer (HPC) |
A collection of connected compute nodes. |
Operating system (OS) |
The base software that supports a computerβs basic functions. Some of the most common linux-based operating systems include those of the Debian distribution (Ubuntu) and those of the RedHat distribution (Fedora and CentOS). |
Pipeline | A pipeline is a workflow consisting of a variety of steps (commands) and/or tools that process a given set of inputs to create the desired output files. |
Programming languages |
Specific syntax and rules for instructing a computer to perform specific tasks. Common programming language used in bioinformatics include Bash, Python, Perl, R, C, and C++. |
Random access memory (RAM) |
Temporarily stores all the information the CPUs require (can be accessed by all of the CPUs on the associated node or VM). |
Scheduler | Manages jobs (scripts) running on shared HPC environments. Some common schedulers include SLURM, PBS, Torque, and SGE. |
Script | A file which contains code to be executed in a single programming language. |
Thread | Number of computations that a program can perform concurrentlyβdepends on the number of cores (usually 1 core = 1 thread). |
Tool | A software program that performs an analysis on an input dataset to extract meaningful outputs/informationβTool, software, and program are often used interchangeably but refer to the core components of bioinformatics pipelines. |
VM | Virtual machineβSimilar to a compute node as it behaves as a single computer and contains a desired number of CPUs and associated RAM (usually associated with cloud computing). |
Walltime | The time a program takes to run in our clock-on-the-wall time. |
π§ Linux & the Terminal
π§ Linux & the Terminal
π Text editors
π Text editors
vim
Emacs
πͺ Do you want to master Windows PowerShell instead?
If you use Windows and for some reason could not get Linux installed or you would rather learn the shell that Windows provides, you will like the book/tutorials below:
References
Brandies, Parice A., and Carolyn J. Hogg. 2021. βTen Simple Rules for Getting Started with Command-Line Bioinformatics.β PLOS Computational Biology 17 (2): e1008645. https://doi.org/10.1371/journal.pcbi.1008645.