πŸ”– Week 02 - Appendix

DS105 - Data for Data Science

Author
Published

07 October 2022

πŸ“– Some simple definitions of common computer terms

πŸ“– Some simple definitions of common computer terms

A handy glossary of terms that you might encounter when learning/studying computer programming.

Source: (Brandies and Hogg 2021)

Term Definition
Algorithm The set of rules or calculations that are performed by a computer program. Certain algorithms may be more suitable for particular datasets and may have differences in performance (e.g., in speed or accuracy).
Central processing
unit (CPU)
The chip that performs the actual computation on a compute node or VM.
Compute node An individual computer that contains a number of CPUs and associated RAM.
Core Part of a CPU. Single-core processors contain 1 core per CPU, meaning CPUs and cores are often interchangeable terms.
CPU time The time CPUs have spent actually processing data
(often \(\operatorname{CPU time} \approx \operatorname{Walltime} \times \operatorname{Number of CPUs}\)).
Dependency Software that is required by another tool or pipeline for successful execution.
Executable The file that contains a tool/program. Some software has a single executable, while others have multiple executables for different commands/steps.
High performance
computer (HPC)
A collection of connected compute nodes.
Operating system
(OS)
The base software that supports a computer’s basic functions. Some of the most common linux-based operating systems include those of the Debian distribution (Ubuntu) and those of the RedHat distribution (Fedora and CentOS).
Pipeline A pipeline is a workflow consisting of a variety of steps (commands) and/or tools that process a given set of inputs to create the desired output files.
Programming
languages
Specific syntax and rules for instructing a computer to perform specific tasks. Common programming language used in bioinformatics include Bash, Python, Perl, R, C, and C++.
Random access
memory (RAM)
Temporarily stores all the information the CPUs require
(can be accessed by all of the CPUs on the associated node or VM).
Scheduler Manages jobs (scripts) running on shared HPC environments. Some common schedulers include SLURM, PBS, Torque, and SGE.
Script A file which contains code to be executed in a single programming language.
Thread Number of computations that a program can perform concurrentlyβ€”depends on the number of cores (usually 1 core = 1 thread).
Tool A software program that performs an analysis on an input dataset to extract meaningful outputs/informationβ€”Tool, software, and program are often used interchangeably but refer to the core components of bioinformatics pipelines.
VM Virtual machineβ€”Similar to a compute node as it behaves as a single computer and contains a desired number of CPUs and associated RAM (usually associated with cloud computing).
Walltime The time a program takes to run in our clock-on-the-wall time.
🐧 Linux & the Terminal

🐧 Linux & the Terminal

πŸ“ƒ Text editors

πŸ“ƒ Text editors

vim

Emacs

πŸͺŸ Do you want to master Windows PowerShell instead?

If you use Windows and for some reason could not get Linux installed or you would rather learn the shell that Windows provides, you will like the book/tutorials below:

References

Brandies, Parice A., and Carolyn J. Hogg. 2021. β€œTen Simple Rules for Getting Started with Command-Line Bioinformatics.” PLOS Computational Biology 17 (2): e1008645. https://doi.org/10.1371/journal.pcbi.1008645.