Software
A common task that every user encounters at least once in his or her use of HPC systems is installing software. Whether that equates to running an install script or configuring/compiling/installing from source code, understanding the Linux environment; where software components belong within the filesystem; and knowing when and how to isolate individual software titles in their own space is critical to success.
The Linux environment
The filesystem
Inherited from Unix is the basic layout of the Linux filesystem: a hierarchy of containers called directories descend from the root directory which is simply named /
. Each directory can contain files of varying types as well as directories (thus, the hierarchical nature of the filesystem).
There are some directory names of higher significance in Unix and Linux:
Directory name | Purpose |
---|---|
bin | Contains executables — programs (compiled or scripted) that the user can run (execute) |
etc | Contains configuration files that influence how programs execute |
include | Contains C/C++ header files associated with libraries that are present |
lib | Contains libraries — compiled subroutine/function bundles — that are used by executables |
lib64 | A variant of lib containing code that was compiled for 64-bit execution |
libexec | Contains executables the user is not meant to run directly, but will be executed by some other program or library |
share | Contains support files that an executable or library may use: help and documentation pages, data tables, etc. |
sbin | Contains executables that are meant to be run by someone with higher privileges (e.g. the root user) |
Most of the directories named above can be found in the root directory — /bin
, /lib64
, and /etc
, for example — as well as in other parent directories. The /usr
directory contains /usr/bin
and /usr/lib64
(amongst others). The /usr/local
directory contains /usr/local/bin
and /usr/local/lib64
, meant to hold components that are not integral to the operating system itself.
The GNU Autoconf and the CMake build management systems default to installing components they've build to the /usr/local
directory, into directories named according to the above table. If a different installation prefix is chosen, the same layout will be applied to that directory: for example, an installation prefix of /opt/shared/program/version
will see executables installed in /opt/shared/program/version/bin
and libraries in /opt/shared/program/version/lib
.
Finding programs: the PATH variable
Whenever you want to execute a program, the shell needs to find that program in the filesystem. Providing the absolute path to the executable makes that easy:
$ /usr/bin/date Mon Feb 25 16:03:57 EST 2019
Rather than repeatedly typing that /usr/bin
prefix, Unix/Linux shells (your user interface to the OS) allow the user to type just the final part of the path (just date
, for example) and the shell will then check the directories in the PATH
variable for a file with that name. The PATH
consists of a sequence of zero or more directory names separated by a colon; the search proceeds from the left-most directory to the right. A typical PATH
might be:
$ echo $PATH /usr/local/bin:/usr/bin:/bin
When the user types the date
command, the shell checks for
/usr/local/bin/date
/usr/bin/date
/bin/date
The first file in that sequence that exists is the one that gets executed. Obviously, when installing a new program copying it to /usr/local/bin
(or the other two directories cited) would make it available for use:
$ cp new_program /usr/local/bin
You may not always have the privilege of copying files to /usr/local/bin
, though. You can always edit the PATH
variable in your shell, though:
$ export PATH="/home/1001/programs/bin:$PATH"
Finding libraries: the LD_LIBRARY_PATH variable
Compiled subroutine/function libraries that are dynamically linked into a program at runtime exist as files, the same as executables. Just as executables are typically found in a bin
directory, these libraries are often found in a lib64
or lib
directory. And equally similar is the fact that the user may not always be allowed to copy new libraries into /usr/local/lib64
or /usr/lib64
where they'll be found by default. The LD_LIBRARY_PATH
variable is used by the runtime linker the same way PATH
is used by the shell: the colon-separated list of directories is searched in sequence for the library that an executable is requesting. So when installing into /usr/local/lib64
is not an option, adding to LD_LIBRARY_PATH
is:
$ export LD_LIBRARY_PATH="/home/1001/programs/lib64:$LD_LIBRARY_PATH"
Organizing software
Given the different directories used by Unix/Linux to hold the various components of software and the PATH
and LD_LIBRARY_PATH
variables' ability to alter where the shell and runtime linker look to find executables and libraries, you have all the tools necessary to organize the software you maintain: any directory can have the bin
, lib64
, include
, et al. directories created within it as in /usr/local
. Adopting the same filesystem layout makes clear the purpose of each file therein. Adding <prefix>/bin
to the PATH
makes any executables present within that directory available to you.
Source code
Many Linux distributions have a /usr/local/src
directory present. This directory is meant to hold source code packages that are built and installed under /usr/local
. For any software directory you manage, adopt the same strategy:
- If the software directory holds components of a single source package, unpack that source code as the
<prefix>/src
directory - For a software directory containing components from multiple source packages (like
/usr/local
) unpack each constituent source package as a directory under the<prefix>/src
directory
Versioning
It is often the case when maintaining software that more than one version of the software must be present. Each distinct version implies a different rendition of the software source code. Variants of a version of the software may be present when, for example, different compilers are used to build the same version of the software.
In these cases, installation into a common software directory like /usr/local
is inadvisable, since a single version or variant of the software is present and will likely change as time goes by. Using a unique software directory for each version or variant — and adopting the Unix/Linux filesystem layout therein — keeps each distinct copy of the software isolated from the others, and is easily added to the shell by altering the PATH
and LD_LIBRARY_PATH
variables, for example.
Putting it all together
I need to manage multiple versions and variants of the Open MPI tools. To start, I create a directory that will hold all of the versions and variants I build:
$ mkdir -p ~/programs/open-mpi $ cd ~/programs/open-mpi
I will download the official source code and store each version I download in a directory — if at any time I need to build another variant of that version, I will have that version's source code available:
$ mkdir attic $ cd attic $ wget 'https://download.open-mpi.org/release/open-mpi/v3.1/openmpi-3.1.3.tar.bz2' $ ls -l total 1 -rw-r--r-- 1 frey everyone 9442937 Oct 29 17:51 openmpi-3.1.3.tar.bz2
To build a variant of the 3.1.3 version with standard system compilers, I'll create a directory and unpack the source code inside it:
$ cd .. $ mkdir 3.1.3 $ cd 3.1.3 $ tar -xf ../attic/openmpi-3.1.3.tar.bz2 $ mv openmpi-3.1.3 src $ cd src
Open MPI uses the GNU Autoconf tools, so configuring the build to install the software into this variant's software directory is very straightforward:
$ ./configure --prefix="$(realpath ~/programs/open-mpi/3.1.3)" : $ make $ make install $ cd .. $ pwd /home/1001/programs/open-mpi/3.1.3 $ ls -l bin total 474 -rwxr-xr-x 1 frey sysadmin 1740 Oct 8 11:04 aggregate_profile.pl lrwxrwxrwx 1 frey sysadmin 12 Oct 8 11:05 mpic++ -> opal_wrapper lrwxrwxrwx 1 frey sysadmin 12 Oct 8 11:05 mpicc -> opal_wrapper lrwxrwxrwx 1 frey sysadmin 12 Oct 8 11:05 mpiCC -> opal_wrapper : lrwxrwxrwx 1 frey sysadmin 12 Oct 8 11:05 shmemcxx -> opal_wrapper lrwxrwxrwx 1 frey sysadmin 12 Oct 8 11:05 shmemfort -> opal_wrapper lrwxrwxrwx 1 frey sysadmin 6 Oct 8 11:05 shmemrun -> mpirun