This is an old revision of the document!
Caviness
The Caviness cluster was built in late 2018 according to a rolling upgradeable design strategy. Like previous community clusters, over time additional nodes will be added. But starting in the sixth year of its life, nodes exiting their warranty period will be replaced and the infrastructure surrounding them reused.
The larger amount of reusable infrastructure is afforded by the Open Compute Project (OCP) design of the cluster. Penguin Computing's Tundra design uses centralized DC power conversion in the racks to make node sleds as small and efficient as possible. Remove the old sleds and insert the new.
Many Engineering research groups own capacity in Caviness.
Resources
The design of Caviness is similar to previous community clusters.
Networking
There are two private ethernet networks in the cluster. A dedicated 1 Gbps network carries management traffic (remote power control of nodes, console access, etc.). A dedicated 10 Gbps network carries all data traffic (NFS, job scheduling, SSH access) to the nodes.
A 100 Gbps Intel Omni-path network also connects all nodes. The OPA network carries Lustre filesystem traffic as well as most MPI internode communications.
Storage
Each rack of compute equipment added to Caviness is designed to add storage capacity to the cluster:
- Lustre Object Storage Targets (OSTs) and Servers (OSSs)
- NFS server
The addition of OSTs/OSSs increases the aggregate capacity and bandwidth of the /lustre/scratch
filesystem. Individual NFS servers provide distinct capacity and bandwidth but do not aggregate with existing capacity or bandwidth — in short, they're just “more space.”
Home directories
Each user is granted a home directory with a 20 GiB limit (quota). Typically users will build software in their home directory. The relatively low quota often means that users cannot (and should not) submit computational jobs from their home directories. Home directories are mounted at the path /home/<uid_number>
, where <uid_number>
is a user's Unix UID number (an integer value, use the id
command to determine it).
The Bash shell allows you to reference your home directory as ~/
in most commands. For example, ls -al ~/
displays a long listing of the all hidden and normally-visible files and directories inside your home directory.
The home directory is also the location of a user's login files:
File | Description |
---|---|
.bashrc | Commands executed by any new Bash shell spawned for the user |
.bash_profile | Commands executed specifically by a new login shell for the user |
.bash_history | Saved sequence of commands the user has interactively entered at the shell prompt |
.bash_udit | Configuration file controlling UD-specific behaviors of the Bash shell |
.valet/ | Directory containing a user's personal VALET package definitions; does not exist by default, should be created by the user if wanted |
.zfs/snapshot/ | Directory containing historical snapshots of the home directory |
The use of ZFS snapshots as backup copies of a home directory is discussed elsewhere.
Workgroup directories
Each workgroup that purchases capacity in the cluster receives a workgroup directory with a quota in proportion to its level of investment in the cluster: the more compute capacity purchased, the more space granted. Workgroup directories are mounted at the path /work/<workgroup-id>
on all nodes in the cluster.
Once you've started a shell in a workgroup using the workgroup -g <workgroup-id>
command the WORKDIR
environment variable contains the path to the workgroup directory. This allows you to reference it in commands like ls -l ${WORKDIR}/users
.
Adding the -c
flag to the workgroup
command automatically starts the workgroup shell in $WORKDIR
.