Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
caviness:start [2019/03/15 16:23] frey |
caviness:start [2019/06/27 15:53] (current) frey |
||
---|---|---|---|
Line 5: | Line 5: | ||
The larger amount of reusable infrastructure is afforded by the Open Compute Project (OCP) design of the cluster. Penguin Computing's //Tundra// design uses centralized DC power conversion in the racks to make node sleds as small and efficient as possible. Remove the old sleds and insert the new. | The larger amount of reusable infrastructure is afforded by the Open Compute Project (OCP) design of the cluster. Penguin Computing's //Tundra// design uses centralized DC power conversion in the racks to make node sleds as small and efficient as possible. Remove the old sleds and insert the new. | ||
- | Many Engineering research groups own capacity in Caviness. | + | Many Engineering research groups own capacity in Caviness. UD IT maintains [[http://docs.hpc.udel.edu/abstract/caviness/caviness|its own Caviness documentation]]; on this site you will find summaries specific to: |
- | + | ||
- | ===== Resources ===== | + | |
- | + | ||
- | The design of Caviness is similar to previous community clusters. | + | |
- | + | ||
- | ==== Networking ==== | + | |
- | + | ||
- | There are two private ethernet networks in the cluster. A dedicated 1 Gbps network carries management traffic (remote power control of nodes, console access, etc.). A dedicated 10 Gbps network carries all data traffic (NFS, job scheduling, SSH access) to the nodes. | + | |
- | + | ||
- | A 100 Gbps Intel Omni-path network also connects all nodes. The OPA network carries Lustre filesystem traffic as well as most MPI internode communications. | + | |
- | + | ||
- | ==== Storage ==== | + | |
- | + | ||
- | Each rack of compute equipment added to Caviness is designed to add storage capacity to the cluster: | + | |
- | + | ||
- | * Lustre Object Storage Targets (OSTs) and Servers (OSSs) | + | |
- | * NFS server | + | |
- | + | ||
- | The addition of OSTs/OSSs increases the aggregate capacity and bandwidth of the ''/lustre/scratch'' filesystem. Individual NFS servers provide distinct capacity and bandwidth but do not aggregate with existing capacity or bandwidth — in short, they're just "more space." | + | |
- | + | ||
- | A discussion of each distinct kind of storage available to users is found below. General usage scenarios for each include: | + | |
- | + | ||
- | * Home directory | + | |
- | * User's personal software builds and installs | + | |
- | * VALET package definitions for those software installs | + | |
- | * Workgroup directory | + | |
- | * Software builds/installs used by one or more members of a workgroup | + | |
- | * VALET package definitions for those software installs | + | |
- | * Members' data sets, job working directories, job summary directories | + | |
- | * Project data sets, job working directories, job summary directories | + | |
- | * Lustre scratch | + | |
- | * Temporary storage of large data sets | + | |
- | * Temporary storage of job working directories | + | |
- | + | ||
- | === Home directories === | + | |
- | + | ||
- | Each user is granted a home directory with a 20 GiB limit (quota). Typically users will build software in their home directory. The relatively low quota often means that users cannot (and should not) submit computational jobs from their home directories. Home directories are mounted at the path ''/home/<uid_number>'', where ''<uid_number>'' is a user's Unix UID number (an integer value, use the ''id'' command to determine it). | + | |
- | + | ||
- | <WRAP center round info 60%> | + | |
- | The Bash shell allows you to reference your home directory as ''~/'' in most commands. For example, ''ls -al ~/'' displays a long listing of the all hidden and normally-visible files and directories inside your home directory. | + | |
- | </WRAP> | + | |
- | + | ||
- | The home directory is the location of a few important files and directories: | + | |
- | + | ||
- | ^File^Description^ | + | |
- | |''.bashrc''|Commands executed by any new Bash shell spawned for the user| | + | |
- | |''.bash_profile''|Commands executed specifically by a new login shell for the user| | + | |
- | |''.bash_history''|Saved sequence of commands the user has interactively entered at the shell prompt| | + | |
- | |''.bash_udit''|Configuration file controlling UD-specific behaviors of the Bash shell| | + | |
- | |''.valet/''|Directory containing a user's personal VALET package definitions; does not exist by default, should be created by the user if wanted (automatically added to ''VALET_PATH'' by UD Bash login scripts)| | + | |
- | |''.zfs/snapshot/''|Directory containing historical //snapshots// of the home directory| | + | |
- | + | ||
- | The use of [[abstract:zfs-snapshots|ZFS snapshots]] as backup copies of a home directory is discussed elsewhere. In general, the editing of ''.bashrc'' and ''.bash_profile'' is discouraged, especially for the alteration of the ''PATH'' and ''LD_LIBRARY_PATH'' environment variables. | + | |
- | + | ||
- | === Workgroup directories === | + | |
- | + | ||
- | Each //workgroup// that purchases capacity in the cluster receives a //workgroup directory// with a quota in proportion to its level of investment in the cluster: the more compute capacity purchased, the more space granted. Workgroup directories are mounted at the path ''/work/<workgroup-id>'' on all nodes in the cluster. | + | |
- | + | ||
- | <WRAP center round info 60%> | + | |
- | Once you've started a shell in a workgroup using the ''workgroup -g <workgroup-id>'' command the ''WORKDIR'' environment variable contains the path to the workgroup directory. This allows you to reference it in commands like ''ls -l ${WORKDIR}/users''. | + | |
- | + | ||
- | Adding the ''-c'' flag to the ''workgroup'' command automatically starts the workgroup shell in ''$WORKDIR''. | + | |
- | </WRAP> | + | |
- | + | ||
- | The typical layout of a workgroup directory often includes: | + | |
- | + | ||
- | ^Subdirectory^Description^ | + | |
- | |''/work/<workgroup-id>/.zfs/snapshot/''|Always present, contains historical //snapshots// of the the workgroup directory| | + | |
- | |''/work/<workgroup-id>/sw/''|A directory to hold software used by multiple members of the workgroup| | + | |
- | |''/work/<workgroup-id>/sw/valet/''|Directory for VALET package definitions (automatically added to ''VALET_PATH'' by UD Bash login scripts)| | + | |
- | |''/work/<workgroup-id>/users/''|Directory to contain per-user storage rather than having them exist directly under ''/work/<workgroup-id>''| | + | |
- | |''/work/<workgroup-id>/projects/''|Directory to contain per-project storage rather than having them exist directly under ''/work/<workgroup-id>''| | + | |
- | + | ||
- | None of these directories are mandatory, but they do tend to make management of a workgroup's resources easier. In particular, the fact that ''/work/<workgroup-id>/sw/valet'' will be automatically added to the VALET search path means workgroup users do not need to alter ''VALET_PATH'' manually, in their ''.bashrc''/''.bash_profile'', or in their job scripts. | + | |
- | + | ||
- | The use of [[abstract:zfs-snapshots|ZFS snapshots]] as backup copies of a workgroup directory is discussed elsewhere. | + | |
- | + | ||
- | === Lustre scratch === | + | |
- | + | ||
- | The ''/lustre/scratch'' file system is a high-speed parallel file system accessible from all nodes in the cluster. Users/groups are free to create their own top-level directories under ''/lustre/scratch'' and are responsible for: | + | |
- | + | ||
- | * setting appropriate permissions to restrict access accordingly | + | |
- | * removing files/directories that are no longer needed to keep capacity available over time | + | |
- | + | ||
- | The total capacity can be checked using the ''lfs df'' command: | + | |
- | + | ||
- | <code bash> | + | |
- | lfs df | + | |
- | UUID 1K-blocks Used Available Use% Mounted on | + | |
- | scratch-MDT0000_UUID 2989410560 977815040 2011593472 33% /lustre/scratch[MDT:0] | + | |
- | scratch-OST0000_UUID 51093819392 12543377408 38549933056 25% /lustre/scratch[OST:0] | + | |
- | scratch-OST0001_UUID 51093630976 11691081728 39402001408 23% /lustre/scratch[OST:1] | + | |
- | scratch-OST0002_UUID 51093489664 12198661120 38894347264 24% /lustre/scratch[OST:2] | + | |
- | scratch-OST0003_UUID 51093485568 12000202752 39092807680 23% /lustre/scratch[OST:3] | + | |
- | + | ||
- | filesystem_summary: 204374425600 48433323008 155939089408 24% /lustre/scratch | + | |
- | </code> | + | |
- | + | ||
- | Note that this command displays both aggregate capacity and the capacity of each OST and MDT (MetaData Target) component of the file system. Users can determine their current occupied Lustre scratch capacity: | + | |
- | + | ||
- | <code bash> | + | |
- | $ lfs quota -u $(id -u) /lustre/scratch | + | |
- | Disk quotas for usr 1001 (uid 1001): | + | |
- | Filesystem kbytes quota limit grace files quota limit grace | + | |
- | /lustre/scratch 313298 0 0 - 78 0 0 - | + | |
- | </code> | + | |
- | + | ||
- | Likewise, capacity associated explicitly with a workgroup can be checked: | + | |
- | + | ||
- | <code bash> | + | |
- | $ lfs quota -g $(id -g) /lustre/scratch | + | |
- | Disk quotas for grp 1001 (gid 1001): | + | |
- | Filesystem kbytes quota limit grace files quota limit grace | + | |
- | /lustre/scratch 267784 0 0 - 16 0 0 - | + | |
- | </code> | + | |
- | + | ||
- | <WRAP center round important 60%> | + | |
- | UD IT staff reserve the right to perform emergency removal of data from ''/lustre/scratch'' if occupied capacity reaches unsafe levels. Periodic automated cleanup policies may become necessary if such levels persist. | + | |
- | </WRAP> | + | |
+ | * [[caviness:login|logging in]] to the cluster | ||
+ | * [[caviness:resources#compute|compute resources]] present in the cluster | ||
+ | * [[caviness:resources#storage|storage resources]] available to you, when to use them, and how, including: | ||
+ | * your personal [[caviness:resources#home_directories|home directory]] | ||
+ | * [[caviness:resources#workgroup_directories|workgroup storage]] available to members of a workgroup | ||
+ | * high-speed shared [[caviness:resources#lustre_scratch|Lustre scratch]] storage | ||
+ | * [[caviness:valet|using VALET]] to manage your environment |