Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
caviness:start [2019/03/15 16:23]
frey
caviness:start [2019/06/27 15:53] (current)
frey
Line 5: Line 5:
 The larger amount of reusable infrastructure is afforded by the Open Compute Project (OCP) design of the cluster. ​ Penguin Computing'​s //Tundra// design uses centralized DC power conversion in the racks to make node sleds as small and efficient as possible. ​ Remove the old sleds and insert the new. The larger amount of reusable infrastructure is afforded by the Open Compute Project (OCP) design of the cluster. ​ Penguin Computing'​s //Tundra// design uses centralized DC power conversion in the racks to make node sleds as small and efficient as possible. ​ Remove the old sleds and insert the new.
  
-Many Engineering research groups own capacity in Caviness+Many Engineering research groups own capacity in Caviness. ​ UD IT maintains ​[[http://docs.hpc.udel.edu/abstract/caviness/caviness|its own Caviness documentation]]on this site you will find summaries specific to:
- +
-===== Resources ===== +
- +
-The design of Caviness is similar to previous community clusters. +
- +
-==== Networking ==== +
- +
-There are two private ethernet networks in the cluster.  ​A dedicated 1 Gbps network carries management traffic (remote power control of nodes, console access, etc.). ​ A dedicated 10 Gbps network carries all data traffic (NFS, job scheduling, SSH access) to the nodes. +
- +
-A 100 Gbps Intel Omni-path network also connects all nodes. ​ The OPA network carries Lustre filesystem traffic as well as most MPI internode communications. +
- +
-==== Storage ==== +
- +
-Each rack of compute equipment added to Caviness is designed to add storage capacity to the cluster: +
- +
-  * Lustre Object Storage Targets (OSTs) and Servers (OSSs) +
-  * NFS server +
- +
-The addition of OSTs/OSSs increases the aggregate capacity and bandwidth of the ''/​lustre/​scratch''​ filesystem. ​ Individual NFS servers provide distinct capacity and bandwidth but do not aggregate with existing capacity or bandwidth — in short, they'​re just "more space."​ +
- +
-A discussion of each distinct kind of storage available to users is found below. ​ General usage scenarios for each include: +
- +
-  * Home directory +
-    * User's personal software builds and installs +
-    * VALET package definitions for those software installs +
-  * Workgroup directory +
-    * Software builds/​installs used by one or more members of a workgroup +
-    * VALET package definitions for those software installs +
-    * Members'​ data sets, job working directories,​ job summary directories +
-    * Project data sets, job working directories,​ job summary directories +
-  * Lustre scratch +
-    * Temporary storage of large data sets +
-    * Temporary storage of job working directories +
- +
-=== Home directories === +
- +
-Each user is granted a home directory with a 20 GiB limit (quota). ​ Typically users will build software in their home directory. ​ The relatively low quota often means that users cannot (and should not) submit computational jobs from their home directories. ​ Home directories are mounted at the path ''/​home/<​uid_number>'',​ where ''<​uid_number>''​ is a user's Unix UID number (an integer value, use the ''​id''​ command to determine it). +
- +
-<WRAP center round info 60%> +
-The Bash shell allows you to reference your home directory as ''​~/''​ in most commands. ​ For example, ''​ls -al ~/''​ displays a long listing of the all hidden and normally-visible files and directories inside your home directory. +
-</​WRAP>​ +
- +
-The home directory is the location of a few important files and directories:​ +
- +
-^File^Description^ +
-|''​.bashrc''​|Commands executed by any new Bash shell spawned for the user| +
-|''​.bash_profile''​|Commands executed specifically by a new login shell for the user| +
-|''​.bash_history''​|Saved sequence of commands the user has interactively entered at the shell prompt|  +
-|''​.bash_udit''​|Configuration file controlling ​UD-specific behaviors of the Bash shell| +
-|''​.valet/''​|Directory containing a user's personal VALET package definitions;​ does not exist by default, should be created by the user if wanted (automatically added to ''​VALET_PATH''​ by UD Bash login scripts)| +
-|''​.zfs/​snapshot/''​|Directory containing historical //​snapshots//​ of the home directory| +
- +
-The use of [[abstract:zfs-snapshots|ZFS snapshots]] as backup copies of a home directory is discussed elsewhere. ​ In general, the editing of ''​.bashrc''​ and ''​.bash_profile''​ is discouraged,​ especially for the alteration of the ''​PATH''​ and ''​LD_LIBRARY_PATH''​ environment variables. +
- +
-=== Workgroup directories === +
- +
-Each //workgroup// that purchases capacity in the cluster receives a //workgroup directory// with a quota in proportion to its level of investment in the cluster: ​ the more compute capacity purchased, the more space granted ​Workgroup directories are mounted at the path ''/​work/<​workgroup-id>''​ on all nodes in the cluster. +
- +
-<WRAP center round info 60%> +
-Once you've started a shell in a workgroup using the ''​workgroup -g <​workgroup-id>''​ command the ''​WORKDIR''​ environment variable contains the path to the workgroup directory This allows you to reference it in commands like ''​ls -l ${WORKDIR}/users''​. +
- +
-Adding the ''​-c''​ flag to the ''​workgroup''​ command automatically starts the workgroup shell in ''​$WORKDIR''​. +
-</WRAP> +
- +
-The typical layout of a workgroup directory often includes: +
- +
-^Subdirectory^Description^ +
-|''​/work/<​workgroup-id>/​.zfs/​snapshot/''​|Always present, contains historical //​snapshots//​ of the the workgroup directory| +
-|''/​work/<​workgroup-id>/​sw/''​|A directory to hold software used by multiple members of the workgroup| +
-|''/​work/<​workgroup-id>/​sw/​valet/''​|Directory for VALET package definitions (automatically added to ''​VALET_PATH''​ by UD Bash login scripts)| +
-|''/​work/<​workgroup-id>/​users/''​|Directory to contain per-user storage rather than having them exist directly under ''/​work/<​workgroup-id>''​| +
-|''/​work/<​workgroup-id>/​projects/''​|Directory to contain per-project storage rather than having them exist directly under ''/​work/<​workgroup-id>''​| +
- +
-None of these directories are mandatory, but they do tend to make management of a workgroup'​s resources easier. ​ In particular, the fact that ''/​work/<​workgroup-id>/​sw/​valet''​ will be automatically added to the VALET search path means workgroup users do not need to alter ''​VALET_PATH''​ manually, in their ''​.bashrc''/''​.bash_profile'',​ or in their job scripts. +
- +
-The use of [[abstract:​zfs-snapshots|ZFS snapshots]] as backup copies of a workgroup directory is discussed elsewhere. +
- +
-=== Lustre scratch === +
- +
-The ''/​lustre/​scratch''​ file system is a high-speed parallel file system accessible from all nodes in the cluster. ​ Users/​groups are free to create their own top-level directories under ''/​lustre/​scratch''​ and are responsible for: +
- +
-  * setting appropriate permissions to restrict access accordingly +
-  * removing files/​directories that are no longer needed to keep capacity available over time +
- +
-The total capacity can be checked using the ''​lfs df''​ command: +
- +
-<code bash> +
- lfs df +
-UUID                   ​1K-blocks ​       Used   ​Available Use% Mounted ​on +
-scratch-MDT0000_UUID ​ 2989410560 ​  ​977815040 ​ 2011593472 ​ 33% /​lustre/​scratch[MDT:​0] +
-scratch-OST0000_UUID 51093819392 12543377408 38549933056 ​ 25% /​lustre/​scratch[OST:​0] +
-scratch-OST0001_UUID 51093630976 11691081728 39402001408 ​ 23% /​lustre/​scratch[OST:​1] +
-scratch-OST0002_UUID 51093489664 12198661120 38894347264 ​ 24% /​lustre/​scratch[OST:​2] +
-scratch-OST0003_UUID 51093485568 12000202752 39092807680 ​ 23% /​lustre/​scratch[OST:​3] +
- +
-filesystem_summary: ​ 204374425600 48433323008 155939089408 ​ 24% /​lustre/​scratch +
-</​code>​ +
- +
-Note that this command displays both aggregate capacity and the capacity of each OST and MDT (MetaData Target) component of the file system. ​ Users can determine their current occupied Lustre scratch capacity: +
- +
-<code bash> +
-$ lfs quota -u $(id -u) /​lustre/​scratch +
-Disk quotas for usr 1001 (uid 1001): +
-     ​Filesystem ​ kbytes ​  ​quota ​  ​limit ​  ​grace ​  ​files ​  ​quota ​  ​limit ​  ​grace +
-/​lustre/​scratch ​ 313298 ​      ​0 ​      ​0 ​      ​- ​     78       ​0 ​      ​0 ​      - +
-</​code>​ +
- +
-Likewise, capacity associated explicitly with a workgroup can be checked: +
- +
-<code bash> +
-$ lfs quota -g $(id -g) /​lustre/​scratch +
-Disk quotas for grp 1001 (gid 1001): +
-     ​Filesystem ​ kbytes ​  ​quota ​  ​limit ​  ​grace ​  ​files ​  ​quota ​  ​limit ​  ​grace +
-/​lustre/​scratch ​ 267784 ​      ​0 ​      ​0 ​      ​- ​     16       ​0 ​      ​0 ​      - +
-</​code>​ +
- +
-<WRAP center round important 60%> +
-UD IT staff reserve the right to perform emergency removal of data from ''/​lustre/​scratch''​ if occupied capacity reaches unsafe levels. ​ Periodic automated cleanup policies may become necessary if such levels persist. +
-</​WRAP>​ +
  
 +  * [[caviness:​login|logging in]] to the cluster
 +  * [[caviness:​resources#​compute|compute resources]] present in the cluster
 +  * [[caviness:​resources#​storage|storage resources]] available to you, when to use them, and how, including:
 +    * your personal [[caviness:​resources#​home_directories|home directory]]
 +    * [[caviness:​resources#​workgroup_directories|workgroup storage]] available to members of a workgroup
 +    * high-speed shared [[caviness:​resources#​lustre_scratch|Lustre scratch]] storage
 +  * [[caviness:​valet|using VALET]] to manage your environment
  • caviness/start.1552667005.txt.gz
  • Last modified: 2019/03/15 16:23
  • by frey