Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
caviness:resources [2019/03/15 17:20] frey created |
caviness:resources [2019/06/28 13:35] (current) frey [Generation 1] |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== Caviness: resources ====== | + | ====== Resources ====== |
The design of Caviness is similar to previous community clusters. | The design of Caviness is similar to previous community clusters. | ||
+ | |||
+ | ===== Compute ===== | ||
+ | |||
+ | Designed as a multi-generational system, over time Caviness will host a variety of nodes, differing not just in memory sizes and presence of coprocessors but also in processor microarchitecture. | ||
+ | |||
+ | ==== Generation 1 ==== | ||
+ | |||
+ | The baseline node specification comprises: | ||
+ | |||
+ | |**CPU**|(2) Intel Xeon E5-2695v4| | ||
+ | |**Cores**|18 per CPU, 36 per node| | ||
+ | |**Clock rate**|2.10 GHz| | ||
+ | |**CPU cache**|32 KiB L1 data and instruction caches per core; 256 KiB L2 cache per core; 45 MiB L3 cache| | ||
+ | |**Local storage**|910 GB ''/tmp'' partition on a 960 GB SSD| | ||
+ | |**Network**|(1) 1 Gbps ethernet port; (1) 100 Gbps Intel Omni-path port| | ||
+ | |||
+ | Three RAM sizes are present: | ||
+ | |||
+ | ^ ^Qty^Specification^ | ||
+ | |**RAM**|64|128 GiB (8 x 16 GiB) DDR4, 2400 MHz| | ||
+ | |**RAM**|55|256 GiB (8 x 32 GiB) DDR4, 2400 MHz| | ||
+ | |**RAM**|7|512 GiB (16 x 32 GiB) DDR4, 2133 MHz| | ||
+ | |||
+ | Some nodes include GPU coprocessors: | ||
+ | |||
+ | ^ ^Qty^Specification^ | ||
+ | |**Coprocessor**|10|(2) nVidia P100 GPGPU, 12 GiB, PCIe| | ||
+ | |||
+ | Additionally, two nodes with additional local storage are present to facilitate user testing of enhanced local storage media: | ||
+ | |||
+ | ^ ^Qty^Specification^ | ||
+ | |**Enhanced storage**|2|6.1 GB ''/nvme'' partition, RAID0 across (2) 3.2 TB Micron 9200 NVMe| | ||
===== Networking ===== | ===== Networking ===== | ||
Line 7: | Line 39: | ||
There are two private ethernet networks in the cluster. A dedicated 1 Gbps network carries management traffic (remote power control of nodes, console access, etc.). A dedicated 10 Gbps network carries all data traffic (NFS, job scheduling, SSH access) to the nodes. | There are two private ethernet networks in the cluster. A dedicated 1 Gbps network carries management traffic (remote power control of nodes, console access, etc.). A dedicated 10 Gbps network carries all data traffic (NFS, job scheduling, SSH access) to the nodes. | ||
- | A 100 Gbps Intel Omni-path network also connects all nodes. The OPA network carries Lustre filesystem traffic as well as most MPI internode communications. | + | A 100 Gbps Intel Omni-path network also connects all nodes. The OPA network carries Lustre filesystem traffic as well as most MPI internode communications. The network uses a fat tree topology employing six spine switches. Each leaf switch (two per rack) features 12 leaf-to-spine uplink ports and 36 host ports (3:1 oversubscription). |
===== Storage ===== | ===== Storage ===== | ||
Line 86: | Line 118: | ||
The total capacity can be checked using the ''lfs df'' command: | The total capacity can be checked using the ''lfs df'' command: | ||
- | <code bash> | + | <code> |
- | lfs df | + | $ lfs df |
UUID 1K-blocks Used Available Use% Mounted on | UUID 1K-blocks Used Available Use% Mounted on | ||
scratch-MDT0000_UUID 2989410560 977815040 2011593472 33% /lustre/scratch[MDT:0] | scratch-MDT0000_UUID 2989410560 977815040 2011593472 33% /lustre/scratch[MDT:0] | ||
Line 100: | Line 132: | ||
Note that this command displays both aggregate capacity and the capacity of each OST and MDT (MetaData Target) component of the file system. Users can determine their current occupied Lustre scratch capacity: | Note that this command displays both aggregate capacity and the capacity of each OST and MDT (MetaData Target) component of the file system. Users can determine their current occupied Lustre scratch capacity: | ||
- | <code bash> | + | <code> |
$ lfs quota -u $(id -u) /lustre/scratch | $ lfs quota -u $(id -u) /lustre/scratch | ||
Disk quotas for usr 1001 (uid 1001): | Disk quotas for usr 1001 (uid 1001): | ||
Line 109: | Line 141: | ||
Likewise, capacity associated explicitly with a workgroup can be checked: | Likewise, capacity associated explicitly with a workgroup can be checked: | ||
- | <code bash> | + | <code> |
$ lfs quota -g $(id -g) /lustre/scratch | $ lfs quota -g $(id -g) /lustre/scratch | ||
Disk quotas for grp 1001 (gid 1001): | Disk quotas for grp 1001 (gid 1001): | ||
Line 119: | Line 151: | ||
UD IT staff reserve the right to perform emergency removal of data from ''/lustre/scratch'' if occupied capacity reaches unsafe levels. Periodic automated cleanup policies may become necessary if such levels persist. | UD IT staff reserve the right to perform emergency removal of data from ''/lustre/scratch'' if occupied capacity reaches unsafe levels. Periodic automated cleanup policies may become necessary if such levels persist. | ||
</WRAP> | </WRAP> | ||
- |