Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
caviness:resources [2019/03/15 17:20]
frey created
caviness:resources [2019/06/28 13:35] (current)
frey [Generation 1]
Line 1: Line 1:
-====== ​Caviness: resources ​======+====== ​Resources ​======
  
 The design of Caviness is similar to previous community clusters. The design of Caviness is similar to previous community clusters.
 +
 +===== Compute =====
 +
 +Designed as a multi-generational system, over time Caviness will host a variety of nodes, differing not just in memory sizes and presence of coprocessors but also in processor microarchitecture.
 +
 +==== Generation 1 ====
 +
 +The baseline node specification comprises:
 +
 +|**CPU**|(2) Intel Xeon E5-2695v4|
 +|**Cores**|18 per CPU, 36 per node|
 +|**Clock rate**|2.10 GHz|
 +|**CPU cache**|32 KiB L1 data and instruction caches per core; 256 KiB L2 cache per core; 45 MiB L3 cache|
 +|**Local storage**|910 GB ''/​tmp''​ partition on a 960 GB SSD|
 +|**Network**|(1) 1 Gbps ethernet port; (1) 100 Gbps Intel Omni-path port|
 +
 +Three RAM sizes are present:
 +
 +^ ^Qty^Specification^
 +|**RAM**|64|128 GiB (8 x 16 GiB) DDR4, 2400 MHz|
 +|**RAM**|55|256 GiB (8 x 32 GiB) DDR4, 2400 MHz|
 +|**RAM**|7|512 GiB (16 x 32 GiB) DDR4, 2133 MHz|
 +
 +Some nodes include GPU coprocessors:​
 +
 +^ ^Qty^Specification^
 +|**Coprocessor**|10|(2) nVidia P100 GPGPU, 12 GiB, PCIe|
 +
 +Additionally,​ two nodes with additional local storage are present to facilitate user testing of enhanced local storage media:
 +
 +^ ^Qty^Specification^
 +|**Enhanced storage**|2|6.1 GB ''/​nvme''​ partition, RAID0 across (2) 3.2 TB Micron 9200 NVMe|
  
 ===== Networking ===== ===== Networking =====
Line 7: Line 39:
 There are two private ethernet networks in the cluster. ​ A dedicated 1 Gbps network carries management traffic (remote power control of nodes, console access, etc.). ​ A dedicated 10 Gbps network carries all data traffic (NFS, job scheduling, SSH access) to the nodes. There are two private ethernet networks in the cluster. ​ A dedicated 1 Gbps network carries management traffic (remote power control of nodes, console access, etc.). ​ A dedicated 10 Gbps network carries all data traffic (NFS, job scheduling, SSH access) to the nodes.
  
-A 100 Gbps Intel Omni-path network also connects all nodes. ​ The OPA network carries Lustre filesystem traffic as well as most MPI internode communications.+A 100 Gbps Intel Omni-path network also connects all nodes. ​ The OPA network carries Lustre filesystem traffic as well as most MPI internode communications.  The network uses a fat tree topology employing six spine switches. ​ Each leaf switch (two per rack) features 12 leaf-to-spine uplink ports and 36 host ports (3:1 oversubscription).
  
 ===== Storage ===== ===== Storage =====
Line 86: Line 118:
 The total capacity can be checked using the ''​lfs df''​ command: The total capacity can be checked using the ''​lfs df''​ command:
  
-<​code ​bash+<​code>​ 
- lfs df+lfs df
 UUID                   ​1K-blocks ​       Used   ​Available Use% Mounted on UUID                   ​1K-blocks ​       Used   ​Available Use% Mounted on
 scratch-MDT0000_UUID ​ 2989410560 ​  ​977815040 ​ 2011593472 ​ 33% /​lustre/​scratch[MDT:​0] scratch-MDT0000_UUID ​ 2989410560 ​  ​977815040 ​ 2011593472 ​ 33% /​lustre/​scratch[MDT:​0]
Line 100: Line 132:
 Note that this command displays both aggregate capacity and the capacity of each OST and MDT (MetaData Target) component of the file system. ​ Users can determine their current occupied Lustre scratch capacity: Note that this command displays both aggregate capacity and the capacity of each OST and MDT (MetaData Target) component of the file system. ​ Users can determine their current occupied Lustre scratch capacity:
  
-<​code ​bash>+<​code>​
 $ lfs quota -u $(id -u) /​lustre/​scratch $ lfs quota -u $(id -u) /​lustre/​scratch
 Disk quotas for usr 1001 (uid 1001): Disk quotas for usr 1001 (uid 1001):
Line 109: Line 141:
 Likewise, capacity associated explicitly with a workgroup can be checked: Likewise, capacity associated explicitly with a workgroup can be checked:
  
-<​code ​bash>+<​code>​
 $ lfs quota -g $(id -g) /​lustre/​scratch $ lfs quota -g $(id -g) /​lustre/​scratch
 Disk quotas for grp 1001 (gid 1001): Disk quotas for grp 1001 (gid 1001):
Line 119: Line 151:
 UD IT staff reserve the right to perform emergency removal of data from ''/​lustre/​scratch''​ if occupied capacity reaches unsafe levels. ​ Periodic automated cleanup policies may become necessary if such levels persist. UD IT staff reserve the right to perform emergency removal of data from ''/​lustre/​scratch''​ if occupied capacity reaches unsafe levels. ​ Periodic automated cleanup policies may become necessary if such levels persist.
 </​WRAP>​ </​WRAP>​
- 
  • caviness/resources.1552670418.txt.gz
  • Last modified: 2019/03/15 17:20
  • by frey