Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
caviness:slurm-manual:intro [2019/08/08 17:39] frey |
caviness:slurm-manual:intro [2019/08/23 19:13] (current) frey [What are resources?] |
||
---|---|---|---|
Line 8: | Line 8: | ||
Getting work done on your laptop or desktop computer usually involves a graphical user interface where your key presses, gestures, and taps or clicks are interpreted to execute programs and enter data. A less intuitive -- but far more powerful -- command-line interface relies on your entering textual commands to accomplish the same tasks. | Getting work done on your laptop or desktop computer usually involves a graphical user interface where your key presses, gestures, and taps or clicks are interpreted to execute programs and enter data. A less intuitive -- but far more powerful -- command-line interface relies on your entering textual commands to accomplish the same tasks. | ||
- | ===== Command-line interface ===== | + | ===== The command-line interface ===== |
- | The default command-line interface on our HPC systems is the Bash shell. The syntax and grammar of Bash encompasses most of the typical constructs of computer programming languages, but purpose-wise Bash focuses heavily on the action of executing other programs and not computation or data processing. The programs executed by Bash on your behalf implement the computation and data processing tasks most closely associated with your work. | + | The default command-line interface (CLI) on our HPC systems is the Bash shell. The syntax and grammar of Bash encompasses most of the typical constructs of computer programming languages, but purpose-wise Bash focuses heavily on the action of executing other programs and not computation or data processing. The programs executed by Bash on your behalf implement the computation and data processing tasks most closely associated with your work. |
- | Getting work done on our HPC systems requires understanding of the Bash shell. An HPC user's efficiency and productivity is to an extent directly proportional to his or her familiarity with the Bash shell. | + | Getting work done on our HPC systems requires knowledge of the Bash shell. Your efficiency and productivity is, to an extent, directly proportional to your familiarity with the Bash shell. Many excellent tutorials exist online that introduce the Bash shell: see [[https://swcarpentry.github.io/shell-novice/|this Software Carpentry tutorial]], for example. |
- | Users of HPC systems often have more work than there are resources in the system. | + | ===== Representation of work ===== |
+ | If the work you do on a computer system consists of a series of Bash commands typed on a keyboard, then saving those commands in a file and telling Bash to read from that file (rather than the keyboard) also gets the job done. Creating such a //Bash script// allows the work to be repeated at any time in the future simply by having a Bash shell read commands from that file. | ||
+ | The work you wish to get done on Caviness should be encapsulated in a Bash script. In this way, a //job script// can be executed at some arbitrary time in the future. Job scripts should require no interaction with a user, to ensure that your not being logged-in to the cluster will not hinder your work from being completed. | ||
+ | ===== Job scheduling ===== | ||
+ | |||
+ | At any time, the hundreds of users of our HPC systems have more work prepared than there are resources in the system. All of those job scripts are submitted to a piece of software that has the job of: | ||
+ | |||
+ | * storing and managing all of the job scripts | ||
+ | * prioritizing all of the job scripts | ||
+ | * matching the resources requested by the job to available resources | ||
+ | * executing job scripts when and where resources are available | ||
+ | * reacting to completion of the job | ||
+ | |||
+ | The Slurm //job scheduler// handles these tasks on the Caviness cluster. | ||
+ | |||
+ | ==== What are resources? ==== | ||
+ | |||
+ | On Caviness, the important resources you must consider for each job are: | ||
+ | |||
+ | * Traditional CPU cores | ||
+ | * System memory (RAM) | ||
+ | * Coprocessors (nVidia GPUs) | ||
+ | * Wall time((wall time = elapsed real time)) | ||
+ | |||
+ | Though default values exist for each, you are encouraged to always make explicit the levels required by a job. In general, requesting more resources than your job can effectively (or efficiently) use: | ||
+ | - can delay start of your job (e.g. it takes longer to coordinate 10 nodes' being free versus a single node) | ||
+ | - may decrease your workgroup's relative job priority versus other workgroups (further delaying future jobs) | ||
+ | |||
+ | ==== Queues and partitions ==== | ||
+ | |||
+ | With other job schedulers, a //queue// is an ordered list of work to be performed. There are one or more queues and jobs are submitted to specific queue(s). Each queue has a set of hardware resources associated with it on which the queue can execute jobs. | ||
+ | |||
+ | Slurm starts from the other end and uses a //partition// to represent a set of hardware resources on which jobs can execute. A single queue contains all jobs, and the partition selected for each job constrains which hardware resources can be used. | ||
- | They may also have //workflows// consisting of sequences of long-running tasks with dependencies between the individual tasks. |