Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
caviness:slurm-manual:intro [2019/08/08 17:39]
frey
caviness:slurm-manual:intro [2019/08/23 19:13] (current)
frey [What are resources?]
Line 8: Line 8:
 Getting work done on your laptop or desktop computer usually involves a graphical user interface where your key presses, gestures, and taps or clicks are interpreted to execute programs and enter data.  A less intuitive -- but far more powerful -- command-line interface relies on your entering textual commands to accomplish the same tasks. Getting work done on your laptop or desktop computer usually involves a graphical user interface where your key presses, gestures, and taps or clicks are interpreted to execute programs and enter data.  A less intuitive -- but far more powerful -- command-line interface relies on your entering textual commands to accomplish the same tasks.
  
-===== Command-line interface =====+===== The command-line interface =====
  
-The default command-line interface on our HPC systems is the Bash shell. ​ The syntax and grammar of Bash encompasses most of the typical constructs of computer programming languages, but purpose-wise Bash focuses heavily on the action of executing other programs and not computation or data processing. ​ The programs executed by Bash on your behalf implement the computation and data processing tasks most closely associated with your work.+The default command-line interface ​(CLI) on our HPC systems is the Bash shell. ​ The syntax and grammar of Bash encompasses most of the typical constructs of computer programming languages, but purpose-wise Bash focuses heavily on the action of executing other programs and not computation or data processing. ​ The programs executed by Bash on your behalf implement the computation and data processing tasks most closely associated with your work.
  
-Getting work done on our HPC systems requires ​understanding ​of the Bash shell.  ​An HPC user'​s ​efficiency and productivity is to an extent directly proportional to his or her familiarity with the Bash shell.+Getting work done on our HPC systems requires ​knowledge ​of the Bash shell.  ​Your efficiency and productivity isto an extentdirectly proportional to your familiarity with the Bash shell.  Many excellent tutorials exist online that introduce the Bash shell: see [[https://​swcarpentry.github.io/​shell-novice/​|this Software Carpentry tutorial]], for example.
  
-Users of HPC systems often have more work than there are resources in the system.+===== Representation ​of work =====
  
 +If the work you do on a computer system consists of a series of Bash commands typed on a keyboard, then saving those commands in a file and telling Bash to read from that file (rather than the keyboard) also gets the job done.  Creating such a //Bash script// allows the work to be repeated at any time in the future simply by having a Bash shell read commands from that file.
  
 +The work you wish to get done on Caviness should be encapsulated in a Bash script. ​ In this way, a //job script// can be executed at some arbitrary time in the future. ​ Job scripts should require no interaction with a user, to ensure that your not being logged-in to the cluster will not hinder your work from being completed.
  
 +===== Job scheduling =====
 +
 +At any time, the hundreds of users of our HPC systems have more work prepared than there are resources in the system. ​ All of those job scripts are submitted to a piece of software that has the job of:
 +
 +  * storing and managing all of the job scripts
 +  * prioritizing all of the job scripts
 +  * matching the resources requested by the job to available resources
 +  * executing job scripts when and where resources are available
 +  * reacting to completion of the job
 +
 +The Slurm //job scheduler// handles these tasks on the Caviness cluster.
 +
 +==== What are resources? ====
 +
 +On Caviness, the important resources you must consider for each job are:
 +
 +  * Traditional CPU cores
 +  * System memory (RAM)
 +  * Coprocessors (nVidia GPUs)
 +  * Wall time((wall time = elapsed real time))
 +
 +Though default values exist for each, you are encouraged to always make explicit the levels required by a job.  In general, requesting more resources than your job can effectively (or efficiently) use:
 +  - can delay start of your job (e.g. it takes longer to coordinate 10 nodes' being free versus a single node)
 +  - may decrease your workgroup'​s relative job priority versus other workgroups (further delaying future jobs)
 +
 +==== Queues and partitions ====
 +
 +With other job schedulers, a //queue// is an ordered list of work to be performed. ​ There are one or more queues and jobs are submitted to specific queue(s). ​ Each queue has a set of hardware resources associated with it on which the queue can execute jobs.
 +
 +Slurm starts from the other end and uses a //​partition//​ to represent a set of hardware resources on which jobs can execute. ​ A single queue contains all jobs, and the partition selected for each job constrains which hardware resources can be used.
  
-  They may also have //​workflows//​ consisting of sequences of long-running tasks with dependencies between the individual tasks.  ​ 
  • caviness/slurm-manual/intro.1565285968.txt.gz
  • Last modified: 2019/08/08 17:39
  • by frey