Middlebury

Difference between revisions of "High Performance Computing (HPC)/Training"

Line 53: Line 53:
  
 
If a serial job can easily broken into several (or many) independent pieces, then it's most efficient to submit an array job, which is a set of closely related serial jobs that will all run independently.
 
If a serial job can easily broken into several (or many) independent pieces, then it's most efficient to submit an array job, which is a set of closely related serial jobs that will all run independently.
 +
* To submit an array job, use the slurm option "--array". For example "--array=0-4" will run 5 independent tasks, labeled 0-4 by the environment variable SLURM_ARRAY_TASK_ID.
 +
* To allow each array task to perform a different calculation, you can to use SLURM_ARRAY_TASK_ID as an input parameter to your calculation.
 +
* Each array task will appear as an independent job in the queue and run independently.
 +
* An entire array job can be canceled at once or each task can be canceled individually.
  
 
  <nowiki>#!/usr/bin/env bash
 
  <nowiki>#!/usr/bin/env bash

Revision as of 16:12, 3 September 2019

Overview of the Ada Cluster

How is a cluster different from my laptop/desktop?

Architecture

Logging in

ssh username@ada
  • "username" is your Middlebury username. If your username on the computer you're logging in from is also your Midd username (e.g. if you're using a college owned computer), then you can just use the command ("ssh ada").
  • You will be prompted for your Middlebury password--after you enter your password, you will now have a linux command prompt for the head node "ada".
  • You are now in your home directory on ada. From here you can access the filesystem in your home directory, using standard linux commands. For example, we can make a directory:
mkdir test_job
  • While it's not necessary, for convenience you can consider setting up public key authentication from your laptop or desktop; this will allow you to login securely without entering your password.

Submitting jobs vis the Slurm scheduler

Basic slurm script

  • We have the basic slurm script shown below in the text file "slurm_serial.sh":
#!/usr/bin/env bash
# slurm template for serial jobs

# Set SLURM options
#SBATCH --job-name=serial_test                  # Job name
#SBATCH --output=serial_test-%j.out             # Standard output and error log
#SBATCH --mail-user=username@middlebury.edu     # Where to send mail	
#SBATCH --mail-type=NONE                        # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mem=100mb                             # Job memory request
#SBATCH --partition=standard                    # Partition (queue) 
#SBATCH --time=00:05:00                         # Time limit hrs:min:sec

# print SLURM envirionment variables
echo "Job ID: ${SLURM_JOB_ID}"
echo "Node: ${SLURMD_NODENAME}"
echo "Starting: "`date +"%D %T"`

# Your calculations here
printf "\nHello world from ${SLURMD_NODENAME}!\n\n"

# End of job info
echo "Ending:   "`date +"%D %T"`

Submitting jobs

  • Jobs are submitted to the slurm scheduler via the "sbatch" command:
sbatch slurm_serial.sh

Monitoring jobs

  • You can monitor the status of jobs in the queue via the "squeue" command:
squeue

Parallel Jobs

Array jobs

If a serial job can easily broken into several (or many) independent pieces, then it's most efficient to submit an array job, which is a set of closely related serial jobs that will all run independently.

  • To submit an array job, use the slurm option "--array". For example "--array=0-4" will run 5 independent tasks, labeled 0-4 by the environment variable SLURM_ARRAY_TASK_ID.
  • To allow each array task to perform a different calculation, you can to use SLURM_ARRAY_TASK_ID as an input parameter to your calculation.
  • Each array task will appear as an independent job in the queue and run independently.
  • An entire array job can be canceled at once or each task can be canceled individually.
#!/usr/bin/env bash
# slurm template for array jobs

# Set SLURM options
#SBATCH --job-name=array_test                   # Job name
#SBATCH --output=array_test-%A-%a.out           # Standard output and error log
#SBATCH --mail-user=username@middlebury.edu     # Where to send mail    
#SBATCH --mail-type=NONE                        # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mem=100mb                             # Job memory request
#SBATCH --partition=standard                    # Partition (queue) 
#SBATCH --time=00:05:00                         # Time limit hrs:min:sec
#SBATCH --array=0-4                             # Array range

# print SLURM envirionment variables
echo "Job ID: ${SLURM_JOB_ID}"
echo "Array ID: ${SLURM_ARRAY_TASK_ID}"
echo "Node: ${SLURMD_NODENAME}"
echo "Starting: "`date +"%D %T"`

# Your calculations here
printf "\nHello world from array task ${SLURM_ARRAY_TASK_ID}!\n\n"

# End of job info
echo "Ending:   "`date +"%D %T"`

Shared memory or multi-threaded jobs

Multi-node (MPI) jobs

GPU jobs

Large Memory jobs

Storage

Local scratch storage

Checkpointing

Sample jobs

Serial Stata job

Parallel Stata job

Git repository

Best practices

Powered by MediaWiki