Middlebury

Difference between revisions of "High Performance Computing (HPC)/Training"

Line 96: Line 96:
  
 
==Multi-node (MPI) jobs==
 
==Multi-node (MPI) jobs==
 +
 +
The cluster is currently not configured to allow for multi-node (e.g. MPI) jobs.
  
 
=GPU jobs=
 
=GPU jobs=

Revision as of 14:31, 4 September 2019

Overview of the Ada Cluster

How is a cluster different from my laptop/desktop?

Architecture

Logging in

ssh username@ada
  • "username" is your Middlebury username. If your username on the computer you're logging in from is also your Midd username (e.g. if you're using a college owned computer), then you can just use the command ("ssh ada").
  • You will be prompted for your Middlebury password--after you enter your password, you will now have a linux command prompt for the head node "ada".
  • You are now in your home directory on ada. From here you can access the filesystem in your home directory, using standard linux commands. For example, we can make a directory:
mkdir test_job
  • While it's not necessary, for convenience you can consider setting up public key authentication from your laptop or desktop; this will allow you to login securely without entering your password.

Submitting jobs vis the Slurm scheduler

Basic slurm script

  • We have the basic slurm script shown below in the text file "slurm_serial.sh":
#!/usr/bin/env bash
# slurm template for serial jobs

# Set SLURM options
#SBATCH --job-name=serial_test                  # Job name
#SBATCH --output=serial_test-%j.out             # Standard output and error log
#SBATCH --mail-user=username@middlebury.edu     # Where to send mail	
#SBATCH --mail-type=NONE                        # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mem=100mb                             # Job memory request
#SBATCH --partition=standard                    # Partition (queue) 
#SBATCH --time=00:05:00                         # Time limit hrs:min:sec

# print SLURM envirionment variables
echo "Job ID: ${SLURM_JOB_ID}"
echo "Node: ${SLURMD_NODENAME}"
echo "Starting: "`date +"%D %T"`

# Your calculations here
printf "\nHello world from ${SLURMD_NODENAME}!\n\n"

# End of job info
echo "Ending:   "`date +"%D %T"`

Submitting jobs

  • Jobs are submitted to the slurm scheduler via the "sbatch" command:
sbatch slurm_serial.sh

Monitoring jobs

  • You can monitor the status of jobs in the queue via the "squeue" command:
squeue

Parallel Jobs

Array jobs

If a serial job can easily broken into several (or many) independent pieces, then it's most efficient to submit an array job, which is a set of closely related serial jobs that will all run independently.

  • To submit an array job, use the slurm option "--array". For example "--array=0-4" will run 5 independent tasks, labeled 0-4 by the environment variable SLURM_ARRAY_TASK_ID.
  • To allow each array task to perform a different calculation, you can to use SLURM_ARRAY_TASK_ID as an input parameter to your calculation.
  • Each array task will appear as an independent job in the queue and run independently.
  • An entire array job can be canceled at once or each task can be canceled individually.

Here is simple example of a slurm array job script is

#!/usr/bin/env bash
# slurm template for array jobs

# Set SLURM options
#SBATCH --job-name=array_test                   # Job name
#SBATCH --output=array_test-%A-%a.out           # Standard output and error log
#SBATCH --mail-user=username@middlebury.edu     # Where to send mail    
#SBATCH --mail-type=NONE                        # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mem=100mb                             # Job memory request
#SBATCH --partition=standard                    # Partition (queue) 
#SBATCH --time=00:05:00                         # Time limit hrs:min:sec
#SBATCH --array=0-4                             # Array range

# print SLURM envirionment variables
echo "Job ID: ${SLURM_JOB_ID}"
echo "Array ID: ${SLURM_ARRAY_TASK_ID}"
echo "Node: ${SLURMD_NODENAME}"
echo "Starting: "`date +"%D %T"`

# Your calculations here
printf "\nHello world from array task ${SLURM_ARRAY_TASK_ID}!\n\n"

# End of job info
echo "Ending:   "`date +"%D %T"`

An example of how a serial job can be broken into an array job is on the HPC Github repository (see below).

Shared memory or multi-threaded jobs

If your code can take advantage of multiple CPU cores via multi-threading, you can request multiple CPU cores for your job in the slurm script via the "--cpus-per-task" option. For example specifying:

#SBATCH --cpus-per-task=8    # Number of CPU cores for this job

in the slurm script would request 8 CPU cores for the job. The standard CPU compute nodes have 36 cores per node, so you can request up to 36 cores per job. All cores will be on the same node and share memory, as if the calculation was running on a single stand alone workstation.

Note that your code must be able to take advantage of the additional CPU cores that slurm allocates--if you request multiple cores for a purely serial code (i.e. that can only use 1 CPU core) the additional CPU cores will remain idle.

Multi-node (MPI) jobs

The cluster is currently not configured to allow for multi-node (e.g. MPI) jobs.

GPU jobs

There is a single GPU compute node which is accessible via the gpu-standard, gpu-short, and gpu-long queues. All GPU jobs must be submitted to one of these queues via the --partition option. E.g.

#SBATCH --partition=gpu-standard                    # Partition (queue)

Large Memory jobs

Standard CPU compute nodes have a total of 96 GB of RAM, so you can request up to 96 GB for jobs submitted to the standard, short or long queues. In your slurm submit script you should specify the amount of memory needed via the --mem option. For example, include the line:

#SBATCH --mem=2gb

to request 2gb for a job. If your job requires more than 96GB of RAM, you will need to use the high memory node, which has 768 GB of RAM. To access the high memory node you need to submit to the himem-standard, himem-short, himem-long queues for example, including the options:

#SBATCH --partition=himem-standard              # Partition (queue) 
#SBATCH --mem=128gb                             # Job memory request

would request 128GB of RAM using the himem-standard queue.

Storage

  • Each user has a home directory located at /home/$USER where $USER is your Middlebury username, and also accessible via the $HOME environment variable. Each user has a quote of 50 GB in their home directory/
  • Additionally each user has a storage directory located /storage/$USER which is also accessible via the $STORAGE environment variable. The quota on each user's storage directory is 400 GB.
  • The home directory has a fairly small quota as it is only intended for storage of scripts, code, executables, and small parameter files, NOT for data storage.
  • Data files should be stored on in the storage directory.

Local scratch storage

Checkpointing

Checkpointing your jobs running on ada is recommended. Checkpointing stores the internal state of your calculation periodically so the job can be restarted from that state, e.g. if the node goes down or the wall clock limit is reached. Ideally, checkpointing is done internally in your application (it is built into many open source and commercial packages); if your application doesn't support checkpointing internally you can use an external checkpointing tool such as dmtcp. Here we'll illustrate an example of using external checkpointing via dmtcp found the directory "ckpt-example" on the GitHub repository.

  • We'll illustrate checkpointing using a simple counter. First compile the executable "count" from the source code "counter.c" via:
gcc counter.c -o count
  • Now submit the slurm script "slurm_ckpt_start.sh"
sbatch slurm_ckpt_start.sh
  • Once that job has completed, you should see a checkpointing file of the form "ckpt_count_*.dmtcp". You job can be restarted using the "dmtcp_restart" command as is found in "slurm_ckpt_restart.sh":
sbatch slurm_ckpt_restart.sh

(Note: you will get a warning message for this sample job on the initial restart--this will not cause a problem)

  • You can restart and continue the job any number of times via the restart script. E.g. try submitting the restart script a 2nd time.
sbatch slurm_ckpt_restart.sh

Sample jobs

Breaking a serial job into an array job

An example of using array jobs is in the directory "array_job_example" on the HPC Github repository

  • The python script factor_list.py will find the prime factors of a list of integers, e.g. the 12-digit numbers in the file "sample_list_12.dat":
python factor_list.py sample_list_12.dat
  • To factor all 20 16-digit numbers in "sample_list_12.dat" as a single serial job (which will take several minutes), submit the slurm script "serial_factor.sh":
sbatch serial_factor.sh
  • The factors will be stored in "serial_factors_out.dat"
  • The slurm script "array_factor.sh" breaks the calculation up into a 10 task array job:
sbatch array_factor.sh
  • Each array task stores the results in the file "array_factors_out-${SLURM_ARRAY_TASK_ID}.dat" where the task array ID runs from 0-9.
  • After all the array tasks are complete, the data can combined into a single file , e.g. array_factors_out.dat:
cat array_factors_out-?.dat > array_factors_out.dat
  • You can check that both methods give you the same result via diff:
diff serial_factors_out.dat array_factors_out.dat

Serial Stata job

Parallel Stata job

Git repository

Best practices

Powered by MediaWiki