The latedays cluster is organized as a single head node and 17 worker nodes that process jobs submitted to a job queue. You log into the head node latedays.andrew.cmu.edu, just like any other remote linux box. You can develop and test code on this machine, but you should never run any long-running jobs on this machine. It will slow down the machine for everyone logged into it--and that's basically the entire class!

From the head node you'll submit jobs, and when your job reached the front of the queue, it will run on the cluster's worker nodes. Note that when your job runs on worker nodes, your code is the only program running on these machines. This performance isolation is very helpful when testing and analyzing the performance of your programs.

Each node in the cluster has the same CPU and memory specs:

Two, six-core Xeon e5-2620 v3 processors
- 2.4 GHz, 15MB L3 cache, hyper-threading, AVX2 instruction support
- 16 GB RAM
- http://ark.intel.com/products/83352/Intel-Xeon-Processor-E5-2620-v3-15M-Cache-2_40-GHz

Although the CPU resources in all nodes are the same, the nodes differ in terms of what compute accelerators are present. 14 of the worker nodes (and also the head node) contain:

A NVIDIA K40 GPU (4.3 TFLOPS)
- 12 GB RAM (288 GB/sec memory bandwidth)
- http://international.download.nvidia.com/pdf/kepler/TeslaK80-datasheet.pdf
A 60-core Xeon Phi 5110P (2 TFLOPS)
- 60 cores (1 GHz, 4-threads per core, AVX512 ("16-wide") instruction support)
- 8 GB RAM (320GB/sec memory bandwidth)
- http://ark.intel.com/products/71992/Intel-Xeon-Phi-Coprocessor-5110P-8GB-1_053-GHz-60-core

The other three worker nodes do not contain a Phi, but contain:

A NVIDIA TitanX GPU (6 TFLOPS)
- 12 GB RAM (336 GB/sec memory bandwidth)
- http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-titan-x/specifications

The rest of this document is a tutorial on how to submit jobs on the latedays.andrew.cmu.edu cluster.

Step 1: Create your executable

For this tutorial we'll create a simple script that prints hello. Name this file hello.sh.

#!/bin/bash
echo "Hello ${1}!"

Make sure you make it executable by running chmod +x hello.sh.

Step 2: Make a job script

The job script describes how to run your executable. Each job will be run on one of the worker nodes, so your job script also needs to copy your executable to the scratch space on the worker node.

For example:

#!/bin/bash

# Move to my $SCRATCH directory.
cd $SCRATCH

execdir=/home/kku/hello  # Directory containing your hello.sh script
exe=hello.sh             # The executable to run
args="kku"               # Arguments to the executable

# Copy executable to $SCRATCH.
cp ${execdir}/${exe} ${exe}

# Run my executable
./${exe} ${args}

Step 3: Submit your job!

Submit your job by running

qsub [your job script]

You'll get the ID of your job. You can monitor the status of your job by running

qstat -u [username]

The S field shows the current status of your job. If it indicates C, that means your job is done.

Once your job is complete, you'll see the stdout and stderr output in 2 files at your current directory. For example:

-rw------- 1 kku users   0 Feb  5 14:46 kku.job.e28298
-rw------- 1 kku users  11 Feb  5 14:46 kku.job.o28298

Advanced: Selecting the Job Queue

Different worker nodes in the cluster have difference hardware configurations. For example, some worker nodes have Nvidia Tesla K40 GPUs Intel's Xeon Phi's others "only" have a GeForce Titan X. Therefore, if your application requires a specific hardware resource, you should submit your job to the corresponding job queue. You can view a list of all queues by running qstat -q.

For example, if you want to run your CUDA code on a Tesla K40 GPU, your job script should look like:

#!/bin/bash

# Select job queue with worker nodes that have Nvidia Tesla K40 GPUs.
# You can view all the available queues on latedays by running `qstat -q`
#PBS -q tesla

# Move to my $SCRATCH directory.
cd $SCRATCH

execdir=/home/kku/hello  # Directory containing your hello.sh script
exe=hello.sh             # The executable to run
args="kku"               # Arguments to the executable

# Copy executable to $SCRATCH.
cp ${execdir}/${exe} ${exe}

# Run my executable
./${exe} ${args}