How to Use the Intel Xeon Phi (KNL) Cluster

Accessing the Intel Cluster

You will need to be on the CMU network to log in to the Intel cluster without two-factor authentication. (VPN does not seem to meet this requirement for some reason, we are working on this with Intel.) By this point you should have received an email with your username and password.

ssh [username]@ssh-iam.intel-research.net

To access the network from off campus, you need to install Duo Mobile first (this is the same thing that CMU uses for 2FA). Then you SSH same as above, and the server will kick you out and give you a URL to go to for setting up Duo. Follow the instructions there, and then when you SSH again you will be prompted for 2-factor authentification. After accepting the login request, you should be good to go.

When you log in, you can develop and test code on this machine, but you should never run any long-running jobs on this machine. It will slow down the machine for everyone logged into it--and that's basically the entire class, as well as any other researchers using the cluster!

Compiling and Submitting Jobs to the Job Queue

From the head node (the node you log into via SSH), you'll submit jobs to the cluster job scheduler, and when your job reached the front of the queue, it will run on the cluster's worker nodes. Note that when your job runs on worker nodes, your code is the only program running on these machines. This performance isolation is very helpful when testing and analyzing the performance of your programs.

You will need to compile specifically for the cluster, so you can now run make intel for part 2 (just make for part 1) in the supplied Makefile. In order to submit a job, run ./submit.sh [uniform_random|clustered] for part 2 (just ./submit for part 1) from the directory you made your bfs/pagerank in. The script will then print out a job id, which will correspond to the output once your job finishes. There will be two files created that look roughly like bfs_job.sh.e1698 and bfs_job.sh.o1698. These correspond to the output from stderr and stdout for job 1698, respectively. The ones for pagerank will start with pr_job.sh.

Checking Job Status

Once your job has been submitted, you can see the status of a job by running

qstat -u [username]

or delete accidental job submissions with

qdel [job id]

The current submission script allows you to run a maximum of one job at a time. If you resubmit while you have pending jobs in the queue, the script will first delete those jobs, and then submit a new job. This design is intended to make everyone play nice with other people using the cluster. (Don't hog the queue!)