Using Latedays for Assignment 2

To provide additional machines to test your Assignment 2 on, we're giving you access to the latedays cluster.

To use this, you will need to pull in updates from the assignment repo, by running:

git pull

If you have already done some changes to your code, you might want to commit and pull with rebase, i.e.

git add -A
git commit -m "Saving current progress"
git pull --rebase

Getting Started on the Latedays Cluster

The latedays cluster is organized as a single head node and 17 worker nodes that process jobs submitted to a job queue. You ssh into the head node latedays.andrew.cmu.edu via your andrew login, just like any other remote linux box. You can develop and test code on this machine, but you should never run any long-running jobs on this machine. It will slow down the machine for everyone logged into it--and that's basically the entire class!

From the head node you'll submit jobs, and when your job reached the front of the queue, it will run on the cluster's worker nodes. Note that when your job runs on worker nodes, your code is the only program running on these machines. This performance isolation is very helpful when testing and analyzing the performance of your programs.

Each node in the cluster has the same CPU and memory specs:

Although the CPU resources in all nodes are the same, the nodes differ in terms of what compute accelerators are present. 14 of the worker nodes (and also the head node) contain:

The other three worker nodes do not contain a Phi, but contain:

At the moment, you have a home directory on latedays that is not your andrew home directory. (You have a 2GB quota.) However, your andrew home directory is mounted as ~/AFS.

At this time you are not able to work out of your ~/AFS directory since that directory is not mounted when your job runs on the worker nodes of the cluster. (It is only mounted and accessible on the head node.) It is recommended for now to develop your code in /home/USERNAME/. If you are using a private repository, you may just consider cloning that directly to your home directory on latedays.

To develop on latedays, if using bash, add the following to your .bashrc:

module load gcc-4.9.2
export LD_LIBRARY_PATH=/usr/local/cuda/lib64/:${LD_LIBRARY_PATH}
export PATH=/usr/local/cuda/bin:${PATH}

If using csh, add the following to your .cshrc:

module load gcc-4.9.2
setenv PATH /usr/local/cuda/bin:${PATH}
setenv LD_LIBRARY_PATH ${LD_LIBRARY_PATH}:/usr/local/cuda/lib64/:${LD_LIBRARY_PATH}

To build the starter code for render on latedays:

git clone https://github.com/cmu15418/assignment2
cd assignment2/render
module load gcc-4.9.2
make

To submit a job to the job queue, use the default submission script we've provided you: run_latedays.sh. This submits the default job, latedays.qsub, which will run make check on the node it's assigned to, and record the output.

./run_latedays.sh

After a successful submission, the script will echo the name of your job. For example:

1337.latedays.andrew.cmu.edu

Now that you've submitted a job, you can check the the status of the queue via one of the following commands:

showq
qstat

For example, the following output indicates that amellon's job is completed (C), acarnegie's job is running (R), and haxxor's' job is enqueued (Q), waiting for an available machine (in practice, more than one job would run at a time, this is just an example).

[haxxor@latedays render]$ qstat
Job ID                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
1335.latedays             latedays.qsub    amellon        00:00:52 C tesla
1336.latedays             latedays.qsub    acarnegie      00:00:43 R tesla
1337.latedays             latedays.qsub    haxxor                0 Q tesla

You can filter for just your job using:

qstat -u [username]

When your job is complete, log files from stdout and stderr will be placed in your working directory as:

latedays.qsub.o1337
latedays.qsub.e1337

Sample output of latedays.qsub.o1337 is:

Hostname is  compute-0-17.local
mkdir -p objs/
g++ -m64 -O3 -Wall -g -o render objs/main.o objs/display.o ...
./checker.pl

--------------
Running tests:
--------------

Scene : rgb
Correctness passed!
Your time : 0.2485
Reference Time: 0.2496
... [followed by the rest of autograder output]

The maximum wall clock time for any job is currently limited to six minutes. (see the comment in latedays.qsub about changing the wall-clock time of your job if you need to increase it for temporary debugging)

If the stdout file (e.g. latedays.qsub.o1337) doesn't look right, try looking at the corresponding stderr file (e.g. latedays.qsub.e1337) for errors.

For more information on the job queue, and creating your own jobs, see this article.

Also, note that latedays requires a different build than GHC machines, which is what render_ref_latedays is for. If you transition from GHC machines to latedays, or vice versa, you will want to run make clean && make to fully rebuild the executable.