Blacklight is a different development environment than most of you are used to. This article hopefully will help ease the pain of learning how to use blacklight. Note that this is not a substitute for reading the information about blacklight.
Creating a blacklight account
If you haven't created a blacklight account, do so immediately. There can be a several business-day latency between you signing up for an account and you having access.
Setting up your blacklight environment
Blacklight uses the module system to manage the developer configuration. To get everything set up for us, you just need to run:
module load openmpi/1.6/gnu
module load python
module load mpt
You might want to add this to you ~/.bashrc
or equivalent.
Running on blacklight
Blacklight allows to connect via ssh to a head node where you can do some small development and add jobs to the job queue. Since anything you do on the head node disrupts work done in the queues, it is really impolite to allow any process to run for any length of time. Instead, you can use a special debug queue with (relatively) quick turnaround times. Note that jobs in the debug queue are limited and can run with only 16 processors.
Here is an example job:
#!/bin/bash
#Allocate only as much resources as you need
#It helps reducing your group and other groups' queueing time
#ncpus must be a multiple of 16
#PBS -l ncpus=16
#pmem can be up to 8gb
#use 8gb when you are using large data size
#PBS -l pmem=2gb
#PBS -l walltime=5:00
# Merge stdout and stderr into one output file
#PBS -j oe
#PBS -q batch
# use the name prog1.job
#PBS -N prog1.job
# Load mpi.
source /usr/share/modules/init/bash
module load openmpi/1.6/gnu
module load mpt
# Move to my $SCRATCH directory.
cd $SCRATCH
# Set this to the important directory.
execdir=PROGDIR
exe=parallelSort
# Set the argument for running your program
args="-s 10000000 -d exp -p 5"
# Copy executable to $SCRATCH.
cp $execdir/$exe $exe
# Run my executable
mpirun -np NCORES ./$exe $args
Pay attention to a couple of things:
The top of the file has a lot of comments of the form
#PBS flag_to_qsub
. These are used to control some parameters of your program. You can also control these parameters directly via arguments toqsub
:qsub jobs/your.job
or
qsub -l ncpus=16 -q jobs/your.job
You use the
$SCRATCH
directory to store your temporary input files, etc.- On blacklight, we use
omplace
to control OpenMP.omplace
will wrap the program and will pin each thread to a unique processor. Also remember that on blacklight you effectively have dedicated access to each CPU. - All paths are relative to
execdir
. If you are not usingmake jobs
to generate job files, you will need to modify this variable.
Using the queue
There are a couple of useful commands for dealing with the job queue. You should probably man
every one of these.
qsub
:qsub
is how you launch jobs. Feel free to learn more, but the basic usage is pretty simple:$ qsub my_job_file.job 231778.tg-login1.blacklight.psc.teragrid.org
On a successful launch, it reports to you the job identifier.
qstat
:qstat
is how you get information about the job queue. It is used in the following way:$ qstat -a tg-login1.blacklight.psc.teragrid.org: Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time -------------------- -------- -------- ---------- ------- ---- ---- ------ ----- - ----- 228983.tg-login1 blood res_2 RMI_RNA_st 328959 -- 192 -- 400:0 R 342:5 228988.tg-login1 blood res_2 BAB_RNA_al 336758 -- 128 -- 400:0 R 342:5 ... <snip> ... 230995.tg-login1 gcommuni batch_l1 nz2 -- -- 16 19000m 95:30 Q -- 230996.tg-login1 gcommuni batch_l1 nz3 -- -- 16 19000m 95:30 Q -- Total cpus requested from running jobs: 3536
Note that you can use this to see your approximate place in the queue:
qstat -a | awk '{print NR" "$0}' | grep $USER
qdel
:qdel
is used to cancel a job. If you accidentally launch a job you didn't mean to, you can always cancel it with:$ qdel 231778