Using the Xeon Phi on Latedays
By caryy

Setting up your envionment

There are various environment variables that have to be set before compiling and running programs built with the Intel C Compiler. Fortunately, all of these environment variables can be included at once with the following command:

source /opt/intel/bin/compilervars.sh intel64

Writing and compiling programs for the Xeon Phi

Because the Xeon Phi is a very new piece of hardware, documentation is fairly limited. However, Intel has kindly provided numerous fully compileable sample programs under

/opt/intel/composerxe/Samples/en_US/C++/mic_samples/

These samples can be copied into your home directory and compiled to test out the Phi's. Note, however, that the head node (latedays.andrew.cmu.edu) does not have a Xeon Phi, so these samples cannot be run directly from there. You may either SSH into a compute node and run it manually, or submit a job to the PBS job queue (see the next section).

For a brief overview of Xeon Phi, we'll take a look at intro_sampleC/sampleC03.c:

typedef double T;

#define SIZE 1000

// Variables may be decorated as a group using the push/pop method

#pragma offload_attribute(push, target(mic))
static T in1_03[SIZE];
static T in2_03[SIZE];
static T res_03[SIZE];
#pragma offload_attribute(pop)

static void populate_03(T* a, int s);

void sample03()
{
    int i;

    populate_03(in1_03, SIZE);
    populate_03(in2_03, SIZE);

    #pragma offload target(mic) optional in(in1_03, in2_03) out(res_03)
    {
        for (i=0; i<SIZE; i++)
        {
            res_03[i] = in1_03[i] + in2_03[i];
        }
    }

    if (res_03[0] == 0 && res_03[SIZE-1] == 2*(SIZE-1))
        printf("PASS Sample03\n");
    else
        printf("*** FAIL Sample03\n");
}

static void populate_03(T* a, int s)
{
    int i;

    for (i=0; i<s; i++)
    {
        a[i] = i;
    }
}

As you can see, programming for the Xeon Phi follows a declarative style, similar to the syntax of OpenMP.

There are a couple important parts of this program to look at:

#pragma offload_attribute(push, target(mic))
...
#pragma offload_attribute(pop)

This section tells the compiler that all of the variables declared between the two statements should be visible on the Phi (identified as mic, which stands for Many Integrated Core). One important thing to note here is that all variables declared this way are always visible on the host, and are only copied to the Phi when the offload computation begins, and copied back to the host when the offload computation ends.

#pragma offload target(mic) optional in(in1_03, in2_03) out(res_03)
{
...
}

All statements in this scope will be offloaded to the Phi, if possible.

  • optional tells the compiler to run the piece of code in this block on the host if no Phi is available, and should probably be left out of your programs so that it's easy to see when the code was not run on the Phi.
  • in(in1_03, in2_03) tells the runtime to copy in1_03 and in2_03 to the Phi at the beginning of the offload execution.
  • out(res_03) tells the runtime to copy res_03 from the Phi to the host at the end of the block.

No other data will be copied!

This style of programming also works really well with OpenMP to launch multiple threads on the Phi. An example of this is in intro_sampleC/sampleC08.c.

More information about special Intel-specific #pragma statements that can be found here.

Running the compiled programs using the job queue

To run the programs you compile with the job queue on latedays, you can use the simple job script below, replacing exe_name with the name of your binary:

#!/bin/sh
#PBS -l nodes=1:ppn=24
#PBS -l walltime=0:6:00

cd $PBS_O_WORKDIR
pbsdsh -u $PBS_O_WORKDIR/exe_name

This job script should be copied into a file named something like latedays.qsub and submitted to the job queue with qsub latedays.qsub.

Nothing complicated here, because OpenMP/offload programs are not really capable of inter-node communication. Using OpenMPI with OpenMP and offload programming is possible, but out of the scope of this article.

Further Reading

More information about programming on the Xeon Phi:

OpenCL is also an option for programming the Phi: