UMBC logo
UMBC High Performance Computing Facility
Please note that this page is under construction. We are documenting the 240-node cluster maya that will be available after Summer 2014. Currently, the 84-node cluster tara still operates independently, until it becomes part of maya at the end of Summer 2014. Please see the 2013 Resources Pages under the Resources tab for tara information.
How to Compile C programs on maya

Introduction

In this tutorial we will demonstrate compilation of C source code on maya. First we start with a simple serial example, then work our way to compiling parallel code. Once the code is compiled we will see how to run it. We will assume that you know some basic C, so the code will not be explained in much detail. Working on a distributed cluster like maya is fundamentally different from working on a standard server (like gl.umbc.edu) or a personal computer, so please make sure to read and understand this material. More details can be found in manual pages on the system (e.g. try the command "man mpicc").

A convenient way to save the example code to your account is as follows. There is a "download" link under each code example. You can copy this link from your browser and issue the following command in your maya terminal session.

[araim1@maya-usr1 ~]$ wget <paste_the_link_here>

For example

[araim1@maya-usr1 ~]$ wget http://www.umbc.edu/hpcf/code/hello_serial/hello_serial.c
--16:08:24--  http://www.umbc.edu/hpcf/code/hello_serial/hello_serial.c
Resolving www.umbc.edu... 130.85.12.11
Connecting to www.umbc.edu|130.85.12.11|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 183 [text/plain]
Saving to: `hello_serial.c'

100%[======================================================================================>] 183         --.-K/s   in 0s     

16:08:24 (29.1 MB/s) - `hello_serial.c' saved [183/183]

[araim1@maya-usr1 ~]$ ls
hello_serial.c
[araim1@maya-usr1 ~]$

We have shown the prompt in the examples above to emphasize that a command is being issued. When following the examples, your prompt may look a bit different (e.g. your own username will be there!). When following along, be careful to only issue the command part, and not the prompt or the example output.

Serial Hello World

We will write a simple "hello world" program that prints the name of the host machine. Here is the code
#include <stdio.h>
#include <unistd.h>

int main(int argc, char* argv[])
{
    char hostname[256];
    gethostname(hostname, 256);

    printf("Hello world from %s\n", hostname);

    return 0;
}

Download: ../code/hello_serial/hello_serial.c

Once you have saved this code to your account, try to compile it. There are several C compilers on the system. We will demonstrate the Intel C compiler, which is the default on maya.

[araim1@maya-usr1 hello_serial]$ icc hello_serial.c -o hello_serial
[araim1@maya-usr1 hello_serial]$

If successful, no warnings will appear and an executable "hello_serial" will have been created.

[araim1@maya-usr1 hello_serial]$ ls
hello_serial  hello_serial.c

To see how to run your serial executable on the cluster, jump to how to run serial programs.


Parallel Hello World

Now we will compile a "hello world" program which can be run in parallel on multiple processors. Save the following code to your account.
#include <stdio.h>
#include <mpi.h>

int main (int argc, char *argv[])
{
    int id, np;
    char processor_name[MPI_MAX_PROCESSOR_NAME];
    int processor_name_len;

    MPI_Init(&argc, &argv);

    MPI_Comm_size(MPI_COMM_WORLD, &np);
    MPI_Comm_rank(MPI_COMM_WORLD, &id);
    MPI_Get_processor_name(processor_name, &processor_name_len);

    printf("Hello world from process %03d out of %03d, processor name %s\n", 
        id, np, processor_name);

    MPI_Finalize();
    return 0;
}

Download: ../code/hello_parallel/hello_parallel.c

This version of the "hello world" program collects several pieces of information at each MPI process: the MPI processor name (i.e., the hostname), the process ID, and the number of processes in our job. Notice that we needed a new header file mpi.h to get access to the MPI commands. We also needed to call MPI_Init before using any of them, and MPI_Finalize at the end to clean up. Try to compile the code with the following command.

[araim1@maya-usr1 hello_parallel]$ mpicc hello_parallel.c -o hello_parallel
hello_parallel.c:
[araim1@maya-usr1 hello_parallel]$

After a successful compilation with no warnings, an executable "hello_parallel" should have been created

[araim1@maya-usr1 hello_parallel]$ ls
hello_parallel  hello_parallel.c
[araim1@maya-usr1 hello_parallel]$ 

To see how to run your parallel executable on the cluster, jump to how to run parallel programs.

In this example, we've written output from our MPI program to stdout. As a general guideline, stdout and stderr should be used for reporting status information, and not for returning large datasets. If your program does need to write out a lot of data, it would be more appropriate to use file I/O instead.

Choosing a Compiler and MPI Implementation

In the parallel code example, we've used a special compilation command "mpicc", that knows how to generate a parallel executable. "mpicc" is actually not a new compiler, but a generic command that uses one of the underlying C compilers (such as icc, pgcc, gcc, etc). It also relies on a specific MPI implementation, several of which are available on maya.

When you compile MPI programs, the compiler needs information about where to find the MPI libraries and which libraries to link to. Fortunately you don't have to worry about this since the MPI implementations provide wrapper scripts which call the compiler for you. These scripts are mpicc (for C), mpicxx (for C++), mpif90 (for Fortran 90), and mpif77 (for Fortran 77). In order to successfully compile or run any MPI program, you must have your PATH, LD_LIBRARY_PATH and pieces of your environment set correctly so that your shell can find the wrapper script and the MPI libraries. This configutation is set by loading appropriate modules.

By default, your account is set up to use the Intel compiler with the MVAPICH2 MPI implementation. To verify this, issue the following command.

[araim1@maya-usr1 ~]$ module list
  1) gcc/4.8.1                     3) intel/2013_sp1
  2) slurm/2.5.7                   4) mvapich2/intel/2013_sp1/1.9
[araim1@maya-usr1 ~]$ 

Generally, the Intel and Portland Group compilers produce more highly optimized programs which take advantage of the specific architecture, while GCC produces programs that are more likely to be portable across architectures (e.g. so a program copied from another machine is likely to still run). Therefore, the Intel and Portland Group compilers are recommended for programs requiring the best possible performance.

Other compilers and MPI implementations can be loaded via the module command; see Using modules on maya for more information.

Users of tara should note that the module system replaces the switcher. Configurations on maya are accessed by loading and unloading modules.

Above we saw that we had a specific compiler/MPI combination loaded. The compiler and MPI implementation are combined because the MPI libraries your code uses at runtime should have been compiled with the same compiler you're now using to compile your code. That means you have to pick one of the combinations first, before both compiling and running your program. It also means that if you change to a different combination, you'll need to recompile your code before running it.

In the tutorials, we always assume you are using the default switcher setting PGI + MVAPICH2. MVAPICH2 and OpenMPI are about equally as easy to use, but we've found in preliminary benchmarks that MVAPICH2 consistently performs better for jobs with larger numbers of processes.
Another useful thing to mention is the "-show" flag for the MPI compiler commands. This will display which options are currently in use. For example:
[araim1@maya-usr1 hello-parallel]$ mpicc -show
icc -I/cm/shared/apps/slurm/2.5.7/include -L/cm/shared/apps/slurm/2.5.7/lib64 -L
/cm/shared/apps/slurm/2.5.7/lib -L/cm/shared/apps/slurm/2.5.7/lib64 -L/cm/shared
/apps/slurm/2.5.7/lib -I/usr/cluster/mvapich2/1.9/intel/2013_sp1/include -L/usr/
cluster/mvapich2/1.9/intel/2013_sp1/lib -lmpich -lpmi -lopa -lmpl -libmad -lrdma
cm -libumad -libverbs -lrt -lpmi -lpthread
[araim1@maya-usr1 ~]$ 
We can see that a call to "mpicc" will invoke the icc compiler with MVAPICH2 options set, as we would expect from our module setup.

Logging which nodes are used

For a parallel program, it's always a good idea to log which compute nodes you've used. We can modify our parallel hello world a little bit to accomplish this. Running this example on the cluster is the same as the Parallel Hello World program, see how to run parallel jobs.

Logging which nodes are used - Version 1

First we'll start by logging some information to stdout. If you ran the Parallel Hello World example above, you probably noticed that the processes reported back in a random order. This is a bit difficult to read, so let's try sorting the responses before we print them out. To do this, we'll have the process with ID 0 receive greeting messages from every other process, in order by process ID. Process 0 will handle writing the messages to stdout.
#include <stdio.h>
#include <mpi.h>
#include <string.h>

int main(int argc, char* argv[])
{
    int id, np, processor_name_len;
    int j;
    int dest;
    int tag = 0;
    char processor_name[MPI_MAX_PROCESSOR_NAME];
    char message[100];
    MPI_Status status;

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &id);
    MPI_Comm_size(MPI_COMM_WORLD, &np);
    MPI_Get_processor_name(processor_name, &processor_name_len);

    sprintf(message, 
        "Process %03d out of %03d running on processor %4s", 
        id, np, processor_name);

    if (id == 0)
    {
        printf("%s\n", message);

        for (j = 1; j < np; j++)
        {
            MPI_Recv(message, 100, MPI_CHAR, j, tag, MPI_COMM_WORLD, &status);
            printf("%s\n", message);
        }
    }
    else
    {
        dest = 0;
        MPI_Send(message, strlen(message)+1, MPI_CHAR, dest, tag, MPI_COMM_WORLD);
    }

    MPI_Finalize();
    return 0;
}

Download: ../code/hello_send_recv/hello_send_recv.c
Message sending is accomplished using the MPI_Send function, and receiving with the MPI_Recv function. Each process prepares its own message, then execution varies depending on the current process. Process 0 writes its own message first, then receives and writes the others in order by process ID. All other processes simply send their message to process 0.

Logging which nodes are used - Version 2

Now we'll make a few improvements to the first version, (1) to make it look more like a real project and (2) to create a useful utility function. Download the following files
#include <stdio.h>
#include <mpi.h>
#include "nodes_used.h"

int main(int argc, char* argv[])
{
    int id, np, processor_name_len;
    char processor_name[MPI_MAX_PROCESSOR_NAME];

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &id);
    MPI_Comm_size(MPI_COMM_WORLD, &np);
    MPI_Get_processor_name(processor_name, &processor_name_len);

    FILE* log_fp = NULL;
    if (id == 0)
    {
        log_fp = fopen("nodes_used.log", "w");
    }

    /* Log our MPI processes to the log file. We could also have
       specified the special FILE* names "stdout" or "stderr" here */
    log_processes(log_fp, id, np, processor_name);

    if (id == 0)
    {
        fclose(log_fp);
    }

    MPI_Finalize();
    return 0;
}


Download:
../code/hello_send_recv-2/hello_send_recv.c
#ifndef NODES_USED_H
#define NODES_USED_H

#include <stdio.h>
#include <string.h>
#include <mpi.h>

void log_processes(FILE* fp, int id, int np, char* processor_name);

#endif

Download: ../code/hello_send_recv-2/nodes_used.h
#include "nodes_used.h"

void log_processes(FILE* fp, int id, int np, char* processor_name)
{
    int j, dest;
    char message[100];
    int tag = 0;
    MPI_Status status;

    sprintf(message,
        "Process %03d out of %03d running on processor %4s",
        id, np, processor_name);

    if (id == 0)
    {
        fprintf(fp, "%s\n", message);

        for (j = 1; j < np; j++)
        {
            MPI_Recv(message, 100, MPI_CHAR, j, tag, MPI_COMM_WORLD, &status);
            fprintf(fp, "%s\n", message);
        }
    }
    else
    {
        dest = 0;
        MPI_Send(message, strlen(message)+1, MPI_CHAR, dest, tag, MPI_COMM_WORLD);
    }
}

Download: ../code/hello_send_recv-2/nodes_used.c
OBJS := nodes_used.o hello_send_recv.o

EXECUTABLE := hello_send_recv

DEFS := 
CFLAGS := -g -O3 

INCLUDES :=
LDFLAGS := -lm 

CC := mpicc

%.o: %.c %.h
    $(CC) $(CFLAGS) $(INCLUDES) -c $< -o $@

$(EXECUTABLE): $(OBJS)
    $(CC) $(CFLAGS) $(INCLUDES) $(OBJS) -o $@ $(LDFLAGS)

clean:
    rm -f *.o $(EXECUTABLE)


Download: ../code/hello_send_recv-2/Makefile
Warning: if you copy and paste the Makefile text from your browser, you will lose some of the formatting. Namely, some of the lines need to begin with tabs. The easiest way to avoid this problem is to download the file using the link.
Now it should be a simple matter to compile the program.
[araim1@maya-usr1 hello_send_recv-2]$ make
mpicc -g -O3   -c nodes_used.c -o nodes_used.o
mpicc -g -O3    -c -o hello_send_recv.o hello_send_recv.c
mpicc -g -O3   nodes_used.o hello_send_recv.o -o hello_send_recv -lm 
[araim1@maya-usr1 hello_send_recv-2]$ ls
Makefile                                nodes_used.c
hello_send_recv                          nodes_used.h
hello_send_recv.c                        nodes_used.o
hello_send_recv.o
[araim1@maya-usr1 hello_send_recv-2]$
It's also simple to clean up the project (object files and executables)
[araim1@maya-usr1 hello_send_recv-2]$ make clean
rm -f *.o *.oo hello_send_recv
[araim1@maya-usr1 hello_send_recv-2]$ ls
Makefile  hello_send_recv.c  nodes_used.c  nodes_used.h
[araim1@maya-usr1 hello_send_recv-2]$
Now the nodes_used.h and nodes_used.c files can be copied to other projects, and used as a utility.

Compiling C programs on other Hardwares

Compiling on GPUs

For instructions on how to compile code for the GPU, see CUDA for GPU.

Compiling on Intel Phi coprocessors

For instructions on how to compile code for the the Intel Phi coprocessor, see Intel Phi.