UMBC logo
UMBC High Performance Computing Facility
Please note that this page is under construction. We are documenting the 240-node cluster maya that will be available after Summer 2014. Currently, the 84-node cluster tara still operates independently, until it becomes part of maya at the end of Summer 2014. Please see the 2013 Resources Pages under the Resources tab for tara information.
How to run MATLAB programs on maya

Introduction

Running MATLAB on HPC's cluster nodes is similar to running any other serial job. Make sure you've read the tutorial for C programs first, to understand the basics. We will not demonstrate any parallel code here, so reading just the serial section is okay for now. A basic introduction to running MATLAB in the computer labs at UMBC is available on the CIRC webpage here.

For more information about the software, see the MATLAB website. To use MATLAB on the cluster, the MATLAB module must be loaded.

[araim1@maya-usr1 ~]$ module load matlab

Performing Calculations on the Cluster Nodes

Let's try to run this sample MATLAB program, given below
% Generate two 100x100 matrices with random contents:
A=rand(100);
B=rand(100);

% Multiply the two matrices:
AB=A*B;

% Calculate the sum of the contents:
sumAB=sum(AB(:));

% Save the AB and sumAB variables to the Matlab save file out.mat:
save out.mat AB sumAB;


Download: ../code/matrixmultiply-matlab/matrixmultiply.m
As always, we will need a batch script too
#!/bin/bash
#SBATCH --job-name=matrixmultiply
#SBATCH --output=slurm.out
#SBATCH --error=slurm.err
#SBATCH --partition=develop

matlab -nodisplay -r "matrixmultiply, exit"

Download: ../code/matrixmultiply-matlab/run.slurm
Note that by using the serial queue, we've requested a single core of one node for our job. This will help to yield the best throughput of MATLAB jobs on the cluster. See the technical report HPCF-2009-1 (Sharma & Gobbert) on the publications page for more details. We can run our batch script in the usual way
[araim1@maya-usr1 matrixmultiply-matlab]$ sbatch run.slurm
sbatch: Submitted batch job 2621
[araim1@maya-usr1 matrixmultiply-matlab]$
After your job completes, you should see an out.mat MATLAB save file in your directory. Later on, if you want to get the data out of that file, you can use the the load command in MATLAB:
>> load out.mat
which will load in the AB and sumAB variables that you saved using your save command. Also in your directory, there should also be slurm.out and slurm.err files. The slurm.err file should be empty and the slurm.out file should contain something like this
[araim1@maya-usr1 matrixmultiply-matlab]$ cat slurm.out
                            < M A T L A B (R) >
                  Copyright 1984-2008 The MathWorks, Inc.
                         Version 7.6.0.324 (R2008a)
                             February 10, 2008


  To get started, type one of these: helpwin, helpdesk, or demo.
  For product information, visit www.mathworks.com.

[araim1@maya-usr1 matrixmultiply-matlab]$
You should be able to use any of the usual non-graphical MATLAB functionality if you follow the directions in this section. If you want to generate graphics in your MATLAB jobs, continue to the next section.

Generating Plots on the Cluster Nodes

As with all cluster jobs, you will need a batch script in order to run MATLAB
#!/bin/bash
#SBATCH --job-name=plotsine
#SBATCH --output=slurm.out
#SBATCH --error=slurm.err
#SBATCH --partition=develop

matlab -nodisplay -r "plotsine, exit"

Download: ../code/plotsine-matlab/run.slurm
Now you'll need the plotsine.m file that the script tries to run
zero_to_2pi=linspace(0,2*pi,1000);
them_sine=sin(zero_to_2pi);

plot(zero_to_2pi,them_sine);
print -dpng sine.png
print -deps sine.eps
print -djpeg sine.jpeg


Download: ../code/plotsine-matlab/plotsine.m
Now submit the batch script and wait for it to finish. After it finishes, you should see the following files
[araim1@maya-usr1 plotsine-matlab]$ ls
run.slurm      plotsine.m  sine.eps        sine.jpeg     
sine.png       slurm.err   slurm.out
[araim1@maya-usr1 plotsine-matlab]$ 
The sine.eps, sine.jpeg and sine.png files contain a plot of sin(x) from x=0..2*pi. The files are encapsulated postscript (.eps), joint photographic experts group (.jpeg) and portable network graphics files (.png), respectively. The slurm.err file should be empty and the slurm.out file should contain the same text as in the previous section. The three images you made should look something like this

PNG sine plot

The encapsulated postscript file (sine.eps) will be in greyscale since I used -deps instead of -depsc. Here are links to the three output files if you want to download them

Checking memory in Matlab programs

On the How to check memory usage page, we discuss various ways of monitoring memory usage, including logging it directly from your C code. In Matlab, there doesn't seem to be a built-in way to do this (not in the Linux version at least). But with a small amount of work, we can add the capability ourselves.

First grab the following C files, which are also used in How to check memory usage

The following C code is written in a specific form which Matlab can interface to. It will call the get_memory_usage_kb function defined in the C files above. The code retrieves the VmRSS and VmSize quantities for the current process (see How to check memory usage for more information), and returns them as a pair to Matlab.
#include <sys/types.h>
#include <unistd.h>
#include "mex.h"
#include "memory.h"

void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
    int* data;
    long vmrss;
    long vmsize;

    if (nrhs > 0)
    {
        mexErrMsgTxt("Too many input arguments.");
    }

    get_memory_usage_kb(&vmrss, &vmsize);

    plhs[0] = mxCreateNumericMatrix(1, 1, mxUINT32_CLASS, mxREAL);
    data = mxGetData(plhs[0]);
    data[0] = vmrss;

    plhs[1] = mxCreateNumericMatrix(1, 1, mxUINT32_CLASS, mxREAL);
    data = mxGetData(plhs[1]);
    data[0] = vmsize;
}
 

Download: ../code/check_memory-matlab/getmemusage.c
To compile this code, we need to use the Matlab MEX compiler. This is already installed on the cluster. We can use it as follows to compile our code. If the compilation succeeds, the file getmemusage.mexa64 is created.
[araim1@maya-usr1 check_memory-matlab]$ mex getmemusage.c memory.c
[araim1@maya-usr1 check_memory-matlab]$ ls
getmemusage.c  getmemusage.mexa64  memory.c  memory.h
[araim1@maya-usr1 check_memory-matlab]$
Now we can start up Matlab and call our new getmemusage function just like any usual function
[araim1@maya-usr1 check_memory-matlab]$ matlab -nodisplay

                                                   < M A T L A B (R) >
                                         Copyright 1984-2009 The MathWorks, Inc.
                                       Version 7.9.0.529 (R2009b) 64-bit (glnxa64)
                                                     August 12, 2009

 
  To get started, type one of these: helpwin, helpdesk, or demo.
  For product information, visit www.mathworks.com.
 
>> [vmrss, vmsize] = getmemusage

vmrss =

      101140


vmsize =

      933132
>> A = rand(5000, 5000);
>> [vmrss, vmsize] = getmemusage 

vmrss =

      298572


vmsize =

     1128448

>> 
Note that this approach has a few limitations. It can only keep track of memory used in the current process. Matlab may invoke external processes for some tasks, whose memory usage will not be counted by this method.

Parallel Programming

Access to the Parallel Computing Toolbox is now available on maya. This allows simple multicore programming, however it is limited to single node jobs. There are several programming constructs available in the Parallel Computing Toolbox: We will provide a simple example below. Detailed documentation on the use of the Parallel Computing Toolbox is available from MathWorks. Consider the following program, which is a multicore Hello World.
poolobj = parpool(8);
spmd
    msg = sprintf('Hello world from process %d of %d', labindex, numlabs);
end

for i=1:poolobj.NumWorkers
    disp(msg{i});
end
delete(poolobj);

Download: ../code/matlab-pct-hello/driver.m
The code starts up a parallel pool with 8 workers on the local machine and creates a parallel.Pool object which we call poolobj. Within the "spmd" block, the string "msg" is built on each process. Notice the special variables "labindex" (ID for parallel worker) and numlabs (number of parallel workers).

After the "spmd" block, the "msg" data is available as a data structure that can be manipulated in serial. Here it consists of eight strings; we loop through and print each one. Notice here that the number of workers can be accessed by "poolobj.NumWorkers".

#!/bin/bash
#SBATCH --job-name=matlab-pct
#SBATCH --output=slurm.out
#SBATCH --error=slurm.err
#SBATCH --partition=develop
#SBATCH --ntasks-per-node=8

matlab -nodisplay -r "driver, exit"

Download: ../code/matlab-pct-hello/run.slurm
A run of the code is shown below.
[araim1@maya-usr1 matlab-pct-hello]$ sbatch run.slurm
[araim1@maya-usr1 matlab-pct-hello]$ cat slurm.err
[araim1@maya-usr1 matlab-pct-hello]$ cat slurm.out

                            < M A T L A B (R) >
                  Copyright 1984-2014 The MathWorks, Inc.
                    R2014a (8.3.0.532) 64-bit (glnxa64)
                             February 11, 2014

 
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
 
Starting parallel pool (parpool) using the 'local' profile ... connected to 8 workers.
Hello world from process 1 of 8
Hello world from process 2 of 8
Hello world from process 3 of 8
Hello world from process 4 of 8
Hello world from process 5 of 8
Hello world from process 6 of 8
Hello world from process 7 of 8
Hello world from process 8 of 8
Parallel pool using the 'local' profile is shutting down.


[araim1@maya-usr1 matlab-pct-hello]$
Next is a simple example of "parallel for". We can replace simple loops with parallel loops with minimal programming effort.
poolobj = parpool(8);
x = zeros(1, 40);
parfor i = 1:40
  x(i) = i;
end
delete(poolobj);

x

Download: ../code/matlab-pct-parfor/driver.m
#!/bin/bash
#SBATCH --job-name=matlab-parallel-toolkit
#SBATCH --output=slurm.out
#SBATCH --error=slurm.err
#SBATCH --partition=develop
#SBATCH --ntasks-per-node=8

matlab -nodisplay -r "driver, exit"

Download: ../code/matlab-pct-parfor/run.slurm
[araim1@maya-usr1 matlab-pct-parfor]$ sbatch run.slurm
[araim1@maya-usr1 matlab-pct-parfor]$ cat slurm.err
[araim1@maya-usr1 matlab-pct-parfor]$ cat slurm.out

                            < M A T L A B (R) >
                  Copyright 1984-2014 The MathWorks, Inc.
                    R2014a (8.3.0.532) 64-bit (glnxa64)
                             February 11, 2014

 
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
 
Starting parallel pool (parpool) using the 'local' profile ... connected to 8 workers.
Parallel pool using the 'local' profile is shutting down.

x =

  Columns 1 through 13

     1     2     3     4     5     6     7     8     9    10    11    12    13

  Columns 14 through 26

    14    15    16    17    18    19    20    21    22    23    24    25    26

  Columns 27 through 39

    27    28    29    30    31    32    33    34    35    36    37    38    39

  Column 40

    40

[araim1@maya-usr1 matlab-pct-parfor]$

GPU Computing with MATLAB

Access to GPU through MATLAB programming is now available on maya. While providing greatly increased throughput, GPU programming has additional cost. Data must be sent from the CPU to the GPU before calculation and then retrieved from it afterwards. This memory access pattern resulted in only a subset of applications or algorithms are suitable for speedup via GPU Computing. Generally, your program should satisfy the following criteria: We will provide a simple example below. This example setups two matrices in GPU memory, multiply them, then copy the result back to CPU memory and display.
gpuDeviceCount

gpuDevice

A = ones(10, 'single', 'gpuArray');
B = 5 .* eye(10, 'single', 'gpuArray');
C = A * B;
C_host = gather(C);

C_host

Download: ../code/matlab-gpu/driver_gpu.m
The slurm file to submit this example is below:
#!/bin/bash
#SBATCH --job-name=matlab-gpu
#SBATCH --output=slurm.out
#SBATCH --error=slurm.err
#SBATCH --partition=batch
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --gres=gpu

matlab -nodisplay -r "driver_gpu, exit"

Download: ../code/matlab-gpu/run_gpu.slurm
A run of the code is shown below.
[hu6@maya-usr1 sec6_gpu]$ cat slurm.out 

                            < M A T L A B (R) >
                  Copyright 1984-2014 The MathWorks, Inc.
                    R2014a (8.3.0.532) 64-bit (glnxa64)
                             February 11, 2014

 
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
 
ans =

     1


ans = 

  CUDADevice with properties:

                      Name: 'Tesla K20m'
                     Index: 1
         ComputeCapability: '3.5'
            SupportsDouble: 1
             DriverVersion: 6
            ToolkitVersion: 5.5000
        MaxThreadsPerBlock: 1024
          MaxShmemPerBlock: 49152
        MaxThreadBlockSize: [1024 1024 64]
               MaxGridSize: [2.1475e+09 65535 65535]
                 SIMDWidth: 32
               TotalMemory: 5.0327e+09
                FreeMemory: 4.9211e+09
       MultiprocessorCount: 13
              ClockRateKHz: 705500
               ComputeMode: 'Default'
      GPUOverlapsTransfers: 1
    KernelExecutionTimeout: 0
          CanMapHostMemory: 1
           DeviceSupported: 1
            DeviceSelected: 1


C_host =

     5     5     5     5     5     5     5     5     5     5
     5     5     5     5     5     5     5     5     5     5
     5     5     5     5     5     5     5     5     5     5
     5     5     5     5     5     5     5     5     5     5
     5     5     5     5     5     5     5     5     5     5
     5     5     5     5     5     5     5     5     5     5
     5     5     5     5     5     5     5     5     5     5
     5     5     5     5     5     5     5     5     5     5
     5     5     5     5     5     5     5     5     5     5
     5     5     5     5     5     5     5     5     5     5

The command gpuDeviceCount will return the number of GPUs you have access now, my slurm file requested one node and one GPU, but you can request two GPU by having gres=gpu:2.

The command gpuDevice returns the property of the GPU, including type, the max block and thread size supported, GPU memory, etc.

In the computation part, I setup matrix A with all enties equal to 1, setup matrix B as a diagonal matrix with diagonal entries all equal to 5. The two matrices are already in GPU memory since I specified gpuArray at the beginning, but you can also copy existing data from CPU to GPU. One thing to notice is matrix C is calculated on GPU and therefore stays in GPU memory, one has to copy it to CPU memory by calling gather(), to display or use in other code that may execute on CPU.

Detailed documentation on GPU programming with MATLAB is available from GPU Computing.