User's Guide to Parallel Computers

Department of Computer Engineering, Chulalongkorn University

Veera Muangsin

http://space.cp.eng.chula.ac.th/documents/parallelguide.html
Revision 1.01
25 April 2003

This document is a user guide for students and faculty staff who want to use parallel computer facilities at the department. It is originally written for students of 2110412 and 2110732 Parallel Computing courses. It is divided into the following sections:

How to connect to the parallel machines
High-Throughput Computing: How to submit batch jobs
Parallel Computing: How to run MPI programs
References

At the time of writting, we have two parallel computer systems

A cluster of PCs running Linux operating system, named Apollo Cluster. It can be accessed by logging into its front-end node, called apollo1. At the moment, five other machines, apollo2 to apollo6 are available. Each machine has an Athlon XP 2000+ processor and 512 MB memory.
Two Sun multiprocessor computers running Solaris operating system. They are named athena and zeus. Each of them has 4 processors.

Problems and discussion about the parallel machines can be posted in this forum (http://space.cp.eng.chula.ac.th/forum/index.html).

I. How to connect to the parallel machines

The parallel machines use the department's shared user accounts and shared file system. Anyone who has a user account at the department can log into both parallel machines and have usual access to home directory. There are many ways to connect to the machines:

1. Telnet

This is the easiest way. Telnet is a terminal emulator program running in text mode. A telnet program is available in all Windows and Linux machines. To connect, just run telnet <hostname>. Replace <hostname> with apollo1, athena, or zeus. Use full host name, e.g. apollo1.cp.eng.chula.ac.th, if necessary.

2. Ssh

ssh (Secure Shell) is similar to telnet but much more secure. There are many free ssh programs for Windows. PuTTY is recommended.
You can download PuTTY from here or here. Just download all files into a directory such as C:\Program Files\PuTTY.
To connect to zeus,

Run putty.exe
Choose SSH protocol
Enter the host name in the host name field, e.g. apollo1 or apollo1.cp.eng.chula.ac.th
Enter username and password

3. VNC

If you need to run graphics applications remotely, try VNC. VNC (Virtual Network Computing) is a remote display system which allows you to view a graphical 'desktop' environment from anywhere on the Internet and from a wide variety of machine architectures. Therefore, you can run programs on these machinces in graphics mode from your PC. You can download it from here or here.

To run VNC,

Connect to the machine in text mode using telnet or ssh.
Make sure that you have /usr/X/bin (for Sun machines, i.e. athena and zeus) or /usr/X11R6/bin (for Linux machines, i.e. apollo1) in your PATH environment variable.
Start a vncserver process. On athena or zeus, run /usr/local/bin/vncserver. On apollo1, run /usr/bin/vncserver.
The first time you run vncserver, it will ask you to set up a password. This is the password for making a connection to your vncserver process and it can be different to your login password. Once you have set your password, run vncserver again. If you have already set a password, vncserver will give you the display name, for example, "zeus:1". You can change the password later by running vncpasswd.
Run the vncviewer program on your PC. Enter the given display name, e.g. "zeus:1", and then enter the password for vncserver. VNC will bring the graphical desktop environment of the remote machine onto your PC.
When you want to disconnect, just close the VNC client window on your PC. You do not need to log out from the desktop environment on the remote machine. You can reconnect to your vncserver at anytime later and the desktop environment will be the same as when you leave. You can stop vncserver on the remote machine by running "vncserver -kill <display name>", e.g. "vncserver -kill zeus:1".

By default, VNC creates a startup file ~/.vnc/xstartup that launches a window manager program called twm. On the Sun machines, if you find twm unattractive, you may wish to use another window manager program called CDE (Common Desktop Environment) instead. Here is how to switch to CDE:

Edit ~/.vnc/xstartup. It looks like this:

#!/bin/sh

xrdb $HOME/.Xresources
xsetroot -solid grey
xterm -geometry 80x24+10+10 -ls -title "$VNCDESKTOP Desktop" &
twm &

Change the last line to

/usr/dt/bin/dtwm &

II. High-Througput Computing: How to submit batch jobs

If you have many computers, and you have to run a lot of programs or the same program with different parameters, you may wish that you can enter all the commands at once and they are automatically assigned to available computers. You do not want to make an individual program to run faster on a parallel machine. What you want is that the whole lot finishes earlier with a parallel machine. This is called "high-throughput computing".
This requirement can be achieved by a "batch queueing system" or "workload management system". On the Apollo Cluster, a batch queueing system called OpenPBS has been installed.

1. Create a job file

A job file, or a PBS script, is a shell script to run your job. It must be made executable (use chmod +x <filename>).
Here is an example of a job file:

#!/bin/bash
#PBS -l cput=00:01:00
#PBS -o test.out
#PBS -e test.err
#PBS -m ea
echo Working directory is $PBS_O_WORKDIR > test.out
cd $PBS_O_WORKDIR
echo `date` `hostname` `pwd` >> output
./myprogram

Here is the explanation, line by line.

#!/bin/bash
Like all shell scripts, this line tells what shell you want to run. In this case, it is Bash shell.

#PBS -l cput=00:01:00
A line that begins with #PBS is a command for PBS. If the script is run directly on a shell, it is interpreted as a comment line. This line tells PBS that your job is expected to run for one minute. It is required by PBS to determine the size of your job.
You should estimate the duration of your job as accurately as possible for efficient scheduling. If you underestimate, your job will be terminated by PBS when the time runs out. If you overestimate, your job may be started later than it should be.

#PBS -o test.out
#PBS -e test.err
These two lines redirect standard output (-o) and standard error (-e) to the specified files. Since the job will be run in the background, you need to write the screen output into a file instead.

#PBS -m ea
This tells PBS to send you an e-mail when the job ends (e) or aborts (a).

echo Working directory is $PBS_O_WORKDIR
cd $PBS_O_WORKDIR
PBS has some built-in environment variables, for example, $PBS_O_WORKDIR is the working directory.

echo `date` `hostname` `pwd` >> output
The output is written into a file named 'output'.

2. Submit the job file

Enter qsub <jobfile>, for example, qsub test.pbs. It will print the ID of your job, for example, 1234.apollo1.cp.eng.chula.ac.th.

3. Check the status of the job queue

You can check the status of the job queue by using qstat command. It will print something like this:

Job id           Name             User             Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
1233.apollo1     hahaha.pbs       user001          01:25:05 R short
1234.apollo1     test.pbs         user002          00:00:10 R short
1235.apollo1     hohoho.pbs       user003                 0 Q short

It shows that the jobs of user001 and user002 have been running, and the job of user003 is waiting in the queue.

4. How to cancel a job

You can cancel your job before it finished by qdel <jobid> command, for example, qdel 1234.

5. PBS commands

PBS commands include:

qalter: Alter a batch job
qdel: Delete a batch job
qhold: Hold a batch job
qmsg: Send a message to a batch job
qmove: Move a batch job to another queue
qrls: Release held jobs
qrerun: Rerun a batch job
qselect: Select a specific subset of jobs
qsig: Send a signal to a batch job
qstat: Show status of batch jobs
qsub: Submit a job
xpbs: Graphical Interface (GUI) to pbs commands

For the usage of these commands, please consult the on-line manual page (man <command>).

III. Parallel Computing: How to run MPI programs

If you want an individual program to run faster on a parallel machine, you have to divide it into smaller pieces and run them concurrently on multiple processors or multiple computers. This is called "parallel computing".
The program must be written in such a way that the task is divided into smaller subtasks, a number of processes are created, the subtasks are assigned to the processes, and communication may occur among the subtasks.
MPI (Message Passing Interface) is a set of library functions for developing parallel programs in a message passsing programming model. It is suitable for clusters, but also works on multiprocessors. There are many implementations of MPI, for example, MPICH and LAM. We have MPICH installed on the Apollo Cluster and the Sun machines. It is configured to communicate with Secure Shell protocol.

1. Set up for secure shell

Before running your first MPI program, please do the following setup. After the setup, you will not have to enter the password when you ssh between the machines.

ssh-keygen -t dsa
When you are asked to enter a passphrase, just hit return.
cp ~/.ssh/id_dsa.pub ~/.ssh/authorized_keys2

2. Add `/usr/local/mpich/bin` into your `PATH`

3. Create a machine file

A machine file is a text file containing a list of machines on which you want to run your program. By default, MPICH will allocate a process on each machine in the list in a round-robin basis.

For example, to make MPICH allocate a process on each processor of the Sun machines (each machine has four processors), edit a machine file named machines.sun which contains:

zeus
zeus
zeus
zeus
athena
athena
athena
athena

For the Apollo cluster, create a file named machines.apollo which contains:

apollo2
apollo3
apollo4
apollo5
apollo6

4. Edit and compile an MPI program

Use mpicc instead of gcc to compile an MPI program.
mpicc <source_program> -o <executable_program>
Since we have both Linux and Sun, you may wish to name your executable program according to where the program is compiled. For example,
When compile on Sun, use mpicc myprog.c -o myprog.sun
When compile on Linux, use mpicc myprog.c -o myprog.linux

5. Run an MPI program

mpirun [mpirun_options...] <executable_program> [options...]

Frequently used command:
mpirun -machinefile <machine_file> -np <number_of_processes> <executable_program>

For example:
mpirun -machinefile machines.sun -np 8 myprog.sun
mpirun -machinefile machines.apollo -np 8 myprog.linux

A Simple MPI Program: Just say hi (greeting.c)

#include <stdio.h>
#include <mpi.h>

main(int argc, char** argv) {
    int my_rank;                /* Rank of process */
    int p;                      /* Number of processes */
    int source;                 /* Rank of sender */
    int dest;                   /* Rank of receiver */
    int tag = 50;               /* Tag for messages */
    char message[100]; /* Storage for the message */
    char name[32];      /* Processor name */
    int name_len;
    MPI_Status status; /* Return status for receive */

    printf("start\n");
    MPI_Init(&argc, &argv);
    printf("MPI_Init OK\n");

    MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
    MPI_Comm_size(MPI_COMM_WORLD, &p);
    MPI_Get_processor_name(name, &name_len);
    if (my_rank != 0) {
        sprintf(message, "Greetings from process %d at %s!", my_rank, name);
        dest = 0;
        MPI_Send(message, strlen(message)+1, MPI_CHAR, dest,
            tag, MPI_COMM_WORLD);
    } else {
        for (source = 1; source < p; source++) {
            MPI_Recv(message, 100, MPI_CHAR, source, tag,
                MPI_COMM_WORLD, &status);
            printf("%s\n", message);
        }
    }

MPI_Finalize();

} /* main */

Another Program: Summation (sum.c)

#include "mpi.h"
#include <stdio.h>
#include <math.h>

#define MAXSIZE 1000

int main(int argc, char *argv)
{
    int myid, numprocs;
    int data[MAXSIZE], i, n, x, low, high, myresult, result;
    double start, stop;

    MPI_Init(&argc,&argv);
    MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
    MPI_Comm_rank(MPI_COMM_WORLD,&myid);

    n = MAXSIZE;
    if (myid == 0) { /* initialize data */
        for(i = 0; i < n; i++) {
            data[i] = 1;
        }
    }

if (myid == 0 ) start = MPI_Wtime();

/* broadcast data */
MPI_Bcast(data, n, MPI_INT, 0, MPI_COMM_WORLD);

    /* Add my portion of data */
    x = n/numprocs;
    low = myid * x;
    high = low + x;
    for(i = low; i < high; i++) {
        compute();           /* Do some computation. */
        myresult += data[i];
    }
    printf("I got %d from %d\n", myresult, myid);

    /* Compute global sum */
    MPI_Reduce(&myresult, &result, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
    if (myid == 0) printf("The sum is %d.\n", result);

    if (myid == 0) {
        stop = MPI_Wtime();
        printf("parallel section time %f", stop-start);
    }

MPI_Finalize();
}

compute()
{
    int i;
    for (i=0; i<1000000; i++);
    /* Nothing useful. Just to make timing more interesting */
}

One more: Calculating Pi (pi.c)

#include "mpi.h"
#include <stdio.h>
#include <math.h>

double f( double );
double f( double a )
{
return (4.0 / (1.0 + a*a));
}

int main( int argc, char *argv[])
{
    int done = 0, n, myid, numprocs, i;
    double PI25DT = 3.141592653589793238462643;
    double mypi, pi, h, sum, x;
    double startwtime = 0.0, endwtime;
    int namelen;
    char processor_name[MPI_MAX_PROCESSOR_NAME];

    MPI_Init(&argc,&argv);
    MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
    MPI_Comm_rank(MPI_COMM_WORLD,&myid);
    MPI_Get_processor_name(processor_name,&namelen);

fprintf(stderr,"Process %d on %s\n",
myid, processor_name);

    n = 0;
    while (!done)
    {
        if (myid == 0)
        {
/*
            printf("Enter the number of intervals: (0 quits) ");
            scanf("%d",&n);
*/
            if (n==0) n=100; else n=0;

            startwtime = MPI_Wtime();
        }
        MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD);
        if (n == 0)
            done = 1;
        else
        {
            h   = 1.0 / (double) n;
            sum = 0.0;
            for (i = myid + 1; i <= n; i += numprocs)
            {
                x = h * ((double)i - 0.5);
                sum += f(x);
            }
            mypi = h * sum;

MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);

            if (myid == 0)
            {
                printf("pi is approximately %.16f, Error is %.16f\n",
                       pi, fabs(pi - PI25DT));
                endwtime = MPI_Wtime();
                printf("wall clock time = %f\n",
                       endwtime-startwtime);
            }
        }
    }
    MPI_Finalize();

return 0;
}