Usage

General
Connecting from Windows
MPI
Matlab
Mathematica

General

There is more than one way to connect to the server. You can use,

$ssh -X [server name] -l [user name]

The -X switch enables X11 forwarding, which allows you to run applications that use some GUI. The above command will connect you to the head node of the cluster. From the head node you can connect to any of the other 19 nodes, called slave nodes, without being prompted for a password. On the internal network of the cluster, the slave nodes are swt101 up to swt119. The name of the head node is swt100. To connect to any of the slave nodes from the head node you can simply type,

$ssh swt101

, which will connect you to the slave node swt101.

The /home directory of the head node is exported to all slave nodes. Thus, by accessing the /home on any of the slave nodes, we access the /home directory on the head node. So, if you run a program on a slave node, which writes to a file in the /home directory, the progam will be accessing the hard drive of the head node. To avoid potential efficiency problems with this setup, you should run your program from some temporary directory located outside of the /home directory. For this purpose you can use the /var/tmp directory on any of the slave nodes,

$mkdir /var/tmp/mydirectory

In this directory you can create your own temporary directory for use by your program. To delete your directory after your program is finished, use

$rm -r /var/tmp/mydirectory

The -r switch in the above command specifies that the deleting is to be done recursively, removing everything in your directory.

In addition you can check if anyone is using the node to which you logged in, by typing,

$users

, and you can check how much resources are being used by current users by typing

$top

The top utility will list the currently running processes together with their owners. There is also a slot listing how much of the CPU is being used. After running top you can press 1 to list both CPUs, since our systems have two per node. You can kill your process by pressing k, and entering your PID (process ID), which will be listed in the left column. To quit top, press q.

Alternatively you may create a simple shell script which will quickly execute ps command on each node. An example of such a script can be found here. Remeber to do chmod +x rps before you run it, to change the mode of the file to executable.

Preventing Job Termination after Log-Out
There are two ways to prevent your process from terminating after you log out. One is to use

$nohup process_name &

This will prevent your process from responding to hang_up calls when you log out. The standard output of your process will be redirected to nohup.out, from where it can be read after you log back in. The drawback of this method is that you will not be able to interact with your process while it is running.

Another way is to use

$screen process_name

This will open a new window in your terminal and run the process. To go back to command line, press Ctrl-a and d, which will detach the window. The process will continue to run even after you log out. You may reattach the window at any time using

$screen -d -r

The -d switch will first detach any attached window before reattaching. This is used in case your system terminates for some reason, before you manage to detach the window.

If you use the screen command to run more than one process, executing screen -d -r generates a list of currently running sessions, and you will be asked to provide an id number of the session you want to reattach. It may be more convenient to select a meaningful name for you session which you can use instead. This is done by passing an -S switch to the screen command in addition to your process_name at the time of execution,

$screen -S session_name process_name

To reattach this session you would then run

$screen -d -r session_name

For more information on the screen command, check the man pages.

Printing
There is only one printer attached to the system via network. The printer name is HP_Color_LaserJet_2830. You should make sure that you are using this printer name in your application. For example, in emacs, go to Options -- Customize Emacs -- Top Level Customization Group -- Wp -- Lpr and make sure that the printer name is as above. Then, instead of Lpr choose Ps Print -- Printer, and again make sure that the printer name is as above.

Connecting from Windows

To be able to connect from Windows, and use a graphic user interface, there are two options: you either need to install some form of SSH tools together with X11 server; or you need to install an NX client.

For the first option, you may use a commercial software called Exceed, or XWin32. A free alternative is PuTTY in conjugation with Xming; or a Linux emulator Cygwin/X.

For the second option, you may use the free NX Client by NoMachine.

The most important difference between the first and the second option is the protocol, which is X11 in the first case and NX in the second. NX is much faster than X11. In addition, NX Client by NoMachine gives you more functionality, such as suspending a session and resuming it on a different system.

If you decide to go with NX, then download the NX Client, and request the client key from an administrator. After you obtain the key, run the client, click "Configure" and input the server information. Then click "Key..."->"Import" and import the client key. Also, it will probably put less strain on the head node if you run the console rather than Gnome (default), KDE or other sessions at login. To do this select "Custom" in the pull down menu of the "Desktop" section in the "General" tab. Then click "Settings" and "Run the console".

If you decide to go with X11, then note that the difference between Cygwin and other software (including the NX software) is that Cygwin is a Linux emulator. It is probably the closest thing to Linux you can have running under Windows. This gives you an opportunity to get familiar with the Linux system on your own machine.

Following are some notes on installation of Cygwin. When running the Cygwin installer make sure to select OpenSSH, and xterm as packages to be installed -- these will not be selected by default. After installation is finished just click on the Cygwin icon, and type

$startx &

at the command prompt. After this, xterm should pop-up, and you may use ssh as discussed in the first section above.

You may find that startx doesn't always work as it should, and crashes before ever running xterm. A possible solution is to first close Cygwin completely -- you may have to run task manager to make sure that all Cygwin related processes are closed. Then, go to Start->Run and run cmd. Here type c:\cygwin\bin\ash.exe to run the ash shell. Now you should run rebaseall -v, and when it's done you may exit by typing exit. This often solves the problem.

MPI

The head node is the only node that has PGI compilers installed. Therefore, to use these compilers, you will have to use the head node. Once your code is compiled you have to move it to a slave node in order to run it using MPI. This is because MPI is disabled on the head node, to prevent an overload.

First, to ensure success, make sure that your environment variables contain the following,

PGI=/usr/pgi
PATH=$PATH:$PGI/linux86-64/6.1/bin
MANPATH=$MANPATH:/usr/pgi/linux86-64/6.1/man

LD_LIBRARY_PATH=/usr/local/lib:/usr/pgi/linux86-64/6.1/lib:$LD_LIBRARY_PATH

Before you run your code, you must boot mpd on each node you wish to use. mpd daemon is responsible for managing the running processes. There is a script that does this for you called mpdboot. To run it, type,

$mpdboot -n [number of mpds to start]

This will not work without a file specifying the host names. The default name of such file is mpd.hosts. You should create it using a text editor of your choice. Here is a sample,

$cat mpd.hosts
swt101
swt102
swt103

You can put all nodes in this list, not just first three. Note, however that swt100 can not be in this file. This is because mpd is disabled on the head node, as mentioned above. Including swt100 in this list will therefore cause mpdboot to generate an error.

After you create this file, you can run mpdboot. The script mpdboot will begin booting mpd on each node, beginning with the first one on the list. By default, only one mpd will be started per node.

Finally, to run your program, say a.out, type

$mpiexec -n [number of processes] a.out

This is a very brief outline about how to use MPICH2. For more information, and to download a manual, you should visit the Documentation section of the MPICH2 website, at http://www-unix.mcs.anl.gov/mpi/mpich2/

Common Problems
(handle_mpd_output 359) and (handle_mpd_output 368) are sometimes generated by mpdboot when run after mpd ring crashes. Apparently, after the crash there are some processes that should be killed before trying to run mpdboot again. These processes will not be killed by either mpdallexit or mpdcleanup, it has to be done manually. Doing this seems to eliminate those errors. The processes in question may be mpiexec (python script) and python itself.

Matlab

As long as you remember to use -X switch with the ssh command, you should have no problem running Matlab. You need to use this switch if you wish to use Matlab's GUI.

Mathematica

Before running Mathematica, you need to install Mathematica fonts on your system. Download these fonts from Wolfram. If you are using Hummingbird from Exceed, then read this "How do I get Exceed version 8 or later to recognize the Mathematica fonts?". If you are using Linux, or Cygwin, then you should read this "I am connecting from my Linux box to the server to run Mathematica, but the Mathematica fonts are not displayed. How do I resolve this issue?". Also, after installing fonts, and running Mathematica, it may happen that when you press the Backspace button, a small square will appear instead. To fix this, make sure that your NumLock is off.