Primer to using the Big Red Supercomputer


Table of Contents

  1. Introduction to Big Red

  2. Requesting An Account (or Software)

  3. Login to Big Red using SSH

  4. Softenv to Setup Your Software Environment

  5. Big Red's File Storage Options

  6. Compiling and Running Your Own Programs

  7. Submit Simple Jobs on Big Red

  8. Running BioInformatics Software (BLAST, MEME, etc.)

  9. More Information about LoadLeveller (LL) and Migration from PBS

  10. File Transfers To and From Big Red using GridFTP

Top


Introduction to Big Red


The Big Red cluster consists of a 768 IBM JS-21 compute nodes, with each node having 8 GB memory. Check out the technical schematic and the associated hardware details about Big Red, given below to understand how the cluster is organized structured.

Big Red technical schematic
Click above image to expand in new window


Click here for detailed Big Red Hardware Information

Supercomputer/cluster vs. Personal computer/cluster

There are some fundamental difference between a user-owned computer (or a small user-owned cluster within a research lab) and a supercomputer (or a cluster). There is a need for usage on the latter systems to be moderated, i.e., there is a need for a traffic cop, whos guarantees best performace and utilization.

More on Need for a Traffic Cop

To further explain the need for traffic cop, let us consider a rental car analogy. Assume you have a sedan, that you use for day-to-day travel. You will likely have full rights to drive your own car at any time and to any place. But, now, consider a scenario where you're moving, and your car is not big enough. You might need a truck or some such ... if you wanted to rent one from a rental car company, you will go through a process of reservation, picking up the truck, and paying for it, so forth.. right?

The situation is pretty similar with supercomputers and clusters, just that as an IU researcher, using IU's research systems does not require you to pay :-).

Note to TeraGrid users: If you are a TeraGrid user, you still don't pay monitarily but need to get an allocation through the TeraGrid project allocations committee before you can use IU systems that are part of TeraGrid.

Supercomputers and clusters still need a traffic cop to direct traffic so that users do not step on each other's toes because:

  • A large number of users want to use the system at the same time, to do research that might be unrelated to each other.
  • Given the large number of processors and storage, large number of jobs, belonging to multiple users, may run at the same given time (and multiple users may store large amounts of research data).

This is where a piece of software, called job manager, that can be used to submit and monitor jobs on a supercomputer, comes in.

LoadLeveler/PBS as a traffic cop on IU research systems

On IU research systems, the following job submission systems are used:

Job Manager Systems that use it
IBM LoadLeveler (LL) Big Red, Libra
Portable Batch System (PBS) AVIDD clusters

All IU clusters use the MOAB scheduler for job scheduling, incorporating a fair-share mechanism into the mix (based on research system time used by each user trying to run a job).

Fair-share Enforcement on Big Red

As with any other supercomputer or cluster, a fair-share mechanism is incorporated within the Big Red system. This mechanism does not allow users to run jobs on the login node or on the compute node outside of the LL job submission system. Any job you submit outside of LoadLeveler is killed off if it uses more than 20 minutes of CPU time. This includes globus jobs that use the jobmanager/fork.

Top


Requesting An Account (or Software)


Account Requests

TeraGrid Allocations (Both IU and Non-IU Researchers): Non-IU users are encouraged to request time on Big Red through the TeraGrid allocations process. More information is available in the following article: How do I apply for a TeraGrid allocation?.

IU Researchers: If you are an IU researcher, and you wish to apply for your own account on the any of the UITS-Research Technologies (RT) division's supercomputing resources (Big Red, Libra), visit the Research and Technical Services (RATS) application web page and submit a request.

Software Requests

If you already have an account on one (or more) of UITS-RT supercomputing resources (Big Red, Libra), and would like to request software to be installed, then visit the application web page and submit a request.

For the purpose of this workshop...

Workshop attendees - STOP! For the purpose of this workshop, if you want to try any of the information you learn today in this workshop, we have already created a training account for you on Big Red.

Top


Login to Big Red


Users can login to the Big Red cluster using:

  • MyProxy (Recommended): Use a TeraGrid resource that allows password based login (NCSA/SDSC/etc.) as a bridge, then use MyProxy to get a proxy and then hop onto Big Red

  • SSH Keys: Use SSH keys to login from your own workstation; first time users will have to send their SSH public key to the TeraGrid helpdesk so it can be added to your Big Red account.

  • Globus Certificate (Not recommended for new users): Use default NCSA certificate or other TeraGrid approved Globus certificate to login. You will need to have your own installation of Globus toolkit to use this method.

  • IU Kerberos Password (ONLY applicable to local IU users): Local IU users can also login using their IU Kerberos password.

NOTE: For demonstration purposes in workshop, and also for attendees, who want to try stuff out following the workshop, we will use local passwords to log onto Big Red.

For more information on how to use MyProxy (the recommended way to login to Big Red, for TeraGrid users), authenticate once and then be able to login any where, check out: Login to Big Red using MyProxy.

Microsoft Windows users:

Open an SSH client and use your login-id to login.

IU users:           Use bigred.teragrid.iu.edu as hostname.
TeraGrid users: Use login.bigred.iu.teragrid.org as hostname.

windows ssh main
Click above image for more information and screenshots

Default Shell on Big Red: Bash

  • The default shell when you get an Big Red account is bash; this tutorial assumes you're continuing to use bash as your shell. If you use another shell: just type bash to switch to the bash shell. You can also continue using your favorite shell, but you'll have to customize shell commands used in this tutorial to work for that shell.

    To change your shell permanently (to, say, tcsh), use the changeshell command.

    ag@BigRed:~> changeshell
    This program will assist you in cha\nging your login
          shell on all nodes of the Big Red cluster.
          . . .
          1) bash
          2) tcsh
          . . .
          5) quit
          Select 1-5: 2
          Changing login shell for ag
          Password:
          Shell changed.
          Your shell has been changed to the Cornell tcsh shell
          This will take effect on all nodes within 15 minutes
          . . . 

Useful notes

The following sub-section contains useful information about Big Red. You could skip ahead to the next section Submit Simple Jobs on Big Red (Using LoadLeveller Job Manager) and come re-visit this later, if you'd prefer to do so.

  1. Note to Microsoft Windows users: You cannot open tools that need a graphical user interface (GUI), like the Totalview debugger and the Vampir-NG profiler, if you use the SSH client on MS Windows. You'll need some sort of X-window emulation software like Cygwin. We recommend using XLiveCD created by IU-UITS-RC.

  2. Intra cluster logins: When you login to your Big Red account for the first time, passphrase-less SSH keys will be automatically created in your home directory. Those keys should enable to login to compute nodes, that you have gained access to through LoadLeveller (see below if you'd like to know how), without typing in a password or a passphrase; i.e., parallel jobs should run seamlessly on multiple compute nodes without any manual intervention.

    However, if you see an error message, when you try to access LL-assigned compute nodes:

    Permission denied (publickey,password,keyboard-interactive)
    ... it is an indication that the intra-cluster RSA keypair in your home directory is either not present or has been messed up.

    If you face the above problem, run gensshkeys; it will generate a passphrase-less keypair for you, allowing you seamless intracluster logins from/to any node in the cluster (that has been assigned for your use by LoadLeveller!)

    $ gensshkeys
  3. About forwarding email address for job-related messages: The Big Red and AVIDD clusters send email about your jobs to the address specified in the ~/.forward file (Note the "." preceeding the filename) on your home directory. By default, this is setup at account creation time, to have the email address you provided when you requested for your account.

    However, if you'd like to change the email address that job emails are sent, you can do so as shown below:

     hpctrn01@BigRed:~> echo "myemailid@hotmail.com" > ~/.forward 

    Warning: You should use a valid email id! Failing to do so will result in inability to get email notifications about the status of your jobs (and also will annoy the sys-admins with bounced emails; you don't want to earn their wrath ;-))

Top


Softenv to Setup Your Software Environment


The Big Red cluster uses SoftEnv, an environment management system, to permit users to customize their environment i.e. configure what software packages they need, through the use of symbolic keywords.

Default settings in ~/.soft file

At your first login, a .soft file (under your home directory) will be created with a set of defaults. Note that the default content is different for IU users and TeraGrid users. Shown below is what you should see when you login for the first time to Big Red (as an IU and TeraGrid user respectively):

IU Users TeraGrid Users
@bigred
@teragrid-basic
@globus-4.0
@teragrid-dev

What software packages are available?

To see all the keywords that can be used on the system (i.e. software that are installed), either use the softenv command on the command line, or refer to the The Big Red Cluster: Available Software webpage.

Keywords are listed with a preceeding '+' while macros (pre-defined lists of keywords) are listed with a preceeding '@'.
ag@BigRed:~>$ softenv | less
<<---- Try doing this!

SoftEnv version 1.4.2
      . . .
These are the macros available:
    @bigred                         Default Environment for Big Red users
      . . .
These are the keywords explicitly available:
    +acrobat                       Adobe Acrobat Reader 7.0
    +gcc-4.1.0                     gcc-4.1.0
    . . .

What does a key or a macro change in the environment?

So, now you know how to look for all the available softenv keys on the system. But what does each key or macro mean? Use the soft-dbq command to find out:

$  soft-dbq @namd-ibm-64
This is all the information associated with
the key or macro @namd-ibm-64.
      . . .
  @namd-ibm-64 contains the following
  keywords and macros:

  +mpich-mx-1.2.7..1-ibm-64 +namd-ibm-64-internal
      . . .
$ soft-dbq  +mpich-mx-1.2.7..1-ibm-64
This is all the information associated with
the key or macro +mpich-mx-1.2.7..1-ibm-64.
      . . .
On the linux-sles9-ppc64 architecture,
the following will be done to the environment:

  The following environment changes will be made:
    LD_LIBRARY_PATH = ${LD_LIBRARY_PATH}:${TG_APPS_PREFIX}/mpich-mx-1.2.7..1-ibm-64/lib
    MANPATH = ${MANPATH}:${TG_APPS_PREFIX}/mpich-mx-1.2.7..1-ibm-64/man
    MPICH_HOME = ${TG_APPS_PREFIX}/mpich-mx-1.2.7..1-ibm-64
    PATH = ${PATH}:${TG_APPS_PREFIX}/mpich-mx-1.2.7..1-ibm-64/bin
      . . .

Temporary changes to environment

To temporarily add a keyword (software) to your environment type:

$ soft add +keyword
To temporarily remove a keyword (software) from your environment type:
$ soft delete +keyword

For example, let us assume you want to use the NCBI toolkit (defined by the +ncbi key), and also that you do not want the nano editor (defined by the +nano key and occurs in your environment by default). Initially, checking for nano and blastall (one of the NCBI tools) leads to the expected result:

ag@BigRed:~>$ which nano
/N/soft/linux-sles9-ppc64/nano-1.2.5/bin/nano
ag@BigRed:~>$ which blastall
[No output!]

Now, use the syntax described above to get rid of nano and add NCBI toolkit into your environment:

ag@BigRed:~>$ soft delete +nano
ag@BigRed:~>$ soft add +ncbi

Now, you should be able to verify that nano is not in your environment any more but you can access the blastall program:

ag@BigRed:~>$ which nano
[No output!]
ag@BigRed:~>$ which blastall
/N/soft/linux-sles9-ppc64/ncbi-2.2.12/bin/blastall

Reverting to default setting

To revert back to default settings, as defined by your ~/.soft file, do:

ag@BigRed:~>$ resoft

Permanent changes to environment

If you want to make permanent changes to your environment, then you can edit your ~/.soft file to that effect...for example, if you never plan to use Adobe Acrobat and also always expect to use fftw3, then your .soft file would look something like this:

@remove +nano
+ncbi
@bigred

You will need to do resoft or log out and log back in for the changes to take effect.

Note: @remove lines and any other software paths, that you wish to pre-pend to your environment, should be entered before the default @bigred line.

Don't want to use Softenv?

In the event that you would prefer not to use softenv, place an empty .nosoft file in your home directory.

More information:

To learn more about softenv type:

ag@BigRed:~>$ man softenv-intro

Top


Big Red's File Storage Options


Home directories on Big Red are stored on a low-performance NFS mounted file system. But there is also a GPFS file system (work bench) which provides excellent performance.

There are a few options as far as where you want to store what kind of data.

Where do I store data?

  • Datasets on GPFS: A good (blind) rule of thumb is "ALL data (sets) live on GPFS": /N/gpfsbr/your_username/.

    TeraGrid users are recommended to use the environment variable $TG_CLUSTER_SCRATCH/ instead.

    Big Red GPFS Technical Information

    • GPFS is a large file system with 266 TB of space, accessible from all Big Red nodes.
    • Is the data backed up? NO! Data is not backed up. (See Things to Remember below for more information.)
    • Also, note that data on /N/gpfsbr/ is cleared out periodically. For up-to-date information on how often it is cleared out, see the Big Red Home Directories and Other Disk Space webpage.
  • Home directory only for programs (Source code and executables): Store stuff like source code, batch scripts (LoadLeveller related or otherwise), executables and possibly user-guide type documents on your home directory ${HOME}. Rephrasing that, do not use your home directory to read in/write out datasets. It's a low performance NFS system which is not designed to take that kind of load.

    • Is the data backed up? YES.

    Note: Always use ~/ or ${HOME} to reference your home directory unless use of absolute path names (i.e. /N/u/username/BigRed) is necessary;
    Why? Because your home directory might be moved from /N/u/username/BigRed to a different location, but ~/ and ${HOME}will always point to the accurate and current home directory!

  • Local scratch: Apart from GPFS and your NFS mounted home directory, you can also use local scratch on each compute node if your program does not require processes to use data across nodes. These are available as /scratch (which is basically a symbolic link to /tmp), on all compute nodes and are usually smallish in size (currently 67 GB).

    • Is the data backed up? NO! Data is not backed up. (See Things to Remember below for more information.)
    • Also, note that data on /scratch is cleared out periodically at a frequency much greater than the one used in clearing out GPFS -- could be anything between 1-2 days to 15 days; unless your program is using some data, there is a risk it will be cleared out.
  • For more information on your diskspace options, see the Big Red Home Directories and Other Disk Space webpage.

Things to remember

  • You need to backup your data: It is also important to remember that GPFS is not backed up and is also cleaned up regularly (60 days or so, but could be much shorter amount of time, beware!). So any data that you deem as important (or as required in the long term) should be backed up onto the HPSS mass store system.

    Once again, for up-to-date information on your diskspace options, see the Big Red Usage Policies webpage.

    More information on getting a Mass Store account and using HPSS from Big Red is available at the Using HPSS from IU Research Systems webpage.

    We will be using the following powerpoint slideshows during our workshop:

  • Beware /tmp users: If you write parallel code, then rememeber you cannot use /tmp to store data that'll be used across nodes. These are usually mounted off a local disk -- each compute node will have its own /tmp (just like /scratch ).

Top


Compiling and Running Your Own Programs


In this section, we'll show how you can compile your own programs and then run them. But before that, a bit of intro about the (two) compilers available on Big Red: IBM and GCC

GCC (Gnu Compilers):

  1. Presently at version 3.3.3.
  2. Works the same as gcc everywhere else.
  3. 64-bit compilations are possible with the "-mpowerpc64" compiler switch.
  4. Some other handy compiler options for Big Red:
    • -mcpu=970: Identifies our processor type.
    • -mabi=altivec: Needed with to enable the next switch (-maltivec).
    • -maltivec: For C codes using 4-byte real numbers, will attempt to vector stream the computations.
    • -mfused-madd Turns on the IBM specific "multiply-add" instruction during optimizations (where possible).

IBM Compilers

  1. Generally compile faster executables than gcc for the same level of optimization.
  2. A little stricter to the Standards than gcc.
  3. One basic compiler in two packages, different front ends for each language (C, C++, Fortran variants). So, many optimization switches are the same across all supported languages.

C compiler

The compiler is called "xlc". There is a cc compiler. On AIX systems this is embedded in the OS and is K & R. Unexpected things can happen if used. On Big Red, cc has been soft linked to gcc (see above). The preferred C compiler is xlc.

I categorize three different switch functions for the compiler: machine switches, user switches and optimization switches.

Machine switches: These switches are always the same for the platform. For Big Red:"-qarch=ppc970 -qtune=ppc970 -qenablevmx -qaltivec" are a good choice. If the altivec processors are unusable, there is no penalty for setting the switch.

User switches: These switches vary with the job. For instance, if addressing is expected above 2 GB, then a 64 bit compile will be needed with the "-q64" switch. If the C preprocessor is all that is desired, then the "-E" switch should be used. Please note that the general rule of thumb with compiler switches is that switches prefixed with "-q" are sent to the compiler, while those prefixed with "-b" are sent to the loader. Some switches (like "-E" and "-o") are exceptions due to historical precedence.

Optimization switches: These are the switches most people think about when they think about compiler switches. Optimizations can be both positive (faster executables) or negative (slower executables). Some switches, like -Q are almost always positive ("-Q" attempts to place functions in-line with the main body and remove the overhead of the function call). Others, like "-g" are almost always negative ("-g" creates debug symbol tables). Various levels of "-O" are offered. Without a number following, "-O" defaults to "-O2". Generally, the higher the number, the more risks the compiler is willing to take. At "-O0" absolutely no optimization occurs. At "-O5" everything from loop fusion to cross-source code (for multiple source code files) optimizations are considered. Often, more is not better. Testing and experimenting is about the only way to properly tune an executable.

Man Page On Big Red, even if you have MANPATH problems, the following command will find the xlc man page for other options:

man -M /opt/ibmcmp/vacpp/8.0/man/en_US/  xlc

C++ compiler

The IBM compiler for C++ codes is called xlC. This compiler used to have real troble with many C++ source codes. It has improved (as has C++ standardization) quite a bit in the last several years. Most all of the same switches that are available to xlc are also available with xlC. In addition, many switches have been added to help with gcc compatibility. One example is "-qlanglvl=gnu_complex". This switch " instructs the compiler to recognize GNU complex data types and related keywords".

The man path the same as xlc.

Fortran compiler

Recently dropped FORTRAN IV ('66) support. No excuse for FORTRAN 77 any more either. Use "xlf90 -qfixed" in its place. A few dropped syntax include "computed goto's"

Support for Fortran 2003 is nearly complete.

Machine switches:

For Big Red: "-qarch=ppc970 -qtune=ppc970" are the same as xlc/C. However, the altivec switches are not directly available. Altivec enabled C routines (such as FFTW) . are linked to the Fortran programs.

User switches:

For Fortran, a 64-bit compile is highly recommended in most cases. Fortran assumes memory management responsibilities, so why not use them? Once again 64-bit compilations are designated with the "-q64" switch. However, additional "safety" switches, such as "-C " for testing array parameters are also available.

Optimization switches:

The xlf90 optimization switches are nearly identical to the xlc optimizations. Once again, switches, like "-Q" are almost always positive anthd switches like "-g" are almost always negative. The same levels of "-O" are offered but with slightly different effects. One throttle on risky optimizations is the "-qstrict" option. This is only available at atleast "-O3". Generally, piece of mind comes at a significant performance cost. The warning messages are very prolific. Again, testing and experimenting is about the only way to properly tune an executable.

Man Page On Big Red, even if you have MANPATH problems, the following command will find the xlf90 man page for other options:

man -M /opt/ibmcmp/xlf/10.1/man/en_US/  xlc

Threads and System Math Libraries:

On all compilers, adding a "_r" suffix enables threads. An example: xlC_r. Often threads need enablling that run "behind the scenes".

An example is compiling and linking in IBM's Engineering and Scientific Subroutine Libraries (ESSL). If using actual threads (like OpenMP or POSIX), then the compile switch is "-qesslsmp". Even serial code requires the "_r" suffix as in "xlf90_r -q64 -qessl..." or "xlc_r -qessl...". For xlC_r, must add "-lessl -qnocinc=/usr/include/essl" as well to redefine the include files.


Parallel Applications -- Message Passing Libraries

Introduction

  • Big Red is a parallel machine! It is not structured for serial codes!
  • Serial codes waste 75% of the processor cores and the interconnect switch (which is about 1/3 the cost of the machine by itself).
  • Note to IU Users: Serial codes should run on Libra! It is almost the same processor set as Big Red.

How to use them?

  • Messsage Passing Packages are selected in the Softenv environment.
  • Note to IU Users: TeraGrid users get a default MPICH library as of the date of writing this document; No MPI libraries exist in the default environment for IU users.
  • Must link the library chosen consistant with address precision (32 or 64) chosen for the compile.
  • Once an MPI library is added, compiles are made through a wrapper to the IBM/Gnu compiler that built the library.
    • For instance, mpif90 is actually just a wrapper to xlf90_r. The same switches used by xlf90 are available to mpif90_r.

MPICH:

  • Argonne Original.
  • MPICH 1 is available; MPICH 2 could be.
  • Uses mpirun.
  • Limited runtime environment.

OpenMPI:

  • Replaced LAM.
  • No more lamboot.
  • Has look and feel of MPICH to the user.
  • Improving with each new release.

Parallel Job for kicks!

NOTE: Feel free to skip this sub-section and proceed to the next section. Refer to the following link for detailed information on how to compile, link, and run parallel jobs on Big Red.

But, if you're very curious about how you could use multiple processors/cores (nodes) at the same time, then try compiling the simple (example) parallel helloWorlds and try submitting a job to run it as shown below:

ag@BigRed:~/> cp -r ~hpc/examples ${HOME}/examples
ag@BigRed:~/> cd ${HOME}/examples/
ag@BigRed:~/examples> soft add +mpich-mx-ibm-64
ag@BigRed:~/examples> mpicc -q64 -o helloWorlds helloWorlds.c
ag@BigRed:~/examples> llsubmit submit_parallel.sh
llsubmit: Processed command file through Submit Filter:. . .
llsubmit: The job "s10c2b5.dim.2571" has been submitted.
ag@BigRed:~/examples> cat submit_parallel.sh.2571.out
Hello, parallel worlds! This is processor s9c4b9.dim and my rank is 1!
Hello, parallel worlds! This is processor s9c4b9.dim and my rank is 2!
Hello, parallel worlds! This is processor s9c4b9.dim and my rank is 0!
Hello, parallel worlds! This is processor s9c4b8.dim and my rank is 6!
Hello, parallel worlds! This is processor s9c4b8.dim and my rank is 5!
Hello, parallel worlds! This is processor s9c4b8.dim and my rank is 4!
Hello, parallel worlds! This is processor s9c4b8.dim and my rank is 7!
Hello, parallel worlds! This is processor s9c4b9.dim and my rank is 3!

What you basically did in the above steps was:

  1. Added the MPICH parallel library using softenv (see section on use of Softenv below)
  2. Compiled and linked your parallel program helloWorlds.c using the mpicc compiler
  3. Finally, submitted a parallel job to LoadLeveler, using a script that asked for 2 nodes and 4 tasks on each of those nodes

For more information about parallel jobs on Big Red, please refer to the Submitting Parallel (MPI) Jobs on Big Red webpage.

Top


Submit Simple Jobs on Big Red (NAMD Example)


In this section, we'll show how you can run a simple jobs on Big Red, that runs NAMD.

Writing a LoadLeveler script

  • The job script is divided into a keyword stanza and an execution section. If any lines exist in the script other than keywords (including just #!/bin/bash), the script is executed. Otherwise, it is sourced.

  • Best practice is to write your job in its own script file and tell LL (LoadLeveler) to execute that.

  • All key words are prefaced with "#@ ".

  • There are two basic types of keywords, those you might expect and those you might not:

    • Expected keywords include: output, error, executable, notification etc.
    • Unexpected keywords include: node_usage, node, job_type, checkpoint, queue

Typical script explained (Template)


# Pretty typical file designator.  Will take absolute paths or try to write
#  to your initial directory.; $(Cluster) is unique job id
#@ output = test_namd.$(Cluster).out
#@ error = test_namd.$(Cluster).err

# When do you want the job to send an email?  Must put address in your
#  .forward file.
#@ notification = complete

# How long to run?  The default is hh:mm:ss or you could specify in seconds.
#@ wall_clock_limit = 1:00:00

# Lots of choices.  "COPY_ALL" sources your current environment for the
#  job's environment.
#@ environment = COPY_ALL

# What queue to run the job in?
#@ class = MED

#  Where to start running the job.
#@ initialdir =  /N/gpfsbr/namd_example
#
## Teragrid Account number goes in the line below.  "NONE" is the account
#  number for IU users.
#@ account_no = NONE

# The name of the executable to be run.
#@ executable = My_execution_script.ksh

# Whatever else the executable wants put on its command line.
#@ arguments = fee fi foe fum

# Job control keywords.
#@ node = 4
#@ tasks_per_node = 2
#@ job_type = parallel
#@ node_usage = not_shared

# What to do if the job halts.  Checkpoint is not totally supported yet.
#@ checkpoint = no
#@ restart = no

# The last keyword in the list.  Sort of a "stop" key.  Anything below this
#  keyword will be executed.
#@ queue

Example NAMD Job

ag@BigRed:~/examples> cat execute_script.ll

#@ output = test_namd.$(Cluster).out
#@ error = test_namd.$(Cluster).err
#@ notification = complete
#@ wall_clock_limit = 1:00:00
#@ environment = COPY_ALL
#
## FAST is small debug queue, MED is 2 weeks, BIG is large jobs for 48 hour max.
## Type llclass on command line for up to date information
#@ class = MED
#
## Please change this line to your work directory.
#@ initialdir =  /N/gpfsbr/namd_example
#
#@ executable = mpich_namd.bash
#
# Teragrid Account number goes in the line below.  I used none for test.
#@ account_no = NONE
#
#@ node_usage = not_shared
#@ node = 4
#@ tasks_per_node = 2
#@ job_type = parallel
#@ checkpoint = no
#@ queue

The mpich_namd.bash script above LL script uses

ag@BigRed:~/examples> cat mpich_namd.bash
#------------------------------------------------------------------
## Once again, this should point to your work directory if you use it.
cd /N/gpfsbr/namd_example
#
export NAMD2=`which namd2`
#
## Get machine list (list of the nodes where your job will run in Big Red and
##  then write the list to /tmp/machinelist.$LOADL_STEP_ID so it can be passed
##  into mpirun.
llmachinelist
#
## Make sure number of tasks is <= to (node * tasks_per_node)
mpirun -np 8  -machinefile /tmp/machinelist.$LOADL_STEP_ID $NAMD2 apoa1.namd

## Clean up temporary machine list file
rm /tmp/machinelist.$LOADL_STEP_ID

Submit LoadLeveler batch job

ag@BigRed:~/examples> llsubmit execute_script.ll
llsubmit: Processed command file through Submit Filter:. . .
llsubmit: The job "s10c2b5.dim.888" has been submitted.


Check Job Status

Use the llq command to check job status (based on your username):

ag@BigRed:~/examples> llq | grep ${USER}
s10c2b5.888.0            hpc         9/5  15:04 I  50  MED
ag@BigRed:~/examples> llq -u ${USER}
Id                       Owner      Submitted   ST PRI Class        Running On
------------------------ ---------- ----------- -- --- ------------ -----------
s10c2b5.888.0            hpc         9/5  15:04 R  50  MED          s9c4b9
1 job step(s) in query, 0 waiting, 0 pending, 1 running, 0 held, 0 preempted

Cancel a queued/running job

If you want to cancel a job that is in the queue or is running -- for example, because you found a bug in your code or an error in the datafile -- then, use the llcancel command.

ag@BigRed:~/examples> llcancel s10c2b5.888.0

llcancel: Cancel command has been sent to the central manager

Now, try looking at the status of the same job -- do you notice the difference?

ag@BigRed:~/examples> llq -u hpc

Id                       Owner      Submitted   ST PRI Class        Running On
------------------------ ---------- ----------- -- --- ------------ -----------
s10c2b5.888.0            hpc         9/5  15:04 CA  50  MED          
0 job step(s) in query, 0 waiting, 0 pending, 0 running, 0 held, 0 preempted

What does the above output from llq mean? Status Codes

If you look at the above output of llq carefully, you will notice one of the following status codes being used to describe your job's status:

Status Code What does it mean?
I Queued, waiting for free nodes
R Running
C Completed
CA Cancelled by user
H Put on hold by user
or due to invalid requirements

Job still in queue? When will my job start?

If your job isn't running, use the job number with showstart to find out when it is scheduled. For example:

ag@BigRed:~/examples> showstart s10c2b5.10488.0
job s10c2b5.10488.0 requires 1 proc for 1:00:00

Estimated Rsv based start in                00:00:00 on Fri Jun  1 14:20:04
Estimated Rsv based completion in           10:00:00 on Sat Jun  2 00:20:04

Best Partition: base

Want even more information about job?

A more in-depth command would be checkjob2. Here is an example:

ag@BigRed:~/examples> checkjob s10c2b5.10692.0
job s10c2b5.10692.0

AName: 0
State: Running
Creds:  user:rsheppar  group:hpc  account:NONE  class:MED
WallTime:   00:00:00 of 00:20:00
SubmitTime: Wed May 30 19:00:03
  (Time Queued  Total: 00:00:41  Eligible: 00:00:41)

StartTime: Wed May 30 19:00:44
Total Requested Tasks: 8

Req[0]  TaskCount: 8  Partition: base
Memory >= 0  Disk >= 0  Swap >= 0
Opsys:   Linux2  Arch: PPC64  Features: ---

Allocated Nodes:
[s6c2b10.dim:2][s6c2b11.dim:2][s6c2b12.dim:2][s6c2b13.dim:2]

IWD:            /N/gpfsbr/namd_example
Executable:     /N/gpfsbr/namd_example/mpich_namd.bash
StartCount:     1
Flags:          BACKFILL,RESTARTABLE
Attr:           BACKFILL
StartPriority:  3804
Reservation 's10c2b5.10692.0' (-00:00:09 -> 00:19:51  Duration: 00:20:00)

Try the NAMD Example

  1. Set up your environment. In your .soft file we add: @namd-ibm-64 then type the command: "resoft."
  2. Make a directory in /N/gpfsbr/${USER} and copy the contents of /N/gpfsbr/namd_example into it.
    ag@BigRed:~/examples> cp -r /N/gpfsbr/namd_example /N/gpfsbr/${USER}/.
  3. Edit the LL script and mpich_namd.bash script for your specific situation:
    • In particular make sure you change the #@ initialdir line, the $@ account_no line in the LL script
    • And the cd /N/gpfsbr/namd_example to your GFPS directory as shown above.
  4. Submit the script and wait for the results!

Output and Error Files

Assuming your job runs to completion, you can find messages it tried to print on the console in an output file.

For example, if LoadLeveler assigned job id s10c2b5.dim.888 to your job and you used ${Executable}.${Cluster}.out/err as filenames in your LL script, then the output file would be named: test_namd.job-id.out and the error file would named: test_namd.job-id.err.

ag@BigRed:~/examples> ls *.out
test_namd.9999.out

Go ahead and check out the output file out, it should looks similar to the output file you got using serialjob:
ag@BigRed:~/examples> cat test_namd.9999.out
. . . Lots of Numbers, and such ... that the NAMD program output . . .
	

Top


Running BioInformatics Software (BLAST, MEME, etc.)


There are pre-canned scripts that can be used to run jobs that in turn run BLAST and other BioInformatics software, on Big Red. For more details refer to the Bioinformatics Support webpage.

Top


More Information about LoadLeveller (LL) and Migration from PBS


Top


File Transfers To and From Big Red using tgcp


This section requires that you have authenticated using grid credentials. See the File Transfer to/from Big Red using GridFTP (tgcp) page for more details.