Big Red Workshop, Bioinformatics'07 Conference, Indianapolis (2007-05-31) |
Table of Contents
Introduction to Big Red |
The Big Red cluster consists of a 768 IBM JS-21 compute nodes, with each node having 8 GB memory. Check out the technical schematic and the associated hardware details about Big Red, given below to understand how the cluster is organized structured.
Click above image to expand in new window
Click here for detailed Big Red Hardware Information
Supercomputer/cluster vs. Personal computer/cluster
There are some fundamental difference between a user-owned computer (or a small user-owned cluster within a research lab) and a supercomputer (or a cluster). There is a need for usage on the latter systems to be moderated, i.e., there is a need for a traffic cop, whos guarantees best performace and utilization.More on Need for a Traffic Cop
To further explain the need for traffic cop, let us consider a rental car analogy. Assume you have a sedan, that you use for day-to-day travel. You will likely have full rights to drive your own car at any time and to any place. But, now, consider a scenario where you're moving, and your car is not big enough. You might need a truck or some such ... if you wanted to rent one from a rental car company, you will go through a process of reservation, picking up the truck, and paying for it, so forth.. right?
The situation is pretty similar with supercomputers and clusters, just that as an IU researcher, using IU's research systems does not require you to pay :-).
| Note to TeraGrid users: If you are a TeraGrid user, you still don't pay monitarily but need to get an allocation through the TeraGrid project allocations committee before you can use IU systems that are part of TeraGrid. |
Supercomputers and clusters still need a traffic cop to direct traffic so that users do not step on each other's toes because:
- A large number of users want to use the system at the same time, to do research that might be unrelated to each other.
- Given the large number of processors and storage, large number of jobs, belonging to multiple users, may run at the same given time (and multiple users may store large amounts of research data).
This is where a piece of software, called job manager, that can be used to submit and monitor jobs on a supercomputer, comes in.
LoadLeveler/PBS as a traffic cop on IU research systems
On IU research systems, the following job submission systems are used:
| Job Manager | Systems that use it |
|---|---|
| IBM LoadLeveler (LL) | Big Red, Libra |
| Portable Batch System (PBS) | AVIDD clusters |
All IU clusters use the MOAB scheduler for job scheduling, incorporating a fair-share mechanism into the mix (based on research system time used by each user trying to run a job).
Fair-share Enforcement on Big Red
As with any other supercomputer or cluster, a fair-share mechanism is incorporated within the Big Red system. This mechanism does not allow users to run jobs on the login node or on the compute node outside of the LL job submission system. Any job you submit outside of LoadLeveler is killed off if it uses more than 20 minutes of CPU time. This includes globus jobs that use the jobmanager/fork.
Requesting An Account (or Software) |
Account Requests
TeraGrid Allocations (Both IU and Non-IU Researchers): Non-IU users are encouraged to request time on Big Red through the TeraGrid allocations process. More information is available in the following article: How do I apply for a TeraGrid allocation?.
IU Researchers: If you are an IU researcher, and you wish to apply for your own account on the any of the UITS-Research Technologies (RT) division's supercomputing resources (Big Red, Libra), visit the Research and Technical Services (RATS) application web page and submit a request.
Software Requests
If you already have an account on one (or more) of UITS-RT supercomputing resources (Big Red, Libra), and would like to request software to be installed, then visit the application web page and submit a request.
For the purpose of this workshop...
Workshop attendees - STOP! For the purpose of this workshop, if you want to try any of the information you learn today in this workshop, we have already created a training account for you on Big Red.
Login to Big Red |
Users can login to the Big Red cluster using:
MyProxy (Recommended): Use a TeraGrid resource that allows password based login (NCSA/SDSC/etc.) as a bridge, then use MyProxy to get a proxy and then hop onto Big Red
SSH Keys: Use SSH keys to login from your own workstation; first time users will have to send their SSH public key to the TeraGrid helpdesk so it can be added to your Big Red account.
Globus Certificate (Not recommended for new users): Use default NCSA certificate or other TeraGrid approved Globus certificate to login. You will need to have your own installation of Globus toolkit to use this method.
IU Kerberos Password (ONLY applicable to local IU users): Local IU users can also login using their IU Kerberos password.
NOTE: For demonstration purposes in workshop, and also for attendees, who want to try stuff out following the workshop, we will use local passwords to log onto Big Red.
For more information on how to use MyProxy (the recommended way to login to Big Red, for TeraGrid users), authenticate once and then be able to login any where, check out: Login to Big Red using MyProxy.
Microsoft Windows users:
Open an SSH client and use your login-id to login.
IU users:
Use bigred.teragrid.iu.edu as hostname.
TeraGrid users: Use login.bigred.iu.teragrid.org as hostname.
Click above image for more information and screenshots
Default Shell on Big Red: Bash
- The default shell when you get an Big Red account is bash;
this tutorial assumes you're continuing to use bash as your shell.
If you use another shell: just type bash to switch to the
bash shell. You can also continue using your favorite shell,
but you'll have to customize shell commands used in this tutorial to work
for that shell.
To change your shell permanently (to, say, tcsh), use the changeshell command.
ag@BigRed:~> changeshell
This program will assist you in cha\nging your login shell on all nodes of the Big Red cluster. . . . 1) bash 2) tcsh . . . 5) quit Select 1-5: 2 Changing login shell for ag Password: Shell changed. Your shell has been changed to the Cornell tcsh shell This will take effect on all nodes within 15 minutes . . .
Useful notes
The following sub-section contains useful information about Big Red. You could skip ahead to the next section Submit Simple Jobs on Big Red (Using LoadLeveller Job Manager) and come re-visit this later, if you'd prefer to do so.
|
Softenv to Setup Your Software Environment |
The Big Red cluster uses SoftEnv, an environment management system, to permit users to customize their environment i.e. configure what software packages they need, through the use of symbolic keywords.
Default settings in ~/.soft file
At your first login, a .soft file (under your home directory) will be
created with a set of defaults. Note that the default content is
different for IU users and TeraGrid users. Shown below is what you should see
when you login for the first time to Big Red (as an IU and TeraGrid user respectively):
| IU Users | TeraGrid Users |
|---|---|
@bigred |
@teragrid-basic @globus-4.0 @teragrid-dev |
What software packages are available?
To see all the keywords that can be used on the system (i.e. software
that are installed), either use the softenv command on the command line,
or refer to the
The Big Red Cluster: Available Software webpage.
Keywords are
listed with a preceeding '+' while macros (pre-defined lists of keywords)
are listed with a preceeding '@'.
ag@BigRed:~>$ softenv | less |
<<---- Try doing this! |
SoftEnv version 1.4.2
. . .
These are the macros available:
@bigred Default Environment for Big Red users
. . .
These are the keywords explicitly available:
+acrobat Adobe Acrobat Reader 7.0
+gcc-4.1.0 gcc-4.1.0
. . .
What does a key or a macro change in the environment?
So, now you know how to look for all the available softenv keys on the system. But what does each key or macro mean? Use the soft-dbq command to find out:
$ soft-dbq @namd-ibm-64
This is all the information associated with
the key or macro @namd-ibm-64.
. . .
@namd-ibm-64 contains the following
keywords and macros:
+mpich-mx-1.2.7..1-ibm-64 +namd-ibm-64-internal
. . .
$ soft-dbq +mpich-mx-1.2.7..1-ibm-64
This is all the information associated with
the key or macro +mpich-mx-1.2.7..1-ibm-64.
. . .
On the linux-sles9-ppc64 architecture,
the following will be done to the environment:
The following environment changes will be made:
LD_LIBRARY_PATH = ${LD_LIBRARY_PATH}:${TG_APPS_PREFIX}/mpich-mx-1.2.7..1-ibm-64/lib
MANPATH = ${MANPATH}:${TG_APPS_PREFIX}/mpich-mx-1.2.7..1-ibm-64/man
MPICH_HOME = ${TG_APPS_PREFIX}/mpich-mx-1.2.7..1-ibm-64
PATH = ${PATH}:${TG_APPS_PREFIX}/mpich-mx-1.2.7..1-ibm-64/bin
. . .
Temporary changes to environment
To temporarily add a keyword (software) to your environment type:
$ soft add +keywordTo temporarily remove a keyword (software) from your environment type:
$ soft delete +keyword
For example, let us assume you want to use the NCBI toolkit (defined by the +ncbi key), and also that you do not want the nano editor (defined by the +nano key and occurs in your environment by default). Initially, checking for nano and blastall (one of the NCBI tools) leads to the expected result:
ag@BigRed:~>$ which nano /N/soft/linux-sles9-ppc64/nano-1.2.5/bin/nano ag@BigRed:~>$ which blastall [No output!] |
Now, use the syntax described above to get rid of nano and add NCBI toolkit into your environment:
ag@BigRed:~>$ soft delete +nano ag@BigRed:~>$ soft add +ncbi
Now, you should be able to verify that nano is not in your environment any more but you can access the blastall program:
ag@BigRed:~>$ which nano
[No output!]
ag@BigRed:~>$ which blastall
/N/soft/linux-sles9-ppc64/ncbi-2.2.12/bin/blastall
Reverting to default setting
To revert back to default settings, as defined by your ~/.soft file, do:
ag@BigRed:~>$ resoft
Permanent changes to environment
If you want to make permanent changes to your environment, then you
can edit your ~/.soft file to that effect...for example, if you never plan to
use Adobe Acrobat and also always expect to use fftw3, then your .soft file
would look something like this:
@remove +nano +ncbi @bigred
You will need to do
or log out and log back in
for the changes to take effect.
resoft
Note: @remove lines and
any other software paths, that you wish to pre-pend to your environment,
should be entered before the default
@bigred line.
|
Don't want to use Softenv?
In the event that you would prefer not to use softenv, place an empty
.nosoft file in your home directory.
More information:
To learn more about softenv type:
ag@BigRed:~>$ man softenv-intro
Big Red's File Storage Options |
Home directories on Big Red are stored on a low-performance NFS mounted file system. But there is also a GPFS file system (work bench) which provides excellent performance.
There are a few options as far as where you want to store what kind of data.
Where do I store data?
Datasets on GPFS: A good (blind) rule of thumb is "ALL data (sets) live on GPFS": /N/gpfsbr/your_username/.
TeraGrid users are recommended to use the environment variable $TG_CLUSTER_SCRATCH/ instead.
Big Red GPFS Technical Information
- GPFS is a large file system with 266 TB of space, accessible from all Big Red nodes.
- Is the data backed up? NO! Data is not backed up. (See Things to Remember below for more information.)
- Also, note that data on /N/gpfsbr/ is cleared out periodically. For up-to-date information on how often it is cleared out, see the Big Red Home Directories and Other Disk Space webpage.
Home directory only for programs (Source code and executables): Store stuff like source code, batch scripts (LoadLeveller related or otherwise), executables and possibly user-guide type documents on your home directory ${HOME}. Rephrasing that, do not use your home directory to read in/write out datasets. It's a low performance NFS system which is not designed to take that kind of load.
- Is the data backed up? YES.
Note: Always use ~/ or ${HOME} to reference your home directory unless use of absolute path names (i.e. /N/u/username/BigRed) is necessary;
Why? Because your home directory might be moved from /N/u/username/BigRed to a different location, but ~/ and ${HOME}will always point to the accurate and current home directory!Local scratch: Apart from GPFS and your NFS mounted home directory, you can also use local scratch on each compute node if your program does not require processes to use data across nodes. These are available as /scratch (which is basically a symbolic link to /tmp), on all compute nodes and are usually smallish in size (currently 67 GB).
- Is the data backed up? NO! Data is not backed up. (See Things to Remember below for more information.)
- Also, note that data on /scratch is cleared out periodically at a frequency much greater than the one used in clearing out GPFS -- could be anything between 1-2 days to 15 days; unless your program is using some data, there is a risk it will be cleared out.
-
For more information on your diskspace options, see the Big Red Home Directories and Other Disk Space webpage.
Things to remember
You need to backup your data: It is also important to remember that GPFS is not backed up and is also cleaned up regularly (60 days or so, but could be much shorter amount of time, beware!). So any data that you deem as important (or as required in the long term) should be backed up onto the HPSS mass store system.
Once again, for up-to-date information on your diskspace options, see the Big Red Usage Policies webpage.
More information on getting a Mass Store account and using HPSS from Big Red is available at the Using HPSS from IU Research Systems webpage.
We will be using the following powerpoint slideshows during our workshop:
Beware /tmp users: If you write parallel code, then rememeber you cannot use /tmp to store data that'll be used across nodes. These are usually mounted off a local disk -- each compute node will have its own /tmp (just like /scratch ).
Compiling and Running Your Own Programs |
In this section, we'll show how you can compile your own programs and then run them. But before that, a bit of intro about the (two) compilers available on Big Red: IBM and GCC
GCC (Gnu Compilers):
- Presently at version 3.3.3.
- Works the same as gcc everywhere else.
- 64-bit compilations are possible with the "-mpowerpc64" compiler switch.
- Some other handy compiler options for Big Red:
- -mcpu=970: Identifies our processor type.
- -mabi=altivec: Needed with to enable the next switch (-maltivec).
- -maltivec: For C codes using 4-byte real numbers, will attempt to vector stream the computations.
- -mfused-madd Turns on the IBM specific "multiply-add" instruction during optimizations (where possible).
IBM Compilers
- Generally compile faster executables than gcc for the same level of optimization.
- A little stricter to the Standards than gcc.
- One basic compiler in two packages, different front ends for each language (C, C++, Fortran variants). So, many optimization switches are the same across all supported languages.
C compiler
The compiler is called "xlc". There is a cc compiler. On AIX systems this is embedded in the OS and is K & R. Unexpected things can happen if used. On Big Red, cc has been soft linked to gcc (see above). The preferred C compiler is xlc.
I categorize three different switch functions for the compiler: machine switches, user switches and optimization switches.
Machine switches: These switches are always the same for the platform. For Big Red:"-qarch=ppc970 -qtune=ppc970 -qenablevmx -qaltivec" are a good choice. If the altivec processors are unusable, there is no penalty for setting the switch.
User switches: These switches vary with the job. For instance, if addressing is expected above 2 GB, then a 64 bit compile will be needed with the "-q64" switch. If the C preprocessor is all that is desired, then the "-E" switch should be used. Please note that the general rule of thumb with compiler switches is that switches prefixed with "-q" are sent to the compiler, while those prefixed with "-b" are sent to the loader. Some switches (like "-E" and "-o") are exceptions due to historical precedence.
Optimization switches: These are the switches most people think about when they think about compiler switches. Optimizations can be both positive (faster executables) or negative (slower executables). Some switches, like -Q are almost always positive ("-Q" attempts to place functions in-line with the main body and remove the overhead of the function call). Others, like "-g" are almost always negative ("-g" creates debug symbol tables). Various levels of "-O" are offered. Without a number following, "-O" defaults to "-O2". Generally, the higher the number, the more risks the compiler is willing to take. At "-O0" absolutely no optimization occurs. At "-O5" everything from loop fusion to cross-source code (for multiple source code files) optimizations are considered. Often, more is not better. Testing and experimenting is about the only way to properly tune an executable.
Man Page On Big Red, even if you have MANPATH problems, the following command will find the xlc man page for other options:
man -M /opt/ibmcmp/vacpp/8.0/man/en_US/ xlc
C++ compiler
The IBM compiler for C++ codes is called xlC. This compiler used to have real troble with many C++ source codes. It has improved (as has C++ standardization) quite a bit in the last several years. Most all of the same switches that are available to xlc are also available with xlC. In addition, many switches have been added to help with gcc compatibility. One example is "-qlanglvl=gnu_complex". This switch " instructs the compiler to recognize GNU complex data types and related keywords".
The man path the same as xlc.
Fortran compiler
Recently dropped FORTRAN IV ('66) support. No excuse for FORTRAN 77 any more either. Use "xlf90 -qfixed" in its place. A few dropped syntax include "computed goto's"
Support for Fortran 2003 is nearly complete.
Machine switches:
For Big Red: "-qarch=ppc970 -qtune=ppc970" are the same as xlc/C. However, the altivec switches are not directly available. Altivec enabled C routines (such as FFTW) . are linked to the Fortran programs.
User switches:
For Fortran, a 64-bit compile is highly recommended in most cases. Fortran assumes memory management responsibilities, so why not use them? Once again 64-bit compilations are designated with the "-q64" switch. However, additional "safety" switches, such as "-C " for testing array parameters are also available.
Optimization switches:
The xlf90 optimization switches are nearly identical to the xlc optimizations. Once again, switches, like "-Q" are almost always positive anthd switches like "-g" are almost always negative. The same levels of "-O" are offered but with slightly different effects. One throttle on risky optimizations is the "-qstrict" option. This is only available at atleast "-O3". Generally, piece of mind comes at a significant performance cost. The warning messages are very prolific. Again, testing and experimenting is about the only way to properly tune an executable.
Man Page On Big Red, even if you have MANPATH problems, the following command will find the xlf90 man page for other options:
man -M /opt/ibmcmp/xlf/10.1/man/en_US/ xlc
Threads and System Math Libraries:
On all compilers, adding a "_r" suffix enables threads. An example: xlC_r. Often threads need enablling that run "behind the scenes".
An example is compiling and linking in IBM's Engineering and Scientific Subroutine Libraries (ESSL). If using actual threads (like OpenMP or POSIX), then the compile switch is "-qesslsmp". Even serial code requires the "_r" suffix as in "xlf90_r -q64 -qessl..." or "xlc_r -qessl...". For xlC_r, must add "-lessl -qnocinc=/usr/include/essl" as well to redefine the include files.
Parallel Applications -- Message Passing Libraries |
Introduction
- Big Red is a parallel machine! It is not structured for serial codes!
- Serial codes waste 75% of the processor cores and the interconnect switch (which is about 1/3 the cost of the machine by itself).
- Note to IU Users: Serial codes should run on Libra! It is almost the same processor set as Big Red.
How to use them?
- Messsage Passing Packages are selected in the Softenv environment.
- Note to IU Users: TeraGrid users get a default MPICH library as of the date of writing this document; No MPI libraries exist in the default environment for IU users.
- Must link the library chosen consistant with address precision (32 or 64) chosen for the compile.
- Once an MPI library is added, compiles are made through a wrapper to the IBM/Gnu compiler that built the library.
- For instance, mpif90 is actually just a wrapper to xlf90_r. The same switches used by xlf90 are available to mpif90_r.
MPICH:
- Argonne Original.
- MPICH 1 is available; MPICH 2 could be.
- Uses mpirun.
- Limited runtime environment.
OpenMPI:
- Replaced LAM.
- No more lamboot.
- Has look and feel of MPICH to the user.
- Improving with each new release.
Parallel Job for kicks!
NOTE: Feel free to skip this sub-section and proceed to the next section. Refer to the following link for detailed information on how to compile, link, and run parallel jobs on Big Red.
But, if you're very curious about how you could use multiple
processors/cores (nodes) at the same time, then
try compiling the simple (example) parallel helloWorlds and try submitting a job to run it
as shown below:
ag@BigRed:~/> cp -r ~hpc/examples ${HOME}/examples
ag@BigRed:~/> cd ${HOME}/examples/
ag@BigRed:~/examples> soft add +mpich-mx-ibm-64
ag@BigRed:~/examples> mpicc -q64 -o helloWorlds helloWorlds.c
ag@BigRed:~/examples> llsubmit submit_parallel.sh
llsubmit: Processed command file through Submit Filter:. . . llsubmit: The job "s10c2b5.dim.2571" has been submitted.
ag@BigRed:~/examples> cat submit_parallel.sh.2571.out Hello, parallel worlds! This is processor s9c4b9.dim and my rank is 1! Hello, parallel worlds! This is processor s9c4b9.dim and my rank is 2! Hello, parallel worlds! This is processor s9c4b9.dim and my rank is 0! Hello, parallel worlds! This is processor s9c4b8.dim and my rank is 6! Hello, parallel worlds! This is processor s9c4b8.dim and my rank is 5! Hello, parallel worlds! This is processor s9c4b8.dim and my rank is 4! Hello, parallel worlds! This is processor s9c4b8.dim and my rank is 7! Hello, parallel worlds! This is processor s9c4b9.dim and my rank is 3! |
What you basically did in the above steps was:
- Added the MPICH parallel library using softenv (see section on use of Softenv below)
- Compiled and linked your parallel program helloWorlds.c using the mpicc compiler
- Finally, submitted a parallel job to LoadLeveler, using a script that asked for 2 nodes and 4 tasks on each of those nodes
For more information about parallel jobs on Big Red, please refer to the Submitting Parallel (MPI) Jobs on Big Red webpage.
Submit Simple Jobs on Big Red (NAMD Example) |
In this section, we'll show how you can run a simple jobs on Big Red, that runs NAMD.
Writing a LoadLeveler script
The job script is divided into a keyword stanza and an execution section. If any lines exist in the script other than keywords (including just #!/bin/bash), the script is executed. Otherwise, it is sourced.
Best practice is to write your job in its own script file and tell LL (LoadLeveler) to execute that.
All key words are prefaced with "#@ ".
There are two basic types of keywords, those you might expect and those you might not:
- Expected keywords include: output, error, executable, notification etc.
- Unexpected keywords include: node_usage, node, job_type, checkpoint, queue
Typical script explained (Template)
# Pretty typical file designator. Will take absolute paths or try to write
# to your initial directory.; $(Cluster) is unique job id
#@ output = test_namd.$(Cluster).out
#@ error = test_namd.$(Cluster).err
# When do you want the job to send an email? Must put address in your
# .forward file.
#@ notification = complete
# How long to run? The default is hh:mm:ss or you could specify in seconds.
#@ wall_clock_limit = 1:00:00
# Lots of choices. "COPY_ALL" sources your current environment for the
# job's environment.
#@ environment = COPY_ALL
# What queue to run the job in?
#@ class = MED
# Where to start running the job.
#@ initialdir = /N/gpfsbr/namd_example
#
## Teragrid Account number goes in the line below. "NONE" is the account
# number for IU users.
#@ account_no = NONE
# The name of the executable to be run.
#@ executable = My_execution_script.ksh
# Whatever else the executable wants put on its command line.
#@ arguments = fee fi foe fum
# Job control keywords.
#@ node = 4
#@ tasks_per_node = 2
#@ job_type = parallel
#@ node_usage = not_shared
# What to do if the job halts. Checkpoint is not totally supported yet.
#@ checkpoint = no
#@ restart = no
# The last keyword in the list. Sort of a "stop" key. Anything below this
# keyword will be executed.
#@ queue
Example NAMD Job
ag@BigRed:~/examples> cat execute_script.ll
#@ output = test_namd.$(Cluster).out #@ error = test_namd.$(Cluster).err #@ notification = complete #@ wall_clock_limit = 1:00:00 #@ environment = COPY_ALL # ## FAST is small debug queue, MED is 2 weeks, BIG is large jobs for 48 hour max. ## Type llclass on command line for up to date information #@ class = MED # ## Please change this line to your work directory. #@ initialdir = /N/gpfsbr/namd_example # #@ executable = mpich_namd.bash # # Teragrid Account number goes in the line below. I used none for test. #@ account_no = NONE # #@ node_usage = not_shared #@ node = 4 #@ tasks_per_node = 2 #@ job_type = parallel #@ checkpoint = no #@ queue
The mpich_namd.bash script above LL script uses
ag@BigRed:~/examples> cat mpich_namd.bash
#------------------------------------------------------------------ ## Once again, this should point to your work directory if you use it. cd /N/gpfsbr/namd_example # export NAMD2=`which namd2` # ## Get machine list (list of the nodes where your job will run in Big Red and ## then write the list to /tmp/machinelist.$LOADL_STEP_ID so it can be passed ## into mpirun. llmachinelist # ## Make sure number of tasks is <= to (node * tasks_per_node) mpirun -np 8 -machinefile /tmp/machinelist.$LOADL_STEP_ID $NAMD2 apoa1.namd ## Clean up temporary machine list file rm /tmp/machinelist.$LOADL_STEP_ID
Submit LoadLeveler batch job
ag@BigRed:~/examples> llsubmit execute_script.ll
llsubmit: Processed command file through Submit Filter:. . . llsubmit: The job "s10c2b5.dim.888" has been submitted.
Check Job Status
Use the llq command to check job status (based on your username):
ag@BigRed:~/examples> llq | grep ${USER}
s10c2b5.888.0 hpc 9/5 15:04 I 50 MED
ag@BigRed:~/examples> llq -u ${USER} |
Id Owner Submitted ST PRI Class Running On ------------------------ ---------- ----------- -- --- ------------ ----------- s10c2b5.888.0 hpc 9/5 15:04 R 50 MED s9c4b9 1 job step(s) in query, 0 waiting, 0 pending, 1 running, 0 held, 0 preempted
Cancel a queued/running job
If you want to cancel a job that is in the queue or is running -- for example, because you found a bug in your code or an error in the datafile -- then, use the llcancel command.
ag@BigRed:~/examples> llcancel s10c2b5.888.0 |
llcancel: Cancel command has been sent to the central manager
Now, try looking at the status of the same job -- do you notice the difference?
ag@BigRed:~/examples> llq -u hpc |
Id Owner Submitted ST PRI Class Running On ------------------------ ---------- ----------- -- --- ------------ ----------- s10c2b5.888.0 hpc 9/5 15:04 CA 50 MED 0 job step(s) in query, 0 waiting, 0 pending, 0 running, 0 held, 0 preempted
What does the above output from llq mean? Status Codes
If you look at the above output of llq carefully, you will notice one of the following status codes being used to describe your job's status:
| Status Code | What does it mean? |
|---|---|
| I | Queued, waiting for free nodes |
| R | Running |
| C | Completed |
| CA | Cancelled by user |
| H | Put on hold by user or due to invalid requirements |
Job still in queue? When will my job start?
If your job isn't running, use the job number with showstart to find out when it is scheduled. For example:
ag@BigRed:~/examples> showstart s10c2b5.10488.0
job s10c2b5.10488.0 requires 1 proc for 1:00:00 Estimated Rsv based start in 00:00:00 on Fri Jun 1 14:20:04 Estimated Rsv based completion in 10:00:00 on Sat Jun 2 00:20:04 Best Partition: base
Want even more information about job?
A more in-depth command would be checkjob2. Here is an example:
ag@BigRed:~/examples> checkjob s10c2b5.10692.0
job s10c2b5.10692.0 AName: 0 State: Running Creds: user:rsheppar group:hpc account:NONE class:MED WallTime: 00:00:00 of 00:20:00 SubmitTime: Wed May 30 19:00:03 (Time Queued Total: 00:00:41 Eligible: 00:00:41) StartTime: Wed May 30 19:00:44 Total Requested Tasks: 8 Req[0] TaskCount: 8 Partition: base Memory >= 0 Disk >= 0 Swap >= 0 Opsys: Linux2 Arch: PPC64 Features: --- Allocated Nodes: [s6c2b10.dim:2][s6c2b11.dim:2][s6c2b12.dim:2][s6c2b13.dim:2] IWD: /N/gpfsbr/namd_example Executable: /N/gpfsbr/namd_example/mpich_namd.bash StartCount: 1 Flags: BACKFILL,RESTARTABLE Attr: BACKFILL StartPriority: 3804 Reservation 's10c2b5.10692.0' (-00:00:09 -> 00:19:51 Duration: 00:20:00)
Try the NAMD Example
- Set up your environment. In your .soft file we add: @namd-ibm-64 then type the command: "resoft."
- Make a directory in /N/gpfsbr/${USER} and copy the contents of
/N/gpfsbr/namd_example into it.
ag@BigRed:~/examples> cp -r /N/gpfsbr/namd_example /N/gpfsbr/${USER}/. - Edit the LL script and mpich_namd.bash script for your specific situation:
- In particular make sure you change the #@ initialdir line, the $@ account_no line in the LL script
- And the cd /N/gpfsbr/namd_example to your GFPS directory as shown above.
- Submit the script and wait for the results!
Output and Error Files
Assuming your job runs to completion, you can find messages it tried to print on the console in an output file.
For example, if LoadLeveler assigned job id s10c2b5.dim.888 to your job and you used ${Executable}.${Cluster}.out/err as filenames in your LL script, then the output file would be named: test_namd.job-id.out and the error file would named: test_namd.job-id.err.
ag@BigRed:~/examples> ls *.out test_namd.9999.out |
Go ahead and check out the output file out, it should looks similar to the output file you got using serialjob:
ag@BigRed:~/examples> cat test_namd.9999.out . . . Lots of Numbers, and such ... that the NAMD program output . . . |
Running BioInformatics Software (BLAST, MEME, etc.) |
There are pre-canned scripts that can be used to run jobs that in turn run BLAST and other BioInformatics software, on Big Red. For more details refer to the Bioinformatics Support webpage.
More Information about LoadLeveller (LL) and Migration from PBS |
- For more information on LoadLeveller (LL)commands and/or if you're a user trying to migrate from PBS to LL, please refer to the Migrating from LL to PBS and Migrating from PBS to LL documents.
File Transfers To and From Big Red using tgcp |
This section requires that you have authenticated using grid credentials. See the File Transfer to/from Big Red using GridFTP (tgcp) page for more details.




