Using HPSS from IU Research Systems

Indiana University has a High Performance Storage System (HPSS). HPSS is a distributed, flexible, and performance oriented mass storage system. It is being developed by consortium of commercial and goverment participants. Users are given HPSS accounts to enable them to economically save and access files that are too large and/or too infrequently used to be kept in personal home directory spaces.

Since HPSS is fundamentally different from a "normal" unix filesystem, it must be used and accessed differently. There are no file size limitations to its use, but there are practicality constraints imposed by the limited bandwidth that serves it.

The following document describes the HPSS system at Indiana University, its access methods and uses, and provides links to other documents with more information.

Accounts and Accounting

User accounts for HPSS are available upon request to all Faculty, Graduate Students and Staff of Indiana University. Under graduates may apply for an account with a Faculty sponsor. The distributed account generation system (DAGS) can be accessed and utilized to create an account. HPSS incorporates the Distributed Computing Environment (DCE) as its primary name space, therefore users must request a DCE account at the following URL http://storage.iu.edu/mdss_start.html

Backups

Files in HPSS are, unless user undefined, copied onto two tapes. Therefore every file which you store within HPSS is has an automatic backup.

Project Directories

A special project directory can be created in HPSS for groups of researchers who wish to easily share files. The files in this directory will be readable by all members of a repository. Project directories for group sharing of files will be made available on request.

Passwords and Authentication

Current HPSS users can change HPSS/DCE passwords by using the dce_login and chpass commands. These are standard parts of DCE, and they and their man pages are available on the IU research systems which support DCE.

The procedure for changing your password is shown in the following example. User input is shown in red; "%" is the unix prompt; text in brackets, "<>," should be replaced by the appropriate values.


% dce_login -newpass
Enter Principal Name: <user_name>
Enter Password: <DCE_password>
% chpass -p
Changing registry password for user_name
New password: <new_passwd>  
Re-enter new password: <new_passwd>
Enter your previous password for verification: <DCE_password>
% 

Additionally you may also choose to connect to the WebPass URL with your web browser.

DCE allows automatic user authentication, once initial authentication is established. This is highly desirable for batch scripts, since it eliminates prompting for login name and password.

Questions about passwords should go to store-admin@iu.edu.

The IU HPSS System

Indiana University's HPSS system is known as, simply, "hpss". When accessing the HPSS system from an IU research system this name can be used. When accessing HPSS from outside IU, or from systems other than IU research system, the name that must be used is hpss.indiana.edu and FTP is the only available access utility, unless pftp or hsi program has been installed on your local system.

User home directories are setup and accessed in the following way. Using my user name as an example: leighg the storage path to my home directory would be: /l/e/leighg. Another example would be the user name josehe the storage path for this users' home directory is /j/o/josehe. In summary the the first character and the second character of your username are used to create your home directory path.

The storage path together with the root path identify an HPSS user's full home directory path. The root path is used to define several types of storage offered with IU's HPSS system.

The possible root paths to user directories are:

/.../dce1.indiana.edu/fs/mirror/
This defines hpss directories which are mirrored filesets in DFS.
/.../dce1.indiana.edu/fs/archive/
This defines hpss directories which are archived filesets in DFS.
/.../dce1.indiana.edu/fs/hpssonly/
This is used directories which are traditional HPSS filesets.

For more information on filesets as they are defined in HPSS, see the HPSS web page.

Accessing HPSS

The Application Programming Interfaces (API) currently configured are:

File Transfer Protocol (ftp) client.
This allows a generic interface from any ftp client to the HPSS store. File storage and retrieval can be performed.
Parallel File Transfer Protocol (pftp) client.
This client is distributed with the HPSS system and must be compiled on the client. The Parallel FTP Client spawns processes for each parallel connection, receives the parallel data streams, and issue reads / writes to the local storage. A compiled version can be downloaded for many systems from the DSSG
HPSS Interface Program (HSI) client.
This allows a shell-like environment for HPSS data and information regarding. A complete guide is provided. The HSI program is installed on the following research systems:

  • AVIDD - +hsi Softenv keyword
  • BigRed - +hsi Softenv keyword

Distributed Computiing Environment's Distributed File System (DCE/DFS) client.
DCE/DFS is a distributed application that manages information in the form of a file system. The systems supported are RS6000/AIX 4.3, HP/HPUX 10.2, SGI/IRIX 6.5.2 and above, Solaris 2.6 and Windows NT. The DCE/DFS client is installed on the Research SP, the SGI Origin system and the Steel cluster.

A Word about Permissions

ftp and pftp change file permissions when retrieving files from HPSS. Files are always retrieved with with UNIX permissions of 600 - owner read and write only. Even if a file was stored as executable, it will be not be retrieved with the executable bit set, and so will not be executable, until its permissions are changed. To add owner execution permissions to the file "prog", for example, one could use "chmod 700 prog", or its symbolic variant, "chmod u+x prog". See the man pages for chmod (i.e. "man chmod") on any NERSC computer for more information on this command.

Accessing HPSS from a Batch Job

Batch access to HPSS is currently available via ftp/pftp, and HSI. HSI and pftp offer automatic DCE authentication, but ftp requires explicit login; it is NOT a good idea to put passwords in batch scripts, so the use of ftp in batch jobs is strongly discouraged. pftp offers superior performance in handling multiple file transfers and large files, and HSI offers more user convenience features, so either is a better choice than ordinary ftp.

HSI should prove more convenient and easier to use in batch scripts than ftp/pftp, due to its flexibility and more powerful commands. For more information on using HSI, see the HSI reference .

To allow HSI to run without prompting for a password a startup file named .hsirc needs to be in the home directory of a user. The file should contain, minimally, the following:


principal = "username"
pwfile = "home_directory_path_for_user"/.private/.pwfile

The pwfile, named .pwfile in this example, contains your DCE password in plain text. The pwfile permissions should be updated so that it is readable by just the owner.

Once the startup file is configured correctly a noninteractive script such as the following can be employed:


#!/bin/sh
atime=`date +"%Y%d%m"`
tar cf - ./myhome | gzip | /usr/local/bin/hsi save -: backup/myhome-$atime.tar.gz

If ftp is to be used in other sorts of shell scripts, for instance to access HPSS from machines outside IU, user authentication can be achieved through the use of a ".netrc file". This file can be placed in the home directory on the accessing computer, and most ftp clients will be able to use it.

We recommend that you invoke both ftp and pftp with the "-v" (verbose) option, to force the display of all responses from the HPSS servers, and to provide data transfer statistics. This information can be useful in tracking any problems that might occur during your ftp/pftp session. Another useful option is "-i" (noprompt), which turns off prompting during multiple file transfers; this can also be achieved through the use of the "prompt" toggle command, within an ftp/pftp session. See the ftp/pftp man pages or the ftp/pftp "help" command output, for more information on these commands.

Both ftp and pftp are interactive utilities. In order to use either from within a batch script, it is necessary to provide commands for them in a non-interactive context. This can be done by redirecting a text file containing ftp/pftp commands into the standard input file, "stdin". This normally means a separate input file must be produced, but it can also, and more easily, be done by a trick known as a "here document." Essentially, the trick is to use special quoting brackets for the command input lines to ftp/pftp, and put them immediately after the ftp or pftp command in the script file. This technique is illustrated in the section on Here Docs, below.

The .netrc File

If you want to automate the login process for ftp commands you need a .netrc file in your home directory. NOTE: This is not needed for pftp. The format of each entry in the .netrc file is as follows:

machine <login_hostname>
login <user_name>
password <password>

A single .netrc file can contain information for multiple remote systems.

Please be security conscious when setting up your storage and IU research system access passwords. For security reasons, your .netrc file must not have any "group" or "world" permission bits set; ftp and pftp will not honor the contents of a .netrc file which allows group or world access. When you execute the "ls -l" command, the .netrc entry should appear as:

 -rw------- 1 usrid   394 Jan 15 18:35 .netrc

The permissions here are specified by the leftmost field, which shows read and write access is allowed only for the owner of this .netrc file. If you see different permissions on your .netrc file, you should immediately type

chmod 600 .netrc

and then repeat the "ls -l" command, to make sure the permissions were successfully changed. For information on changing file permissions, see the man pages for the "chmod" command.

About Here Documents

A "here document" or "here-doc" is a Unix idiom that allows utilities which normally get their commands interactively, to take them instead from the immediately following lines of a file. (Technically, this is a redirection of the utility's standard input channel, connecting it to the current file.) This is very convenient for using interactive utilities in shell scripts (e.g., batch scripts).

The presences of a here-doc is signalled by the special redirection character pair, "<<", which ends the utility invocation line. The lines that constitute the "here document" immediately follow this line, which contains the utility's name and any options. The here-doc lines are embedded in a pair of bracket strings. The bracket strings most often used are single non-alphanumeric characters, such as "*", "+", etc., but informative alphabetic strings can also be used, as the example below will show. The here-doc command lines immediately follow the opening bracket and immediately precede the closing bracket. The positioning of the brackets is critical; they must be the last character(s) before the first command line, and the first character(s) after the last line.

Consider the following simple ftp session, which, if copied into a file and executed as a shell script, will perform the actions explained in the next paragraph:


   pftp -v -i archive <<HEREDOC
   cd HPSS_directory
   mget data*
   quit
   HEREDOC

This example will execute the FTP commands between the "HEREDOC" strings. As stated above, these bracket strings could each be a single character, such as "+", but they must be the same character. The opening bracket must be positioned immediately after the redirection characters, "<<", to open the "here document"; the closing bracket must be the first thing in the first line (in the first column!) after the last command, to close the "here document".

The above example first changes to the directory named "HPSS_directory", and then fetches all the files with names beginning with "data" from that directory. The destination directory is the working directory at the time the above script fragment executes.

The ftp command is given two options and a destination system to connect to ("hpss"). The -v option specifies verbose output from the ftp utility; this causes it to show all responses from the HPSS ftp server and report data transfer statistics. The -i option is used to turn off interactive prompting. NOTE: The use of these two options is recommended in here documents! If prompting is not disabled, in batch-job-based file transfers, the mget action will request user confirmation for each individual file name; since batch sessions allow no interaction, the prompts will go unanswered, stalling the file transfer operation and the rest of the batch job. (In interactive contexts, prompting can be toggled with ftp's prompt" command.)


More information regarding HPSS can be found at the DSSG web site.