banner
 
arrow pointer TDGC Website
 
Total Visitors since
28 Aug 2003: 380492
 
Valid XHTML 1.0!
 
Valid CSS!

Hpcsim - Frequently Asked Questions


General Questions

Compiling System Questions

Running Jobs



General Questions

Q: How do I get an account on hpcsim?

Hpcsim cluster is reserved for research project. Application must be under the advice of the Project Leader with Assistant Lecturer grade or above. To apply for an account, you must contact the hpcsim administrators.

Q: How do I connect to hpcsim system?

Only Secured (ssh) connections to the hpcsim system are supported. All insecure methods of connection (rlogin, rsh, telnet) are disabled.
From any computer, issuing the command

ssh [-I login_name] hpcsim.sci.hkbu.edu.hk

or

ssh login_name@hpcsim.sci.hkbu.edu.hk

For an example, ssh user@hpcsim.sci.hkbu.edu.hk will prompt the user of his/her password and get connected to the hpcsim system. You will log on to the master node and you should be in your home directory which is also accessible by compute nodes.
If your local computer system does not support SSH, please install the SSH software [ SSH or Open SSH] before connecting to hpcsim.
For Microsoft Windows system, a free client called PuTTY is available here: http://www.chiark.greenend.org.uk/~sgtatham/putty/

Q: How do I change my password?

Use the command "passwd"

% passwd

Every user is recommended to change the password at his/her very first login to the system, as well as at regular intervals.
A good password will have a mix of lower- and upper-case characters, numbers, and punctuation marks, and should be at least 6 characters long. Some people like to put a punctuation mark/number in the middle of a word, e.g., Ha&pp6Y.

Q: How do I transfer files between hpcsim and another computer system?

You must use a utility that used the SSH protocol. Examples are Secure CoPy (scp)and SSH File Transfer Protocol (sftp).
Using "scp", you can use be very flexible in moving data from one computer system to another. For example, if you want to copy files from the system where you issue the command to a remote destination system, you may use the command

% scp filename1 filename2 user@remote_host:/dest/dir/for/file/

You may also copy a directory recursively, using the option "-r", for example,

% scp -r directory user@remote_host:/dest/dir/

Of course, using "scp" you can also copy files from a remote system to the computer you are logged in. The command would be

% scp user@remote_host:/dir/remotefile /dest/dir/file

Another powerful tool for file transfer is "sftp". Not only "sftp" is secure, it is also much more convenient due to its recursive transfer of directories. On the machine where you are logged in (such as hpcsim), just issuing the command

% sftp remote_host

will establish the connection. Then, if you use the sftp command

sftp> get dir_of_files

you will get all the files under the directory dir_of_file recursively. The sftp commands are very similar to the conventional "ftp". The only thing one should pay particular attention to is that "sftp" must be connected to a machine which is running ssh2 service.

Q: How do I get files from the web to hpcsim?

You can use the command "wget":

% wget http://www.abc.com/file1.doc

Q: Where can I find news about hpcsim?

Read the login message of the day (motd). This is one forum where important changes are announced. We also send out announcement to all users by e-mail about important modifications (such as software upgrade) of the system. We also announced planned system downtimes beforehand so that users can get prepared.

Q: Where should a new hpcsim user get started?

New hpcsim users should read through the HPCSIM web page, including this FAQ, to familiarize themselves with the system.

Q: Is the data stored on hpcsim automatically backed up?

All the data are backed up by the administrators daily on a backup server. Nevertheless, we should emphasize the different levels of importance of computer data. Most of the data on hpcsim are generated by programs, and could be regenerated, in case of necessity. The source code, on the other hand, are very precious to the developers, and could not easily be regenerated. The users are highly recommended to do everything possible for ensuring those critical material (programs etc.) never be lost. For example, the users should keep multiple copies of these important files on different computer systems, including users' local system. Use "scp" or "sftp" to move your data off hpcsim.

Q: Where can I run X-windows applications from?

Normally, only from master, the hpcsim front-end. This includes: emacs, vim, etc. Please realize that these applications soak up considerable bandwidth. Having to share bandwidth is one of the reasons such applications do not always run smoothly.

Q: How can I remote login to hpcsim through GUI?

You can connect to hpcsim through GUI using VNC viewer. If your local computer is using Windows environment, you can download vncviewer from http://www.uk.research.att.com/vnc/. Then you can connect the the cluster by following the instruction listed in this file. Set your research machine hostname as "hpcsim.sci.hkbu.edu.hk".

Q: Can I install software on hpcsim?

Please minimize the installation of downloaded software for personal use. If it is reasonable software to have, it should be installed in /u1/local. Please contact administrator to request software that you need to use.

Q: My job did not clean up correctly what do I do?

Please email to administrator and request that your job be cleaned up. Be sure to include your job ID and do not start any new jobs until the administrator has cleaned up your jobs for you. Also, if you have more than one job running and you wish for the administrator to clean up a subset of the compute nodes please specify which nodes that you want cleaned.

Q: How do I monitor the cluster?

The "frontend" node of the hpcsim cluster serves a set of web pages to monitor its activities and configuration. (See the hpcism webpage: http://hpcsim.sci.hkbu.edu.hk/.)
The web pages available from this link provide a graphical interface to live cluster information provided by Ganglia monitors running on each cluster node. The monitors gather values for various metrics such as CPU load, free Memory, disk usage, network I/O, operating system version, etc.

Q: How do I run a shell command on all the compute nodes?

Use the command: cluster-fork [COMMAND]
For example, to check the load on all compute nodes, you would type this:

% cluster-fork uptime

To execute "ps" on all the nodes and check the processes for USER, you would type this:

% cluster-fork ps -U$USER

Q: How do I run a shell command to check the processes on all the compute nodes?

Use the command: cluster-ps [PATTERN]
For example, to check processes of user Sam on all the compute nodes, you would type this:

% cluster-ps Sam (similar function as "cluster-fork ps -USam")

To check which compute nodes are running matlab processes, you would type:

% cluster-ps matlab



Compiling System Questions

Q: What compilers are available on hpcsim?

Hpcsim supports the Intel compilers, the GNU compiler and the Portland Group (PGI) compilers. All of the above support Fortran 77,C and C++. However only the Intel and PGI Fortran compiler support Fortran 90.
Please see Software section for more details.

Q: How do I use the sequential Fortran and C compilers?

You can simply run the sequential Fortran and C compilers(gcc, f77, f90) as you would on any non-clustered system:
% gcc program.c -o program
% ifort program.f90 -o program
% pgf77 program.f -o program

Q: Where are the commands to compile my MPI program (mpicc, mpif90, etc.)?

Both MPICH and LAM/MPI have been installed on the cluster hpcsim. Paths of the mpi compilers are listed in the section Software.

Q: How do I compile a mpi program to run on hpcsim?

First of all, make sure the version of the mpi compiler is correct(mpich-icc, mpich-pgi, lam mpi, etc). For example, you would like to compiler a Fortran 77 mpi coded program, you can check the path of mpif77 by:
% which mpif77
/u1/local/mpich-icc/bin/mpif77
If you would like to use a PGI instead of a Intel mpif77 compiler, you can change the path by:
% export /u1/local/mpich-pgi/bin:$PATH
% which mpif77
you may see the path of mpif77 become "/u1/local/mpich-pgi/bin/mpif77" now.
Then you can compile the program by

% mpif77 program.f -o program

Other commands for compiling mpi programs:
% mpif90 program.f90 -o program
% mpicc program.c -o program
% mpic++ program.cpp -o program

Running Jobs

Q: What is the batch system?

The hpcsim cluster uses the Portable Batch System (PBS) with the Maui Scheduler for running jobs.

Q: Is there a sample batch script online?

Sample scripts are available on the clusters in the file /usr/local/doc/pbs/batch.sample.

Q: How do I submit a batch job?

You can submit a batch job using: qsub [SCRIPTNAME]
For example, you have prepared a pbs script "Qgaussian.pbs", you would submit the job by:

% qsub Qgaussian.pbs

Q: Why is it that jobs submitted after mine to the batch system have started running while my job is still waiting in the queue?

The Maui scheduler must reserve the nodes for your job before it can be run. Jobs that request fewer nodes and/or shorter wall clock limits may start earlier than other jobs.

Q: Is there a debug queue available on the clusters?

There is no debug queue available on hpcsim. However, jobs that request a small number of nodes for short time periods are more likely to start sooner than jobs that request a large number of nodes for long time periods.

Q: When I look at running jobs, it appears that many nodes are sitting idle. Why?

The Maui scheduler may be reserving nodes for system maintenance, a dedicated run, or a large job. During this time, jobs that will not finish before the scheduled event cannot be started.

Q: Which environment variables are not passed to a batch job, even when using qsub -V?

The following environment variables are not passed into a batch job, even when using the qsub -V option:
  LD_LIBRARY_PATH 
  LM_LICENSE_FILE 
  PATH 
  MANPATH 
These variables are reset by /etc/bashrc when the job starts.

Q: What do the following things mean in a batch job output file?

        resources_used.cput=06:22:44
        resources_used.mem=15774344kb
        resources_used.vmem=534948kb
        resources_used.walltime=00:07:46
These are the cumulative resources used by all nodes of the job. cput and walltime are in hh:mm:ss format. resources_used.cput is the CPU time used. resources_used.mem is physical memory used. resources_used.vmem is virtual memory used but this number is not valid and should be ignored. resources_used.walltime is the wall clock time of the job and is used to compute the service units charged.

Q: I am unable to kill my job. What does "qdel: Server could not connect to MOM ..." mean?

The PBS MOM (machine oriented miniserver) process (pbs_mom) is the process that starts the user's job script and ensures that it completes within it's allotted time. The above error usually indicates a problem with the pbs_mom on the compute node. Please report this error message to the administrators.

Q: My MPI program apparently aborted, but my batch job is still running. Why?

This is caused by a bug in VMI. You will need to kill your job manually using the command:

qdel job_id

where job_id is the ID of the batch job.

Q: When a job is submitted using qsub, I get an error saying "qsub: Job exceeds queue resource limits". What is wrong?

In the #PBS -l option, your job is requesting a resource that is unavailable. It may be too large a walltime limit, too many nodes, invalid ppn (must be 1 or 2), wrong value for resource (normally set to "prod"). See the batch sample script (/usr/local/doc/pbs/batch.sample) for an example of a valid resource request.

Last update: Oct 19, 2009 by Lilian Chan