banner
 
arrow pointer TDGC Website
 
Total Visitors since
28 Aug 2003: 391773
 
Valid XHTML 1.0!
 
Valid CSS!

sciblade - Frequently Asked Questions


General Questions

Compiling System Questions

Running Jobs



General Questions

Q: How do I get an account on sciblade?

All colleagues in Science Faculty who find the cluster useful for their computation and research are welcome to apply an account in the cluster. To apply for an account, you must contact the sciblade administrators or filling in your information in the online application form.

Q: How do I connect to sciblade system?

Only secured (ssh) connections to the sciblade system are supported. All insecure methods of connection (rlogin, rsh, telnet) are disabled.
From any computer, issuing the command

ssh [-I login_name] sciblade.sci.hkbu.edu.hk

or

ssh login_name@sciblade.sci.hkbu.edu.hk

For an example, ssh user@sciblade.sci.hkbu.edu.hk will prompt the user of his/her password and get connected to the sciblade system. You will log on to the master node and you should be in your home directory which is also accessible by compute nodes.
If your local computer system does not support SSH, please install the SSH software [SSH or Open SSH] before connecting to sciblade.
For Microsoft Windows system, a free client called PuTTY is available here:http://www.chiark.greenend.org.uk/~sgtatham/putty/

Q: How do I change my password?

Sciblade users can change their password by using the command "passwd"

% passwd

Q: How do I transfer files between sciblade and another computer system?

You must use a utility that used the SSH protocol. Examples are Secure CoPy (scp) and SSH File Transfer Protocol (sftp).
Using "scp", you can use be very flexible in moving data from one computer system to another. For example, if you want to copy files from the system where you issue the command to a remote destination system, you may use the command

% scp filename1 filename2 user@remote_host:/dest/dir/for/file/

You may also copy a directory recursively, using the option "-r", for example,

% scp -r directory user@remote_host:/dest/dir/

Of course, using "scp" you can also copy files from a remote system to the computer you are logged in. The command would be

% scp user@remote_host:/dir/remotefile /dest/dir/file

Another powerful tool for file transfer is "sftp". Not only "sftp" is secure, it is also much more convenient due to its recursive transfer of directories. On the machine where you are logged in (such as sciblade), just issue the command

% sftp remote_host

with your username and password at remote_host, which will establish the connection.

sftp> get dir_of_files

you will get all the files under the directory dir_of_file recursively. The sftp commands are very similar to the conventional "ftp". The only thing you should pay particular attention to is that "sftp" must be connected to a machine which is running ssh2 service.

Q: How do I get files from the web to sciblade?

You can use the command "wget":

% wget http://www.abc.com/file1.doc

Q: Where can I find news about sciblade?

Read the login message of the day (motd). This is one forum where important changes are announced. We also send out announcement to all users by e-mail about important modifications (such as software upgrade) of the system. We also announce planned system downtimes beforehand so that users can get prepared.

Q: Where should a new sciblade user get started?

New sciblade users should read through the Technical web page, including this FAQ, to familiarize themselves with the system.

Q: Is the data stored on sciblade automatically backed up?

All the data are backed up by the administrators daily on a backup server. Nevertheless, we should emphasize the different levels of importance of computer data. Most of the data on sciblade are generated by programs, and could be regenerated, in case of necessity. The source code, on the other hand, are very precious to the developers, and could not easily be regenerated. The users are highly recommended to do everything possible for ensuring those critical material (programs etc.) never be lost. For example, the users should keep multiple copies of these important files on different computer systems, including users' local system. Use "scp" or "sftp" to move your data off sciblade.

Q: Where can I run X-windows applications from?

Normally, only from master, the sciblade front-end. This includes: emacs, vim, etc. Please realize that these applications soak up considerable bandwidth. Having to share bandwidth is one of the reasons such applications do not always run smoothly.

Q: How can I remote login to sciblade through GUI?

You can connect to sciblade through GUI using VNC viewer. If your local computer is using Windows environment, you can download vncviewer fromhttp://www.uk.research.att.com/vnc/. Then you can connect the the cluster by following the instruction listed in this file.

Q: Can I install software on sciblade?

Please minimize the installation of downloaded software for personal use. If it is reasonable software to have, it should be installed in /u1/local. Please contact administrator to request software that you need to use.

Q: My job did not clean up correctly what do I do?

Please email to administrator and request that your job be cleaned up. Be sure to include your job ID and do not start any new jobs until the administrator has cleaned up your jobs for you. Also, if you have more than one job running and you wish for the administrator to clean up a subset of the compute nodes please specify which nodes that you want cleaned.

Q: How do I monitor the cluster?

The "frontend" node of the sciblade cluster serves a set of web pages to monitor its activities and configuration. (See the sciblade webpage: http://sciblade.sci.hkbu.edu.hk/.)
The web pages available from this link provide a graphical interface to live cluster information provided by Ganglia monitors running on each cluster node. The monitors gather values for various metrics such as CPU load, free Memory, disk usage, network I/O, operating system version, etc.

Q: How do I run a shell command on all the compute nodes?

Use the command: cluster-fork [COMMAND]
For example, to check the load on all compute nodes, you would type this:

% cluster-fork uptime

To execute "ps" on all the nodes and check the processes for USER, you would type this:

% cluster-fork ps -U$USER

Q: How do I run a shell command to check the processes on all the compute nodes?

Use the command: cluster-fork ps [PATTERN]
For example, to check processes of user Sam on all the compute nodes, you would type this:

% cluster-fork ps -USam

To check which compute nodes are running matlab processes, you would type:

% cluster-fork matlab



Compiling System Questions

Q: What compilers are available on sciblade?

sciblade supports the Intel compilers, the GNU compiler and the Portland Group (PGI) compilers. All of the above support Fortran 77, C and C++. However only the Intel and PGI Fortran compiler support Fortran 90.
Please see Software section for more details.

Q: How do I use the sequential Fortran and C compilers?

You can simply run the sequential Fortran and C compilers(gcc, f77, f90) as you would on any non-clustered system:
% gcc program.c -o program
% ifort program.f90 -o program

Q: Where are the commands to compile my MPI program (mpicc, mpif90, etc.)?

Both MPICH and LAM/MPI have been installed on the cluster sciblade. Paths of the mpi compilers are listed in the section Software.

Q: How do I compile a mpi program to run on sciblade?

First of all, make sure the version of the mpi compiler is correct(mpich-icc, mpich-pgi, lam mpi, etc). For example, you would like to compiler a Fortran 77 mpi coded program, you can check the path of mpif77 by:
% which mpif77
/u1/local/mvapich1/bin/mpif77
If you would like to use a PGI instead of a Intel mpif77 compiler, you can change the path by:
% export /u1/local/mvapich1/bin:$PATH

% which mpif77
you may see the path of mpif77 become "/u1/local/mvapich1/bin/mpif77" now.
Then you can compile the program by
% mpif77 program.f -o program
Other commands for compiling mpi programs:
% mpif90 program.f90 -o program
% mpicc program.c -o program


Running Jobs

Q: What is the batch system?

The sciblade cluster uses the Portable Batch System (PBS) with the Maui Scheduler for running jobs.

Q: Is there a sample batch script online?

Sample scripts are available on the clusters in the directory /u1/local/share/example/pbs/.

Q: How do I submit a batch job?

You can submit a batch job using: qsub [SCRIPTNAME]
For example, you have prepared a pbs script "Qgaussian.pbs", you would submit the job by:

% qsub Qgaussian.pbs

Q: Why is it that jobs submitted after mine to the batch system have started running while my job is still waiting in the queue?

The Maui scheduler must reserve the nodes for your job before it can be run. Jobs that request fewer nodes and/or shorter wall clock limits may start earlier than other jobs.

Q: Is there a debug queue available on the clusters?

There is no debug queue available on sciblade. However, jobs that request a small number of nodes for short time periods are more likely to start sooner than jobs that request a large number of nodes for long time periods.

Q: When I look at running jobs, it appears that many nodes are sitting idle. Why?

The Maui scheduler may be reserving nodes for system maintenance, a dedicated run, or a large job. During this time, jobs that will not finish before the scheduled event cannot be started.

Q: Which environment variables are not passed to a batch job, even when using qsub -V?

The following environment variables are not passed into a batch job, even when using the qsub -V option:
  LD_LIBRARY_PATH 
  LM_LICENSE_FILE 
  PATH 
  MANPATH 
These variables are reset by /etc/bashrc when the job starts.

Q: What do the following things mean in a batch job output file?

        resources_used.cput=06:22:44
        resources_used.mem=15774344kb
        resources_used.vmem=534948kb
        resources_used.walltime=00:07:46
These are the cumulative resources used by all nodes of the job. cput and walltime are in hh:mm:ss format. resources_used.cput is the CPU time used. resources_used.mem is physical memory used. resources_used.vmem is virtual memory used but this number is not valid and should be ignored. resources_used.walltime is the wall clock time of the job and is used to compute the service units charged.

Q: I am unable to kill my job. What does "qdel: Server could not connect to MOM ..." mean?

The PBS MOM (machine oriented miniserver) process (pbs_mom) is the process that starts the user's job script and ensures that it completes within it's allotted time. The above error usually indicates a problem with the pbs_mom on the compute node. Please report this error message to the administrators.

Q: My MPI program apparently aborted, but my batch job is still running. Why?

This is caused by a bug in VMI. You will need to kill your job manually using the command:

qdel job_id

where job_id is the ID of the batch job.

Q: When a job is submitted using qsub, I get an error saying "qsub: Job exceeds queue resource limits". What is wrong?

In the #PBS -l option, your job is requesting a resource that is unavailable. It may be too large a walltime limit, too many nodes, invalid ppn (must be 1 or 2), wrong value for resource (normally set to "prod"). See the batch sample script (/u1/local/share/example/pbs/) for an example of a valid resource request.


Last update: Dec 1, 2009


Privacy Policy    ©2002-2024 Hong Kong Baptist University. All Rights Reserved.