banner
 
arrow pointer TDGC Website
 
Total Visitors since
28 Aug 2003: 379480
 
Valid XHTML 1.0!
 
Valid CSS!

Sciblade - Queuing System


For the efficient use of the cluster, two Monitoring/Job Management software (PBS/Torque and Maui) have been installed.

After logging into to the cluster, the user is on the master node. When a program is run, it is also immediately run on the master. This is the "interactive mode", which is convenient for running simple commands like ls, vi, etc. or for editing/compiling a program. But, long computing jobs should be submitted through the queuing system. The submitted job will be in a queue waiting for its turn, then will be sent to one or more compute node(s), which the job will have dedicated access to until it finishes. Therefore, the job will run faster and the cluster will be more efficiently utilized.

Basic Commands

Some basic commands that every cluster user should know before they start running jobs on these system:

CommandDescription
qsubTo submit a job to the queuing system
qdelTo delete a job that has been submitted to the queuing system
qstat / showqList all information about queues and jobs

Sample PBS job scripts

  • PBS job script for Parallel MVAPICH1
  • PBS job script for Parallel MVAPICH2
  • PBS job script for Parallel OPENMPI
  • PBS job script for Parallel NAMD2
  • PBS job script for Serial job
  • Submit Your Jobs

    Submit your batch job from the frontend with the command

    $ qsub [job_script]

    You get the job_name and job_id assigned, which can be used with various command.

    Monitor Your Jobs

    To see the progress information of running jobs, the command showq(Maui) and qstat(Torque) can be used. Both commands give you a summary of the status of submitted jobs and queues They give slightly different types of information. qstat shows a list of all running and waiting jobs in the queue, sorted by job identifier.

    [user_y@sciblade myjob]$ qstat
    Job id                    Name             User            Time Use S Queue
    ------------------------- ---------------- --------------- -------- - -----
    256.sciblade              J01_16           09432411        22:58:50 R default        
    258.sciblade              gau-16_4         user_x          00:23:11 R default
    272.sciblade              cpi_test         user_x                 0 Q default
    281.sciblade              q22p128          user_y          00:00:00 R default	
    	

    Here you can see that the submitted job 281 is in the state of running (R), while job 272 of user_x is waiting (Q). To get more detailed information, use qstat -a or qstat -f [job id]

    showq sorts the jobs in three categories: running, idle and blocked. Idle jobs will start when processors become available. Blocked jobs will become idle when the queue system rule allow it(e.g. when a user no longer has the maximum allowed number of processors used).

    [user_y@sciblade myjob]$ showq
    ACTIVE JOBS--------------------
    JOBNAME            USERNAME      STATE  PROC   REMAINING            STARTTIME
    
    281                  user_y    Running   128  1:58:47:47  Thu Oct  8 15:01:11
    258                  user_x    Running    64  3:36:44:45  Wed Oct  7 15:50:16
    256                09432411    Running    16 11:01:01:26  Thu Sep 24 00:30:57
    
         5 Active Jobs    208  of  2048 Processors Active (10.15%)
                           13  of   256 Nodes Active      (5.08%)
    
    IDLE JOBS----------------------
    JOBNAME            USERNAME      STATE  PROC     WCLIMIT            QUEUETIME
    272                  user_x       Idle  480   7:15:00:00  Wed Sep 30 17:22:21
    
    
    1 Idle Jobs
    
    BLOCKED JOBS----------------
    JOBNAME            USERNAME      STATE  PROC     WCLIMIT            QUEUETIME
    
    
    Total Jobs: 4   Active Jobs: 3   Idle Jobs: 1   Blocked Jobs: 0
    	

    Please note that sometimes it takes a minute for submitted job to showq up under showq.

    Another difference is that qstat shows time used for running jobs, while showq displays time left until the job will be killed by the queue system. When a job has finished it will no longer appear in the qstat or showq output.

    Besides, the web based cluster monitor Ganglia (available from http://clustername/ganglia) is a very helpful tools to monitor the compute-node loading/status.

    To delete a running job, use

    $ qdel [jobid]

    Frequently Used PBS Command

    PBS supplies a command line interface. This is used to submit, monitor, modify, and delete jobs. The following are some frequent used PBS user commands and their functions:
    CommandDescription
    qsubSubmit a job
    qstatList all information of queues and jobs
    qdelDelete a job
    qholdHold a batch job to keep it from being scheduled for running
    qmoveMove a job to a different queue or server
    qmsg Append a message to the output of an executing job
    qrerunTerminate an executing job and return it to a queue
    qrlsRelease a held job
    qsigSend a signal to an executing job

    Frequently Used qsub option

    OptionAction
    qsub -l listSet job resource list
    qsub -N jobnameSet job name to jobname
    qsub -q destSubmit to queue dest

    The resource requested on command line has a high preference than the directive line in the script file. For an example, submit job by command qsub -l nodes=2:ppn=4 [jobscript]
    this job will run on 2 compute nodes with 4 processors each instead of what stated in the script file.

    Frequently Used qstat option

    OptionAction
    qstat -aList all jobs
    qstat -qList all queues on the system
    qstat -nList
    qstat -u useridList all jobs owned by user userid
    qstat -rList all running jobs
    qstat -f jobidList all information known about specified job(jobid)


    Privacy Policy    ©2002-2024 Hong Kong Baptist University. All Rights Reserved.