HPC Running Status

Running your computation

The following step-by-step will guide you to set up your runs. You need a basic familiarity of the use of command line interfaces and basic usage of linux terminal.

1. Accessing the Master Node: To use the facility, you start by logging in to the master node using SSH. Please open a terminal and type

$ ssh -X <username>@10.0.51.200

This command initiates an SSH connection to the specified IP address (10.0.51.200) with X11 forwarding enabled, which allows for graphical applications to be displayed on your local machine. Please use your IISERK login credentials (username and passwords) for this step.

2. Master Node:

After the successful login, you will land on the home directory of the master node. You can create, edit, delete files and directories as you want inside your home directory.

The master node is not intended for heavy computational work. Instead, it is primarily used for job management and observations. You can run small test codes on this node, but for actual heavy computation, you should submit your jobs with proper instructions and resource specifications.

3. Job Submission:

The cluster uses the PBS (Portable Batch System) scheduler for job management and load balancing. This scheduler is used to submit, schedule, and manage jobs on the cluster. You would need to familiarize yourself with PBS commands and job submission procedures if you plan to run computational jobs on this cluster.

To perform computation, you need to write a PBS script (sometimes called a batch file). This script combines resource requirement specifications and shell commands typically used in the terminal. A sample PBS script is provided as follows:

#!/bin/bash
#PBS -N JobName
#PBS -l nodes=1:ppn=1
#PBS -l walltime=1:00:00
#PBS -q workq
#PBS -o stdout.out
#PBS -o stderr.out
cd $PBS_O_WORKDIR

source /etc/profile.d/modules.sh
module load python3/3.10.9

# Your shell commands go here
g++ my_code.cc
./a.out
python3 another_code.py

After creating your PBS script file (say myscript.pbs), you can submit the job using the qsub command, like so:

$ qsub myscript.pbs

Upon submission, you will receive a job ID as output.

4. Checking Job and Nodes Status:

You can check the status of your submitted jobs using various qstat commands. Here are some examples:

● qstat : Shows a summary of all jobs.
● qstat -a : Provides detailed information about all jobs.
● qstat -f <jobid> : Displays information for a specific job using its job ID.

To check the running status of the nodes, you can use the pbsnodes command. Here are some examples:

● pbsnodes -a : Shows a detailed status of compute nodes.
● pbsnodes -aSj : Provides a useful summary of node status.

Alternatively, you can visit the 'Running Status' page for job and node status.

For quick references on PBS commands, you can use the man command to access the manual pages. For example:

● man qsub : Shows a summary of all jobs.
● man qstat : Provides detailed information about all jobs.
● man pbsnodes : Displays information for a specific job using its job ID.

For more comprehensive references, please see PBS Professional User Guide.

Softwares and modules

Some important software packages are installed on the system. To use them, you can load the desired software/module using the module load command. For example:

$ module load <package-name>

Replace <package-name> with the actual name of the software package you want to load. The available list of modules can be obtained by typing

$ module avail

For example, use the following command to load python-3.10.9:

$ module load python3/3.10.9

Usage Policy

We do not have any strict usage policy for the users. Nevertheless, we reccommend the following practices for smooth running of the jobs and ensuring a productive computing environment for all users.

Responsible and Efficient Usage of the KEPLER Cluster:

1. Pre-Execution Check: Before using the KEPLER cluster, ensure that your job or program runs as expected on your local desktop. Initial troubleshooting should be done locally to reduce unnecessary load on the cluster.

2. Benchmarking: Benchmark your program or package to determine the optimal number of processors for efficient execution. Avoid assuming that more processors always mean faster execution, as excessive communication among processors can slow down the job, causing issues for the cluster.

3. Scaling Guidance: Refer to the documentation provided by the package provider to understand how to scale up your job execution effectively. This information is crucial for optimizing cluster usage.

4. Seek Experienced Help: Collaborate with experienced users within your research group or community for guidance before running a job on the KEPLER cluster. Their expertise can be invaluable in ensuring smooth execution.

5. Admin Assistance: In case of specific code-related issues or challenges, don't hesitate to reach out to the KEPLER cluster administrator. Although scheduling a meeting with the administrator may take time, it can lead to the best solution.

6. Documentation and Communication: If you encounter difficulties or problems while running jobs on the cluster, document the issues and communicate them to the administrator via email. Clear communication helps resolve problems efficiently.

Usage Recommendations for Smooth Cluster Operation:

1. Cluster Load Management: Avoid overloading the cluster by running only necessary jobs. Do not run jobs without a purpose or overload the system unnecessarily.

2. Resource Optimization: Minimize your resource requirements before starting a job to ensure efficient usage of cluster resources.

3. Efficiency Monitoring: Regularly check the efficiency parameter on the Running Status page to gauge the performance of your job. Adjust resource requirements for future runs accordingly.

4. Storage Limit: Do not exceed 500GB of storage on the hard drive. Use the /storage00/<username> directory for important backups. If no directory on your username is present inside /storage00/, please create one and set appropriate permission by the command `cd /storage00/; mkdir <username>; chmod 700 <username>`.

5. Data Transfer Efficiency: Keep data transfers to a minimum, considering hardware limitations.

6. Node Usage: Avoid running long jobs on the master node. Instead, submit longer computations to the compute nodes for optimal resource utilization. Avoid running non-gpu jobs on the GPU-enabled node (newton-gpu01).

Some examples

We kindly invite users to actively contribute to this section. If you have successfully configured and run a widely-used package within the Kepler cluster, we encourage you to share basic exemplars of popular packages in your respective fields. These exemplars should comprehensively include a well-documented primary code, a PBS (Portable Batch System) script, installation instructions where applicable, and any requisite supplementary files. Please share the files with us through a email.