Super Computing and Distributed Computing Camp
Universidad de Costa Rica, Sede del Atlántico en Turrialba
July 10-16, 2011
|
Parallel and Distributed Computing Essentials | ||
Time | Description | |
09:00 - 11:00 | Parallel Architectures and HPC Basic Concepts | |
11:00 - 11:30 | Part I: Cluster Services and Installation Procedures | |
11:30 - 12:00 | Part II: NETBOOT Environment and Troubleshooting | |
12:00 - 12:30 | Part III: Cluster Management Tools and Security | |
12:30 - 13:00 | Part IV: Hands-on_Laboratory | |
13:00 - 14:00 | Lunch | |
14:00 - 16:30 | Laboratory Part I: Installing a cluster and configuring cluster services | |
16:30 - 19:00 | Laboratory Part II: Parallel environment and resource management system | |
19:00 - 20:00 | Dinner | |
20:00 - 22:00 | Laboratory Part III: Serial and parallel job submission with Torque/Maui + openMPI |
Parallel Architectures and HPC Basic Concepts
09:00 - 11:00
Speaker: Gilberto Diaz
Through this presentation attends will have several lectures on the basis of HPC and distributed computing.
Part I: Cluster Services and Installation Procedures
11:00 - 11:30
Speaker: Moreno Baricevic
A brief introduction to HPC clusters, covering both the HW and the SW infrastructure, and an overview of some installation procedures for clusters and the services involved.
Part II: NETBOOT Environment and Troubleshooting
11:30 - 12:00
Speaker: Moreno Baricevic
Configuration, setup and troubleshooting of the network booting environment for remote, distributed and unattended deployment of computers. Overview of PXE, DHCP, TFTP, NFS, package repositories, Kickstart/Anaconda, system logging.
Part III: Cluster Management Tools and Security
12:00 - 12:30
Speaker: Moreno Baricevic
Overview of some management tools and protocols needed for the administration and the monitoring of the cluster status and the services involved; brief notes on security.
Part IV: Hands-on_Laboratory
12:30 - 13:00
Speaker: Moreno Baricevic
A brief introduction to the today's laboratory session about cluster installation and configuration.
Laboratory Part I: Installing a cluster and configuring cluster services
14:00 - 16:30
Speaker: Moreno Baricevic
The aim of this exercise is to install and configure the services needed to setup an unattended installation of a cluster. Each group of students is expected to configure the masternode, install missing packages and remove unwanted ones, configure the required services, and finally install one computing nodes. The same cluster will be used for the second part of the lab.
Exercises: http://edu.escience-lab.org/SCCamp2011/ClusterInstallationPart1
Laboratory Part II: Parallel environment and resource management system
16:30 - 19:00
Speaker: Moreno Baricevic
The aim of this exercise is to install and configure the batch system and the scheduler in order to setup a queue system to a parallel environment. The cluster set up during the first part of the lab is now going to be fully configured as a real-world HPC cluster. Each group of students is expected to download, compile and install the components of the queue system, and setup such services. It's also asked to create a test account for a user and setup NFS in order to export some filesystems. For administration tasks, the students are supposed to setup a passwordless environment and install a parallel shell.
Exercises: http://edu.escience-lab.org/SCCamp2011/ClusterInstallationPart2
Laboratory Part III: Serial and parallel job submission with Torque/Maui + openMPI
20:00 - 22:00
Speaker: Moreno Baricevic
The exercise will be considered successfully completed when the "user" will be able to submit a batch job and run an mpi parallel test on both the computing node and the masternode installed during the first part of the lab. Installing openmpi has to be considered part of the exercise.
Exercises: http://edu.escience-lab.org/SCCamp2011/ClusterInstallationPart2