Shinichi Yamagiwa (Kochi University of Technology)

Tutorial 1

Title: TSUBAME2.0: A Petascale GPU-accelerated Supercomputer

Speaker : Toshio Endo (Tokyo Institute of Technology, Japan)

Abstract: General-purpose graphics processing unit (GPGPU) computing technology is becoming attractive and popular because of its superior power-performance ratio. By using this technology, Tokyo Institute of Technology has installed a new petascale supercomputer, called TSUBAME2.0 in November 2010. This system includes 4,224 NVIDIA Tesla M2050 GPUs distributed over 1,408 computing nodes, and enjoys peak performance of 2.4 Petaflops, which became No.4 supercomputer in the world as of 2010. While users can utilize this system as a large Linux cluster by running CPU parallel programs, they can largely accelerate the performance with GPGPU technology. The area of accelerated applications on TSUBAME2.0 includes weather simulation, earthquake simulation, molecular dynamics application and DNA analysis, etc. TSUBAME2.0 also adopts other technologies to achieve “green” petascale supercomputer that accommodates real large scale applications, including 7PB shared storage, solid state drives (SSD) as local storage, and modular cooling system racks.

Tutorial 2

Title: Invitation to OpenCL

Speaker : Pablo Lamilla Alvarez (Kochi University of Technology, Japan)

Abstract: The recent advancement in GPU technology has attracted researchers who need intensive computing to the GPU-based computing (GPGPU) field because of its high and inexpensive performance. However, GPGPU programming platforms are traditionally vendor- or hardware-specific, which complicate the access to the computer power of heterogeneous processors from a single host. The recently released OpenCL is expected to become a standard for massively parallel heterogeneous processors. This tutorial introduces the OpenCL, explaining the characteristics of the environment and describing in detail the basic structure of OpenCL program. The tutorial also presents and evaluates various techniques to improve the performance of OpenCL applications.

Tutorial 3

Title: All about RICC: RIKEN Integrated Cluster of Clusters

Speaker: Maho Nakata (RIKEN, Japan)

Abstract: This is an introduction to RIKEN's supercomputer RICC (RIKEN Integrated Cluster of Clusters) that has been in operation since August 2009. Its total performance is 93.3TFlops, which was ranked 40th in the top 500 supercomputers, and was the fastest PC cluster system in August 2009. One of the motivations for RICC was to create an environment for programmers to develop software for the next generation's supercomputer systems that may be massively parallelized using GPUs. RICC can run computational jobs on 8192 cores. This number can be increased with GPU accelerators. This super parallel PC clusters consists of 1024 nodes where each node contains 8 cores, 12G bytes memory, and 500G bytes of storage. These cores are interconnected via InfiniBand.

Tutorial 4

Title: Virtualized Development and Testing of Embedded Cluster Computing

Speaker :Ian Vince McLoughlin (Nanyang Technological University, Singapore)

Abstract: Embedded cluster computing is a growing and interesting field that may significantly increase in importance within the next decade. It is the intersection of distributed or parallel computing with embedded systems; usually characterised by small, inexpensive and low power compute elements. It is one of the driving technologies behind terms such as ubiquitous and pervasive computing, ambient intelligence and computing everywhere.

In thus tutorial, we consider the development of such systems,in particular embedded Beowulf-style clusters: collections of asynchronous embedded computers running the Linux operating system. We look at the development difficulties in building these with reference to one of the earliest examples of such systems, using ARM processors, built for fault-tolerant low-power computing in space. We consider the issues faced by software and hardware teams during prototyping, development, testing, integration and validation of these types of systems. The QEMU dynamic code translation emulator will then be introduced as a useful tool in the development of embedded clusters, with real-world examples given of its use and performance analysis.