CANDAR Keynote A
- Chair: Michihiro Koibuchi (NII)
- Speaker: Takekazu Tabata (Fujitsu Ltd.)
- Title: A64FX: Designed for HPC and AI applications
- Abstract: Fujitsu developed the A64FX processor that is optimized for HPC (High Performance Computing) and the AI(Artificial Intelligence). A64FX is used in Fugaku supercomputer which is a massively parallel supercomputer and a successor to K-computer. A64FX is based on Fujitsu microarchitecture, as used in our SPARC64 and mainframe processor development. It provides the world's top-class computing performance and memory bandwidth. To achieve high performance in a wide range of actual applications, we collaboratively work with RIKEN (co-design). For ISA (Instruction Set Architecture), Fujitsu chose to adopt the Armv8-A with SVE (Scalable Vector Extension) to best position the Fugaku to utilize and contribute to a broader user base. Fujitsu has collaborated with Arm as a lead partner and contributed to the development of SVE of the HPC and AI extension for the Armv8-A. A64FX is the first implementation of SVE in the world. And it also provides Fujitsu's proprietary extensions adopted in K-computer such as hardware barrier, sector cache and hardware prefetch. In this keynote, I would explain the processor features from the standpoint of microarchitecture and ISA.
CANDAR Keynote B
- Chair: Koji Nakano (Hiroshima University)
- Speaker: Yuji Shinano(ZIB)
- Title: Building optimal solutions to prize-collecting Steiner tree problems on supercomputers
- Abstract：SCIP-Jack is a customized, branch-and-cut based solver for Steiner tree and related problems. ug[SCIP-Jack, MPI] extends SCIP-Jack to a massively parallel solver by using the Ubiquity Generator (UG) framework. ug[SCIP-Jack, MPI] was the only solver that could run on a distributed environment at the (latest) 11th DIMACS Challenge in 2014. Furthermore, it could solve three well-known open instances and updated 14 best known solutions to well-known instances from the benchmark libary SteinLib. After the DIMACS Challenge, SCIP-Jack has been considerably improved. However, the improvements were not reflected on ug[SCIP-Jack, MPI]. An updated version of ug[SCIP-Jack, MPI] enabled us to use branching on constrains and a customized racing ramp-up, among others. The new features brought us the capability to use up to 43,000 cores to solve two more open instances from the SteinLib. SCIP-Jack solves not only the classic Steiner tree problem, but also a number of related problems. In this presentation, we show for the first time results of using ug[SCIP-Jack, MPI] on a problem class other than the classic Steiner tree problem, namely for prize-collecting Steiner trees. The prize-collecting Steiner tree problem is a well-known generalization of the Steiner tree problem and entails many real-world applications.
- Chair: Hiroaki Morino (Sibaura Institute of Technology)
- Speaker: Yasunori Owada (National Institute of Information and Communications Technology)
- Title: NerveNet: A Resilient Network and Edge Computing Platform for Smart Information Sharing
- Abstract: “NerveNet” is a distributed network and application platform developed by the NICT, which is aimed to achieve local information production and local information consumption in a regional area by using distributed network and edge computing nodes. It is possible to construct a layer 2 mesh structure network using Ethernet, and there is a function to autonomously construct multiple VLAN paths over the mesh topology to be able to automatically switch to the detour VLAN path in case of the link/node failure happened. In addition, it provides distributed application platform as a local cloud with a use of distributed information synchronization function operated by a distributed database system on the network node with using limited resources on the node such as computation, memory, and storage. This platform makes it possible to realize local communication and information sharing without depending on the cloud system on the Internet and realize a resilience of the network connectivity and the application service availability. This allows users to use the network and application services locally, independent of the internet or a cloud. In other words, even if a connectivity to a wide area network or the internet is not available in case of a large-scale disaster, the network and application service still available only by the remaining network nodes that are alive locally. Since all the functions of the latest NerveNet are packed into software, running on Linux PC including Raspberry Pi, it is easy to extend the scale. In this presentation explains the architecture and functional details of NerveNet system and introduces some use cases in Japan and other countries.
- Chair: Keiichiro Fukazawa (Kyoto University)
- Speaker: Takahiro Katagiri (Nagoya University)
- Title: Towards Auto-tuning Technology in Exascale Era
- Abstract: We face a difficulty to establish high performance of software on computer architectures in exascale era. Many possibilities of computer architectures are predicted toward to exascale era, such as accelerators based on Graphics Processing Unit (GPU), Field-Programmable Gate Array (FPGA), and quantum computers. For CPU point of view, data-flow computing is candidate in addition to many core CPUs. In memory level, high bandwidth has been established by using 3D stacking memory. However, deep heterogeneity of memory is still critical problem on performance in the memory level. To solve above issues of software performance toward to exascale era, it is said that auto-tuning (AT) is one of promising technologies. The speaker has been studying AT technology in this decade. In particular, the speaker has been developing an AT language, named ppOpenAT, to establish easy development of AT software from legacy codes to advanced computer architectures. In this talk, the speaker will talk about novel AT functions for ppOpen-AT toward to exascale era. This work was supported by JSPS KAKENHI, Grant-in-Aid for Scientific Research (S), entitled “Innovative Method for Computational Science by Integrations of (Simulation + Data + Learning) in the Exascale Era.”
- Chair: Jacir L. Bordim (University of Brasilia)
- Speaker: Max Plauth (Hasso Plattner Institute, University of Potsdam)
- Title: Bridging the Gap: Towards Energy-efficient Execution of Workloads on Heterogeneous Hardware
- Abstract: With electricity grids increasingly transforming into smart grids, a number of challenges are introduced for today's large-scale computing systems. To operate reliably and efficiently, computing systems must adhere not only to technical limits such as thermal constraints, but they must also reduce operating costs, for example, by increasing their energy efficiency. Efforts to improve the energy efficiency, however, are often hampered by inflexible software components that hardly adapt to underlying hardware characteristics. This talk discusses approaches to bridge the gap between inflexible software and heterogeneous hardware. Using adaptive software components that dynamically adapt to heterogeneous processing resources such as different hardware architectures and accelerator classes during runtime, energy efficiency of computing systems may be improved.
- Chair: Sayaka Kamei (Hiroshima University)
- Speaker: Shlomi Dolev (Ben-Gurion University of the Negev)
- Title: Algorithms for Optical, Quantum, Cloud and Swarm Computing
- Abstract: The multi-core technology is a limited answer to the difficulty in continuing Moore's law using a single core. The change is dramatic, and there is a need to redesign systems to utilize parallel and distributed capabilities from the level of the microprocessor, instead of merely speeding up the frequency and using the same algorithms as has been done for decades. As a side note, recall, that some tasks are sequential in nature, and so cannot be sped up via the use of several cores. New computing technologies implies new opportunities for using parallel and distributed algorithms. The talk will overview several technologies that can and are influencing the computing capabilities. Examples for algorithms designed for optical computing devices, for quantum computing and for cloud and swarm computing will be described.
- Chair: Nobuhiro Nakano (Keio University)
- Speaker Yasuhiro Kawahara(The University of Tokyo)
- Title: Wireless Power Transmission for Sustainable Computing Systems
- It's not ideal to have a wired power supply or a heavy battery to power mobile devices and sensors. In order to realize the sustainable operation of such IoT devices, wireless power supply technology and energy harvesting (environmental power generation) play important roles. Research on wireless power supply has a long history and many studies have been conducted, but the research is limited to the research and development of fundamental technologies. In this research, we focus on the interaction between people and objects and discuss how to power small objects moving around in a large space and how to set up an energy supply infrastructure in the space.
- Chair: Yasuyuki Nogami (Okayama University)
- Speaker: Sumio Morioka (Interstellar Technologies Inc.)
- Title Information Theoretically Secure Communication for Spacecrafts in the NewSpace Sector
- Abstract: Rapid increase in the number of spacecrafts made by private commercial companies is one of the main trends in recent space development activities. The movement is called as NewSpace, in contrast with traditional space development driven by national governments. It is expected that thousands of small satellites will be launched by small launch vehicles (space rockets) in the next 10 years, in order to construct worldwide communication and remote sensing networks, where small satellites communicate not only with ground stations but also with each other. In this trend, low cost yet highly secure wireless communication is desired for making the space networks safe and reliable. In this talk, we will introduce that information theoretic security can be achieved in critical spacecraft communication systems, while most of the communication systems on the ground or in the air are based on computational security. We have designed an information theoretically secure communication protocol for small spacecrafts, and have been conducting flight tests on a space launch vehicle MOMO developed by Interstellar Technologies Inc. in Japan.
- Chair: Shuichi Ichikawa (Toyohashi University of Technology)
- Speaker: Yuichiro Shibata (Nagasaki University)
- Title: Functional Safety Oriented Heterogeneous Redundant Design for FPGA Computing
- Abstract: Application domains of FPGAs has been increasingly expanding, reflecting great interests as a promising acceleration approach for energy-efficient computing. Now FPGAs are staring playing an important role even in mission-critical application fields such as self-driving vehicles and industrial infrastructure. In these applications, since a high degree of functional safety is required, simple homogeneous redundant logic design, where the same hardware modules are simply replicated, is not enough. In order to improve tolerance to common cause fault, heterogeneous redundant design, in which several different implementation approaches are taken to realize the same logic functionality, is crucial. However, manual redundant design tends to place a burden on designers, reducing productivity of FPGA application development. In this talk, some systematic heterogeneous redundant design approach are introduced and discussed, focusing on design diversity in various phases of an FPGA design flow. Some experimental results are also shown.
- Chair: Ikki Fujiwara (NII)
- Speaker: Kohta Nakashima (Fujitsu Laboratories)
- Title: Conservative Interconnect of Large-scale HPC systems
- Abstract: Interconnect technologies are quite important for large-scale HPC systems to achieve best performance. To realize most efficient systems, many novel network topologies have been proposed from academia researchers. Industrial and business people also require low latency, high bandwidth, and cost-effective interconnect systems. If the network diameter can be reduced, the maximum latency can be reduced. If the average number of hops can be reduced, the bandwidth can be improved. If someone find a way to connect more number of servers with few network fabrics, the system cost can be reduced. These are reason why many researcher search for novel network topologies and efficient network systems. However, there are very few systems to adopt network topologies to be proposed latest research activities and almost all practical HPC systems such as ranked in top500 only adopt several types of traditional network topologies. This talk shows property that the interconnect for HPC systems should have, exemplifying actual large-scale HPC systems.