CANDAR Keynote A
- Speaker: Hideharu Amano (Keio University)
- Title: Flow-in-Cloud: a multi-FPGA system for AI application
- Abstract: Flow-in-Cloud (FiC) is consisting of hundreds of FPGA boards each of which is connected with a power interconnection network. Since each board provides 32 full-duplexed 9.9Gbps serial links, the aggregated bandwidth is up to 316.8Gbps. The STDM (Static Time Division Multiplexing) switching is adopted for emulating future optical switch. Xilinx's cost efficient Kintex Ultrascale is used, and programmers can describe their application modules attached to the switch as HLS modules. Some CNN(Computational Neural Network) application are implemented on a system with 16 boards. HLS modules can be replaced with a dynamic reconfiguration without stopping the switch.
CANDAR Keynote B
- Speaker: Takeaki Uno (NII)
- Title: New Approaches for Clustering Problems
- Abstract: Clustering Problem is one of the most fundamental problems in data mining. It partitions the data into several groups, or find groups of entities, so that the entities in a group are similar to each other, or there are sufficiently big gap between two groups. The existing methods are not good at for finding many small clusters or middle sized ones, even though some of them take very long computation time. In this talk, we show the difficulty of this kind of unsupervised problems, and a new approach so called “data polishing” that can find small clusters with high accuracy. We also propose a hierarchical clustering method by meta-clustering of usual k-means algorithms. This kind of approaches would give a new type of HPC to data analysis.
- Speaker: Bruno Martin (Université Nice Sophia Antipolis)
- Title: Pseudo-random sequence generation with cellular automata
- In Wolfram’s seminal work in 1986, it was proposed to use cellular automata’s rule 30 as a pseudo-random sequence generator. It is now well known that rule 30 (and any rule picked out from elementary cellular automata) is not suitable as a good pseudo-random generator. in this talk, we will describe two main approaches to use cellular automata for generating stronger pseudo-random sequences. The former uses non-uniform cellular automata that combine different rules with various techniques to generate randomness. The latter proposes to enlarge the rule’s radius and takes advantage of Boolean functions theory to provide random sequences. Both approaches give pseudo-random generators that can successfully pass tests of randomness and can be used as a lightweight and high range pseudo-random generator. We will also present other approaches for generating stronger pseudo-random sequences with cellular automata and their applications in various fields.
- Speaker: Tutomu Murase (Nagoya University)
- Title: IoT Centric Network Architecture and Ad Hoc Network
- Abstract: Data from variety of IoT devices could become attractive data, Big Data, if they are appropriately collected and retrieved. Network architecture is important to transport IoT data while neither current internet architecture nor Information Centric Network architecture seems suitable in terms of mobility and lifetime of IoT data. Especially, massively and continuously generated data such as a picture frame from surveillance cam, web cam or drive recorder must be efficiently managed and retrieved. An architecture to fit IoT is addressed. Ad hoc network plays an important role to collect such IoT data as well as to form a part of 5G network. Regarding its autonomous and self-organizing nature, some frontier researches of many efforts to improve the performance of ad hoc network are addressed.
- Speaker: Kohta Nakashima (Fujitsu Laboratories)
- Title: PC cluster technologies in 20 years and future prospects
- Abstract: PC cluster systems, which are mainstream as major supercomputer implementation, have been developed for 20 years. Major components of PC cluster system are commodity based hardware such as x86 CPU, memory modules, GPUs and so on. The “commodity - based” approach not only reduces system hardware cost but also improves system performance with commodity component growth in the market; so that the number of PC clusters systems increased and occupied major part of top 500 list. On the other hand, it is difficult to achieve good performance with parallel applications and to stabilize the system with combination of commodity based component. To overcome the difficulty, system wide performance analysis technologies and network management technologies are developed. Such technologies realize the same performance and stability with PC cluster systems as HPC specific machine. In this talk, we show changing PC cluster systems and technologies to build and configure them for 20 years in hardware component and workload of systems point of view.
- Speaker: Fumihiko Ino (Osaka University)
- PACC: A Directive-based Approach for Accelerating Out-of-Core Stencil Applications on the GPU
- Abstract: In this talk, we will present a directive-based programming framework for accelerating out-of-core stencil applications on the graphics processing unit (GPU). Our framework, named PACC, is an extension of OpenACC directives, which are useful for generating high-performance GPU code from sequential CPU code. Our extension further facilitates code development by automatically generating complicated code required for software pipelining, data decomposition, and temporal blocking. We will briefly show how this code transformation is realized with the ROSE compiler infrastructure. Performance results obtained with 100 GB data will be presented for discussion.
- Speaker: Mitsuhisa Sato (RIKEN)
- Title: Omni Compiler project: An infrastructure for source-to-source transformation of directive-based programming language
- Abstract: Omni Compiler is an infrastructure for source-to-source transformation to design source-to-source compilers such as Omni XcalableMP compiler. XcalableMP is a directive-based language extension of Fortran95 and C for scientific programming for high-performance distributed memory parallel systems. We have been developing a compiler of PGAS programming language called XcalableMP for post-petascale computing. Omni Compiler includes C and Fortran95 front-ends which translate a source code to XML-based intermediate code called XcodeML, a Java-based code-transformation library on XcodeML, and the de-compilers which translate XcodeML intermediate code back to transformed source code. Currently, the Omni compiler also supports the code transformation for OpenMP and OpenACC. In this talk, we will present internals of Omni compiler by showing not only XcalableMP, but also several projects using this infrastructure, and our future
- Speaker: Yuichi Sudo (Osaka University)
- Title: Loosely-stabilizing Leader Election in the Population Protocol Model
- Abstract: The population protocol model, introduced by Angluin et al. in 2004, is a theoretical model for wireless and mobile sensor networks. A network consists of a large number of finite-state automata, called agents. Agents often make interactions (communication), each between a pair of agents, by which they update theirs states. They cannot distinguish their neighbors with the same states. This simple model has attracted a lot of attention and has been studied by many researchers. The population protocol model typically represents a network consisting of a large number of tiny sensing devices, hence the network is prone to faults of devices. Hence, fault tolerance is of importance in this model. However, we have negative results about fault tolerance: some fundamental problems such as leader election can not be solved in a self-stabilizing fashion. This talk presents how we can circumvent this impossibility and give fault tolerant protocols by the technique called “loose-stabilization”.
- Speaker: Masahiko Nakano (Keio University)
- Title: A microsystem design using standard CMOS technology with on-chip solar cell
- Abstract: In this paper, we show the design of the microsystem which autonomously operates only a chip by standard CMOS process. It is intended to operate only with chips created by CMOS process, Use an on-chip solar cell as an energy source. Since the proposed system operates only with chips, bonding and packaging are unnecessary and cost reduction becomes possible. First, we describe a method for improving the efficiency of solar cells fabricated by a standard CMOS process. Next, we will introduce a boost DC-DC converter that operates with the on-chip solar cell as an energy source. The output voltage boosts to 1 V or more at which a general circuit operates. In addition, we show a method to boost to 6 V or more aiming to write on fuse type memory on the chip, and the results of writing to memory are shown. Finally, we will conclude the research progress toward the realization of the microsystem.
- Speaker: Yukinori Sato (Toyohashi University of Technology)
- Title: Computer Systems Performance Engineering from Dynamic Binary Optimization to MISD-style Dataflow Hardware
- Abstract: As many deep learning programs has struggled with performance related issues even if it is implemented on the state-of-the-art CPUs or GPUs, lack of system-level performance is primary concern for up coming AI-based applications. Therefore, techniques that realize high-performance and high-efficiency computer systems are expected to be an enabler of emerging new AI applications such as self-driving cars and autonomous intelligent robots. In this context, we highlight custom computing systems specialized for particular application domains. We focus on specialization for memory access locality and attempt to improve its inefficient part by fully customizing access patterns towards target hardware, while it incurs challenges in productivity and costs for the hardware and software development process. In this presentation, we show our tool chains for system performance engineering so far developed such as transparent performance tuning system called ExanaDBT, on-line cache conflict detector called C2Sim, and polyhedral compilation based auto tile size optimizer called PATT. Also, we briefly shows our ongoing work that applies performance engineering technique to FPGA based MISD-style dataflow accelerators.
- Speaker: Satoshi Obana (Hosei University)
- Title: How to Guarantee Integrity in Secret Sharing
- Abstract: Secret sharing is a cryptographic primitive which enables a secret information to be distributed in such a way that only qualified sets of users are possible to reconstruct the secret, and other sets of users obtains no partial information about the secret. Secret sharing is now widely used to protect sensitive information in not only closed systems but also open system such as cloud storage. Secret sharing guarantees confidentiality and availability of the secret, though, ordinary secret sharing does not guarantee integrity of the reconstructed secret. In this talk, we will talk about secret sharing scheme that guarantees integrity. Here, we consider cheaters who try to forge secret by forging their shares in secret reconstruction phase. We introduce general techniques for cheating detection and cheater identification against such type of cheating, and we also discuss some open problems in this area.
Graph Golf Keynote
- Speaker: Takeru Inoue (NTT Laboratories)
- Title: Graph enumeration and its applications featuring compressed data structures
- Speaker: This talk presents subgraph enumeration techniques and their applications. Subgraph enumeration is a traditional but tough problem in computer science: finding all subgraphs in a graph under given constraints, e.g., paths, cycles, trees, degrees, size, and their combinations. Utilizing compressed data structures (i.e., decision diagrams), subgraph enumeration algorithms have recently been very fast, so as to solve several network problems including network optimization, fault analysis, and reliability evaluation. This talk also addresses an open-source software named Graphillion, which implements key enumeration algorithms like the frontier-based search.