Short Course 2 (Joint)

* If you encounter menus do not work upon clicking, delete your browser's cache.

Enabling a Future of Even More Powerful Computing

Moderators: Kentaro Yoshioka (Toshiba) and Seung H. Kang (Qualcomm)

This short course addresses future directions of technology and circuits for high-performance computers, GPU-based AI accelerator, supercomputer for deep learning, in-memory, neuromorphic, quantum, and quantum-inspired computers.

Live Q&A Session: June 14, 7:00AM-8:30AM (JST)

Acceleration of Tomorrow's Computational Challenges, Gabriel Loh, Advanced Micro Devices

The computing industry has already been facing a diverse set of challenges brought on by the slowing of Moore’s Law, the end of Dennard Scaling, the computational demands of our new age in artificial intelligence, and rapidly evolving and expanding application use cases. However, the demand for additional compute capabilities across mobile, edge, home, office, cloud, and supercomputer platforms will likely accelerate over the coming years. To address these computational demands and challenges through the end of the decade, we will discuss three key trends for future computer systems. The first is a complete pervasiveness of intelligence, although “intelligence” will be expanded beyond the current focus on deep machine learning. The second is viewing all workloads as opportunities for specialization and acceleration. The third is pushing modular design principles to the next level, encompassing both hardware and software, to enable the productive design, implementation, and utilization of these future compute platforms.

3D-Structured Monolithic and Heterogeneous Devices for Post-5G System Applications, Yoshihiro Hayashi, Keio University

In upcoming post-5G systems, AI-centric infrastructures will be connected via low-latency RF and photonic networks with local servers and IoT edge devices. These digital innovations will be realized by scaled-down, ultra-low-power semiconductor devices with new functional materials in 3D configurations implemented as either nano-scaled monolithic or macroscopic heterogeneous integration. In this short course, recent 3D monolithic and heterogeneous innovations are reviewed, and their impact will be discussed on the performance and architectural leaps forward toward the computation and communication infrastructures required in post-5G systems.

Accelerated Computing: Latest Advances and Future Challenges, Ben Keller, NVIDIA

With the "free ride" of Moore's Law and Dennard scaling drawing to a close, today's silicon designers must aggressively pursue innovation from devices and circuits to software and systems. This presentation will highlight the full-stack innovations of the NVIDIA A100 datacenter GPU that enable a 20X leap in deep learning performance compared to its predecessor. I will then discuss ongoing efforts in NVIDIA Research to drive continuous innovation in chip design, including package-level integration, the optimization of deep learning inference accelerators, and fine-grained adaptive clocking for aggressive margin reduction.

Next-Generation Deep-Learning Accelerators: From Hardware to System, Yakun Sophia Shao, SK Hynix/University of California, Berkeley

Machine learning is poised to substantially change society in the next 100 years, just as how electricity transformed the way industries functioned in the past century. In particular, deep learning has been adopted across a wide variety of industries, from computer vision, natural language processing, autonomous driving, to robotic manipulation. Motivated by the high computational requirement of deep learning, there has been a large number of novel deep-learning accelerators proposed in academia and industry to meet the performance and efficiency demands of deep-learning applications.
To this end, this short course will cover various aspects of deep-learning hardware from a system perspective, including deep-learning basics, hardware & software optimizations for deep-learning, system integration, and compiler optimization. In particular, we will discuss challenges and opportunities for next-generation of deep-learning accelerators, with a special focus on system-level implications of designing, integrating, and scheduling of future deep-learning accelerators.

Hardware for Next Generation AI, Dmitri Nikonov and Amir Khosrowshahi, Intel Labs

The field of machine learning continues to evolve at a rapid pace. The past few years have seen remarkable advancements across many areas including computer vision, natural language processing, and reinforcement learning. Progress is driven by the availability of scalable compute, larger corpora of data, and novel algorithmic approaches. We survey various neural accelerator chips which enabled this progress (Google TPU, Cerebras, Nervana, etc.) and outline the envelope of their performance. It is determined by the underlying hardware - digital CMOS multipliers. This limits their energy efficiency (TOPS/Watt) and, coupled with the exponentially growing demand for AI computing in the world, leads to an unsustainable consumption of energy. Promising directions for transformative change to address these challenges are (1) beyond CMOS compute, memory, and interconnects and (2) analog neural network architectures. We then review recent research on the use of various beyond CMOS devices (resistive RAM, phase change memory, floating gate, ferroelectric, spintronic, and photonic devices) for neural networks. Also, we touch upon examples of digital and analog arrays for compute-in-memory. We benchmark their experimentally achieved operating time and energy vs. theoretical projections. We project the options likely to achieve significant improvement in energy efficiency.

Re-Engineering Computing with Neuro-Inspired Learning: Algorithms, Hardware Architecture, and Devices, Kaushik Roy, Purdue University

Advances in machine learning, notably deep learning, have led to computers matching or surpassing human performance in several cognitive tasks including vision, speech and natural language processing. However, implementation of such neural algorithms in conventional "von-Neumann" architectures are several orders of magnitude more area and power expensive than the biological brain. Hence, we need fundamentally new approaches to sustain exponential growth in performance at high energy-efficiency beyond the end of the CMOS roadmap in the era of ‘data deluge’ and emergent data-centric applications. Exploring the new paradigm of computing necessitates a multi-disciplinary approach: exploration of new learning algorithms inspired from neuroscientific principles, developing network architectures best suited for such algorithms, new hardware techniques to achieve orders of improvement in energy consumption, and nanoscale devices that can closely mimic the neuronal and synaptic operations of the brain leading to a better match between the hardware substrate and the model of computation. In this short course presentation, I will focus on recent work on neuromorphic computing with spike-based learning and the design of underlying hardware that can lead to quantum improvements in energy efficiency with good accuracy.

Quantum Computing with Superconducting Circuits, Markus Brink, IBM

Quantum computing has made tremendous progress in recent years, which led to wider interest in the field. Superconducting quantum circuits have emerged as a prime contender for implementing quantum processors, with the goal of realizing universal quantum computing. Quantum processors have scaled significantly in size, as measured by the number of quantum bits (qubits) connected on a chip, with devices incorporating more than 50 qubits available today. Likewise, the quality of qubits and quantum processors has also increased steadily, as measured, for example, by the quantum volume. Despite these advances, fault-tolerant quantum computing is still some time away, due to the significant hardware overhead and performance requirements for error-correction codes. But early quantum applications and demonstrations can already be implemented on near-term quantum systems.

Digital Annealing Technology for Solving Combinatorial Optimization Problems, Koichi Kanda, Fujitsu

Demand for continued computer performance growth even after the end of Moore’s Law has led to various domain‐specific hardware approaches. Fujitsu’s Digital Annealer Unit (DAU), whose concept was first published in 2016, is an ASIC for solving large‐scale combinatorial optimization problems, where the objective function to minimize is formulated as an Ising model. The DAU achieves full connectivity among 8k variables and performs Markov Chain Monte Carlo searches in a multidimensional binary space. In this presentation, the DA algorithm and architecture are explained with an emphasis on techniques to accelerate the process of finding the optimal solution inside the hardware, such as parallel tempering. Techniques for applying the digital annealing concept to variable spaces representing permutations and assignments will be also presented. Applications of DAU and effectiveness of those techniques will be shown along with benchmark results of various problems.