Short Course 2

* If you encounter menus do not work upon clicking, delete your browser's cache.

Circuits and Systems for AI and Computing

Organizers: Koji Nii, TSMC Design Technology Japan, Inc
Organizers: Yan Li, Western Digital Corp.

Chairpersons: Tomohiro Nezuka, MIRISE Technologies Corp.
Chairpersons: Vanessa Chen, Carnegie Mellon Univ.

Date & Time: Monday, June9, 8:25A.M.-5:25P.M.

8:25 Opening
8:30 Hardware Accelerator Design for Generative AI (Tentative), Leland Chang, IBM

Abstract:
The advent of large language models and generative AI has ushered enormous demand for hardware accelerators to perform AI training, fine-tuning, and inference. The design of such accelerators depends on holistic optimization of technology, circuits, and systems, but also fundamentally upon the models and use cases that this hardware needs to serve. Achieving the proper balance of compute vs. communication to optimize latency and throughput in AI workloads will require tradeoffs across the hardware/software stack to reconcile the long development cycles needed to build chips and systems with the torrid pace of innovation in AI models and algorithms. This talk will provide an overview of the landscape for AI hardware accelerators and discuss research roadmaps to improve both compute efficiency and communication bandwidth, particularly as Generative AI evolves towards Agentic AI and smaller, fit-for-purpose models.

9:20 Architecture Trends for AI Hardware Platforms, Norman James, AMD

Abstract:
AI has received a large amount of press in recent years and the underlying AI hardware is an important part of the popularity. New AI models are continually released and push the limits of the hardware’s capability. Thousands of software developers have the sole purpose of extracting as much performance from the hardware as possible so the AI platforms can do more at lower cost. Bandwidth per dollar and picojoules per bit are key metrics. To optimize these metrics, GPU’s are ideally connected with passive copper which has the side effect of more dense packaging at all levels so the GPU’s are in close proximity. The high performance computing (HPC) segment has faced these issues but there are differences from the AI market segment. The hope of lower power and lower cost optics could change the density paradigms in the future. This presentation covers these trends leading to higher density, many of the complications, and gives a glimpse into the future of AI platforms.

10:10 Break
10:25 Modular chiplet approaches for scalable and efficient machine learning, Zhengya Zhang, University of Michigan

Abstract:
Machine learning models are rapidly increasing in size and complexity, surpassing the pace of accelerator chip upgrades. The development of monolithic chips to match these evolving models is both expensive and challenging. Alternatively, modular chiplets can be designed and reused to create multi-chip packages (MCPs) capable of addressing diverse models and tasks. The future success of chiplet technology hinges on advancements in chiplets that offer high utilization and flexibility, efficient high-bandwidth die-to-die interfaces, and high-density packaging. In this presentation, I will introduce two MCPs resulting from our collaboration with Intel and the Institute of Microelectronics in Singapore. The first MCP, Arvon, utilizes Embedded Multi-die Interconnect Bridge (EMIB) to integrate an FPGA chiplet and two DSP chiplets. As a programmable MCP, Arvon can adapt to evolving workloads over time. The second MCP, NetFlex, integrates four neural network chiplets using high-density fan-out wafer level packaging (HD-FOWLP). NetFlex’s streamlined architecture enables scalability for larger configurations.

11:25 EDA for AI, AI enhanced EDA (Tentative), Igor Markov, Synopsys

Abstract:
TBA

12:05 Lunch
12:55 Connectivity Technologies to Accelerate AI, Tony Chan Carusone, University of Toronto Alphawave Semi

Abstract:
The rapid scaling of AI is reshaping large-scale computing and communication hardware, driving new demands for wireline connectivity. Chiplet-based architectures are emerging as a key enabler, integrating logic, memory, and connectivity to reduce the cost and time-to-market of custom AI hardware optimized for specific workloads. These architectures depend on high-density die-to-die interfaces, which are evolving rapidly. At the same time, the increasing compute density within a package is accelerating demand for high-speed off-package connectivity. Scaling AI clusters at the datacentre and inter-datacentre level requires new organizational paradigms, with optical data transmission playing an expanding role in meeting these challenges.

13:45 3D Optical Engine Design Challenges and Opportunities, Frank Lee, TSMC

Abstract:
With the recent rapid advances in AI large language models, the demand for high-speed data communication links has increased dramatically. Conventional copper-based data links are reaching their limits. Optical interconnects, due to their high bandwidth and lower power consumption, have emerged as a promising solution for next-generation data links.
In this short course, we will introduce the operating principles of basic silicon photonic devices. We will then discuss design considerations for photonic transmitter modulators and receiver filters, along with their associated electronic driver and amplifier designs. High-speed electrical circuits, optical circuits and 3D packaging co-design will be illustrated as well.

14:35 Break
14:50 HBM for AI computing, Jinhyung Lee, SK Hynix

Abstract:
High bandwidth memory (HBM) has become a key enabler for AI computing, offering high bandwidth and low power consumption with small form factor. Its 3D-stacked architecture with a wide memory interface significantly enhances performance for AI workloads. However, to eliminate risks that may occur in the 3D stack structure, a lot of design techniques are involved such as DFT/BIST/Redundancy and power delivery for HBM.
In particular, HBM requires a lot of consideration in the design process because the environment in which it is tested and it is actually used are different. These design topics for HBM are going to be discussed.

15:40 Semiconductor Storage for Further Evolution of Generative AI, Jun Deguchi, KIOXIA Corp.

Abstract:
In accordance with Moore's Law, the advancement of semiconductors has been a driving force behind the evolution of AI, including the current boom in generative AI. The size of mainstream generative AI models has significantly increased to enhance their reasoning, interpretative capabilities, and memory capacity. This has subsequently led to a rising demand for semiconductor devices such as CPU, GPU, and DRAM to efficiently operate these massive AI models. However, if AI models continue to grow at this rate, the costs and power consumption required for training and inference will escalate dramatically. This indicates the need for a shift towards a different trajectory for the evolution of AI.
Under these circumstances, the primary role of semiconductor storage devices such as flash memory and SSDs has been to store vast amounts of data necessary for AI model training. However, they have not directly contributed to resolving the aforementioned challenges. To address these issues, even before the generative AI boom, our company has proposed the concept of "Memory-Centric AI," which separates and advances the memory and reasoning/interpretative functions of AI. By appropriately allocating the required semiconductors to each function, we aim to overcome these challenges.
In this talk, we will discuss the issues highlighted above, provide an overview of Memory- Centric AI, and introduce the new roles of semiconductor storage within this context. Additionally, we will outline our open-source software technology, "KIOXIA AiSAQ™ (Allin-Storage Approximate Nearest Neighbor Search with Product Quantization)," which we developed to promote the utilization of SSDs in "RAG: Retrieval Augmented Generation,“ one of the implementation forms of Memory-Centric AI, that enhances generative AI’s response accuracy and database scalability. Finally, we will also discuss the future prospects of Memory-Centric AI.

16:30 Advancements in Power Architectures for AI Computing, Ke-Horng Chen, National Yang Ming Chiao Tung University

Abstract:
The ever-increasing demand for efficient and high-quality AI computing power systems has accelerated the transition from centralized power supplies to distributed architectures. To address the challenges posed by high input voltage and large driving currents, a new Intermediate Bus design has gained significant traction. This architecture offers lower cost, superior power quality, and enhanced efficiency while leveraging the latest advancements in power components, particularly GaN HEMT devices. This talk provides a comprehensive overview of the historical evolution of high-reliability power systems, followed by an in-depth exploration of the benefits and design challenges of Intermediate Bus Converters (IBC). A practical example will illustrate the control requirements for an IBC converter utilizing GaN devices. Additionally, the hybrid power system will be introduced as an effective strategy to reduce power distribution costs and enable point-of-load (POL) regulators to be placed directly adjacent to the corresponding load. This approach minimizes supply-plane parasitics and significantly improves high di/dt and dv/dt transient response. With high power density, cost efficiency, and robust driving capability, these solutions are designed to meet the stringent power demands of modern AI computing systems.