

# 2025 Symposium on VLSI Technology and Circuits

# Special Workshop

Sunday, June 8, 13:00-19:00

Centennial Anniversary of FET Invention: Past, Present, and Future (FET 100)

Organizers: B. Zhao, IEEE EDS

T. Kimoto, JSAP K. Endo, Tohoku Univ. Y. Hayashi, AIST H. Iwai, NYCU/Sci. Tokyo

# **Regular Workshops**

Sunday, June 8, 13:00-15:00

Workshop 1: Innovating Semiconductor Manufacturing: Science Meets Al

Organizers: S. Myung, Samsung Electronics) J. Jeong, Samsung Electronics)

Workshop 2: Advanced Heterogeneous System with 3D Chiplet Integration

Organizer: F. Inoue, YOKOHAMA National Univ.

Sunday, June 8, 15:30-18:00

Workshop 3: Semiconductor Manufacturing with AI

Organizer: M. Tada, Keio Univ.

Workshop 4: Innovations and Challenges in the Advanced Packaging Era

Organizer: A. Raley, Tokyo Electron Ltd

Workshop 5: Architectural Benchmarking of Compute-in-Memory Systems

Organizer: P. Narayanan, IBM Research - Almaden

Workshop 6: Recent Advances in CMOS Cellular and Molecular Biosensors

Organizer: H. Wang, ETH Zurich

Workshop 7: What is Possible with Open Chip Design? The Journey so Far.

Organizer: M. Saligane, Brown Univ.

Sunday, June 8, 19:00-21:30

Workshop 8: Manufacturing Advanced VLSI Systems Using Virtualization and Machine Learning

Organizers: J. Lapidas, Lam Research J. Ervin, Lam Research

Workshop 9: Hybrid Bonding-Breaking Boundaries in Advanced Packaging and Heterogeneous Integration in the Era of Chiplets and Al

Organizer: H. K. Niazi, Intel Corp.

Workshop 10: Advancing Neuromorphic Technology Research and Commercialization: From Sensors to Edge to Cloud

Organizer: X. Iturbe, IKERLAN Research Centre

Workshop 11: Revolutionizing Electronics with GaN: Opportunities and Challenges

Organizer: N. Jain, Globalfoundries, Inc.

Workshop 12: Chip Tapeout Classes: Methodologies, Technologies and Outcomes

Organizer: B. Nikolic, Univ. of California

**Satellite Workshops** 

2025 Silicon Nanoelectronics Workshop

Sunday, June 8 - Monday, June 9

2025 Spintronics Workshop on LSI

Monday, June 9



# **Short Course 1**

# Key VLSI Technologies in the Al Era

Monday, June 9, 8:25-16:35

| Chairpersons:                             | S. Fujii, KIOXIA Corp.<br>A. Veloso, Imec                                                                      |
|-------------------------------------------|----------------------------------------------------------------------------------------------------------------|
| 8:25                                      | Opening                                                                                                        |
| 8:30                                      | CMOS Scaling Exploration: Technology Trends and System-Level Perspectives, L. Yang, TSMC                       |
| 9:20                                      | 2D Materials and Their Application Space - Recent Insights Gained -, J. Appenzeller, Purdue Univ.              |
| 10:10                                     | Break                                                                                                          |
| 10:25                                     | Heterogenous System Partitioning, the 2.5D and 3D Integration Landscape and Roadmap, E. Beyne, imec            |
| 11:25                                     | Etch and Material Innovations for Advanced Logic and Memory Technologies, T. Lill, Lam Research                |
| 12:05                                     | Lunch                                                                                                          |
| 12:55                                     | DRAM History and Challenges, Y. Matsumoto, Micron                                                              |
| 13:45                                     | $\textbf{Current Landscape and Future Outlook of Emerging Memory Technologies,} \ S. \ G. \ Kim, \ SK \ hynix$ |
| 14:35                                     | Break                                                                                                          |
| 14:50                                     | DTCO/STCO for Multi-Objective Optimization from Device to System, HO. Kim, Samsung                             |
| 15:40                                     | Heterogeneous Integration Technologies for 3D Integrated CMOS Image Sensors, Y. Kagawa, Sony                   |
| 16:30                                     | Closing                                                                                                        |
|                                           | Short Course 2                                                                                                 |
| Circuits and Systems for AI and Computing |                                                                                                                |
|                                           | Monday, June 9, 8:25-17:25                                                                                     |
| Organizers:                               | K. Nii, TSMC Design Technology Japan, Inc<br>Y. Li, Western Digital Corp.                                      |
| Chairpersons:                             | T. Nezuka, MIRISE Technologies Corp.<br>V. Chen, Carnegie Mellon Univ.                                         |
| 8:25                                      | Opening                                                                                                        |
| 8:30                                      | Hardware Accelerator Design for Al: Enabling Generative Models, L. Chang, IBM                                  |
| 9:20                                      | Architecture Trends for Al Hardware Platforms, N. James, AMD                                                   |
| 10:10                                     | Break                                                                                                          |
| 10:25                                     | Modular Chiplet Approaches for Scalable and Efficient Machine Learning, Z. Zhang, Univ. of Michigan            |
| 11:25                                     | Al for EDA: Challenges and Opportunities, I. Markov, Synopsys                                                  |
| 12:05                                     | Lunch                                                                                                          |
| 12:55                                     | Connectivity Technologies to Accelerate Al, T. C. Carusone, Univ. of Toronto Alphawave Semi                    |
| 13:45                                     | <b>3D Optical Engine Design Challenges and Opportunities, </b> F. Lee, TSMC                                    |
| 14:35                                     | Break                                                                                                          |
| 14:50                                     | HBM for Al Computing, J. Lee, SK Hynix                                                                         |
| 15:40                                     | $\textbf{Semiconductor Storage for Further Evolution of Generative AI,} \ J. \ Deguchi, \ KIOXIA \ Corp.$      |
| 16:30                                     | Advancements in Power Architectures for Al Computing, KH. Chen, National Yang Ming Chiao Tung Univ.            |



# Opening and Plenary Session [Shunju I+II+III]

Tuesday, June 10, 8:00-10:00

08:00-

**Opening Remarks** 

08:40-

**Plenary Session 1** 

#### PL1-1 - 08:40 (Plenary)

Driving Innovation in DRAM Technology-Towards a Sustainable Future, S. Y. Cha, SK hynix

Since the introduction of the 6F2 Buried Gate Scheme in the early 2010s, DRAM technology has evolved based on platforms that can be continuously scaled down to 10nm technology. Beyond 10nm, however, the evolution of DRAM technology has reached an inflection point where it is difficult to build scalable platforms using existing cell schemes and meet the high performance demands of the AI era. In response to the inflection point, this presentation will review how cell schemes will change to ensure a scalable platform and describe how DRAM technology can innovate in a way that delivers new values in the era of AI.

# PL1-2 - 09:20 (Plenary)

Innovate VLSI for Al Growth, J. Y. Chen,

NVLDIA Duilt on VLSI by the amazing Moore's law which is ended, but we need VLSI more than ever in AI era. So, what's next? It's innovation, innovate from materials, devices, modules to systems. This speech presents the progress of VLSI from the past decade and highlights the most complicated VLSI chip today. Innovation is easier said than done. What are the criteria and barriers for success and the leadership needed to cultivate innovation? The speaker's career has lived through the relationship between VLSI and AI, their similarity, synergism and reinforcement that accelerates their thriving growth. With AI taking away routine and complicated tasks, he answers the question of what should young people be doing? While AI is becoming such a powerful tool, the leaders and engineers must help to grow ethics and morality for the mankind.

#### **Circuits Session 1**

#### **CIM and Quantum-inspired Computing**

Tuesday, June 10, 10:30-12:35

Chairpersons: M. Yamaoka, Hitachi, Ltd.

J. Kulkarni, University of Texas, Austin

# C1-1 - 10:30

A 3nm 125 TOPS/W-29 TFLOPS/W, 90 TOPS/mm²-17 TFLOPS/mm² SRAM-Based INT8 and FP16 Digital-CIM Compiler with Multi-Weight Update/Cycle, H. Mori\*, J.-M. Hung\*, W.-C. Zhao\*, K. Khare\*, B. Crafton\*\*, H. Ishii\*\*\*, C.-E. Lee\*, X. Peng\*\*, X. Sun\*\*, Y. Fujino\*\*\*, C.-J. Tsen\*, V. Joshi\*\*\*\*, C.-K. Chuang\*, T. Hashizume\*\*\*, C.-F. Lee\*, T.-L. Chou\*, K. Akarvardar\*\*, S. Adham\*\*\*\*, Y. Wang\*, H. Fujiwara\*, Y.-D. Chih\*, Y.-H. Chen\*, H.-J. Liao\* and T.-Y. J. Chang\*, \*TSMC, Taiwan, \*\*TSMC, USA, \*\*\*TSMC, Japan and \*\*\*\*TSMC, Canada

This paper presents a DCIM compiler supporting INT8 and FP16 formats, offering configuration flexibility, high accuracy, and high area/power efficiency. Our 3nm test chip is fully validated and exhibits 124.6 TOPS/W at 0.5V and 90.2 TOPS/ mm² at 1.1V for INT8 DCIM, and 28.6 TFLOPS/W at 0.5V and 17.4 TFLOPS/mm² at 1.1V for FP16 DCIM, respectively.

# C1-2 - 10:55

NeuC-CIM: A 1.3pJ/SOP Neuromorphic Charge-Domain Compute-In-Memory Macro For Spiking Neural Network, H. Fu\*, H. Zheng\*, Y. Zhou\*, X. Wen\*\*, Y. Chen\*\*, H. Ren\*, X. Lin\*, Z. Zong\*, L. Wu\*\* and B. Cheng\*, \*The Hong Kong Univ. of Science and Technology (Guangzhou) and \*\*Beihang Univ., China,

This work presents NeuC-CIM, an energy efficient neuromorphic compute-in-memory macro. The proposed multi-bit charge-domain synapse and the event-triggered spiking neuron enables efficient, DC-current free operation. Moreover, the capacity extension module and neuron cyclic calibration scheme are proposed to reduce the nonlinearity and mismatch-induced accuracy loss in analog synapse and neuron circuits. The effectiveness of NeuC-CIM is verified in silicon with various neuromorphic tasks, showing a state-of-the-art 1.3pJ/SOP energy efficiency with comparable accuracy.



#### C1-3 - 11:20

LLM-CIM: A 28nm 126.7TOPS/W Input-LUT-Based Digital CIM Macro with Reconfigurable Matrix Multiplication and Nonlinear Operation Modes for LLMs, Y. Wang\*, Z. He\*, Z. Wu\*, R. Guo\*, L. Yan\*\*, H. Han\*, Y. Wang\*, S. Wei\*, Y. Hu\*, F. Tu\*\* and S. Yin\*, \*Tsinghua Univ. and \*\*The Hong Kong Univ. of Science and Technology, China,

This paper presents a Digital Computing-in-Memory (DCIM) macro tailored for Large Language Model (LLM) acceleration, named LLM-CIM. It has three key features: 1) A Maximum Reuse-Oriented Input-LUT (MRIL) CIM reduces compute logic area and power by 40.7% and 37.9%. 2) An Outlier Aggregation-based Reordering Unit (OARU) saves 1.68~1.72x CIM computation time. 3) A Normalized-Domain Computation Converter (NDCC) improves resource utilization to 95.6~98.7%. LLM-CIM achieves 126.7TOPS/W energy efficiency on LLaMA2-7B, with up to 5.36x energy saving over SOTA CIMs.

#### C1-4 - 11:45

An 8K-Spin Ising Machine IC with Reconfigurable Many-Body Spin Interactions and Limitless Multichip Extension, J. Kim and J.-Y. Sim, POSTECH, Korea,

This work presents an Ising machine chip featuring 8K spins and up to 31 reconfigurable two- and three-body interactions for each spin. The proposed architecture supports building a larger system with limitless 1-dimensional extension, where multiple chips are connected in a chain by a serial interface while each chip is processing in parallel. The proposed Ising machine, implemented in 28nm CMOS, is verified in a system including four chips. Tested on 3SAT problems with 32,760 variables, the system achieves the largest problem capacity at both the chip level and the system level.

#### C1-5 - 12:10

m-Zephyr: A Digital In-Memory Ising Chip with 240 Spins Featuring Enhanced Connectivity Based on a Modified 3D Zephyr Topology, Y. Wu\*, J. Bae\*, C. Shim\* and B. Kim\*\*, \*Univ. of California, Santa Barbara, USA and \*\*KAIST, Korea,

We present a digital in-memory Ising computing chip with a modified 3D Zephyr topology implemented using a 65nm CMOS technology. This design features cross-dimensional connectivity with 24 interactions per spin for the efficient mapping of complex combinatorial optimization problems to the Ising hardware. The proposed Ising chip with enhanced spin connectivity outperforms prior works in terms of reconfigurability, allowing the Ising chip to solve more difficult problems with complicated topologies and higher dimensions, but minimizing hardware mapping overhead. 24\* 7-bit precision coefficients are stored in memory and used to compute spin interactions, minimizing energy consumption and computing latency while ensuring accuracy with fully digital and high precision computing. Measured results show >100\* speedup than the baseline algorithm while finding better optimal solutions with lower Mean Ising Hamiltonian values.

#### Circuits Session 2

#### RF/mm-wave Tx and Rx

Tuesday, June 10, 10:30-12:35

Chairpersons: W. Deng, Tsinghua University

A. Shabra, MediaTek

#### C2-1 - 10:30

A Ka-Band 8-Stream Phased-Array Receiver with Time-Hopping Blocker Rejection for 6G Applications, M. Tang, Y. Zhang, D. Xu, Y. Liu, Z. Li, Y. Chen, M. Fan, Z. Ren, J. Pang, D. Xu, C. Liu, Y. Zhang, H. Sakai, K. Kunihiro, A. Shirane and K. Okada, Institute of Science Tokyo, Japan,

This work proposes an area-efficient Ka-band non-uniform time-hopping (TH) phased array receiver that supports up to eight streams using only eight RF elements. By reusing RF signal paths for different streams through fast stream switching, the design significantly reduces chip area. Additionally, the proposed non-uniform TH technique mitigates blocker issues caused by switching harmonics through dithering. A prototype fabricated in 65nm CMOS demonstrates 4x4 MIMO signal reception in OTA measurements for both horizontal and vertical polarization, achieving a total data rate of 38.4 Gbps with a 42.9 dB improvement in blocker tolerance.

# C2-2 - 10:55

An IEEE802.15.4ab/a/z Compatible IR-UWB 2TRX with Dual-Antenna Full-Duplex 1x3 SIMO Radar Sensing and Aliasing Suppressing Semi-Synchronous TX, A. N. Bhat, E. Allebes, E. Bechthum, Z. Xu, J. van Den Heuvel, P. van Zeijl, P. Mateman, S. van der Ven, M. Eskiyerli, S. Traferro, M. El Soussi, A. Farsaei, E. Tiurin, M. Zhou, S. Gamage, A. Sheikh, P. Boer, I. Petrov, J. Govers, H. Visser, M. Hijdra, S. P. Shantharam, I. Marques, G. K. Ramachandra, Y. Baykal, K. Ding, R. Li, A. Boora, P. Vis, M. Konijnenburg, A. Breeschoten, Y.-H. Liu and P. Zhang, imec, Netherlands,

This work presents an 802.15.4ab/a/z compatible IR-UWB 2TRX highlighting a full-duplex-based radar and a semi-synchronous TX. The matching and isolation techniques proposed in the electrical-balance-duplexer (EBD)-based fullduplex RF front-end (FE) enable state-of-the-art (SoTA) short-distance radar performance at 1.7x smaller form factor. An asynchronous first-order-hold (FoH) switched-capacitor power amplifier (SCPA) with a delay-locked-loop (DLL)-based calibration is proposed in the TX. This pushes the aliasing sidelobes to 5dB lower compared to the SoTA.



#### C2-3 - 11:20

A Triple-Band Transceiver for Formation Flying Satellite Communication with Dual Circular Polarized Wireless Power and LO Transfer, K. Yuasa, S. Date, S. Kato, M. Ide, T. Tomura, K. Okada and A. Shirane, Institute of Science Tokyo, Japan,

In this paper, we propose a triple-band transceiver that integrates wireless power transmission (WPT) and satellite communications for low earth orbit (LEO) satellite formation flights, utilizing 24 GHz for LO signaling and dual-polarization WPT, eliminating the internal PLL and solar cells thus significantly reducing power consumption and mass. 65 nm CMOS flip chip mounted ICs in a 4 x 8 phased array evaluation module demonstrated 3.8mW generated power (2x improvement over single polarization) with dual polarization. In satellite communication relay operation, simultaneous 19/29 GHz communications were achieved with 16 APSK modulation and seamless 5 GHz IF conversion. These innovations solve the energy bottleneck and relay operation and establish the foundation for satellite networks in next-generation satellite communications.

#### C2-4 - 11:45

A Digital Envelope Tracking RF Power Amplifier Achieving 400MHz Channel Bandwidth and 91.9% Efficiency for Upper-Mid Band Extreme Massive MIMO 6G Communications, J.-H. Bae\*, S.-J. Youn\*, J.-Y. Myung\*, M.-J. Kim\*, J. Kim\*, J. Lee\*, D. Kang\*, N. Sharma\*\*, S. Dong\*\*, W.-S. Choi\*\* and J.-S. Paek\*, \*Pusan National Univ., Korea and \*\*Samsung Research America, USA,

This paper presents a digital envelope tracking supply modulator for upper-mid band extreme massive MIMO 6G communications. The designed series-switches DET IC consists of Buck, SCVD and level selection switches. Cell sliced switches and a power mode selector were applied to improve power back off efficiency. The designed DET IC achieved 91.9% of the peak power efficiency. The DET PA saved larger than 100mW of DC power consumption compared to 5V-fixed-voltage PA for 200MHz RF signal, and it achieved 2.81% EVM and -36.8dBc ACLR at the output power of 17dBm with NR 400MHz.

#### C2-5 - 12:10

Sub-µW Battery- and Crystal-Free Tag featuring 802.11ba/b-Compliant Wake-up Receiver, Backscattered Transmitter and 3D Localization, M. Privitera\*. \*\*, Y. Ruiyuan\*, K. A. Ali\*, A. Ballo\*. \*\*, A. D. Grasso\*\* and M. B. Alioto\*, \*National Univ. of Singapore and \*\*Univ. of Catania, Italy,

A sub-µW tag is introduced for battery-less operation, reusing 2-tone incident wave for 1) RF harvesting, 2) IM2 extraction of WiFi-compliant clock, 3) 802.11ba wake-up receiver, 4) WiFi 802.11b backscattered transmitter, 5) 3D localization via signal strength. Smart label demonstration for fulfillment centers shows cm-range accuracy in 180 nm.

#### **Circuits Session 3**

#### **Energy Harvesting**

Tuesday, June 10, 10:30-12:10

Chairpersons: S. Nagai, Panasonic Industry Co., Ltd. I. Lee, University of Pittsburgh

#### C3-1 - 10:30

A Fully Integrated SC Converter Hybridizing Dickson and Continuously-Scalable- Conversion-Ratio Topologies with a Wide Bipolar VCR range for Energy Harvesting, A. Guo, W. Peng, Y. Yang, X. Hu, D. Muratore and S. Du, Delft Univ. of Technology, Netherlands

This paper proposes a dual-polarity 4-rail continuously scalable conversion-ratio (CSCR) converter for thermoelectric energy harvesting (TEH). This is the first reported switched-capacitor converter achieving dual-polarity operation with an extended voltage conversion ratio (VCR) range, thanks to the novel 4-rail CSCR topology and the reconfigurable hybrid Dickson-CSCR architecture. The TEH interface system is fabricated with a 180-nm BCD process, achieving above 75% efficiency for VCR ranging from -6.48 to -2.55 and from 2.55 to 6.67 with a 1.2 V regulated output.

#### C3-2 - 10:55

A Reconfigurable Multi-Level AC-DC/DC-DC Ocean Energy Harvester IC Achieving 77.7% End-to-End Power Efficiency for Triboelectric Nanogenerators, W. Jung\*.\*\*, H.-M. Lee\* and H.-P. Le\*\*, \*Korea Univ., Korea and \*\*Univ. of California, San Diego, USA

The proposed integrated ocean energy harvester IC combines a multi-level bias-flip (BF) topology for improved BF efficiency with a high-voltage multi-level full-bridge active rectifier, providing reconfigurability between AC-DC rectification and DC-DC conversion with a single inductor while reducing power stage losses. The proposed reconfigurable multi-level (RML) AC-DC/DC-DC energy harvester IC achieves a peak end-to-end (E2E) power efficiency of 77.7%, a wide  $P_{EXT}$  (~2mW), and an outstanding FoM of 6.63x while supporting input frequency range of 1-50Hz.

#### C3-3 - 11:20

A Globally Optimized 3-D MPPT System for Dual-Band RF Energy Harvesting with Collaborative Source Reconfiguration, X. Li\*, S. Du\*\*, X. Zeng\* and Z. Chen\*, \*Fudan Univ., China and \*\*Delft Univ. of Technology, Netherlands

This paper presents a globally optimized RF energy harvesting system leveraging the novel concepts of 3-D maximum power point tracking (MPPT) and collaborative source reconfiguration, achieving high MPPT accuracy and wide input power range. The 3-D MPPT is realized by coordinating energy source, optimizing the rectifier output and regulating the rectifier stages. By integrating a multi-level regulating DC-DC converter with a reconfigurable rectifier, the system enables synchronized and collaborative energy harvesting from dual-band RF sources. The fabricated chip demonstrates a peak end-to-end efficiency of 71%, a sensitivity of -24.1 dBm, and the wide high-PCE input power ranges of 15.5 dB and 20.5 dB at 433 MHz and 900 MHz, respectively.



#### C3-4 - 11:45

**2,771% Power Improvement Triple-source Ground-Symmetric Pile-Up Resonant Energy Harvester,** J.-m. Cho\*, H.-J. Choi\*, H.-J. Park\*, J. Han\*, K.-S. Yun\*, J.-P. Im\*\* and S.-W. Hong\*, \*Sogang Univ. and \*\*ETRI (Electronics and Telecommunications Research Institute), Korea

This paper presents a triple-source ground-symmetric pile-up resonant (TGPR) harvester which extracts energy from a piezoelectric transducer (PZT), a photovoltaic (PV) cell, and a thermoelectric generator (TEG). While the circuit cannot exceed a maximum allowable voltage of the transistor (Vmax), the voltage amplitude across the PZT can reach up to 2Vmax owing to the ground-symmetric structure. As a result, the TGPR harvester generates 887 µW of power from the PZT, achieving a 2,771% improvement in power extraction.

#### **Technology Session 1**

#### **Technology Highlights 1**

Tuesday, June 10, 10:30-12:35

Chairpersons: M. Kobayashi, The University of Tokyo

Y. Liang, nVidia

#### T1-1 - 10:30

Intel 18A Platform Technology Featuring RibbonFET (GAA) and PowerVia for Advanced High-Performance Computing, K. J. Fischer, B. Sell, R. Aggarwal, P. Amin, S. Anand, M. Asoro, C. Atay, C. Auth, M. Aykol, A. Badmaev, F. Abboud, B. Bains, K. Beckwitt, J. Bell, M. M. Bellah, D. Bhawe, J. Birdsall, W. Blanton, J. Bondi, A. Bowonder, J. Brooks, M. Buehler, S. Burgess, K. Byon, D. Casillas, S. Chakraborty, T.-t. Chan, S. Charue-bakker, A. Chatterjee, S. Chin, D. Collins, D. Crum, A. Danon, N. Das, R. Das, A. Dhar, R. Ehlert, P. Elfick, G. Freeman, K. Ganesan, T. Ghani, R. Ghostine, O. Golonzka, B. Greene, H. Greve, W. Grimm, J. Guerrero, B. Guha, Z. Guo, P. Gupta, W. Hafez, C. Han, D. Harris, M. Hattendorf, S. Havelia, J. Hicks, I.-c. Ho, M. Horsch, W. Hsu, L. Hu, S. Jaloviar, A. Joushaghani, V. Kamysbayev, E. Karl, S. Kelgeri, A. Kelleher, A. Kim, S. Kirby, S. Klopcic, J. Knudsen, S. Kobaku, M. Kobrinsky, H. Kothari, D. Lawrence, J. Leib, C.-H. Lin, C.-y. Lin, C. Lin, G. Malyavanatham, S. Mandal, S. Mandal, S. Mani, B. Marinkovic, A. Matsubayashi, E. Mckenna, R. Mehandru, J. Mehta, N. Mehta, S. Meister, V. Mishra and K. Mistry, Intel Corp., USA,

An advanced Intel 18A technology featuring RibbonFET and PowerVia provides over 30% density scaling and a full node of performance improvement compared to Intel 3. Intel 18A achieved 18%/25% performance (0.75V/1.1V) at iso-power over Intel 3 through an industry-first combination and optimization of advanced interconnects, a Gate-All-Around (GAA) transistor architecture, backside power, and design co-optimization. Intel 18A offers high-performance (HP) and high-density (HD) libraries with full-featured technology design capabilities and enhanced design ease of use.

# T1-2 - 10:55

A Back-illuminated 10 μm-pitch SPAD Depth Sensor with 42.5% PDE at 940 nm using an Optimized Doping Design, S. Shimada\*, Y. Otake\*, J. Suzuki\*, A. Magori\*, K. Kurata\*, T. Matsui\*, R. Tsuchida\*, M. Okazaki\*, K. Yokochi\*, T. Iwase\*, H. Takase\*, F. Koga\*, J. Ogi\*, H. Maeda\*\*, K. Moriyama\*\*, H. Honda\*\*, K. Fujisawa\*, T. Miura\*, H. Koketsu\* and T. Wakano\*, \*Sony Semiconductor Solutions Corp. and \*\*Sony Semiconductor Manufacturing Corp., Japan,

We present a novel single-photon avalanche diode (SPAD) pixel array designed for accurate distance measurement with a high photon detection efficiency (PDE). This study was conducted with a SPAD depth sensor of 10 [u]m-pitch with the back-illuminated structure on a 300 mm CMOS platform. To enhance PDE, we optimized the multiplication region design to increase the triggering probability for the Geiger mode, achieving a 92.9% with 3 V excess bias. Additionally, by introducing new doping, enabling a more efficient charge collection, we have achieved a 98.8% charge collection efficiency. Consequently, in the Si-type SPAD technology, we achieve the world's highest PDE of 42.5% at 940 nm.

#### T1-3 - 11:20

Demonstration of Tungsten-Doped Indium Oxide MOSFETs with 3 Angstrom EOT, Improved Stability and High On-Current, H. Park\*, S. G. Kirtania\*, E. Sarkar\*, D. Chakraborty\*, C. Zhang\*, H. J. Lee\*, J. Shin\*, S. Yu\*, A. Khan\*, H. Kim\*\*, C. Im\*\*, M. J. Hong\*\*, D. Ha\*\* and S. Datta\*, \*Georgia Institute of Technology, USA and \*\*Samsung Electronics Co., Ltd., Korea,

This study demonstrates improved performance and reliability in back-end-of-line (BEOL) compatible amorphous oxide semiconductor (AOS) MOSFETs. Employing HfO<sub>2</sub>-ZrO<sub>2</sub>-HfO<sub>2</sub> (HZH) laminated gate dielectric, we achieved a high on-state current ( $I_{ON}$ ) of 244  $\mu$ A/um for devices with a 50 nm channel length, operating at  $V_{OV}$  = 1.5V and  $V_{DS}$  = 1V. Crucially, these devices exhibited exceptional bias stress stability, with a minimal threshold voltage shift of only 10 mV under high stress of  $V_{stress}$  1V. The HZH gate stack enhanced charge confinement, minimized subgap density of states, and suppressed both positive and negative bias instabilities. This work establishes a new milestone for EOT scaling in BEOL-compatible AOS MOSFETs, paving the way for improved charge transport, enhanced stability, and the development of reliable 3D-integrated circuits.

#### T1-4 - 11:45

Performance Step-Up in PMOS with Monolayer WSe<sub>2</sub> Channel, A.-S. Chou\*, Y.-T. Lin\*, \*\*, M.-Y. Li\*, P.-S. Mao\*, \*\*\*, C.-H. Hsu\*, \*\*, S.-A. Chou\*, Y.-W. Hsu\*\*, S. Ku\*, G. Arutchelvan\*, W.-S. Yun\*, C.-F. Hsu\*, W.-C. Wu\*, T. Y. T. Hung\*, M.-Z. Li\*, D. Heh\*, P.-H. Ho\*, Y.-Z. Chiu\*\*, G. Vellianitis\*, M. van Dal\*, W.-H. Chang\*\*\*, C.-I. Wu\*\*, C.-C. Cheng\*, I. Radu\* and M. Cao\*, \*TSMC, \*\*National Taiwan Univ. and \*\*\*National Yang Ming Chiao Tung Univ., Taiwan

This study makes monolayer (1L) tungsten diselenide (WSe<sub>2</sub>) more competitive as a scaled p-channel candidate. We address performance constraints related to channel, dielectric, and contact while reducing device-to-device variability. Back-gated PMOS with 1L-WSe<sub>2</sub> channel and effective oxide thickness (EOT) of 1.2 nm reaches ON-current of 400 mA/mm at  $V^{DS}$  of -1 V, with subthreshold swing of 72 mV/decade, ON/OFF ratio of 7 orders, nearly hysteresis-free, while operating in enhancement-mode. Furthermore, passivation allowing for air stability is implemented without performance degradation. This is achieved without the use of materials that are not compatible with wet processing. We identify contact resistance and specifically the need for gate-voltage-independent contact resistance as the main next step for improving device performance.



#### T1-5 - 12:10

Highly Scalable and Reliable Cell Characteristics for 1Tb 9th Generation 3D-NAND Flash Memory, R. Lee, S. Park, Y. Seo, J. Kim, J. Jung, J. Yeo, Y. Ahn, H. Joon Kim, Y. Kim, J. Lee, D. Lee, J. Park, W. Kim, B. Lee, K. Noh, S. Hong and S. H. Hur, Samsung Electronics Co., Ltd., Korea

A pioneering 9th-generation 3D-NAND flash memory has been successfully developed, offering highly reliable cell characteristics and achieving the smallest unit cell volume in the industry. To achieve higher bit density, the cell area has been aggressively scaled down in both vertical and lateral dimensions, resulting in a remarkable 50% improvement in bit density compared to the 8th generation. Cell characteristic degradations and reliability concerns due to the extreme scaling were effectively mitigated using state-of the-art ONO material engineering and process innovations. This work establishes a new benchmark for scalable and reliable 3DNAND flash memory solutions.

#### **Circuits Session 4**

#### **SRAM and Mask ROM**

Tuesday, June 10, 14:00-15:40

Chairpersons: C. Shiah, Etron Technology, Inc.

M. Sinangil, Nvidia

#### C4-1 - 14:00

A 3nm FinFET 563kbit 35.5Mbit/mm² Dual-Rail SRAM with 3.89pJ/access High Energy Efficient and 27.5µW/Mbit 1-cycle Latency Low-Leakage Mode, Y. Aoyagi\*, M. Yabuuchi\*, T. Tanaka\*, K. Mizutani\*, M. Hamada\*, Y Ishii\*, K. Nii\*, M. Zhou\*\*, J. C. Liu\*\*, C.-Y. Huang\*\*, K.-H. Peng\*\*, Y.-H. Hsu\*\*, I. Wang\*\*, H.-C. Cheng\*\*, H.-J. Liao\*\* and T.-Y. J. Chang\*\*, \*TSMC Design Technology Japan inc., Japan and \*\*TSMC, Taiwan

This paper presents a Single-port high-density (HD) 6T SRAM for mobile application. The macro uses eXtended Dual Rail (XDR) architecture and two proposed techniques. The Delaying-Write-WL (DeWL) technique resolves the problem of contention between the cell and the write-driver (WDRV). The 1-cycle latency low-leakage mode (1-CLM) enables to turn off all BL pre-chargers during no operation (NOP). À 3-nm FinFET test chip can achieve the reduction of 17% in active energy and 10% in stand-by leakage by these techniques.

#### C4-2 - 14:25

A 6+ GHz 128KB Multi-Port L1 Cache using Ground Rule Clean 10T Bitcells in 5nm Technology, R. Joshi\*, J. Davis\*, G. Fredeman\*, B. Yavoich\*, U. Srinivasan\*, R. Hayes\*, A. Pelella\*, Z. Chen\*, P. Bunce\*, D. Lee\*, I. Cervantes\*, G. Tverskoy\*, D. Leu\*, J. Pille\*\*, B. Huott\* and S. Kim\*, \*IBM Corp., USA and \*\*IBM Corp., Germany

- A 6+ GHz multi-port 10T Ground Rule Clean (GRC) compact Cache is implemented in the recently announced IBM Telum II processor [1]. It features a Multi port design (2 Read and 1 Write) with fine grain banked architecture minimizing read and write collisions. The design is functional across various corner conditions without read and write assist circuits.

#### C4-3 - 14:50

A 13.8% Speed-Enhanced 1T Mask ROM by Algorithmically Signed Program Data on 3-nm Fin-FET Logic CMOS, R. Watanabe\*, H. Kojima\*, K. Uno\*, H. Otsubo\*, T. Chu\*\*, H.-C. Cheng\*\*, K. Nii\*, Y. Wang\*\* and T.-Y. J. Chang\*\*, \*TSMC Design Technology Japan, Japan and \*\*TSMC, Taiwan

We have proposed an algorithmically signed mask ROM, where a sign column is attached to an original ROM bitcell array. By selecting appropriate sign bits for given data, on bit ratio to be programed for each column can be constrained within a narrow range, which profoundly mitigates tradeoff between read access time and read margin. SPICE simulations and silicon measurement for a 1.8 Mbit signed ROM on 3nm FinFET logic CMOS demonstrate improvement in read access time by 12.7 percent and 13.8 percent, respectively.

## C4-4 - 15:15

A 37.8Mb/mm² SRAM in Intel 18A Technology Featuring a Resistive Supply-Line Write Scheme and Write-Assist with Parallel Boost Injection, H. Pilo\*, M. Arora\*, A. Garg\*, J. Jia\*, C.-A. Lai\*\*, Z. Lo\*\*, K. Chawla\*\*\* and J. Kumar\*\*\*, \*Synopsys, Inc., USA, \*\*Synopsys, Inc., Taiwan and \*\*\*Synopsys, Inc., India

A high-density, SRAM-based Register File (RF) has been demonstrated in Intel 18A Technology featuring RibbonFET GAA transistors and a Back Side Power Delivery Network. The RF is optimized for high density and array efficiency and achieves a density of 37.8Mb/mm², the highest density reported to date for an RF. It is implemented with a conventional Bitline, two-bank memory architecture and it can be used as the SRAM workhorse for most SoC applications with maximum bit-count of 262Kb.



#### **Application-Specific ADCs**

Tuesday, June 10, 14:00-15:40

Chairpersons: H. Ishii, Toshiba Electronic Devices & Storage Corporation

T. Caldwell, Alphawave Semi

#### C5-1 - 14:00

A Calibration-Free 175MHz Bandwidth 60dB SNDR 6<sup>th</sup>-order Bandpass Cascaded Time-Interleaved Noise-Shaping SAR ADC with Optimum Zero Placement, Z. Zhao\*, H.-W. Chen\*, S. Song\*, T. Kang\*\* and M. Flynn\*, \*Univ. of Michigan, USA and \*\*Sungkyunkwan Univ., Korea

A new Cascaded Time-Interleaved (CaTI) Noise-Shaping (NS) SAR ADC combines high-order cascaded NS and new TI bandwidth enhancement techniques to break the bandwidth limitation of NS SAR. The prototype provides 60dB SNDR and 77.5dB SFDR over a 175MHz bandwidth without any calibration, occupies only 0.02mm², and consumes 3.38mW at 1.4GS/s. With an FoM<sub>s</sub> of 167dB, CaTI NS SAR is a robust and more efficient alternative to conventional TI NS SAR. The prototype has the highest bandwidth of any NS SAR.

#### C5-2 - 14:25

A 4GHz 2b 5th Order Continuous-Time ΔΣ Modulator with ~100.1dBc THD and 122dBFS SFDR in 100MHz BW, S. Javvaji\*, L. Breems\*.\*\*, M. Bolatkale\*.\*\*, S. Bajoria\*\*, R. Rutten\*\*, K. Makinwa\*, \*Delft Univ. of Technology and \*\*NXP Semiconductors N.V., Netherlands

This work presents a continuous-time delta sigma modulator (CTDSM) with a novel 5th order noise transfer function (NTF) that increases in-band SQNR by ~9.5dB compared to conventional NTFs. This allows its sampling frequency to be reduced, enabling the use of a 2b SAR quantizer, which dissipates less power than a conventional 2b flash quantizer and does not require power-hungry digital offset calibration. Furthermore, its 1st integrator is based on a single energy-efficient inverter, whose limited DC gain is absorbed by the rest of the loop filter. While sampling at 4GHz, the resulting CTDSM achieves -100.1dBc THD, 74.5dB SNDR over a 100MHz BW and consumes 49.5mW. This results in 5.6dB higher energy-efficiency than CTDSMs with similar BW and linearity.

#### C5-3 - 14:50

A 76.5-dB Dynamic-Range 8-bit 100-MS/s Variable-Range SAR ADC, R. Takenaka, S. Kano and T. Iizuka, The Univ. of Tokyo, Japan

This work presents an 8-bit 100 MS/s SAR ADC with a 35 dB variable input range. The range tunability is realized by fully passive switched-capacitor-based circuits integrated with CDAC. Programmable capacitive dividers that narrow down the full-scale range of the CDAC and capacitor stacking circuits that amplify the input signal without an active amplifier are used in combination to achieve a wide tunable input range. Fabricated in 65nm CMOS, the proposed ADC consumes up to 0.599 mW at 100 MS/s and achieves 48.7 dB SNDR and 65.6 dB SFDR with Nyquist input. SNDR remains > 40 dB over the input signal amplitude ranges from -35 to 0 dBFS, which leads to a 76.5 dB dynamic range, yielding a Schreier FoM of 185.7 dB.

# C5-4 - 15:15

A 4-Element Baseband Charge-Domain Beamformer Integrated into 9-bit SAR ADC Achieving 32dB Spatial Notch with 52.1mW, Z. Lin and B. Nikolic, Univ. of California, Berkeley, USA

This work presents a 4x4 fully passive charge-domain configurable beamformer integrated into 9-bit SAR ADCs in Intel 16 process, which operates up to 500MHz, achieving over 33dB spatial notch with 52.1mW per RX element. 1-degree resolution, energy-efficient switched-capacitor phase shifters with on-chip calibration ability are implemented using existing circuits in SAR ADC, realizing beamforming with only 15% area and 5% power overhead.

## **Circuits Session 6**

#### **Imagers**

Tuesday, June 10, 14:00-15:40

Chairpersons: T. Takahashi, Sony Semiconductor Solutions Corporation

B. Rae, ST Microelectronics

# C6-1 - 14:00

A 1.22 e-rms Temporal Random Noise, 110 dB High Dynamic Range, 2.988 µm Pixel-Pitch 3-Stacked Digital Pixel Sensor with On-Chip HDR Merger, M.-W. Seo, Y.-S. Choi, S. Lee, M. Ito, D. Bae, Y. Shim, G. Cho, S.-G. Koo, S.-J. Byun, H. Kwon, S. Kim, K. Kim, Y. Lee, B. Kim, J. Jeong, H. Sugihara, S.-H. Han, S. Park, Y Kim, H. Shim, S.-S. Kim, J.-k. Lee, J. Go and J. Song, Samsung Electronics Co., Ltd., Korea,

This paper presents the world's smallest 2.988 µm pixel pitch, 3-megapixel (Mp) 3-stacked digital pixel sensor (DPS) with a low temporal random noise (RN) of 1.22 e-rms and a high dynamic range (HDR) of 110 dB in global-shutter operation mode for various applications. To achieve the low RN and HDR performances, a pixel-parallel analog-to-digital converter (P-ADC) architecture with a narrow noise bandwidth (BW) and split photodiode (PD) structure with the signal chopping operation for LED flicker mitigation (LFM) are utilized for the developed DPS, respectively. Especially, a 3-stacked structure has been adopted for the small form factor of DPS and enhancing the on-chip image signal processing capability.



#### C6-2 - 14:25

A 0.8µm 32Mpixel Always-On CMOS Image Sensor with Windmill-Pattern Edge Extraction and On-Chip DNN, M. Sato, S. Akebono, K. Yasuoka, E. Kato, M. Tsuruta, C. Takano, K. Ota, K. Haraguchi, M. Watanabe, G. Fujii, K. Yamanaka, K. Yasuda, S. Minami, K. Hanzawa, K. Matsuda, A. Kato and Y. Ueno, Sony Semiconductor Solutions Corp., Japan,

This paper presents a CMOS image sensor (CIS) that integrates two operation modes: a high-resolution viewing mode with 0.8 µm 32 Mpixels and a low-power always-on object recognition mode consuming 2.67 mW at 10 fps. The CIS features a unique windmill-pattern analog edge extraction circuit that is resilient to illumination variations. An on-chip DNN processor was implemented alongside a compact algorithm with only 12 KB for coefficients and 48 KB for working memory. The design incorporates separate circuit areas for high-speed viewing and low-power sensing modes, thereby ensuring optimal performance and energy efficiency.

#### C6-3 - 14:50

A Multispectral Vision Sensor with Embedded Convolutional Neural Network Using Programmable Fractional Weights and nMOS-Only PWM Pixels, H. Kim, J. Kim, E. Chen and V. Chen, Carnegie Mellon Univ., USA,

This paper presents a multispectral vision sensor integrating convolutional neural network (CNN) for color-based vision tasks. The sensor utilizes fractional convolution (Conv) weights with a resolution of 0.1 using weighted exposure control for precise adjustments. It features 5b fully connected (FC) weights enabled by an analog multiply-and-accumulate (MAC) unit, optimizing both area and energy efficiency. The proposed nMOS-only PWM pixel improves the fill factor by 30.6% compared to prior works. A 258 x 129 CMOS 180 nm sensor array consumes 137.5 uW at 20 fps per channel with iFoM of 206.6 pJ/pixel\*frame, achieving 90.9 % and 89.6 % accuracy on diverse detection tasks.

# C6-4 - 15:15

A 12-Mpixel Automotive Image Sensor with 137-dB Single-Exposure Dynamic Range and 0.55-electron Read Noise by Oversampling-Based Noise Reduction, M.-S. Keel, K.-M. Kim, H. Seo, S.-Y. Park, M. Lee, D. Kim, J. Jo, J. Woo, Y. Jang, J. Heo, J. Lim, D. Yoo, S. Park, Y. Oh, Y. Jeong, B. Kim, K. Lee, J. Ko, H. Lee and J. Lee, Samsung Electronics Co., Ltd., Korea,

A 12-Mpixel CMOS image sensor is presented, with a single-exposure dynamic range of 137 dB and read noise of 0.55 e- at 85-degree junction temperature for automotive applications. A sub-pixel structure is adopted with a high-density in-pixel capacitor for dynamic range expansion. Optimal oversampling and high analog gain with the proposed readout scheme are employed, to reduce read noise. ADC bit resolutions of the readout modes are reconfigured, to support noise reduction in a given readout time. The chip consumes only 676 mW at 30-fps full readout.

### **Technology Session 2**

#### Oxide Semiconductors 1: Novel Applications and Structures

Tuesday, June 10, 14:00-15:40

Chairpersons: R. Tsuchiya, Hitachi, Ltd.

C. Hinkle, Notre Dame

#### T2-1 - 14:00

Integration of 0.75V V<sub>DD</sub> Oxide-Semiconductor 1T1C Memory with Advanced Logic for An Ultra-Low-Power Low-Latency Cache Solution, K. H. Chiang, Y.-C. Ho, M.-Y. Chuang, C.-Y. Chang, Y.-F. Kao, H.-Y. Yang, C.-H. Huang, Y.-J. Chien, Y.-H. Wu, Y.-C. Liu, H. Noguchi, C.-J. Wu, W.-K. Lee, C.-H. Chou, P. J. Liao, Y.-C. Chiu, Y.-Y. Cheng, Y.-M. Lin, Y. Wang, J. Chang, C.-T. Lin and M. Cao, TSMC, Taiwan

We successfully demonstrated the monolithic integration of a BEOL memory with advanced logic. The memory array is completely embedded in the BEOL, featuring an oxide-semiconductor channel selector and a low-temperature process capacitor. Yield and reliability were verified on a memory test chip operating at  $0.75 \text{V}_{DD}$ , with active energy lower than 0.5 pJ/b and a random cycle time of less than 8ns, achieving retention exceeding 128ms and endurance surpassing 1e14 cycles at 85°C. This advanced logic-compatible BEOL memory technology offers a customizable, ultra-low-power, low-latency cache solution with a higher density than SRAM.

#### T2-2 - 14:25

A 2-Transistor-1-Modulator (2T1M) Electronic-Photonic Hybrid Memory Architecture for Deep Neural Network CIM and Very Large-Scale Transformers, S. Zhao\*.\*\*, Z. Xu\*.\*\*, J. Yang\*.\*\*, Y. Li\*.\*\*, E. Zamburg\*.\*\*, R. Nagarajan\*.\*\*\*, A. V.-Y. Thean\*.\*\*, \*National Univ. of Singapore, \*\*Singapore Next-Generation Hybrid μ-Electronics Center (SHINE) and \*\*\*Marvell Semiconductor, Inc., Singapore

Compute in-memory architecture offers the promise to address the data movement efficiency in data-abundant computing, especially for deep neural networks. However, their array scalability is inevitably limited by IR losses with increasing error accumulation due to the increasing wire resistance as arrays grow in size. In this work, we propose a two-transistor-one-modulator electro-optic memory array with an optical bitline that circumvents the BL IR loss and capacitive loading issue. In each cell, dot-products, performed by FeFET memories operated in sub-threshold region, are summed through energy-efficient phase modulation of an optical signal using an ultra-low-loss compact lithium niobate on insulator photonic modulator. The photonic waveguide BL read-out is achieved through pairs of shared Mach-Zender Interferometers to maximize column layout efficiency. By eliminating IR loss on the BL, we can enable up to 3750kb array size and achieve up to 45% inference accuracy improvement on a large-scale ALBERT transformer model.



#### T2-3 - 14:50

BEOL-Compatible ITO FET with Ultra-Short Channel Length of 5 nm, Y. Wang\*, J. Xie\*, Z. Zhang\*, Z. Zhou\*, D. Zhang\*, C. Sun\*, K. Han\*, \*\*, Z. Zheng\*, W. Shi\*, Y. Kang\*, X. Chen\*, G. Zheng\*, R. Shao\* and X. Gong\*, \*\*\*, \*National Univ. of Singapore, Singapore, \*\*Huazhong Univ. of Science and Technology, China, \*\*\*Institute of Microelectronics, A\*STAR, Singapore

For the first time, a novel drain in-situ shadowing mask (DISM) technique has been proposed for ultra-short channel (USC) transistor fabrication. This technique enables the realization of an ITO FET with the shortest channel length ( $L_{ch}$ ) of 5 nm among all oxide semiconductor (OS) FETs, achieving a record-high maximum transconductance ( $G_{m, max}$ ) of 1104  $\mu$ S/ $\mu$ m among all ITO FETs, and a high maximum drain current ( $I_{D, max}$ ) of 2061  $\mu$ A/ $\mu$ m. The absence of off-state tunneling in such an USC ITO FET has been confirmed via both experimental and simulation results. Furthermore, the 5 nm  $L_{ch}$  device performs a SS as low as 34.5 mV/decade, disclosing a tiny glimpse into the full potential of cryogenic USC ITO FETs.

#### T2-4 - 15:15

A Generalizable Tri-Layer Design Framework for Enhancing OSFET Reliability, G. Liu\*, Q. Kong\*, J. Lu\*, X. Li\*, C. Sun\*, D. Zhang\*, X. Wang\*, G. Liang\*.\*\* and X. Gong\*, \*National Univ. of Singapore, Singapore and \*\*National Yang Ming Chiao Tung Univ., Taiwan

In this work, we propose a Tri-Layer structure design methodology to systematically enhance the reliability performance of oxide semiconductor (OS) FETs, applicable across all OSFET technologies. This approach addresses longstanding reliability challenges of positive and negative bias temperature instability (P/NBTI) comprehensively and simultaneously. The Tri-Layer incorporates a top layer and a passivation layer, each with distinct roles: the top layer functions as a hydrogen (H) reservoir, mitigating H-related degradation and alleviating quantum confinement effects (QCE) through bandgap relaxation, while the passivation layer acts as an H barrier, preventing H diffusion from the ambient environment. The independent tunability of these layers allows for material property optimization, improving reliability without sacrificing electrical performance. To validate this methodology, we fabricated In-Ga-Zn-Sn-O (IGZTO) Tri-Layer FETs, achieving a record-low threshold voltage change per oxide electric field of 0.009 Vcm/MV at high temperature with stress time of 700 s under NBTI condition.

#### Technology / Circuits Joint Focus Session 1

#### JFS1 3D Integration and Photonics

Tuesday, June 10, 14:00-15:40

Chairpersons: H. Yamaguchi, Fujitsu Limited

R. McMullan, AMD

#### JFS1-1 - 14:00 (Invited)

Key Technologies and Performance Aspects for Electrical and Optical Interconnects, W. Kocon, Y. Bian, K. Giewont and T. Letavic, GlobalFoundries Inc., USA

High-Performance Computing (HPC) modules and their ever-increasing data bandwidth require conversion of electrical signals to photons. We present the next generation to Occamy electrical only HPC system to include optical interconnects addressing bandwidth, manufacturing scalability, and power efficiency challenges in future data center systems. As datacenter HPC modules rapidly transition to electrical/optical interconnects, GlobalFoundries has developed a compatible electrical to optical interconnect advanced packaging solution with its partner ecosystem. In this paper, we provide an overview of some of the advances in integration and characterization of electro/optical modules and discuss trends of current and future combined electrical and optical interconnects. Key performance aspects and features include high power low-loss edge couplers, best-in class high-speed micro-ring modulators, Mach-Zehnder modulators, and photodetectors. GlobalFoundries' next generation FotonixTM (45SPCLO) technology was created as a high performance electro-optical interconnect platform with monolithic integration of Photonics and RF class CMOS devices on the same die as integrated V-grooves for fiber attachment solutions and Through Silicon Vias (TSVs) for advanced packaging high speed electrical connections.

# JFS1-2 - 14:25 (Invited)

**3D** Interconnect Technology for Superconducting Quantum Devices, K. Kikuchi, Y. Araga, H. Nakagawa and N. Watanabe, AIST, Japan

Superconducting quantum computers are one of the most promising quantum computing method. In superconducting quantum computers, Josephson junctions (JJs) are one of the key components of quantum bits (qubits). To achieve the higher performance of a quantum computer than a conventional computer (quantum advantage), we must integrate a large number of qubits. On the other hand, qubits are implemented on a single metal layer chip rather than a multi-layer metal chip because quantum circuits require extremely low signal loss. For these reasons, 3D integration is essential for large-scale quantum computers to combine single low-loss qubit chips with large-scale integration. However, superconducting quantum computers require special techniques that avoid destructive plasma and thermal damage for JJs as well as superconductive materials. Therefore, we introduce two 3D interconnect technologies with bump connections using superconducting materials and direct bonding technology using superconducting contact pads, for the large-scale integration of superconducting quantum devices.

# JFS1-3 - 14:50

1.536TB/s/mm2 Bandwidth Scalable Attention Accelerator with 22.5GOPS Throughput High Speed SoftMax for Quantized Transformers in Intel 3, P. Budhkar, Intel Corp., USA,

This work presents a novel hardware accelerator utilizing <3µm pitch 3D Cu-Cu Hybrid bonding interconnect (HBI) technology and specifically designed to efficiently execute Multi Head Attention (MHA) of encoder transformer models, particularly focusing on integer BERT. Transformers have become the dominant architecture for natural language processing (NLP) tasks. We present an accelerator addressing the loss due to low precision model by incorporating specialized hardware optimizations to the quantizer and SoftMax engine. Extremely Wide and variable SRAM/logic Bandwidth for GEMM parallelism including dynamic transpose logic and high speed 2-Pass SoftMax in conjunction offers 22.5GOPS. The Intel 3 prototype with a 3D footprint of 1.2mm2 achieves 25668 Attention/s with no accuracy loss while running IBERT.



#### JFS1-4 - 15:15

A 3D-Integrated 56 Gb/s Silicon Photonic Transceiver with 5nm CMOS Electronics for Optical Compute Interconnects, P. Mishra\*, G. Balamurugan\*, A. Aggarwal\*, D. Lazovsky\*, H. Yasotharan\*\*, H. Eachempatti\*, K. M. Park\*\*, M. Hossain\*\*\*\*, M. Staffaroni\*, P. Thakur\*, P. Winterbottom\*, P. Virk\*, P. K. Goyal\*\*, R. Nagulapalli\*\*\*, R. Yazdani\*, S. Forey\*\*\*, S. Sahni\*, S. Ramachandra\*, S. Vats\*, S. Yu\*, S. Jeon\*, S. Hariwan\* and W. Younis\*, \*Celestial AI, USA, \*\*Celestial AI, Canada, \*\*\*Celestial AI, UK, \*\*\*\*Carleton Univ., Canada,

This work presents a hybrid-integrated silicon photonic (SiPh) transceiver suitable for realizing chiplet-based optical I/O in future AI/ML ASIC packages. The transceiver die stack consist of two ICs: a SiPh IC (PIC) with compact, thermally stable electro-absorption modulators (EAMs), and a 5 nm CMOS electronic IC (EIC) with 56 Gb/s NRZ SerDes and optical interface circuits. A stacked CMOS EAM driver delivers 1.8 Vpp modulation voltage, achieving 3.5 dB extinction ratio. Transmit equalization and inductive peaking help accommodate sub-Nyquist RX analog front-end bandwidth, thereby reducing additive thermal noise from RX electronics. The complete 56 Gb/s NRZ optical link consisting of the fiber-attached 3D SiPh transceiver and an external packaged light engine achieves BER<1e-9 with <-10.7 dBm RX OMA while dissipating 2.8 pJ/bit. The transceiver electronics consumes only 0.09 mm² area, delivering more than 600 Gb/s/mm² areal bandwidth density.

# **Technology Session 3**

#### Modeling and Reliability

Tuesday, June 10, 14:00-15:40

Chairpersons: M. Takenaka, The University of Tokyo

R. Jha, Univ Cincinnati

#### T3-1 - 14:00

Towards Understanding Cryogenic Reliability in FinFETs Under Hot Carrier Stress: New Findings on Ge Migration, and Impacts of Tail States Evolution, Z. Dong\*.\*\*, Z. Wang\*, H. Wang\*, X. Li\*\*, C. Luo\*\*, J. Huang\*\*, L. Li\*\*, Z. Huang\*\*, Z. Sun\*, Y.-Y. Liu\*\*\*, X. Wu\*\* and R. Wang\*, \*Peking Univ., \*\*East China Normal Univ. and \*\*\*Chinese Academy of Sciences, China

In this paper, we present a systematic study of cryogenic reliability in FinFETs under hot carrier stress down to 10K. Our findings reveal that the traditional view of hot carrier degradation (HCD)-induced  $V_{th}$  shift is no longer sufficient to explain device behavior at such low temperatures. For the first time, an additional deta $V_{th}$  under HCD stress has been identified in PMOS, attributed to Ge migration from SiGe S/D into the Si channel. This migration reduces band tail states, as uncovered through advanced physical characterizations (TEM/EDS/EELS) and *ab initio* calculations. These results provide strong physical evidence linking Ge migration to  $V_{th}$  shift at cryogenic temperatures, highlighting the need for cryogenic reliability models to incorporate atomic-scale Ge migration effects.

# T3-2 - 14:25

High Resolution Well-Plasma Detection Device in 16nm CMOS FinFET Process, C.-H. Lin\*, W. Chang\*, W.-H. Lin\*, C.-Y. Chang\*\*, W.-J. Lin\*\*, J.-W. Lee\*\*, K.-J. Chen\*\*, M.-H. Song\*\*, Y.-D. Chih\*\*, E. Wang\*\*, J. Chang\*\*, Y.-C. King\* and C. J. Lin\*, \*National Tsing Hua Univ. and \*\*TSMC, Taiwan

This paper presents a firstly developed FinFET well plasma charge detection device in 16nm FinFET CMOS logic process. The device can precisely record the plasma-induced FinFET well charges from BEOL plasma processes, also to detect the charge polarization and quantification in plasma steps. A novel differential pair-structure is further implemented to detect the charging damages from well-to-well interference in dynamic plasma processes. The detection devices aid in understanding the transient well-plasma charging mechanism and quantifying the well plasma damages of FinFET gate dielectric from BEOL process steps. It surely provides an accurate monitoring and optimizing solution of the well plasma induced charges on the whole 12-inch wafer in 16nm FinFET technologies and beyond.

# T3-3 - 14:50

Unified Physics-Based CFET Thermal SPICE Considering BEOL, Substrate, and BSPDN Using Adiabatic Cones, T. Chou, H.-W. Shen, Y.-S. Lai, C.-W. Tseng, F.-Y. Chang, H.-C. Lin, C.-W. Yao, C. W. Liu, National Taiwan Univ., Taiwan

For the first time, we present the physics based thermal SPICE of the split gate complementary FET. The passage of heat forms the cone and column shaped adiabatic surfaces from the transistor level to BEOL and the backside, where can be power delivery network or substrate. The temperature of CFET is 2.2 times higher than the non-stacked nanosheet. n-on-p CFET can alleviate temperature by 2°C compared to p-on-n CFET due to the higher thermal resistance of SiGe:B S D than Si:P S D. The same physical model is used for each level and is represented by Cauer ladders.

# T3-4 - 15:15

Uncovering True DIBL in Oxide-Semiconductor FETs: Impact of Negative Bias Stress and its Significance in 2T0C DRAM Retention, Y. Kang\*, K. Han\*.\*\*, X. Chen\*, Z. Zheng\*, Q. Kong\*, G. Liu\*, L. Jiao\*, Z. Zhang\*, Z. Zhou\*, Y. Chen\*, J. Xie\*, Y. Xu\*, Y. Wang\*, D. Zhang\*, Y. Feng\* and X. Gong\*, \*National Univ. of Singapore, Singapore and \*\*Huazhong Univ. of Science and Technology, China

In this study, we demonstrate the majority of drain-induced barrier lowering DIBL effects observed in BEOL-compatible oxide semiconductor FETs OSFETs may not represent the true DIBL phenomenon. For the first time, we reveal that these effects are instead caused by the negative threshold voltage shift induced by negative bias stress NBS during gate voltage sweeping. This insight helps explain the discrepancy between experimental and theoretical retention in OSFET-based DRAMs. Our findings are further substantiated by a significant improvement in retention performance for OSFET-based 2T0C DRAM at 77 K, where both DIBL and the NBS-induced  $V_{\rm TH}$  shift are suppressed. Additionally, for a 2T0C DRAM using OSFETs with a channel length  $L_{\rm CH}$  of 80 nm, negligible data loss was observed after 1000 seconds of data holding.



#### **High-Density Short-Reach Links**

Tuesday, June 10, 16:00-18:05

Chairpersons: W. Lo, ITRI

G. Gangasani, Marvell

#### C7-1 - 16:00

A 4x112-Gb/s, 3.51-pJ/bit Monolithically Integrated Silicon-Photonic Transceiver for High-Density Co-Packaged Optics, H. Liu\*.\*\*, N. Qi\*.\*\*, A. Li\*.\*\*, M. Zhu\*.\*\*, Y. Xiong\*, R. Wu\*.\*\*, Y. Xie\*.\*\*, Y. Qu\*, C. Zhong\*.\*\*, G. Li\*.\*\*, M. Li\*.\*\* and L. Liu\*.\*\*, \*Institute of Semiconductors, Chinese Academy of Sciences and \*\*Univ. of Chinese Academy of Sciences, China

A 4x112-Gb/s Si-photonic (SiPh) transceiver (O-TRX) for co-packaged optics (CPO) is presented in 45-nm SOI CMOS. By monolithically integrating electronic circuits and photonic devices on one chip, high bandwidth density and high power efficiency is achieved. A high-swing open-drain driver is codesigned with the traveling-wave Mach-Zehnder Modulator (MZM) to reach high speed at low power. A compact-size inverter-based trans-impedance amplifier (TIA) is co-optimized with the photodetector (PD). Multi-node inductive peaking is employed to extend bandwidth while maintaining a linear phase delay. The O-TRX loopback is demonstrated at 112-Gb/s/lane under PAM-4 modulation. Measurement results show the OTX achieves 224-Gb/s/mm bandwidth density with 2.36-pJ/bit power efficiency, while the ORX attains 299-Gb/s/mm density with 1.15-pJ/bit efficiency.

#### C7-2 - 16:25

A 0.52pJ/bit 0.448Tbps/mm UCle Standard Package Die-to-Die Transceiver with Low-Latency TX Clock Alignment in 3nm FinFET, J. Vandersand\*, D. Turker Melek\*, K. Geary\*\*, P. BS\*\*\*, S. Jain\*\*\*, B. Bothra\*\*\*, P. Sarkar\*\*\*, P. Sarkar\*\*\*, P. Sarkar\*\*\*, R. NavinKumar\*\*\* and K. Chang\*\*\*\*, \*Cadence Design Systems, Inc., USA, \*\*Cadence Design Systems, Inc., Ireland, \*\*\*Cadence Design Systems, Inc., India and \*\*\*\*Marvell, USA

This work presents a UCle SP compliant die-to-die wireline transceiver developed in a 3nm CMOS process node. Each 16-lane module operates at 16Gbps per pin yielding a 0.448Tbps per mm beachfront bandwidth while only consuming 0.52pJ per bit. A TX clock alignment scheme implements clock domain crossing to reduce the FDI-to-FDI latency to 4.0ns. With a 25mm organic substrate channel, an aggregate 0.372UI eye opening at BER of 1e-15 and 0.108UI at BER of 1e-27 is measured at 16Gbps per pin operation. A background maintenance mode improves the aggregate eye width by up to 0.156UI across +5% to -5% voltage and -40C to 125C temperature drift.

#### C7-3 - 16:50

A 77 fJ/bit 8 Gbps Low-Latency Self-Timed Die-to-Die Link for 2.5D and 3D Interconnect in 3nm, B. Zimmer, S. G. Tell and C. T. Gray, NVIDIA Corp., USA

This work presents a self-timed die-to-die link that serializes 4 data bits per pin for 2.5D or 3D interconnects using a standard adaptive digital clock and voltage supply. The link achieves 8 Gbps of per-pin bandwidth with a latency of 1 cycle, energy efficiency of 77 fJ/b, and bandwidth density of 44 Tbps/mm² at 0.7V. The link was implemented on a testchip as side-by-side transmitter and receiver macros connected with on-chip wires in a 3nm process.

# C7-4 - 17:15

A 3.2pJ/b 0.068pJ/b/dB 25Gb/s NRZ Wireline Transceiver with 3-tap FFE and Random Forest Classification for Compensating 47dB Loss in 16nm FinFET, R. Javadi, X. Lin and T. Anand, Oregon State Univ., USA

An energy efficient (0.068pJ/b/dB) NRZ wireline transceiver is presented that leverages 3-tap FFE in the transmitter and feature extraction with classification in the receiver for compensating 30dB to 47dB channel loss with BER<10<sup>-11</sup>. The proposed random forest classifier learns long-reach channel characteristics enabling the transceiver to achieve 3.2pJ/b at 25Gb/s for compensating 47dB loss.

# C7-5 - 17:40

A 12-Gb/s Single-Ended Transmitter with Echo-Canceling FFE for Multi-drop Bus in 28nm CMOS, K. Chung\*, Y. Kim\*\*, D. Shin\* and S. Cho\*, \*KAIST and \*\*Samsung Electronics Co., Ltd., Korea

This paper presents a 12-Gb/s single-ended transmitter with echo-canceling feed-forward equalizer to eliminate reflections in a multi-drop bus. The transmitter achieves a bit-error-rate below 1E-12 and maintains its performance across various multi-drop channels with stub lengths ranging from 8 cm to 12 cm. The transmitter achieves an energy efficiency of 1.52 pJ/b and occupies 0.01 mm² of active area in 28nm CMOS.



#### **High-Speed ADCs**

Tuesday, June 10, 16:00-17:40

Chairpersons: T. Nezuka, MIRISE Technologies Corporation

C. M. Lopez, imecV. Chen, Carnegie Mellon University

#### C8-1 - 16:00

An 11.9-ENOB 560-MS/s Subranging ADC Employing Amplifier-Switching Architecture with Multi-Threshold Comparators, R. Takenaka and T. Iizuka, The Univ. of Tokyo, Japan

This paper proposes a 14-bit 560-MS/s ADC, which employs an amplifier-switching subranging architecture. The proposed architecture enables both high-speed and high-resolution without requiring accurate amplifiers. A multi-threshold comparator with time-latch stages is proposed to achieve 16-level decisions with a single input pair, which reduces its input-referred noise with low power consumption. Fabricated in 28nm CMOS, this work achieves 72.14 dB SNDR at Nyquist input and consumes 9.76 mW at 560 MS/s, which leads to 176.7 dB Schreier FoM.

#### C8-2 - 16:25

A 46GS/s 7-bit Time-Interleaved Time-Domain ADC with Synthesizable Unit ADCs in 16nm FinFET, M. M. Ghahramani\*, S. K. Kaile\*, G. Park\*, Y. Zhu\*, I.-M. Yi\*\*, S. Hoyos\* and S. Palermo\*, \*Texas A&M Univ., USA and \*\*Gwangju Institute of Science and Technology, Korea

A 7-bit 16-way time-interleaved time-domain analog-to-digital converter (TDADC) is presented with unit ADCs synthesized using Verilog code and fully-automated standard digital place-and-route techniques. The proposed design introduces capacitor-based distortion compensation in the voltage-to-time converter and employs a novel 3-input time comparator with implicit phase interpolation in the fine time-to-digital converter to reduce power and area. Fabricated in 16nm FinFET, the proposed ADC operates at 46GS/s, achieves 78fJ/conv.-step at Nyquist, has 20.1GHz 3dB input bandwidth, and occupies 0.085mm² area.

#### C8-3 - 16:50

A Single-Channel 14b 3GS/s Pipelined ADC in 28nm Technology, B. Rui\*, X. Pan\*, Y. Lu\*, Y. Zhu\*, R. P. Martins\*.\*\* and C.-H. Chan\*, \*Univ. of Macau, China and \*\*Universidade de Lisboa, Portugal

This article presents a negative-delayed-input (NDI)-assisted Pipelined ADC (Pipe), which considerably shrinks the critical path by enabling concurrent quantization in all stages. The single-channel (1-CH) 14b ADC runs at 3GS/s and achieves 62.3dB SNDR and 81.1dB SFDR at Nyquist input after uneven-spaced (UE) PWL function INL calibration.

# C8-4 - 17:15

A PVT-Robust 16GS/s 4×TI Time-Domain ADC with Vernier-based Multipath Flash TDC achieving 25.7fJ/c-s FoM in 28nm CMOS, H. Li\*.\*\*, K. Zhang\*.\*\*, L. Qi\*\*\*, S.-W. Sin\*, R. P. Matins\*.\*\*\*\* and M. Guo\*, \*Univ. of Macau, \*\*UM Hetao IC Research Institute, \*\*\*Shanghai Jiao Tong Univ., China and \*\*\*\*Univ. of Lisboa, Portugal

A 4-way 16GS/s two-step time-domain ADC with Vernier-based multipath Flash TDC architecture is reported. Benefiting from the proposed Vernier-based Multipath TLSB Generator (VMTG) and the inherently tracking delay cells, this design achieves PVT robustness for time step ratios and gain between the two stages. Fabricated in 28nm CMOS, this ADC achieves 35.3dB SNDR and 51.2dB SFDR with a Nyquist input at 16GS/s and it consumes 19.6mW power, leading to a 25.7fJ/conv.-step Walden FoM.

# **Circuits Session 9**

# **Advanced Bio-Sensing Techniques**

Tuesday, June 10, 16:00-17:40

Chairpersons: J. Yoo, Seoul National University

A. Manickam, Cepheid

#### C9-1 - 16:00

Frequency-Division Multiplexed Magnetic Induction Based Wireless Wearable Sensor Network for Real-Time Motion Tracking, M. Rustom, A. Farhadian, T. D. Nguyen, M. Moghaddam and C. Sideris, Univ. of Southern California, USA,

A network of wearable, wireless sensor nodes is proposed for 3D real-time motion tracking using frequency-division-multiplexed magnetic induction (MI) to extract spatial information based on measured node-to-node coupling. Unlike camera-based and accelerometer-based motion tracking approaches, the proposed MI-based system does not require line-of-sight or suffer from accumulating errors due to sensor drifts. A custom MI transceiver chip is designed and fabricated in 180nm CMOS, which drives a wearable coil, measures tones transmitted by other nodes concurrently in full-duplex, and incorporates a fully-custom DSP for data analysis, supporting a configurable capture rate of up to 977 frames per second. Practical experiments are demonstrated on a 3D-printed double-joint leg model and a human subject, achieving 0.35 degree mean absolute error at 122 frames per second and 4mW total power consumption, demonstrating fully wireless MI-based motion tracking for the first time.



#### C9-2 - 16:25

A 10.42µW/Ch. PPG Sensor with a Zoomed Sampling Based on Velocity of Blood Flow, K.-J. Choi, B. Kim and J.-Y. Sim, POSTECH, Korea,

This work presents a  $10.42\mu$ W/Ch. PPG sensor with a zoomed sampling based on blood flow velocity. The proposed sensor achieves a 53% reduction in overall power consumption while maintaining heart rate (HR) and blood oxygen saturation (SpO<sub>2</sub>) accuracy with errors of 0.73bpm and 0.53%, respectively, showing robustness even under changes of 30bpm in HR and 6% in SpO<sub>2</sub>.

# C9-3 - 16:50

DERMIS: A Flexible Fully-Integrated 600µm-Resolution Per-Taxel Slip-to-Spikes Tactile Sensor Readout on a-IGZO TFT for Large-Area High-Density Electronic Skins, M. D. Alea, M. A. Rosa, Y. Nowicki, M. Dandekar, K. Myny and G. Gielen, KU Leuven, Belgium,

This work is the first demonstration of a flexible fully-integrated tactile sensor readout ASIC with 1) per-taxel event-based readout on a large-area unipolar amorphous indium gallium zinc oxide (a-IGZO) thin-film transistor (TFT) technology; 2) enabling the integration of differential capacitive sensors with state-of-art 600um pitch for high-spatial-resolution shear and normal force sensing; 3) allowing the direct acquisition of crucial grasp-state-dependent parameters (e.g., slip); while 4) achieving state-of-the-art performance compared to both prior TFT-based readouts (FOM<sub>W</sub>=4nJ/c-s; 0.6fF<sub>RMS</sub> resolution) and tactile sensors.

#### C9-4 - 17:15

A 28nm Online Spike Sorting Processor Based on Multi-Channel Template Matching, Z. Wen, R. Cong, H. Zhu, J. Zhang, C. Xie and K. Yang, Rice Univ., USA,

This paper presents a multi-channel template matching-based online spike sorting (OSS) processor to sort low SNR units while maintaining low area, high efficiency, and constant latency for brain-computer interface (BCI), through software-hardware co-design that exploits principles in neuroscience. It achieves 61.5% sorting accuracy with a min. unit SNR of 4 and stable sorting and visual decoding performance over 15 days in our visual response processing (VRP) experiment on a mouse.

# **Technology Session 4**

#### **RRAM and MRAM**

Tuesday, June 10, 16:00-18:05

Chairpersons: D. Kil, SK Hynix

J. Muller, Global Foundries

### T4-1 - 16:00

High Density 7nm FinFET Dielectric RRAM in Embedded Memory Applications, P.-H. Shih\*, W.-H. Lin\*, Y.-D. Chih\*\*, E. Wang\*\*, J. Chang\*\*, Y.-C. King\* and C. J. Lin\*, \*National Tsing Hua Univ. and \*\*TSMC, Taiwan

A high density **Fin**FET **D**ielectric RRAM (**FIND RRAM**) is firstly developed in fully compatible 7nm FinFET CMOS logic process without additional masks or process steps. The high density embedded 7nm FIND RRAM cell has a unit cell area of 0.063um², which consists of a 7nm nFinFET (WL) in series of a tiny high-K gate dielectric resistive switch node. The 7nm embedded HfO₂-based FIND RRAM performs very excellent metrics in high-speed operation, stable readout window, and superior reliability in endurance and retention. Besides, the NOR-type array arrangement has superior disturb window and overwriting immunity for various embedded IP configurations. Moreover, the highly scalable embedded FIND RRAM cell can be further adapted to 5nm and beyond for boosting the memory density, performance, and operation voltages, which all lead the 7nm FinFET RRAM technology to be a promising FinFET memory solution in advanced embedded applications.

## T4-2 - 16:25

First Implementation of Monolithic Integrated CIM with 1Mb Ultra-High-Density 8-Layer 3D VRRAM, Achieving High Computing Density (204.8GOPs/mm²) and FoM (2.13x10° GOPS²/W/mm²) for Efficient Scientific Computing, C. Ma\*\*\*, W. Sun\*, L. Nie\*.\*\*, Y. Li\*.\*\*, Y. Wu\*\*\*, Z. Fan\*\*\*, W. Tang\*.\*\*, K. Zhang\*, Y. Liu\*, S. Zhao\*, P. Zhang\*, S. Zhang\*.\*\*, X. Xu\*, F. Zhang\* and M. Liu\*, \*Chinese Academy of Sciences, \*\*Univ. of Chinese Academy of Sciences and \*\*\*Beijing Institute of Technology, China

This work firstly implemented the Monolithic Integration of ultra-high density 1Mb 8-layer 3D Vertical RRAM based Computing-In-Memory chip for efficient scientific computing. The 3D VRRAM performs in-situ matrix-vector multiplication and transmits data to underlying computational circuits via the BEOL metal interconnects, which is embedded with the on-chip processor, enabling a fully monolithic SoC. Benefiting from the reduced latency and energy consumption, the MI-3DCIM exhibits ultra-high computational density (204.8 GOPS/mm²), high energy efficiency (10.4 TOPS/W) and superior Figure of Merit. This novel 3DCIM chip greatly enhances the capability for complex tasks.



#### T4-3 - 16:50

High Density, High Speed STT-MRAM N7 Macros: Material and DTCO Exploration, D. Narducci, F. G.-Redondo, L. Verschueren, R. Carpenter, X. Piao, J. Chatterjee, M. G. Monteiro, A. Palomino, N. Jossart, K. Wostyn, M. G. Bardon, G. Hellings, G.S. Kar and S. Rao, imec, Belgium

As SRAM density stalls and process costs rise, Spin Transfer Torque (STT) - MRAM offers a compelling embedded alternative, though hampered by slow writes and limited bitcell gain owing to high write currents. To address this challenge, we present a novel synthetic antiferromagnet (SAF)-based magnetic tunnel junction (MTJ) capable of 1 ppm error-free operation at 5 ns write and 150 uA on 45 nm eCD devices. We further demonstrate reliable switching at 30 nm CD for lwrite of approximately 30 uA. The DTCO optimizes N7 process and bitcell design, enabling 5 ns, 0.8 pJ/bit macro writes. With a 0.013 um2 bitcell - iso A14/A10 nodes SRAM area, this approach bridges the density and performance gap in advanced technology nodes.

#### T4-4 - 17:15

Demonstration of Embedded MRAM with Sub-50 nm MTJ for RAM-Like and MCU Applications, Y.-J. Lee, C.-T. Cheng, J.-C. Huang, K.-F. Huang, B.-H. Lin, B. Lin, K. C. Tseng, B. Jinnai, M.-H. Tsai, M.-R. Jiang, Y. C. Ong, Y.-S. Chen, C.-Y. Tsai, K. C. Huang, C. C. Liu, M.-L. Lee, K.-F. Lin, Y.-C. Shih, P.-F. Yuh, T.-W. Chiang, W.-C. You, E. Chien, R. Wang, A. Y. J. Wang, Y. Wang and H. Chuang, TSMC, Taiwan

We present the scalability of the MTJ critical dimension (CD) to sub-50 nm on test vehicles at the 16 nm technology node. Using fast-write MTJ films, we demonstrate a high endurance of 1E14 cycles with a high-speed 20 ns write pulse width and a sub-ppm bit error rate without ECC for RAM-like applications. With high data retention MTJ films, we achieve 20-year data retention above 150 degrees Celsius and 1E6 cycles endurance for MCU applications. This MTJ CD scaling is designed to be compatible with small bitcell, making it suitable for the 5 nm technology node.

#### T4-5 - 17:40

96-Kb Voltage-Controlled-Magnetic-Anisotropy MRAM for In-situ Reservoir Computing with High Endurance (≥1012), Sub-ns Operation (0.3 ns) and Ultralow Power Consumption (40 fJ), J. Yu\*, Z. Zhu\*, A. Lee\*, X. Huang\*, Y. Zhang\*, Y. Li\*, K. Shen\*, Y. Gao\*, Y. Wei\*, H. Jiang\*, X. Zhang\*, M. Wang\*, D. Wu\*\*, X. Chen\*, Q. Liu\* and M. Liu\*, \*Fudan Univ. and \*\*Suzhou Inston Technology Co., Ltd, China

Voltage Controlled Magnetic Anisotropy (VCMA) Magnetic Random Access Memory (MRAM) offers ultrafast operation with high endurance, making it an ideal platform for edge computing applications such as reservoir computing (RC). We propose a 96Kb VCMA MRAM based RC system for temporal data processing, leveraging sub-nanosecond switching (0.3 ns), ultrahigh endurance (>10 to 12 cycles), and insitu training. The RC system demonstrates high recognition accuracy (97.8%) for tasks like voice recognition while achieving energy efficiency (40 fJ per input). These results establish a Si platform friendly architecture for scalable and efficient RC implementations.

#### **Technology / Circuits Joint Focus Session 2**

### JFS2 DTCO and Design Enablement

Tuesday, June 10, 16:00-18:05

Chairpersons: J. Cai, TSMC

Y. Li, Sandisk

#### JFS2-1 - 16:00 (Invited)

Analog Cells DTCO and Their Impact on Advanced Node CMOS Analog/Mixed-Signal Circuits, V. Chou, Y.-H. Chuang, Y.-T. Yang, C.-Y. Huang, S. Tai, Y.-C. Peng and K. Hsieh, TSMC, Taiwan

Analog Cells are selected size transistors with optimized cell heights, easy abuttable placement, and pre-drawn structured layouts. Analog Cell Design & Des

## JFS2-2 - 16:25 (Invited)

Soc DTCO Covering both Design and Technology, J. Deng, D. Sharma, J. Yuan, H. Wang, X.-Y. Wang, B. Lim, C. Jung, J. Cheng, Y. Gao, H. Rasouli, A. Kauffman, Y. Chiang, R. Denduluri, G. Nallapati and C. Chidambaram, Qualcomm Technologies, Inc., USA

This paper explores how designs (standard cells, SRAM, analog) with advanced semiconductor technology continue to enhance Power-Performance-Area (PPA) improvements through Design-Technology Co-Optimization (DTCO). As technology scaling nears its physical limits, PPA improvements have slowed due to constraints like Contacted Poly Pitch (CPP) and slower SRAM scaling. Process innovations such as Self-Aligned LELE patterning and Bridge-Via methods mitigate these issues, enhancing frequency and reducing logic area. Process-aware design flows, including Clustered Cells and specific Poly routing rules, further improve PPA. SRAM scaling benefits from design innovations like Pseudo Two Port (PTP) 6T cells, while analog designs leverage stacked short-channel devices for better process control and use MOSFETs in subthreshold mode as alternative devices for temperature sensor design. We also explored product-level KPI optimization through co-optimizing design parameters, such as operating voltage and frequency plans. Snapdragon® products achieve significant area reduction, performance improvement, and battery life increase through DTCO-driven optimizations as described.



#### JFS2-3 - 16:50 (Invited)

Backside Routing Enablement Considerations for Advanced Node GAA Devices, K. Subramani\*, R. Sengupta\*, M. Loden\*, A. Rahman\*, D. Dechene\*\*, and A. Chu\*\*,

\*Cadence Design Systems, USA and \*\*IBM Corporation, USA

Backside routing is an innovative feature offered by several foundries in 3nm and beyond nodes. In this paper, we will discuss the considerations for digital design implementation to use the backside layers most efficiently for maximizing PPA. The considerations involve choosing the right strategies for technology definition and implementation flow for power, clock and specific signals that can benefit from backside routing.

# JFS2-4 - 17:15

Realistic and Scalable TCAD for Yield-Aware Full-Chip DTCO, M. C. Park\*, U. Chae\*, S. Kim\*, Y. Kim\*, S. Kwon\*, S. Lee\*, J.-H. Lee\*, B. Shin\*, H. Jang\*, N. Kim\*, S. Kim\*, H.-H. Park\*\*, J. Kim\*, J.-H. Kang\*, Y.-S. Kim\*, Y.-G. Kim\*, C. Jeong\*\*\*, J.-W. Jeon\* and D. Kim\*, \*Samsung Electronics Co., Ltd., Korea, \*\*Samsung Semiconductor, Inc., USA and \*\*\*Ulsan National Institute of Science and Technology, Korea

Recent advances in Al-based virtual metrology and physics based modeling have accelerated the realization of digital twins, enabling comprehensive analysis. In this paper, we propose a methodology to evaluate chip-level defect probabilities and thermal bottlenecks, followed by a physical layout

optimization strategy. Our framework integrates multienvironment virtual metrology with a neural-network-based 3D real-time TCAD model, enabling the analysis of billions of geometries and thermal hotspots for simultaneous yield and performance optimization. Validated on a modest HPC cluster, our full-chip simulation completes in under an hour. This study extends the role of TCAD from device-level analysis to chip level design, linking performance and yield interactions and establishing a new milestone in the DTCO paradigm.

#### JFS2-5 - 17:40

Design-Aware Full-Chip Warpage Modeling for STCO: Bridging Reliability and Design for a New Era of Advanced Systems, H. Jang, B. Ma, S. Kim, J.-H. Lee, S. Myung, Y.-J. Lee, I. Huh, S. Kim, M. C. Park, N. Jeong, S. J. Kim, Y.-G. Kim and D. S. Kim, Samsung Electronics Co., Ltd., Korea

A novel design-aware warpage modeling methodology overcomes formidable computational barriers in full-chip layout simulation. By integrating representative volume element (RVE) analysis with Al-driven pattern clustering, this method enables efficient finite element method (FEM) simulations while capturing intricate BEOL design impacts. Validated by strong agreement with measured chip warpage across diverse temperatures conditions, the model reveals how mechanical property distributions drive warpage behavior. Demonstrated in system-technology co-optimization (STCO) for high bandwidth memory (HBM), it supports micro-bump and power delivery network (PDN) designs, achieving up to 13% warpage reduction without sacrificing performance. This scalable solution provides critical insights into balancing mechanical reliability and performance, paving the way for advanced semiconductor systems.

# **Technology Session 5**

#### **Imagers and Sensors**

Tuesday, June 10, 16:00-18:05

Y. Kikuchi, Sony Semiconductor Solutions Corporation Chairpersons:

F. Arnaud, ST Microelectronics

### T5-1 - 16:00

A Monolithic Dual-Layer Pixel Design with BEOL IGZO Transistors featuring High Dual Conversion Gain Ratio and Scaled Pixel Size for Future Image Sensors, S. Zhan\*, K. Kaneko\*\*, H. Wang\*, L. Kang\*, Y. Li\*, W. Cui\*, S. Lu\*, W. Zhao\*, Y. Wang\*, Y. Yin\*, Y. Shao\*, Z. Lin\*, X. Cui\*, Y. Wu\* and J. Xu\*, \*Huawei Technologies, China, China and \*\*Huawei Technologies, Japan, Japan

A novel monolithic dual-layer pixel design based on BEOL InGaZnO (IGZO) transistors (Tr) is proposed. By moving the pixel Trs to BEOL, the proposed design enables the double pixel size scaling down to 0.5 um and also a large dual conversion gain (DCG) ratio ~10:1 due to reduced parasitic capacitance and large Tr area. Device reliability and noise performance of IGZO Trs for pixel applications were studied comprehensively. Remarkable positive bias temperature instability (PBTI) with VTH shift within 30 mV after 1 ks stress under gate fields of 2-6 MV/cm and temperatures of 25-95 oC is achieved for IGZO Trs with Lg = 65 nm. Low 1/f noise, 10 times lower than reported short-channel IGZO Trs and comparable to Si Tr at 45nm node, is also demonstrated for the scaled IGZO Trs. Our results open opportunities for future image sensor based on BEOL IGZO technology.

#### T5-2 - 16:25

Adaptive Metasurface Microlens Array for Ultra-Wide-Angle CMOS Image Sensors, J. Hong\*, S. Lee\*, Y. Yun\*, S. Kwon\*, I. Park\*, S. Park\*, J. Jo\*, S.-E. Mun\*\*, H. Park\*\*, S. Roh\*\*, S. Ahn\*\*, S. Yun\*\*, B. Lee\*, I.-S. Joe\*, S.-I. Kim\*, J. Go\* and J. Song\*, \*Samsung Electronics Co., Ltd. and \*\*Samsung Advanced Institute of Technology, Korea

Demand for higher-resolution CMOS image sensors for mobile camera accelerated pixel scaling into sub-micron size. Microlens (ML) array, which plays a crucial role of collecting photons and phase-detection auto-focus, its performance degraded as the ML size became comparable to the visible wavelength. This gets worse with ML aberration and increasing light incident angle in ultra-wide-angle image sensors. We propose a metasurface microlens (MML) replacing the conventional spherical ML, employing adaptive design tailored to varying chief ray angle (CRA) across the entire image sensor. We implemented this MML using 0.5 µm pixel prototype, and demonstrated 35% auto-focus contrast ratio enhancement and 49% sub-colorchannel difference improvement without any quantum efficiency degradation.



#### T5-3 - 16:50

Back-Illuminated U-Shape p-i-n SPAD With High PDE and Broad Spectral Response Fabricated in 110nm CIS Foundry Technology, J.-H. Kim, D. Eom, E. Park, D. Son, W.-Y. Choi and M.-J. Lee, Yonsei Univ., Korea

We present a novel U-shape p-i-n SPAD (U-SPAD) and demonstrate the device performance using a back-illuminated (BI) 110 nm CIS foundry technology. The proposed SPAD is designed for outstanding broad spectral response, achieving photon detection efficiency (PDE) of 23.4% at 940 nm, 73.8% at 700 nm, and 50% at 475 nm, dark count rate (DCR) of about 21.6 cps/µm², about 210 ps timing jitter, and 0.3% afterpulsing probability (APP) at 21.5 V breakdown voltage and 1.6 V excess voltage. While high-performance SPADs generally require an optimized custom process, the proposed U-SPAD achieves high performance in a standard foundry process without any process modification.

#### T5-4 - 17:15

Optimization of a 3.5 µm Pitch 3D-Stacked Back-Illuminated SPAD in 40 nm CIS Technology: Achieving 37% PDP at 940 nm, E. Park\*, H.-S. Park\*, H.-S. Choi\*, J.-H. Kim\*, D. Eom\*, E.-J. Kim\*, S. Yook\*, D.-H. Son\*, H. Lee\*\*, J. Jang\*\*, K.-D. Kim\*\*, J. Kim\*\*, W.-Y. Choi\* and M.-J. Lee\*, \*Yonsei Univ. and \*\*SK hynix Inc., Korea

We report on a 3.5 micrometer pitch 3D-stacked back-illuminated single-photon avalanche diode (SPAD) based on 40 nm CIS technology. The SPAD is optimized to achieve superior photon detection probability (PDP) through doping optimization, enabling it to reach a PDP of 37% at 940 nm.

#### T5-5 - 17:40

A 1.2Mp 2.8 um 4-tap Indirect Time-of-Flight Sensor with 42% Quantum Efficiency at 940 nm Wavelength with Enhanced Angular Response, S. Lee, M. Son, S. Seo, S. Cho, Y.-G. Jin, Y. Oh, S. Song, S.-H. Lee, D. Han, D. Kim, M. Kim, I.-P. Hwang, Y. Jeong, D. Kim, M.-S. Keel, J. Ko, H. Park and K. Lee, Samsung Electronics Co., Ltd., Korea

A 2.8 um 4-tap Indirect Time-of-Flight Sensor with 42% Quantum Efficiency (QE) at 940nm wavelength is presented in 1.2Mp. SiO grid is used for 6%p higher 940nm QE than tungsten grid. We also used 3x3 multiple small lenses per unit pixel to enhance angular response. The dark noise can cause depth accuracy degradation especially under dim received light from far object, therefore we optimized n-type photodiode implantation under higher annealing temperature, which showed dark noise reduction, and achieved demodulation contrast of 92% under 100 MHz demodulation frequency with 940nm VCSEL. Overall depth resolution is under 10mm at 0.3~1.2m range without wiggling calibration.



#### **Evening Panel Discussion 1**

# What Can Semiconductor Industry Do for Greener Society?

Tuesday, June 10, 20:00-21:30

Moderator: B. Haran, AMAT

This panel discussion explores the crucial role of the semiconductor industry in building a sustainable future. As the world grapples with climate change and environmental challenges, semiconductors are at the forefront of technological advancements driving positive change. Key discussion topics include energy-efficient technologies enabled by advanced VLSI and sustainable manufacturing. The panel, featuring six experts in this field, will be moderated by Bala Haran from AMAT.

Panelists: T. Kawauchi, TEL

K. Fujiwara, JSR J. Wu, TSMC

C. Chidambaram, Qualcomm

B. En, AMD R. Eda, SEMI

# **Evening Panel Discussion 2**

# Practical Circuits & Technology Training: Academia vs. Industry – Where Do We Learn the Most?

Tuesday, June 10, 20:00-21:30

Moderators: L. Kostas, Qualcomm

P. Yue, HKUST

Traditional education may not fully prepare IC designers for real-world challenges. However, obtaining a sound theoretical basis and strong understanding of basic principles of electrical engineering and systems design is learned at an academic institution that also imparts critical thinking skills. So, is learning best done on the job or is there something that on-the-job training just doesn't teach you? Audience participation is encouraged. A scorecard will track academia vs. industry, with final scores revealed at the event's end.

Panelists: K. Makinwa, T. U. Delft

R. T. Yazicigil, Boston University

M. Nagata, Kobe Univ. C. M. Lopez, imec D. Friedman, IBM C.-M. Hung, MediaTek



# Award and Plenary Session 2 [Shunju I+II+III]

Wednesday, June 11, 8:00-10:00

08:00-Award Ceremony

08:40-Plenary Session 2

# PL2-1 - 08:40 (Plenary)

Enabling Generative Al: Innovations and Challenges in Semiconductor Design Technologies, K.-H. L. Loh, Media Tek

In fin recent years, Generative AI has profoundly transformed a wave of revolution across all fields from our daily life to advanced science exploration. This transformation has triggered an unparalleled increase in the demand for computing, connectivity/communication, and memory/data storage across data centers, infrastructures, and edge devices. The uptick has catalyzed booming industrial investments spanning a spectrum of "hard tech" based on advanced materials, packaging and semiconductor process technologies, such as hardware accelerators, wired and wireless connectivity/communication, and heterogeneous integrations from chip to discrete levels, all supported by substantial research and development investments to embrace the AI era.

In this presentation, we will explore the frontier of cutting-edge technologies and tackle the challenges associated with the development of high-performance computing and highspeed connectivity solutions under considerations to accomplish demanding energy efficiencies. Additionally, we will address the mounting demands posed by power distribution and other engineering complexities. Our focus will highlight the pivotal role of innovation and investments to ensure the long-term sustainability in the forthcoming decades.

#### PL2-2 - 09:20 (Plenary)

The evolution of Edge Al: Contextual Awareness and Generative Intelligence, A. Cremonesi, STMicroelectronics

We are witnessing a rapid transition from traditional AI to generative AI in the cloud. This is driving increased demands in the high-performance computing domain. However, to support this shift sustainably, edge AI technologies are advancing, including hardware accelerators (NPU) in microcontrollers and disruptive technologies like in-memory and neuromorphic computing. These developments, along with optimized large language models, enable more efficient AI and generative AI solutions for edge products. This keynote will explore the transformative potential of contextual awareness in AI for edge devices. Advanced sensing technologies and generative AI will revolutionize interactions with the world, allowing AI to adapt based on localized experiences and migrate seamlessly across devices. These innovations will drive the future of technology, making it more cognitive, generative, and interactive, ultimately leading to smarter, more connected and more sustainable solutions.

#### **Circuits Session 10**

# Al Accelerators 1

Wednesday, June 11, 10:30-12:35

Chairpersons: Y. Sasagawa, Socionext Inc.

P. Raina, Stanford University

#### C10-1 - 10:30

CATTUS: A 4K-UHD 30fps Deep Image Processor for Channel Attention Equipped U-Net Acceleration in 16nm FinFET, Y.-T. Chen\*, Y.-T. Chiu\*, Y.-C. Lee\*, P.-Y. Lu\*, M.-H. Lee\*, J.-S. Chen\*, K.-H. Ho\*, H.-C. Liao\*\*, C.-C. Chen\*\* and C.-T. Huang\*, \*National Tsing Hua Univ. and \*\*Novatek Microelectronics Corp., Taiwan

This paper presents CATTUS, the first deep image processor to integrate U-Net acceleration with channel attention, enabling high-quality image deblurring on mobile devices. It achieves throughput of 4K-UHD 30fps with 4.4 TOPS/W of energy efficiency. The processor realizes image reconstruction on challenging tasks such as deblurring with three key features: 1) pyramid layer fusion (PLF) for efficient U-net acceleration, 2) skip tunneling that minimizes external memory access (EMA) by optimizing skip connections, 3) lookahead channel attention (LCA) to reduce EMA for aggregating global information. Fabricated in 16nm FinFET technology, CATTUS delivers a 1.53x throughput improvement and 82% EMA reduction compared to baseline implementation, demonstrating real-time imaging across multiple applications.



C10-2 - 10:55

NuVPU: A 4.8~9.6 mJ/frame Progressive NTT-based Unified Video Processor for Stable Video Streaming and Processing with Neural Video Codec, S. Kim, H. Kwon, J. Lee, Y. Moon, H. Lee, J. Ryu, Z. Kalzhan, S. Kim, W. Jo and H.-J. Yoo, KAIST, Korea

This paper presents NuVPU, a unified neural video processor that supports both streaming and post-processing with 4.8-to-9.6mJ/frame of energy efficiency. The Selective Convolution-mode Neural Engine (SCNE) adaptively selects either space or NTT convolution domain to increase throughput by 1.69-to-3.35 times. A Progressive NTT Unit (PNTU) lowers computation by 44.8% and memory overheads by 80% during domain changes. A Frequency-aware Compressor (FAC) and Adaptive Tile Scheduler (ATS) reduce the external memory access (EMA) of warping-based frame reuse by 81.3%. NuVPU in 28nm CMOS process achieves 36.9 TOPS/W on UVG benchmark which is 2.3-to-9.2 times higher than previous video processors.

#### C10-3 - 11:20

Clo-HDnn: A 4.66 TLOPS/W and 3.78 TOPS/W Continual On-Device Learning Accelerator with Energy-Efficient Hyperdimensional Computing via Progressive Search, C. E. Song\*, W. Xu\*, K. Fan\*, S. Jan\*, G. Hota\*, H. Yang\*, L. Liu\*\*, K. Akarvardar\*\*, M.-F. Chang\*\*, C. H. Diaz\*\*, G. Cauwenberghs\*, T. Rosing\* and M. Kang\*, \*Univ. of California, San Diego, USA and \*\*TSMC, Taiwan

Clo-HDnn is an on-device learning (ODL) accelerator designed for emerging continual learning (CL) tasks. Clo-HDnn integrates hyperdimensional computing (HDC) along with low-cost Kronecker HD Encoder and weight clustering feature extraction (WCFE) to optimize accuracy and efficiency. Clo-HDnn adopts gradient-free CL to efficiently update and store the learned knowledge in the form of class hypervectors. Its dual-mode operation enables bypassing costly feature extraction for simpler datasets, while progressive search reduces complexity by up to 61% by encoding and comparing only partial query hypervectors. Achieving 4.66 TFLOPS/W (FE) and 3.78 TOPS/W (classifier), Clo-HDnn delivers 7.77x and 4.85x higher energy efficiency compared to SOTA ODL accelerators.

# C10-4 - 11:45

EVA: A 16mm<sup>2</sup> 1.54TFLOPS Tiled-Based Accelerator for Evolvable Edge Computing, J. Zhu, M. Kim, C.-H. Lu, W. Tang, T. Wei and Z. Zhang, Univ. of Michigan, USA

EVA is an accelerator designed for evolvable edge computing applications. It integrates programmable processing array tiles (A-Tiles) for performance and utilization, and RISC-V CPU tiles (C-Tiles) for control and quick reconfiguration at a 100ns timescale.

EVA features a versatile interconnect fabric, a distributed memory system, and flexible mapping. A 16mm² Intel16 EVA test chip achieves a peak throughput of 1.54TFLOPS (FP16) at 0.9V and a peak energy efficiency of 708GFLOPS/W at 0.49V, demonstrating high compute density and efficiency for diverse workloads.

#### C10-5 - 12:10

MAVERIC: A 16nm 72 FPS, 10 mJ/frame Heterogeneous Robotics SoC with 4 Cores and 13 INT8/FP32 Accelerators, S. Kim, J. Zhao, R. Hsiao, Y. Chi, V. Iyer, V. Jain, B. Nikolic and Y. S. Shao, Univ. of California, Berkeley, USA

MAVERIC is a 4 core, 13 INT8 or FP32 accelerator unit (AU) 16mm² SoC in Intel 16, for ML and robotics applications. 3D reconstruction (3DRecon) robotics application combines depth estimation (DE) and simultaneous localization and mapping (SLAM) for perception tasks, posing compute demand, accelerator integration, and scheduling challenges. MAVERIC operates at up to 1 GHz and achieves 8 TOPS/W peak energy efficiency (Eeff). It supports loop closure (LC) and delivers 10 mJ/frame and 72 FPS at the end-to-end DE and SLAM.

#### **Circuits Session 11**

# **High-Performance Oscillators**

Wednesday, June 11, 10:30-12:35

Chairpersons: T. lizuka, The University of Tokyo

A. Loke. Intel

#### C11-1 - 10:30

A 0.06mm<sup>2</sup> 27.5-to-30GHz Series Resonance VCO with Magnetic Mutual Resistance Achieving 207.2dBc/Hz FoMA at 10MHz Offset, L. Chen and D. Ye, Huazhong Univ. of Science and Technology, China

This paper proposes a series resonance voltage-controlled oscillator (SR-VCO) with magnetic mutual resistance (MMR) for the mm-wave band. The SR-VCO utilizes a single transformer to implement the LC tank. By manipulating the phase shift of current flow in the primary and secondary coils of the transformer, two resistive impedances, referred to as MMR, are generated. MMR enhances the tank impedance, resulting in reduced power consumption. Since MMR arises from magnetic coupling and does not dissipate energy, it preserves the quality factor (Q) of the LC tank while suppressing MOSFET-induced noise, which is a dominant contributor to phase noise at mm-wave frequency. The proposed SR-VCO is fabricated in a 65nm CMOS process. The measured phase noise (PN) is -120.1dBc/Hz at 1MHz offset, with a power consumption of 46mW. The prototype occupies 0.06mm² and achieves a FoMA of 207.2dBc/Hz at 10MHz offset.



#### C11-2 - 10:55

A 0.06mm² 14.7-to-20.2GHz Quad-Core VCO Enabled by the Folded Circular Transformer Achieving 201.1dBc/Hz FoMT and 203.4dBc/Hz FoMa, T. C. Ou\*, \*\*, K. Hu\*, H. Xu\*, J. Gu\*, Y. Wang\*, J. Bi\*, W. He\*, Z. Xu\*\*\*, M.-K. Law\*\* and N. Yan\*, \*Fudan Univ., \*\*Univ. of Macau and \*\*\*Zhejiang Univ., China

This paper proposes a quad-core oscillator utilizing a folded circular transformer that utilizes both magnetic and electric coupling for core synchronization, greatly reducing the area and phase noise penalty of quad-core VCO<sub>s</sub>. Moreover, the proposed transformer simultaneously enhances common-mode and differential-mode quality factors. Fabricated in 28nm CMOS, the proposed VCO covers 14.7-to-20.2GHz (31.5%) within an area of  $0.06 \, \mathrm{mm}^2$  while consuming  $5.3 \, \mathrm{mW}$  power. It achieves FoM<sub>A</sub> and FoM<sub>T</sub>@10MHz offset frequency of 203.4 and 201.1dBc/Hz at  $15.8 \, \mathrm{GHz}$ .

#### C11-3 - 11:20

A 1 ppb MEMS Oscillator Achieving -90 dBc/Hz at 10 Hz Offset Enabled by 5 MΩ TIA and 360° Phase Shifter, R. Islam\*, J. Shi\*, A. Abdelrahman\*, J. Kim\*, K. Elmeligy\*, Y. Li\*, M. O. Selim\*, J. Yan\*, M. B. Younis\*, R. Xia\*, M. S. Aly\*, M. A. Khalil\*, A. E. Hegazi\*, H.-K. Kwon\*\*, G. Vukasin\*\*, S. Saxena\*.\*\*\*, T. W. Kenny\*\*, G. Bahl\* and P. K. Hanumolu\*, \*Univ. of Illinois at Urbana-Champaign, \*\*Stanford Univ. and \*\*\*Indian Institute of Technology Madras, USA

This paper presents a fully integrated MEMS oscillator incorporating a low-noise high gain TIA, tunable phase shifter, and drive control circuit. It is fabricated in 65 nm CMOS and features a 115-134 dBohm, 81 fA per sqrt(Hz) TIA, a 360 degree phase shifter, and a 10 mV - 200 mV drive control circuit. Leveraging a high Q electrostatic MEMS resonator, the oscillator achieves -90 dBc per Hz phase noise at a 10 Hz offset.

#### C11-4 - 11:45

A 425pW, 0.3V, 32kHz Crystal Oscillator in 22nm FDSOI with Adaptive Pulse Injection and Amplitude Regulation Across -40°C to 125°C, X. Wang\*, Y. Zhu\*, X. Qian\*, J. Zhao\*\*, Y. Li\*, G. Wang\*\*, J. Li\* and L. Lin\*, \*Southern Univ. of Science and Technology and \*\*Shanghai Jiao Tong Univ., China

This paper presents a sub-nW, single-supply crystal oscillator (XO) based on pulse injection at fundamental frequency. The integrated ZVD-based adaptive slicer enables accurate pulse injection timing against PVT variations at 162pW. The discrete PWA-based amplitude regulation enables oscillation amplitude regulation, reducing driver power to 93pW. The 22nm test chip shows a 3.7ppb Allan deviation floor at 425pW and 0.016mm², with an operating temperature range from -40°C to 125°C.

#### C11-5 - 12:10

A 24-MHz Crystal Oscillator with 6.9-μs Startup Time and 2% Injection-ΔF Tolerance Using Phase-Interpolator-Assisted Synchronized Injection, X. Wang\*, Z. Wang\*, K.-M. Lei\*\*, S. Wang\*, J. Yao\*, Z. Cai\*, P.-I. Mak\*\* and Y. Guo\*, \*Nanjing Univ. of Posts and Telecommunications and \*\*Univ. of Macau, China

This article presents a 24-MHz fast startup crystal oscillator with a phase-interpolator-assisted synchronized injection technique. This technique ensures the phase consistency between the injection source and the crystal resonance even with a 2% injection-delta-F, rendering the synchronized injection more flexible and efficient. Additionally, a differential peak detection technique is proposed to detect the phase error incurred by delta-F, shortening the auxiliary non-injection period to merely 4 cycles. Fabricated in a 40-nm CMOS process, the proposed XO achieves a 6.9-µs startup time and a 4.8-nJ startup energy with 2% injection inaccuracy when tested with a 24 MHz off-chip crystal. Compared to the traditional injection with constant frequency and phase, the startup time is reduced by 191.3 times. With a power consumption of 63 uW, the phase noise is measured to be -137.8 dBc/Hz at 1-kHz, corresponding to a FoM of 237 dBc/Hz.

# **Circuits Session 12**

#### **Ultra High-speed Wireline**

Wednesday, June 11, 10:30-12:35

Chairpersons: C. Patrick Yue, Hong Kong University of Science and Technology& Stanford University

N. Kocaman, Broadcom

#### C12-1 - 10:30

A 7-bit 150-GSa/s DAC in 5nm FinFET CMOS, B. Moeneclaey, J. Lambrecht, A. Parisi, J. Van Kerrebrouck, G. Coudyzer, A. Kankuppe, E. Martens, J. Craninckx and P. Ossieur, imec, Belgium

We present a 7-bit wireline source-series terminated (SST) digital-to-analog converter (DAC) in 5nm FinFET CMOS. The number of unit cells is reduced by employing SST cells with relative weight 1 and 4. From an eight-phase clock, each unit cell creates clock pulses which are in turn used to control the three-stage 8:1 multiplexer. At 150GSa/s, the ENOB was measured to be 4.1bit for a 72.8GHz sinewave. The DAC consumes 621mW from a 0.9V and 0.96V supply. Eye diagrams of 150Gbit/s non-return to zero modulation and 300Gb/s PAM-4 are measured, pre-equalized using a 10-tap feedforward equalizer.

#### C12-2 - 10:55

A 128Gb/s 0.67pJ/b PAM-4 Transmitter in 18A with RibbonFET and PowerVia, S. Kundu, J. Kim, K. M. Nguyen, H.-S. Kim, D. S. Shin, S. Kim, K. Yu, M. Clark, P. Wali, A. Teles, R. Saba, F. OMahony and A. Balankutty, Intel Corp., USA

A fully integrated 128Gb/s DAC based transmitter designed for long-reach wireline applications in 18A CMOS process with RibbonFet, PowerVia and backside power delivery network is presented. Design optimization to utilize the lower leakage and faster RibbonFeTs for improved energy efficiency and layout techniques to utilize backside power layer for inductor and clock distribution are demonstrated. The TX achieves the best energy efficiency of 0.67pJ/bit (0.75pJ/bit with the PLL) and smallest area ever reported, meeting key linearity and jitter compliance specifications for PAM-4 standards highlighting the continued benefits of CMOS scaling for wireline applications.



#### C12-3 - 11:20

A Monolithic 400Gbps Electro-Optical Retimer with Integrated TIA and Class-AB Silicon-Photonics/VCSEL Driver in 5nm FinFET, N. Codega\*, C. Asero\*, D. L. Herbas Burgos\*, A. Di Pasquo\*, C. Nani\*, D. Albano\*, E. Monaco\*, F. Giunco\*, F. Martinelli\*, G. Gira\*, I. Fabiano\*, M. Garampazzi\*, M. Sosio\*, M. Davoodi\*\*, N. Ghittori\*, N. Nadisha Miral\*, S. Dadash\*\*\*, S. Cyrusian\*\*, V. Karam\*\*\*, F. De Bernardinis\*, J. Riani\*\*, K. Parker\*\*\*, P. Pascale\*, P. Rossi\*, E. Temporiti\*, S. Scouten\*\*\*, T. Mukherjee\*\*, S. Jantzi\*\*, L. Tse\*\* and K. Chang\*\*, \*Marvell Semiconductor, Inc., Italy, \*\*Marvell Semiconductor, Inc., Canada

In this paper we present a 4x100Gb/s PAM4 electro-optical retimer with integrated TIA and driver realized in 5nm FinFET technology. The focus of the paper is on the fully integrated line side electro-optical interfaces, targeting MMF (multi-mode fiber) as well as SMF (single-mode fiber) applications. In the line RX (LRX), with integrated TIA, a BER floor of 1E-11 and an average sensitivity of -9dBm at a BER=2.4E-4 is achieved at 53.125GBd. For the line TX (LTX), the integrated Silicon Photonics (SiPho) driver delivers 4Vpp to external Mach-Zehnder Modulators (MZM), while the integrated Vertical-Cavity Surface-Emitting Laser (VCSEL) driver delivers 9mApp to external VCSELs.

#### C12-4 - 11:45

A 212.5Gb/s PAM-4 Receiver with Mutual Inductive Coupled Gm-TIA in 4nm FinFET, K. Kwon, H. Roh, K. Kim, D. Noh, W. Lim, J. Park, D. Lim, D. Jang, P. Lim, J. Do, S. Yim, D.-H. Lim, B.-J. Yoo, M. Choi, H.-G. Rhew and J. Shin, Samsung Electronics Co., Ltd., Korea

This paper presents a 212.5Gb/s PAM-4 receiver (RX) in 4nm FinFET. The receiver features a hybrid continuous time linear equalizer (CTLE) with stability enhanced common-mode feedback (CMFB), Gm-TIA amplifier with mutual inductive coupling, and clock distribution network using standing wave driver. The receiver occupies 0.56 mm² and achieves a BER of 5.81e-6 at 212.5Gbps, with a total insertion loss of 40dB at 53.125GHz.

#### C12-5 - 12:10

A 128Gb/s ADC/DAC Based PAM-4 Transceiver with >45dB Reach in 3nm FinFET, P. Liu\*, A. Hassan\*, A. Kaushik\*\*, A. Mostafa\*, C.-R. Yang\*, D. Prabakaran\*\*, D. Zhou\*, D. Visani\*, E. Hsiao\*, F. Lu\*, F. Chu\*, G. Cui\*, H. Yan\*, J. Gu\*, J. Zang\*, K. Mohammad\*\*, L. Jiang\*, M. Gambhir\*, M. Singh\*, P. P. Kulkarni\*\*, P. Ramakrishna\*\*, R. Ho\*, S. Xu\*, S. Sivakumar\*, X. Han\*, X. Yang\*, Z. Adal\*, Z. Guo\*, Z. Li\*, Z. Yan\*, Z. Yu\*, H. Wang\* and K. Chang\*, \*Marvell Semiconductor, Inc., USA, \*\*Marvell Semiconductor, Inc., India

This work presents a low-power and high-performance long reach transceiver suitable for Al data centers and cloud computing. This ADC/DAC based transceiver is implemented in 3nm FinFET, capable of equalizing >45dB channel loss @128Gbps achieving BER < 2e-8, compatible with PCIE GEN-7 data rate. Analog power of the transceiver is as low as 1.89pJ/b, and the total power efficiency is 4.1pJ/b.

#### **Technology Session 6**

#### **Technology Highlights 2**

Wednesday, June 11, 10:30-12:35

Chairpersons: T. Irisawa, AIST

A. Veloso, Imec

# T6-1 - 10:30

High-Density Wafer Level Connectivity Using Frontside Hybrid Bonding at 250nm Pitch and Backside Through-Dielectric Vias at 120nm Pitch After Extreme Wafer Thinning, L. Witters\*, S. Van Huylenbroeck\*, S. Kang\*, P. Zhao\*, S.-A. Chew\*, K. D'havé\*, S. Iacovo\*, M. Stucchi\*, B. Zhang\*, S. Dewilde\*, D. Montero\*, R. Chukka\*, K. Vandersmissen\*, N. Heylen\*, N. Jourdan\*, J. W. Maes\*\*, H. V. Bana\*\*, C. Zhu\*\*\*, J. De Vos\*, G. Beyer\* and E. Beyne\*, \*imec, \*\*ASM International, Belgium and \*\*\*ASM International, Finland

This study presents the latest advancements in high-density wafer-level connectivity. We achieve face-to-face hybrid bonding at 250nm pitch. Access from the wafer backside is demonstrated through extreme wafer thinning beyond the shallow trench isolation floor and through dielectric via connections at 120nm pitch. Additionally, we examine the impact of guided wafer to wafer bonding, necessary for tight pitch hybrid bonding, on subsequent backside lithography overlay corrections, paving the way for combining high-density wafer front and backside connectivity.

#### T6-2 - 10:55

Voltage Reduction (1.4V) and Array Scaling (41nm) of Ferroelectric NVDRAM for Low-Power and High-Density Applications, A. Calderoni, A. Chavan, D. P. Ettisserry, A. Liao, M. Balakrishnan, M. Hollander, M. Mariani, K. Karda, M. Jerry, M. Fischer, D. Mills, B. Cook, S. Chhajed, J. Zahurak and N. Ramaswamy, Micron Technology, Inc., USA

We present a second generation of scaled Ferroelectric NVDRAM [1] with a reduced x- and y-direction pitch (41nm), a thinner ferroelectric stack (5nm) and a lower array operating voltage (Read/Write at 1.4V). We present full chip array data, showing a >250mV window at -4s after 1E10 cycles. This is the densest 1T1C ferroelectric technology with such high performance. Multiple material and electrical challenges were addressed to ensure performance was maintained at reduced dimensions.



#### T6-3 - 11:20

A Gate-All-Around Nanosheet Oxide Semiconductor Transistor by Selective Crystallization of InGaOx for Performance and Reliability Enhancement, A. Chen\*, K.-w. PARK\*, K. Sakai\*, S. Hwang\*\*, X. Huang\*, T. Saraya\*, T. Hiramoto\*, T. Takahashi\*\*\*, M. Uenuma\*\*\*, Y. Uraoka\*\*\* and M. Kobayashi\*, \*The Univ. of Tokyo, \*\*AIST and \*\*\*Nara Institute of Science and Technology, Japan

We have demonstrated performance enhancement of poly-crystalline InOx FET by Ga-doping in ALD process & anneal with mobility of 81cm2/Vs, novel selective crystallization technique for the InGaOx in an oxide semiconductor (OS) stack, and process integration & device operation of a gate-all-around nanosheet (GAA NS) InGaOx (IGO) FET, for the first time. An evidence of quantum confinement in NS OS is experimentally obtained. Bias-stress reliability is significantly improved by GAA NS FET. This work will provide a direction for high-performance OS FETs toward LSI applications.

#### T6-4 - 11:45

First Demonstration of 1T FDSOI-based >1000fps Image Sensor with In-Pixel Computing, N. Tang\*.\*\*, G. Yu\*.\*\*, A. Shi\*.

\*\*, K. Li\*.\*\*, J. Li\*.\*\*, H. Yang\*.\*\*, Y. Xiao\*.\*\*, L. Wu\*.\*\*, Z. Zhou\*.\*\*, L. Liu\*.\*\*, X. Liu\*.\*\* and J. Kang\*.\*\*, P. Huang\*.\*\*,

\*Peking Univ. and \*\*Beijing Advanced Innovation Center for Integrated Circuits, China

In this work, we first demonstrate a 128\*128 image sensor based on 1T 22nm FDSOI pixel, which leverages the deep depletion region under buried oxide (BOX) for optical sensing. Key features include: (1) extremely high photosensitivity of 5E5 A/W is achieved thanks to the amplifying effect of the FET; (2) photosensitivity can be tuned by gate/drain voltage, enabling in-pixel computing capability in 1T structure; (3) the chip can achieve >1000 fps imaging and feature extraction utilizing the in-pixel processing capability and the proposed exposure/sampling/readout uncoupled pipeline design.

# T6-5 - 12:10

1T1C 3D HZO FeRAM with High Retention (>125°C) and High Endurance (>1E13) for Embedded Nonvolatile Memory Application, Y. Sun, H. Li, F. Yu, J. Zhao, Y. Li, Y. Zhang, M. Zhang, X. Jia, Y. Huan, J. Wu, T. Zhang, Z. Wang, M. Shao, D. Su, K. Du, J. Zhu, J. Song, H. Zhang, H. Lyu and J. Xu, Huawei Technologies Co., Ltd., China

We present a high-performance 3D FeRAM test chip with hafnium zirconium oxide (HZO) materials for advanced embedded nonvolatile memory (eNVM) applications. This FeRAM achieves array-level endurance exceeding >1E13 cycles, robust data retention over 10 years at 125°C, and stable high-temperature operation. A novel stack design with defect shielding layers on both sides of the HZO film effectively suppresses fatigue, imprint, and pinch phenomena, which are common challenges in ferroelectric memories. A 32Mb 1T1C FeRAM test chip was developed by integrating the high-performance ferroelectric capacitor (FeCAP) with a 7-nm HZO film into a 3D trench structure on a 40nm CMOS platform. The memory array demonstrates a memory window of approximately 340 mV at -5.2sigma (0.1ppm), maintaining above 200 mV after 1E11 writes/1E13 reads, and 125°C baking operation. These results highlight the significant potential of this eFeRAM to replace eFlash in industrial-level eNVM applications, offering enhanced performance and reliability.

#### **Circuits Session 13**

#### Al Accelerators 2

Wednesday, June 11, 14:00-15:40

Chairpersons: W. Shin, Rebellions Inc.

B. Keller, NVIDIA

# C13-1 - 14:00

Adelia: A 4nm LLM Accelerator with Streamlined Dataflow and Dual-Mode Parallelization for Efficient Generative Al Inference, J.-H. Kim\*, S. Lim\*\*, J. Cha\*\*, S. Moon\*\*, D. Seo\*\*, H. Lee\*\*, J. Kim\*\*, J. Lee\*\* and J.-Y. Kim\*\*, \*KAIST and \*\*HyperAccel. Korea

This paper presents Adelia, an efficient inference chip for large language models (LLMs) featuring a streamlined dataflow and dual-mode parallelization. The streamlined dataflow directly connects the external memory to Adelia's LLM-optimized compute engine with matched bandwidth, achieving an effective memory bandwidth usage of up to 91%. The systolic path between multiple engines facilitates data reuse to enhance computational power without compromising efficiency. Adelia dynamically transitions between context mode, which distributes the long context of a single request to optimize latency, and batch mode, which processes inputs from different requests to prioritize throughput based on the runtime status. Adelia is fabricated in 4nm technology and occupies 5.28mm² in die area. Compared to the H100 GPU, it achieves 1.59x and 2.51x greater memory bandwidth efficiency and throughput efficiency, respectively, across various models.

# C13-2 - 14:25

A 22nm 25.08TOPS/W Multi-Task Transformer Accelerator with Mixed Precision Structured Sparsity and Two-Stage Task-Adaptive Power Management, Z. Fan, Q. Zhang, P. Abillama, S. Shoori, J. Lee, C.-W. Tseng, W. Meng, H.-S. Kim, D. Blaauw and D. Sylvester, Univ. of Michigan, USA

This paper presents the first silicon implementation of a transformer accelerator that executes multiple tasks simultaneously, improving per-task efficiency. It applies structured sparsity, multi-precision computation, and a two stage DVFS mechanism yielding highest reported efficiency per task.



# C13-3 - 14:50

ASAP: A 28nm Transformer Training Accelerator with Alternating Sparsity and Asymmetrical Microscaling Floating-Point Precision, H. Mun, J. Meng, X. Hu, Y. Liao and J.-s. Seo, School of ECE, Cornell Tech, USA

This paper presents a novel Transformer training accelerator, ASAP, which supports alternating sparsity and asymmetrical microscaling floating-point precision, preserving accuracy while efficiently handling both dense and sparse weights. Additionally, a reconfigurable pipeline SIMD (RP-SIMD) architecture is proposed to efficiently support special functions. ASAP is implemented in 28nm CMOS and achieves 26.6 TOPS/W energy efficiency and 2.9 TOPS performance, surpassing state-of-the-art training processors by 2.02-to-5.07X in training energy-efficiency.

#### C13-4 - 15:15

A 94Hz Inference and 7.4mJ/epoch Fine-Tune Edge SoC for Diffusion-based Robot Manipulation with Speculation and Disturbance Enhancement, S. Zhang, X. Chen, C. Peng, H. Yang, Y. Liu and H. Jia, Tsinghua Univ., China

We present an edge SoC for diffusion-transformer-based action generation (DiTAG) in robot manipulation featuring both low-latency inference and high-fidelity on-device fine-tune. The substantial challenges of their edge acceleration are overcome by a speculative parallel inference and disturbance-enhanced low-bit fine-tune architecture. The 28nm prototype integrating the quad-core accelerator with CPU shows 10.6ms inference, 36.8x better than an edge GPU, with 7.88TOPS/W system energy efficiency, and 7.4mJ/epoch on-device fine-tune with minimal accuracy loss at normal voltage.

#### **Circuits Session 14**

#### **Analog Techniques**

Wednesday, June 11, 14:00-15:40

Chairpersons: Y. Shu, MediaTek Inc. A. Burg, EPFL

#### C14-1 - 14:00

A Pipelined-SAR-TDC with Time-Domain Noise-Shaping Self-Calibration, Y. Mormul, V. Nguyen and R. B. Staszewski, Univ. College Dublin, Ireland,

We introduce a new architecture of a pipelined SAR time-to-digital converter (TDC) operating at 800 MS/s. To overcome the problem of high nonlinearity and performance variability, a time-domain noise-shaping (NS) self-calibration technique is proposed. It boosts SFDR by 20 dB achieving 55 dB, yielding the best-in-class 96% bit-utilization (ENOB/NOB) and FoM<sub>w</sub> of 40 fJ/conv-step. Measurements across supply variations and multiple ICs validate the efficacy of the proposed approach.

#### C14-2 - 14:25

A 0.5-0.8V 10-85 MS/s 12-bit SAR ADC in 22nm FDSOI Utilizing an Inverter-Based Comparator Architecture, R. Cents, H. S. Bindra, H. de Vree and B. Nauta, Univ. of Twente, Netherlands,

A low-noise high-speed inverter-based comparator architecture integrated within a SAR ADC is proposed, enabling the ADC to operate down to 0.5V supply at a state-of-the-art Nyquist rate of 10 MS/s, while still achieving 9.5 ENOB and a Walden FoM of 2.0 fJ/conv-step. At its typical supply of 0.8V, the ADC demonstrates 9.9 ENOB at a Nyquist rate of 85 MS/s, with a Walden FoM of 3.6 fJ/conv-step.

# C14-3 - 14:50

A  $G_m$ -Boosted 3-Stage Amplifier with Gain-Enhancing Feedforward Path for CL of 40-160nF, H.-W. Jeong, C.-H. Lee, S.-Y. Nam and S.-W. Hong, Sogang Univ., Korea,

This paper proposes a 3-stage amplifier designed to drive a load capacitor ( $C_L$ ) exceeding 100 nF. To ensure stability with a large  $C_L$  while achieving high DC gain and sufficient gainbandwidth (GBW), this amplifier uses a transconductance ( $G_m$ ) boosting cell and a gain-enhancing feedforward path (GE-FFP). As a result, the amplifier achieves DC gain > 130 dB, GBW of 1.47 MHz, and phase margin (PM) of 51 degrees when driving  $C_L$  of 160 nF. The chip was fabricated in a 28 nm CMOS process.

#### C14-4 - 15:15

A Boosted 3.5W, -81.6dB THD+N, 92.6% Total Efficiency, Battery-Powered Class-D Audio Amplifier with True-Zero-Switching Achieving Quiescent Current of 0.9mA, J.-H. Cho\*.\*\* and H.-S. Kim\*, \*KAIST and \*\*Samsung Electronics Co., Ltd., Korea,

This paper presents a battery-powered Class-D audio amplifier (CDA). The single-stage topology boosts the output to up to 2x of the battery voltage using a flying capacitor directly mounted on the die. The true-zero-switching (TZS) minimizes quiescent current to 0.9mA. Fabricated in a 180nm, the chip achieves a peak efficiency of 92.6% with a maximum power of 3.56W. A peak THD+N of -81.6dB at 1kHz was measured.



#### **Biomedical Readout and Stimulation**

Wednesday, June 11, 14:00-15:40

Chairpersons: T. Tokuda, Institute of Science Tokyo

S. Zali Asl, Ferric

#### C15-1 - 14:00

A Fully Balanced Biphasic Neurostimulator with Body-Coupled Powering and Full-Duplex Communication via Baseband Load Shift Keying, T. Jeon\*, K. Kang\*, S. Yun\*\*, S. Yoon\*, B. Kim\*\*, J. Bae\*\*, \*\*\* and Y. Chae\*, \*\*\*, \*Yonsei Univ., \*\*Kangwon National Univ. and \*\*\*XO Semiconductor Inc., Korea

This paper presents a wireless biphasic neurostimulator that exploits body-coupled powering and full-duplex communication, achieving precise stimulation with real-time feedback. Simultaneous wireless power and data transmission can be achieved with a baseband load shift keying. A fully balanced biphasic stimulation can be achieved, whose current is applied using flying capacitors and current regulators, and the residual charge after biphasic stimulation becomes extremely small. Implemented in a 65nm CMOS, the prototype IC achieves a downlink BER of 1.8x10<sup>-4</sup>, an uplink BER of < 3x10<sup>-5</sup>, and a residual charge of 0.22nC at a depth of 40mm in porcine tissue.

# C15-2 - 14:25

A Flexible HV Stimulator ASIC with Stimulus-Synchronized Charge Balancing and Embedded CM Regulation for Implantable Peripheral Nerve Stimulation, M. Zhou\*, H. Xin\*, R. van Wegberg\*, G. Langereis\*, C. M. Lopez\*\*, M. Konijnenburg\* and N. Van Helleputte\*\*, \*imec, Netherlands and \*\*imec, Belgium

We present a flexible HV stimulator ASIC for implantable peripheral nerve stimulation applications. Innovative charge balancing techniques are introduced to support diverse stimulation current waveforms, achieving state-of-the-art performance. Fabricated in 130-nm BCD technology, the ASIC has been fully validated through saline characterizations.

# C15-3 - 14:50

An N-type-only a-IGZO Thin-Film-Transistor Based Nyquist-rate 8-bit CDAC+SAR ADC Consuming 1.7mW at 32ksps and Achieving 44dB SNDR, M. Dandekar and K. Myny, KU Leuven, Belgium

This paper presents an 8-bit SAR-ADC design implemented using n-type-only a-IGZO devices in a TFT-on-foil technology. The ADC operates at a sampling rate of 32ksps, consuming 1.7mW, and achieves an SNDR of 44dB, resulting in a FOMW of 410 pJ per c.s. This performance surpasses the current state-of-the-art thin-film designs by approximately 50x, closing the gap between thin-film and Silicon design space, making all TFT flexible bioelectronic sensors feasible. The high sampling rate and accuracy allow for the digitization of signals directly on a flexible biomedical sensor, providing a digital output resistant to noise and interference.

# C15-4 - 15:15

A 16-Qam-Based Multi-Node BCC System with Bias-Electrode-Free Multi-Channel ExG Readout ICs, D. Lee\*, S.-I. Cheon\*, D.-H. Choi\*, Y. Cho\*, S. Park\*, C. Santos\*, H. Cho\*, S. Ha\*\* and M. Je\*, \*KAIST, Korea and \*\*New York Univ. Abu Dhabi, United Arab Emirates

We present a system for multi-node body channel communication (BCC) integrated with multi-channel ExG readout ICs. By utilizing time-multiplexing for BCC communication and ExG measurement, the system minimizes interference between the BCC signal and the ExG readout IC, enabling the measurement of EEG, which requires ultra-low noise, along with EOG, EMG, and ECG signals. Moreover, eliminating the bias electrode increases BCC signal amplitude by 60%, allowing 16-QAM communication at 8Mbps even under time-multiplexing operation. To guarantee the robustness of communication, the system adaptively switches to OOK modulation in cases of significant body channel loss. Additionally, the system employs a multi-channel least mean square (LMS) filter and a DC servo loop (DSL) to effectively address channel mismatches and differential artifacts.

# **Circuits Session 16**

#### **Hardware Security**

Wednesday, June 11, 14:00-15:40

Chairpersons: M. Takamiya, The University of Tokyo

K. Kornegay, Morgan State University

# C16-1 - 14:00

A High-Order Masking with Load-Delay-Equalized WDDL for Provable Side-Channel Security, N. Miura\*, K. Monta\*\*, T. Wadatsumi\*\*, J. Shiomi\*, Y. Hiraga\*\*\*, T. Sugawara\*\*\* and M. Nagata\*\*, \*Osaka Univ., \*\*Kobe Univ., and \*\*\*The Univ. of Electro-Communication, Japan

A custom WDDL is proposed for provably secure high-order masking. XOR/AND used in WDDL are designed by symmetric Pass Transistor Logic (PTL) to suppress data dependency in power/delay. A self-retimer is introduced to avoid accumulation of inter-stage power/delay variations due to unbalanced layout and routings. Those custom circuit techniques reduce data-dependent information leakage and non-ideal leakage due to unwanted coupling through supply and ground for proven high-order side-channel security.



#### C16-2 - 14:25

A 1.7 pJ/bit 10 MHz Calibration-Free PVT Variation and Mismatch Tolerant Latch-Based True Random Number Generator in 4 nm FinFET, J. Lee\*, \*\*, J. Lee\*\*, Y. Youn\*\*, Y. Lim\*\*, D. Hong\*\*, B. Kang\*\*, S. Yoo\*\* and J.-Y. Sim\*, \*POSTECH and \*\*Samsung Electronics Co., Ltd., Korea

This paper presents a calibration-free, latch-based true random number generator (TRNG). The entropy source, a ring structure formed with two inverters and two capacitors connected in series, systematically cancels the offset caused by device mismatches and parasitic capacitances while amplifying the device noise. By combining four entropy sources with XOR operations, the proposed TRNG achieves qualified randomness without additional noise or calibration. The proposed TRNG, fabricated in a core area of 1680µm² using a 4nm FinFET process, demonstrates robust operation across 80 samples, covering standard process corners(SS, TT, FF) and a wide supply (0.75V-0.95V) and temperature (-40°C to 85°C) range. Validations on NIST SP 800-22 and NIST SP 800-90B while supply noise of up to 200mV is being injected highlight its robustness against external perturbations. The TRNG, with an energy efficiency of 1.7 pJ/bit at 0.75 V, reveals its suitability for low-power applications.

#### C16-3 - 14:50

GUARD: A Fully-Digital TDC-Based Clock and Voltage Glitch Detector with On-Demand Protection in a 28nm CMOS, M. Li\*, P. X. Huang\*, J. Park\*, S. K. Mathew\*\*, V. De\*\* and M. Seok\*, \*Columbia Univ. and \*\*Intel Corp., USA

We present GUARD, a clock and voltage glitch detector with on-demand protection using clock gating, supporting clock frequency up to 1.2GHz. It achieves 100% detection and protection accuracy on a victim MCU under glitch attacks of different profiles as small as 100ps, robustly across -40-125°C and 0.6-1.1V. GUARD can also differentiate normal IR drops from real glitch attacks to prevent unnecessary protection overhead. It consumes 2958µm² and 0.874mW/100MHz.

#### C16-4 - 15:15

Online Learning-Based Countermeasure Against Power Analysis Attacks, T. Wang, H. Qu, Q. Fang, M. Alioto, National Univ. of Singapore, Singapore

An online learning-based countermeasure against power analysis attacks is presented. The proposed countermeasure digitizes power and learns on chip, training a power compensation machine learning model. For the first time, this enhances security in a die-specific manner (e.g., adaptation to mismatch) and over time (e.g., adaptation to aging). On-chip learning avoids disastrous (18,000x worse) aging-induced security deterioration, achieving billion-scale MTD.

# **Technology Session 7**

#### 3D Power Delivery Network

Wednesday, June 11, 14:00-15:40

Chairpersons: R. Baek, POSTECH M. Badaroglu, Qualcomm

#### T7-1 - 14:00

Backside Power Delivery for Power Switched Designs in 2nm CMOS: IR Drop and Block-level Power-Performance-Area Benefits, Y. Zhou\*, P. Venugopal\*, L. Verschueren\*, J. Cousins\*, M. Brunion\*, J.-Y. Lin\*, H. Kukner\*, M. Naeim\*\*, M. Stassar\*.\*\*, A. Farokhnejad\*, O. Zografos\*, M. G. Bardon\*, J. Myers\*, J. Ryackaert\* and G. Hellings\*, \*imec, Belgium and \*\*Cadence Design Systems, Inc., USA

We evaluate block-level power-performance-area benefits of backside power delivery networks (BSPDNs) in power switched designs, using Through Silicon Via in the Middle of Line (TSVM) in an N2 (2nm) nanosheet technology. We use an industrial processor design, representative of mobile computing, to compare the BSPDN implementations to traditional frontside power delivery networks (FSPDNs), in both always-on and power switched implementations. BSPDNs benefit both domains, reducing IR drop and enabling smaller block-level area at iso-performance conditions. FSPDNs consume significant routing resources to ensure an acceptable IR drop, especially in power switched designs employing local PDN assists to achieve the target IR drop. Moving PDN to the backside frees up resources, resulting in smaller area for both power switched (-14%) and always-on (-17%) domains. BSPDN enables the use of 2x fewer power switches, achieving an additional 9% area reduction.

#### T7-2 - 14:25

A Novel Backside Signal Inter/Intra-Cell Routing Method Beyond Backside Power for Angstrom Nodes, J. Lee\*. \*\*\*, S. Lee\*, S. Lee\*, Y. Ahn\*, M. Kim\*, G. Cho\*, S. C. Song\*\*, U. Roh\*\*, M. Cai\*\*, D. Greenlaw\*\*, S. Molloy\*\*, S.-K. Lim\*\*\* and R.-H. Baek\*, \*POSTECH, Korea, \*\*Google LLC and \*\*\*Georgia Institute of Technology, USA

For the first time, we investigate a power-performance-area (PPA) benefit of novel backside (BS) signal (BSS) routing using BS gate/source-drain contact targeting the Angstrom node. INVx1 with BS-pin has a smaller miller capacitance ( $C_{\text{miller}}$ ) and shows 3.0 ~ 3.3 % higher ring oscillator (RO) frequency at iso-power by BSS inter-cell routing. Even without BS-pin, standard cells can improve frontside (FS) routing congestion and an energy-delay product (EDP) by using BSS intra-cell routing. BSS intra-cell routing based chip has a larger IR drop, but it is mitigated when the microBump pitch is small. The BSS intra-cell routing based chip shows an 8.61 ~ 9.46 % lower power delay product (PDP).



#### T7-3 - 14:50

Heterogeneous 3D Integration of Low-Voltage E/D-mode GaN HEMTs on CMOS Chip for Efficient On-Chip Voltage Regulation in Active Power Delivery Networks, J. Jeong, C. J. Lee, N. Rheem, Y.-J. Suh, B. H. Kim, J. P. Kim and S. Kim, KAIST, Korea

In this work, we present and experimentally demonstrate the heterogeneous 3D integration of low-voltage (LV) E/D-mode GaN power devices on a CMOS chip for a vertically integrated active power delivery network (PDN). Our LV E/D-mode GaN power devices show record-low on-resistance ( $R_{on}$ ) of 2.78&1.85 ohmmm and record-high frequency capability ( $f_{T}$ ) of 84&68 GHz along with breakdown voltage ( $V_{BD}$ ) > 12 V among power devices for active PDN. Furthermore, we innovated by introducing a scaled AlGaN buffer thickness to the direct heat spreading layer, enabling more efficient heat dissipation. This approach reduces thermal resistance ( $R_{TH}$ ) by 13.5%compared to using only heat spreading layer bonding techniques.

#### T7-4 - 15:15

A 10W 3.8-5V Input IVR Chiplet with 93%-Peak-Efficiency and 3.2A/mm² Density Featuring Wide Load Range and Adaptive Ganging for 2.5D/3D Vertical Power Delivery, X. Sun\*, X. Wang\*, R. Chen\*, X. Men\*, J. Jiang\*\*, Y. Wang\* and X. Liu\*, \*Tsinghua Univ. and \*\*Southern Univ. of Science and Technology, China

A monolithic Flying Capacitor Integrated Voltage Regulator (FCIVR) chiplet is proposed to supply wide load range while incorporating adaptive ganging capabilities for 2.5D/3D-ICs. It offers four ganging modes catering to scalable PDNs. The triple-loop regulation enables two-dimensional V-F droop saving for drastic di/dt events and provides autonomous handshake driving for efficient conversion. By leveraging ML-driven circuit & RDL physical optimization, combined with ultra-thin interposer integration to minimize form factor and parasitics, the proposed VPD technology achieves 10A max current, 3.2A/ mm² density, 93% peak efficiency and 32ns FoM.

# **Technology Focus Session 1**

#### **TFS1 Memories for AI Applications**

Wednesday, June 11, 14:00-15:40

Chairpersons: S. Kim, KAIST

A. Calderoni, Micron

#### TFS1-1 - 14:00 (Invited)

Memory for Al. How it's Currently Used, Challenges, and Future Direction., R. Sreeramaneni, L. Yang and R. Chang, Micron Technology, Inc., USA

High-Bandwidth-Memory (HBM) technology is powering the accelerated computing and GPUs behind the Al revolution. This paper discusses the important role HBM performance, power, capacity, and RAS play in Al hardware. We will compare the HBM memory architecture with more traditional compute memory such as DDR5 focusing on Micron Technology's industryleading HBM3E. We will end with a look towards the future technology innovations that will be necessary to continue enabling the future HBM roadmaps.

#### TFS1-2 - 14:25 (Invited)

Emerging Embedded Non-Volatile Memories Beyond 28nm in Al Era, S. Han\*, J. Oh\*, T.-Y. Lee\*, J. Park\*, K. Yamane\*, Y. Chung\*, G. Yang\*, K. Lee\*, S. Hwang\*, G. Kang\*, H.-J. Shin\*, K.-T. Nam\*, J. Son\*, B. Seo\*, K. Suh\*, Y. Kim\*\*, J. Jeong\*\*, S. Ko\*\*, K. Yeom\*, J. Park\*, J.-H. Park\*\*, K. Lee\* and J.-H. Ku\*, \*Foundry Business, Samsung Electronics Co., Korea and \*\*2R&D center, Samsung Electronics Co., Korea

Embedded Non-Volatile Memory (eNVM) is a crucial technology in various applications such as Micro Controller Units (MCUs), Data Center and Edge AI. Due to the scaling limitations of eFlash and the demand for high-performance and energy efficiency, emerging eNVM technologies such as MRAM, RRAM, and PCM are being developed below 28nm. These emerging eNVMs have their own strengths and weaknesses, and they are expected to complement each other and coexist in the market. In this paper, we review the characteristics of these emerging eNVMs and market positioning. We also discuss our progress in eNVM solutions.

# TFS1-3 - 14:50

A Hybrid Monolithic 3D Integration of 2T0C DRAM and RRAM Chip for High-Precision In-Memory Point Cloud Acceleration with Ultra-Fine-Grained Dataflow, Y. Gao\*, Z. Wang\*.\*\*, L. Bao\*, H. Zhang\*, J. Sun\*, R. Xie\*, L. Liang\*.\*\* and Y. Cai\*.\*\*, \*Peking Univ. and \*\*Beijing Advanced Innovation Center for Integrated Circuits, China

In this work, we propose the hybrid monolithic 3D (H-M3D) computing-in-memory (CIM) accelerator to address the challenges of frequent data granularity in existing point cloud acceleration on edge devices. By integrating and orchestrating high-precision 2T0C DRAM CIM and RRAM CIM for Euclidean distance (L2D) and matrix-vector multiplication (MVM) operations, we achieve efficient feature computation and refine the dataflow granularity to individual points, enabling high-speed data transmission. The proposed M3D-CIM point cloud system demonstrates 1.51x speedup and 2.56x enhancement in energy efficiency, providing a novel solution for point cloud acceleration on edge devices.



#### TFS1-4 - 15:15

**224 TOPS/W-level Analog Computation in Memory Cell Using Hybrid Ferroelectric Tunnel Junction Having Enhanced On-State Conductance,** W.-T. Koo, S. Lee, H. D. Lee, K. Kim, J.-G. Lee, M. Y. Lee, C. Kim, J. Lee, I. Jang, Y. Kim, S. Oh, W. Lee, S. Chae, H. Cho, H. Choi, S. G. Kim, S. Lee, J. Yi, Y. Cho and S. Cha, SK hynix Inc., Korea

Here, we demonstrate hybrid ferroelectric tunnel junctions (H-FTJs) that combine ferroelectric and resistive switching for analog computation in memory. The key of H-FTJs lies in the modulation of the effective tunneling thickness by the control of oxygen vacancy-based unconnected filaments in HfZrO<sub>2</sub>. The H-FTJs exhibited high on-state conductance (1600 S/cm²) and on/off ratios (32,000) compared to recent studies on HfZrO<sub>2</sub>-based FTJs. We also fully integrated one-transistor-one-(H-FTJ) cross-bar arrays, and confirmed their analog multiply-accumulate operations (accuracy 91.9%) with high energy efficiency (224.4 TOPS/W) for inference tasks.

#### **Technology Session 8**

#### **Integrated Optical Devices and Photodetector**

Wednesday, June 11, 14:00-15:40

Chairpersons: Y. Shiratori, NTT Corporation

E. Vianello, CEA-LETI

#### T8-1 - 14:00

Monolithic Optical Clock Distribution in Bulk CMOS Using I-Beam Waveguides, D. Sarkar, D. Baum and A. Hajimiri, California Institute of Technology, USA

Optical clock distribution is presented for the first time in an unmodified bulk CMOS chip. An H-tree distributes the signal using I-beam photonic waveguides operating at 780 nm, and avalanche photodiodes and transimpedance amplifiers bring the signal back to the electronic domain. All components are monolithically integrated on a 180nm bulk CMOS chip. The I-beam waveguides are also demonstrated in 22nm FDSOI CMOS technology without process modifications or design rule violations, making it the most advanced commercial process to monolithically demonstrate photonic waveguiding.

#### T8-2 - 14:25

Digital-to-Optical Converters (DOCs) with Improved Nonlinearity for Energy-Efficient Optical Data Transmission, J. Davis\*, G. Kyriazidis\*, Y. Hu\*\*\*, H. Warner\*, X. Zhu\*, L. Magalhaes\*, M. Modisette\*, N. Lippok\*\*\*, K. Yang\*, B. Vakoc\*\*\*, M. Loncar\* and G. Hills\*, \*Harvard Univ., USA, \*\*Peking Univ., China and \*\*\*Wellman Center for Photomedicine, USA

Digital-to-Optical Converters (DOCs) convert digital electrical signals directly to analog optical signals, eliminating the need for Digital-to-Analog Converters (DACs) in Electro-Optic (EO) modulators, for applications in energy-efficient data communication. However, nonlinearity in DOCs degrades Symbol Error Rate (SER), power, and bandwidth. We present Engineered-Segment Length (ESL) DOC circuits that achieve strictly better trade-offs in Integral Non-Linearity (INL), Differential Non-Linearity (DNL), SER, and optical power vs. state-of-the-art DOC circuits. We experimentally demonstrate DOCs in heterogeneously integrated Si CMOS and Thin-Film Lithium Niobate (TFLN). Our measured ESL DOCs improve Root-Mean-Square (RMS) INL from 1.04 to 0.14 Least-Significant Bits (LSBs) and RMS Differential Non-Linearity (DNL) from 0.42 to 0.10 LSBs.

# T8-3 - 14:50

High-Performance Monolithic 3D Integrated Red μLEDoS Display for AR/VR, J. Park\*, H. Kim\*\*, Y. Kim\*\*, W. Baek\*, S. H. Lee\*, H. Kim\*, D. Geum\*\*\*, J. P. Kim\*, J. Jeong\* and S. Kim\*, \*KAIST, \*\*RAONTECH Co.,Ltd. and \*\*\*Inha Univ., Korea

The monolithic 3D (M3D) integration of Si CMOS integrated circuits (IC) and micro-LEDs is essential for high-resolution AR/VR displays. In this work, we successfully demonstrated a red micro-LEDoS display of 1700 PPI and a 640x480 resolution based on the M3D process. The phosphide red micro-LED devices exhibited excellent color performance as unit pixels. Notably, with high pixel yield achieved through a low-temperature process below 300 degrees Celcius and a metal mesh structure mitigating IR drop issues, we successfully implemented high-quality imaging driven by real-time HDMI inputs using 8-bit grayscale. This study plays an essential role in advancing M3D micro-LEDoS next-generation display.

#### T8-4 - 15:15

A Heterogeneous CMOS Chip Monolithically integrating Monolayer MoS<sub>2</sub> for Enhanced NUV Detection: A Scalable Platform Leveraging 2D Materials to Complement and Surpass Silicon, X. Wang, H. Ning, J. Lin, Y. Xie, Y. He, Y. Ma, S. C. Bodepudi, B. Yu and Y. Xu, Zhejiang Univ., China

A gap persists between the remarkable properties of 2D materials and their realization in chips, demanding integration strategies that unlock 2D-CMOS complementarity to surpass silicon. We present a 2D-CMOS chip and this approach demonstrates a paradigm that combines 2D properties with the established advantages of CMOS as a large-scale chip foundation. Our chip integrates the intrinsic properties of monolayer MoS2, particularly high-gain and low-noise, enabling superior near-ultraviolet performance over mainstream silicon-based chips. This integration complements and transcends the limitations of silicon chips. Our chip achieves a 257% quantum efficiency at 375 nm and a detectivity exceeding 10<sup>13</sup> Jones under weak-illumination, maintaining an effective SNR of 3dB at 0.4pW ultra-weak light. Compared to state-of-the-art 2D-materials-integration, our design reduces fabrication complexity, enhances scalability, and achieves high degree of CMOS integration, yielding substantial performance improvements. These results establish a scalable platform, bridging the gap between emerging materials and practical chip implementations.



#### CIM-based Al Accelerators

Wednesday, June 11, 16:00-17:40

Chairpersons: M. Chang, NTHU & TSMC

Y. Chen, EnCharge Al

#### C17-1 - 16:00

DIAL: An Energy-Efficient DRAM In-Memory Computing Accelerator with Compact Partial Product LUT and Twisted Differential ADC, S. Um\*, S. Ha\*, S. Whang\*, M. Kim\*, B. Kim\*, S. Kim\*, J. Ryu\*, C. Jeong\*, K. Sohn\*\* and H.-J. Yoo\*, \*KAIST and \*\*Samsung Electronics Co., Ltd., Korea

This paper presents DIAL, an energy-efficient DRAM in-memory computing accelerator with partial product LUT-based architecture for advanced tasks. The compact LUT reduces 53% of the LUT area and 45% of power by eliminating extended zeros. The twisted differential ADC reduces ADC area by 37% and power by 45% through comparators sharing and improves CSNR by using a 2.3fF computation capacitor and increasing ADC resolution by 3 times. The dual-mode sense amplifier reduces 33% of LUT power by adjusting operation modes based on data patterns. Fabricated in 28nm CMOS technology, DIAL occupies a 20.25 mm² die area and achieves 55.4 TFLOPS/W energy efficiency on the GPT-2 benchmark, demonstrating a 4.1 times higher efficiency than prior IMC designs.

#### C17-2 - 16:25

A 22nm 41.8TFLOPS/W Al-edge Transformer/CNN Nonvolatile-Processor Using QKV-Softmax-Layer-Fused Hybrid ReRAM-CIM and Concurrent-Transpose/Non-Transpose SRAM-CIM, H.-H. Hsu\*, W.-S. Khwa\*\*, T.-H. Wen\*, W.-T. Hsu\*, Y.-C. Chang\*, H.-Y. Lu\*, H.-J. Wen\*, C.-Y. Chen\*, Y.-C. Huang\*, C.-L. Wu\*, P.-S. Wu\*\*, M.-S. Ho\*\*\*, C.-C. Lo\*, R.-S. Liu\*, C.-C. Hsieh\*, K.-T. Tang\*, S.-H. Teng\*\*, Y.-D. Chih\*\*, T.-Y. J. Chang\*\* and M.-F. Chang\*.\*\*, \*National Tsing Hua Univ., \*\*TSMC and \*\*\*National Chung Hsing Univ., Taiwan

Tiny Al-edge devices use nvCIM for power-off weight storage and active-mode computation, enabling high energy efficiency (EF) and low power-on latency. While tiny Transformer models offer higher accuracy than CNNs, they pose significant challenges for homogeneous nvCIM architectures due to attention-based operations that require dynamic matrix multiplications (MM), where both inputs and weights are generated on-the-fly. This paper presents a hybrid ReRAM-CIM and SRAM-CIM nvProcessor optimized for Transformer and CNN models, featuring a CIM-friendly layer-fused flow. Fabricated in a 22nm process, the proposed nvProcessor achieves 41.8 TFLOPS/W (system), 55.2 TFLOPS/W (macro) and 74.48% ImageNet accuracy, enabling high EF and accuracy with tiny (Mb-level) models.

# C17-3 - 16:50

CELLA: A 28nm Compute-Memory Co-Optimized Real-Time Digital CIM-Based Edge LLM Accelerator with 1.78ms-Response in Prefill and 31.32 Token/s in Decoding, Z. Wu\*, Y. Wang\*, Z. He\*, B. Yang\*\*, Y. Wang\*, S. Wei\*, Y. Hu\*, F. Tu\*\* and S. Yin\*, \*Tsinghua Univ. and \*\*Hong Kong Univ. of Science and Technology, China

This work presents the first digital Compute-in-memory (CIM)-based Edge LLM Accelerator CELLA with Compute-Memory co-optimization for *Prefill-Decoding* phases: 1) *Prefill* is compute-bound and CIM works in MAC-mode. A puzzle-shaped input scheduling unit (PISU) reassembles input into an interlocking pattern, reducing 2.82x TTFT. 2) *Decoding* is memory-bound and CIM works in CAM-mode. (a) For *static* parameters, non-uniform quantization multiply-accumulation (NUQ-MAC) is processed in CIM with majority-first index searcher (MFIS), saving 4.89x TPOT. (b) For *dynamic* KV cache, only important KV is fetched by hash-based cache filter (HCF), saving 4.13x TPOT. It reaches 1.78ms-response and 31.32token/s, speeding up 2.53~9.74x than state-of-the-art accelerators and edge GPU.

#### C17-4 - 17:15

A Fully Integrated Mixed-Signal Compute-In-Memory Accelerator for Solving Arbitrary Order Boolean Satisfiability Problems, T. Bhattacharya\*, D. Kwon\*, G. H. Hutchinson\*, X. Zhang\*\*, I. Rozada\*\* and D. Strukov\*, \*Univ. of California, Santa Barbara, USA and \*\*1QB Information Technologies (1QBit), Canada

This paper presents a mixed-signal In-Memory Computing (IMC) accelerator with 256x128 10T bi-directional SRAM array in 55 nm CMOS process, for solving arbitrary-order K-Boolean Satisfiability (K-SAT) problems. It achieves nearly an order of magnitude faster solution times compared to other single-variable update ASIC solvers on uniform random 3-SAT problems, while outperforming all other solvers by 10-200X on the studied higher-order K-SAT problems, relevant to cryptography applications.



#### Power Management

Wednesday, June 11, 16:00-18:05

Chairpersons: W. Jung, KAIST Q. Fan, TU Delft

# C18-1 - 16:00

A 0.087 fs FOM Current-Mirror-Based Analog-Assisted Digital LDO with VO Ripple Optimization, S.-Y. Nam, H. Park, W.-G. Kim and S.-W. Hong, Sogang Univ., Korea

This paper proposes a current-mirror based analog-assisted (CBAA) digital low dropout regulator (DLDO) that achieves a fast transient response and output voltage ( $V_0$ ) ripple optimization. It features less than 1 mV  $V_0$  ripple at 200 mA load current ( $I_L$ ) and 106 mV undershoot in load transition. Fabricated using 28 nm CMOS process, the CBAA DLDO shows a superior figure-of-merit (FOM) of 0.087 fs among low input voltage ( $V_{IN}$ ) DLDOs.

#### C18-2 - 16:25

Distributed Power Management for 22nm Al Processor with Event-Driven Exponential Dual-Loop LDOs and Online Sparsity-Aware Droop Mitigation, Y. Dong, X. Liu, Y. Jing, R. Huang, L. Ye and T. Jia, Peking Univ., China

A distributed power management solution is presented for a 22nm digital compute-in-memory (DCIM) based Al processor featuring distributed dual-loop digital LDOs with event-driven exponential control for fast response, online learning-based sparsity-aware droop mitigation, and workload-driven current balance scheme among LDOs. The work demos the promising opportunity of deploying distributed management for robust and efficient Al computing, where the above techniques lead to >71.6% droop reduction for diverse Al workloads and up to 33.3% energy saving or 22.5% performance improvement.

#### C18-3 - 16:50

A Fully Integrated Buck Voltage Regulator in 16nm with in-Package Air Core Inductor Featuring Digital Computational Control for Fast Transient Responses, Z. Ahmed, S. Kim, H T Do, H. K. Krishnamurthy, K. Ravichandran, J. W. Tschanz and V. De, Intel Corp., USA

This paper presents a fully integrated buck regulator in 16nm CMOS with in-package air-core inductors. A digital computational non-linear controller is proposed demonstrating 20ns response time with 25mV droop at 1A/0.4ns load step.

#### C18-4 - 17:15

An 85.6%-Efficiency Supply Modulator with Auxiliary Bidirectional Power for 200MHz 5G NR Applications, C.-Y. Chen\*, C.-Y. Lee\*, K.-H. Chen\*, K.-L. Zheng\*.\*\*, Y.-H. Lin\*\*\*, S.-R. Lin\*\*\* and T.-Y. Tsai\*\*\*, \*National Yang Ming Chiao Tung Univ., \*\*Chip-GaN Power Semiconductor Corp. and \*\*\*Realtek Semiconductor Corp., Taiwan

To meet high bandwidth and wide output range requirements while achieving high efficiency in 5G envelope tracking (ET), the proposed Auxiliary Bidirectional Power (ABP) supply modulator recycles excessive energy to the auxiliary capacitor (C<sub>AUX</sub>) and optimizes the linear amplifier supply voltage via droop mechanism combined with the adaptive voltage supply, achieving peak efficiency of 0.86. The measured PA output spectrum shows -38.97dBc and -38.34dBc ACLR at NR 200MHz 26dBm.

#### C18-5 - 17:40

A 90-260 V<sub>AC</sub> Isolated Offline Single-Stage Single-Transformer-Winding Multiple-Output (STWMO) RGBW LED Driver with <0.7% Current Variation and Dimmable Current-Regulated Error-Based Control, M.-J. Cho\*, C. Shin\*\*, S.-J. Lee\*, J.-H. Kim\*, Y.-W. Jeong\*, M.-S. Kim\*, M.-H. Kim\*, H. Jeon\*\*\* and S.-U. Shin\*, \*POSTECH, \*\*Samsung Electronics Co., Ltd. and \*\*\*Chungbuk National Univ., Korea,

This paper proposes an isolated offline single-stage single-transformer-winding multiple-output (STWMO) RGBW LED driver optimized for mood lighting systems. The proposed design reduces external components and system volume by utilizing a single transformer winding for multiple outputs. A cross-side error-sum feedback circuit (CS-ESFC) ensures precise and independent current regulation. Additionally, a switched-capacitor amplitude modulation (SC-AM) circuit addresses cross-regulation issues, maintaining stability during color and brightness adjustments. The driver achieves a peak efficiency of 90.45% with a current variation under 0.7% across a wide input voltage range of 90 to 260 V<sub>AC</sub>, offering a compact and cost-effective solution for RGBW LED drivers.



#### **Frequency Generation**

Wednesday, June 11, 16:00-18:05

Chairpersons: K. Sohn, Samsung Electronics Co., Ltd.

S. Palermo, Texas A&M

#### C19-1 - 16:25

A 24.5-to-45.2-GHz Dual-Injection Clock Multiplier with Folded-Inductor-Based Magnetic-Flux Cancellation Achieving 32.83-fs<sub>rms</sub> Jitter and 0.037-mm<sup>2</sup> Core Area Feifan Hong, Jiawen Chen, Pingda Guan, Sayan Kumar, Robert B. Staszewski, and Teerachot Siriburanon, F. Hong, J. Chen, P. Guan, S. Kumar, R. B. Staszewski and T. Siriburanon, Univ. College Dublin, Ireland

We present an injection-locked clock multiplier (ILCM) with an ultra-wide frequency tuning range (TR) and low-jitter, employing a compact folded-inductor-based magnetic-flux cancellation technique. An LC-series dual-mode quadrature ring oscillator (QRO) is co-designed with an edge-combining frequency doubler in mm-wave bands to simultaneously extend the TR and lower the phase noise. Leveraging the differential time-alignment technique, the proposed design achieves an exceptionally large loop bandwidth ( $f_{\text{BW}}$ ). Fabricated in 28-nm HPC+ CMOS, the QRO provides a 59.46% TR (12.25-22.6 GHz), while the doubler's output delivers a frequency range of 24.5 to 45.23 GHz. Occupying a tiny core area of 0.037 mm², the proposed ILCM achieves a measured rms jitter of 32.83 fs at 39.5 GHz.

#### C19-2 - 16:50

A Calibration-free ADPLL with < -80 dBc Fractional Spur Based on Pseudo-random Phase Modulation, N. Zhang\*, S. Zhang\*, C. Wu\*, Y. Wang\*\*, J. Liu\* and H. Liao\*, \*Peking Univ., China and \*\*Univ. of British Columnbia, Canada

This work presented a fractional-N digital PLL that leveraged pseudo-random phase modulation and demodulation technique to mitigate fractional spurs. By decoupling the periodicity of control words from the non-linearity of the digital phase interpolator (DPI), the fractional spurs induced by DPI non-linearity were effectively suppressed. The digital PLL fabricated in 40nm CMOS, demonstrated fractional spurs below noise floor at near-integer channels.

#### C19-3 - 17:15

A 31.5-36 GHz Low-Spur Gain-Boosting Charge-Sharing Locking PLL with 54fs Jitter, C. Khongprasongsiri, S. Kumar, P. Sawakewang, R. B. Staszewski and T. Siriburanon, Univ. College Dublin, Ireland

We propose a gain-boosted charge-sharing locking (GB-CSL) PLL that achieves both ultra-low jitter and low reference spur. The gain-boosted mechanism amplifies the charge residue, read by an ADC, in the digital domain, and pre-charges it back to a small charge-sharing capacitor ( $C_{Share}$ ) via a DAC for the CSL operation. Despite the use of a small  $C_{Share}$ , the proposed technique significantly enhances the PLL load modulation to the LC tank to achieve low spurs. Implemented in 28 nm CMOS, the 33 GHz PLL achieves 54 fs rms jitter and a record-low normalized reference spur of -76 dBc for mm-wave PLLs, while consuming 14 mW.

# C19-4 - 17:40

A 2.3-15.8-GHz 8-Phase Injection-Ripple-Filtered Multi-Ring-Coupled DCO Enabling a Wideband Digital PLL, Z. Xu\*, E. Allebes\*, P. Mateman\*, J. van den Heuvel\*, S. van der Ven\*, S. Traferro\*, A. Kumar\*, R. Li\*, S. Nagata\*\*, K. Bunsen\*\*, T. Matsumoto\*\* and M. Konijnenburg\*, \*imec, Netherlands and \*\*Sony Semiconductor Solutions Corp., Japan

A high-frequency wide-range 8-phase DCO with inherent injection ripple filtering is proposed and incorporated into a wideband digital PLL. The DCO in prototype PLL achieves 2.3-15.8GHz frequency range. With the multiplication factor of 260, the PLL achieves 3-15.5GHz wideband output, -59dBc far-off injection spur at 10GHz output, and -260dB FoM<sub>N</sub>.

#### **Circuits Session 20**

# **Acoustic Sensors**

Wednesday, June 11, 16:00-18:05

Chairpersons: K. Yoshioka, Keio University E. Quevy, ProbiusDX

-- ,

# C20-1 - 16:00

A -87.2 dB THD+N 89.1 dB DR Fully-Integrated Shunt-Resistor-Based In-Line Current Sensor with up to 2 MHz 14.4 V PWM Rejection, H. Ma\*, H. Zhang\*, M. Berkhout\*\* and Q. Fan\*, \*Delft Univ. of Technology and \*\*Monolithic Power Systems, Netherlands

This paper presents a cost-effective, fully-integrated shunt-resistor-based in-line current sensor offering high linearity and effective PWM rejection for audio applications. To mitigate the on-chip shunt resistor self-heating, which could severely compromise the linearity of the current sensor, a thermal compensation method is proposed and improves the THD+N by up to 23.6 dB. A floating  $G_m$  stage is proposed to reject high-frequency high-voltage (HV) PWM signals at the audio amplifier switching node. Implemented in a 180 nm BCD process, the prototype achieves a peak THD+N of -87.2 dB and a DR of 89.1 dB with up to 2 MHz, 14.4 V PWM rejection, while supporting a  $\pm 6$  A current range.



# C20-2 - 16:25

A 131-dBSPL AOP 66.3-dB SNR 105.7-µA-Standby All-Dynamic Digital Microphone with Self-Clocked Interference-Resilient Acoustic Activity Detection, Y. Chen, J. Dong, Q. Zhang, Y. Wang, B. Zhao and Y. Luo, Zhejiang Univ., China

This paper presents a 103.3dB-dynamic-range all-dynamic digital CMOS-MEMS microphone with internal acoustic activity detection (AAD). A voltage-boosting dynamic analog front end (VB-DAFE) is proposed to allow linear scaling of supply current with operating frequency and extend dynamic range (DR) under 1.8V system supply. Besides, an event driven (ED) zoom analog-to-digital converter (zoom-ADC) is proposed to handle large output swing from the VB-DAFE. The VB DAFE and zoom-ADC allow the system to operate in an all dynamic manner, enabling scalable power with operating frequency, suitable for ED operations. Under a 1.8V supply, this work achieves an acoustic overload point (AOP) of 131 dBSPL, a signal-to-noise (SNR) of 66.3 dB and a standby current of 105.7 mu A with internal self-clocked interference resilient AAD.

#### C20-3 - 16:50

A 5.2µW, 2-to-8-Channel Scalable, Speaker-Tracking Microphone Array Featuring a CNN-Defined AFE, S. Pan\*, W.-H. Yu\*, F. Tan\*\*, J. Li\*, K.-F. Un\*, R. P. Martins\* and P.-I. Mak\*, \*Univ. of Macau, Macau and \*\*Swiss Federal Institute of Technology in Lausanne (EPFL), Switzerland

This paper describes a smart microphone array with speaker-tracking capability to enhance the output voice signal-to-noise ratio (SNR). The speaker verification (SV) module identifies the target speaker in both near and far fields and intelligently adjusts the beamforming direction-of-arrival (DoA) and the analog front-end (AFE) gain for a better output voice SNR. The system chip in 28nm process is bonded with a capacitive micro-electro-mechanical system (MEMS) microphone, and it integrates an in-sensor-computing channel that can arbitrarily scale between 2 to 8, to realize a small form-factor DoA. With also our energy-efficient delay and sum (DAS) and SV circuit techniques, the system chip consumes only 5.2µW.

# C20-4 - 17:15

A 33aFrms, 3.4pF Base Capacitance, 192fF Input Range, 500kHz Sampling Frequency, Capacitance-to-Voltage Converter Using a Resonant LC Bridge, H. O. Ghiasi, S. Arjmandpour and T. Jang, ETH Zurich, Switzerland

This paper presents a resonant LC bridge technique to enhance the performance of the capacitance-to-voltage converter (CVC) by leveraging its passive gain to improve resolution and power consumption. The proposed CVC achieves 1.56mV/fF gain, 33.1aF<sub>rms</sub> resolution at 500kHz sampling rate while consuming 2.85µW with 3.4pF base capacitance, equivalent to 3.24fJ/conversion-step Walden FoM and 179.12dB Schreier FoM.

#### C20-5 - 17:40

A Temperature-Insensitive Period-Modulation CDC with DLL-Based Comparator Delay Compensation Achieving 53.5ppm/°C without Calibration, W. Youn\*, D. Youn\*, H. Seong\*, S. Ha\*\* and M. Je\*, \*KAIST, Korea and \*\*New York Univ. Abu Dhabi, United Arab Emirates

This paper presents a temperature-insensitive period-modulation (PM) capacitance-to-digital converter (CDC) with a delay-locked-loop (DLL) based compensation alleviating temperature-dependent comparator delays. Moreover, a half-period alternate-sampling technique is proposed to track the comparator offset and mitigate its variation in a swing-boosted front-end. Our 180nm CMOS prototype achieves 53.5ppm per C output variation over -40 to 85°C without any calibration or external clock. The conversion achieves the best FoM<sub>w</sub> of 6.0pJ per conversion-step among temperature-insensitive PM CDCs.

#### **Technology Session 9**

# **NAND** and **NOR**

Wednesday, June 11, 16:00-17:40

Chairpersons: S. Fujii, Kioxia Corporation

J. Yu, Sandisk

#### T9-1 - 16:00

A Schottky Junction as a Hole Injector for Enhancing Erase Operation of 3D Flash Memory in CMOS Directly Bonded to Array (CBA) Era with Over 1,000 Word Lines, S. Hashimoto\*, K. Nakatsuka\*, Y. Hizume\*, H. Takeda\*, T. Kurusu\*, K. Sakata\*, S. Yoshida\*, S. Arai\*, T. Okina\*\* and K. Sekine\*, \*KIOXIA Corp. and \*\*Western Digital Corp., Japan

This paper presents an innovative structure for 3D flash memory, integrating a Schottky junction into source and channel (CNL) contacts in the CMOS Directly Bonded to Array (CBA) architecture. This structure achieves over 1,000 times more holes than the conventional gate-induced drain leakage (GIDL) method for erasing cells. This breakthrough overcomes the limitations of word line (WL) stacking caused by insufficient hole current in future 3D flash memory.

#### T9-2 - 16:25

Low-Temperature NiSi Formation via Microwave Annealing for Stable Metal-Induced Lateral Crystallization in 3D NAND Flash Memory, O. Kwon\*, S. Lee\*, J. Seo\*, G. Han\*, D. Kim\*, S. Park\*\*, J. Lee\*\*, K. Park\*\*, S. Yang\*\*, H. Choi\*\*, W. Lee\*\* and H. Hwang\*, \*POSTECH and \*\*Samsung Electronics Co., Ltd., Korea

This study introduces a novel Metal-Induced Lateral Crystallization (MILC) method, enhanced by a low-temperature pre-Microwave Annealing (MWA) step, for 3D NAND flash memory. Conventional MILC faces uneven growth due to simultaneous nucleation and grain growth, worsened in confined Si channels because crystalline Si (c-Si) whiskers with Ni silicide nodule fronts block each other during directional growth, limiting the availability of Ni for crystallization. Therefore, we conduct a pre-MWA step to promote the formation of NiSi while suppressing MILC progression caused by the transformation into NiSi2, resulting in synchronized MILC growth during the subsequent annealing step. This method ensures stable MILC in 3D NAND flash memory, offering reliable approach for advanced memory technologies.



# T9-3 - 16:50

Wide Memory Window and Steep ISPP Slope (13.2 V and 2.7) of an Aggressively Scaled 3D Ferroelectric NAND (FeNAND) Cell for <30nm Tier Pitch Scaling, P. Sharma, K. Florent, D. Resnati, A. Mauri, J. Yue, J. Ahn, A. Fayrushin, E. Camerlenghi, P. Fantini, A. Goda, T. Kim and R. Hill, Micron Technology, Inc., USA

This study demonstrates a maximum memory window (MW) of 13.2 V and an average ISPP slope of 2.7, achieving the best-reported performance obtained in vertical Gate-All-Around (GAA) 3D FeNAND architecture. The device features a 20 nm gate length (LG) and a 42 nm layer pitch. These achievements were made possible through significant innovations in gate stack engineering. The roles of channel interlayer (CIL) and ferroelectric (FE) switching, identified as primary limiters for optimal MW and reliability, are examined through precise tunneling current and trap emission probability calculations. Retention and endurance characteristics were measured with a large MW of >8V. The trade-offs between the initial MW and reliability are discussed. The higher slope of 2.7 enables ~30% tier pitch scaling, paving the way for sub-30 nm tier pitch.

# T9-4 - 17:15

3D NOR-type FeFETs with Record Endurance of 10<sup>11</sup>, Fast Erase of 50 ns, and Immediate Read-After-Write for In-Memory Learning, Y. Zhou\*, R. Zhu\*, W. Luo\*, X. Xu\*, S. Qi\*, Z. Ning\*, L. Chen\*, H. Shao\*, K. Tang\*.\*\* and R. Huang\*.\*\*, \*Peking Univ. and \*\*Beijing Advanced Innovation Center for Integrated Circuits, China

In this work, we experimentally demonstrate a 3D NOR-type FeFET (FeNOR) array based on ALD IGO channel. With a scaled channel area down to  $0.007~\mu m^2$ , the 3D FeNOR achieves a record endurance of  $10^{11}$  cycles, 50 ns erase speed, immediate read-after-write, and minimal layer-to-layer variations. Comprehensive analysis unveils the mechanism of endurance degradation and optimization strategies. Based on multi-domain TCAD simulations, the improvement of erase speed is well-fitted with experiments and mainly attributed to blocking of percolation paths. The good density, reliability and speed make the 3D FeNOR promising for next-generation AI applications, especially in-memory learning.

#### **Technology Session 10**

#### **Advanced CMOS Platform**

Wednesday, June 11, 16:00-17:40

Chairpersons: A. Voon-Yew Thean, National University of Singapore

T. Viet Dinh, NXP

#### T10-1 - 16:00

Monolithic CFET Flow Improvements Integrating Cover Spacer and Dual-WF RMG, C. Cavalcante, S. Demuynck, C. Sheng, D. Batuk, M. Hosseini, A. Peng, K. Stiers, A. Vandooren, H. Mertens, T. Chiarella, H. Arimura, J. Mitard, V. Georgieva, H. Puliyalil, A. S. Marquez, M. Chistiakova, A. Peter, Y. Kimura, F. Sebaai, P. Puttarame Gowda, J. G. Lai, A. Arslanova, N. Reddy, A. Mingardi, S. Kumar Sarkar, R. Kumar Saroj, S. Choudhury, D. Alvarez, T. Sarkar, I. Gyo Koo, E. Altamirano Sanchez, S. Subramanian, L. P. B. Lima, N. Horiguchi and S. Biesemans, imec, Belgium

We report on various modifications made to our monolithic-CFET process flow resulting in significant improvements in drive current and CMOS survival rate (SR) (from 5% to 30%). We share our implementation of dual Work Function Metal (WFM) vertical patterning using carbon-based soft mask inside the gate trench integrating TiN and TiAl metals. The quality of the Source-Drain (SD) junctions are improved by implementation of a sacrificial Cover Spacer (CSP) using a vertically patterned sidewall ALD hardmask (HM) covering top device Si channel. Contact pattern placement accuracy is improved at tight gate pitch using a low stress film. pFET via resistance benefit from implementing a dedicated tungsten oxide (WOx) wet etch step. Finally, we discuss ideas with potential for further pFET SR improvements by re-engineering the bottom junction (BJ) and BackSide Contact (BSC) formation involving impact of backside formed bottom dielectric isolation (BS-BDI).

#### T10-2 - 16:25

NanoStack Transistor Architecture for CMOS 7A Node and Beyond, S. Reboh, C. Zhang, T. Yamashita, L. Wai Kin, R. Xie, T. Ando, U. Bajpai, J. Mazza, N. Lanzillo, E. Cho, H. Zhou, J. Strane, E. Miller, J. Satterlee, S. Fan, Y. Sulehria, M. Sankarapandian, S. Mochizuki, N. Shanker, R. Johnson, A. Chu, S. Khan, M. Malley, W. T. Tseng, R. Pujari, E. Stuckert, L. Tierney, J. Li, M. Belyansky, M. Nasseri, N. Putnam, A. Hubbard, D. Durrant, J. Fulham, D. Canaperi, S. Skordas, S.-C. Seo, O. Gluschenkov, M. Sherony, J. Wang, Y. Zhu, J. Arnold, J. Wynne, L. Meli, B. Peethala, J. Zhang, J. Tolbert, D. Dechene, G. Shahidi, D. Edelstein, R. Ramachandra, G. Dechao, V. Narayanan, N. Felix, T. Standaert, H. Jagannathan, D.-K. Sohn, H. Bu and M. Khare, IBM Research, USA

NanoStack is a sequential stacking CMOS transistor architecture featuring flexible placement of top and bottom nanosheet channels, thermally stable bottom FET gate stack, thin dielectric bonding and more. We project NanoStack with 4-track base cells to deliver ~50% area scaling, ~50% iso-power performance improvement or ~70% iso-performance power reduction with respect to the 2nm node, fulfilling fundamental requirements for a competitive multi-node CMOS architecture beyond nanosheet. We demonstrate here for the first time a manufacturable sequential integration of multi-channel nanosheet-on-nanosheet NanoStack CMOS featuring ultra-scaled vertical inter-FET isolation.



#### T10-3 - 16:50

First Experimental Demonstration of Dual-sided N/P FETs in Filp FET (FFET) on 300 mm Wafers for Stacked Transistor Technology in Sub-1nm Nodes, H. Wu, W. Bu, Y. Ge, Y. Chu, J. Sun, J. Jin, Y. Wu, Y. Ren, F. Zhou, L. Zhang, J. Wu, M. Li, J. Kang, R. Wang, X. Zhang and R. Huang, Peking Univ., China

For the first time, dual-sided devices in FFET were successfully demonstrated on 300 mm wafers, on which FFET's unique back-to-back stacking of frontside (FS) NFET and backside (BS) PFET were realized. Key process modules including wafer bonding, substrate thinning, channel profile optimization, lithography overlay correction and BS performance tuning were successfully developed. FS NFET after flipping behaved well and decent BS PFET (LG of 30 nm, SS of 73.1 mV/dec, DIBL of 24 mV and on-off ratio of 10<sup>7</sup>) was achieved, comparable to FS NFET. Thanks to the separated processes on each side of wafer, outstanding features of FFET were further validated: natural split-gate, good multi-Vt tunablity (~ 500 mV) and better process margin without the challenging vertical patterning in CFET. Moreover, dual-sided CMOS was also realized, proving FFET's excellent extendibility. With great benefits in process-friendliness, design flexibility and scalability, FFET is a crucial candidate beyond 1nm.

#### T10-4 - 17:15

First Demonstration of Monolithic 3-Tier Nanosheet Transistor Stacking with Split Gate Featuring Tri-State Inverter/Half SRAM Functionalities, B.-W. Huang, Y.-Q. Liu, C.-W. Yao, W.-J. Chen, M.-K. Lin, X.-Y. Lin, C.-Y. Cheng, Y. Huang, D.-W. Lin, C.-H. Lu, T.-H. Tsai and C. W. Liu, National Taiwan Univ., Taiwan

Monolithic three-tier transistor stackings with GeSi nanosheets, split gate, and multiple pn layer isolation are demonstrated for the first time. The split-gate process enables one common-gate CFET and one nFET underneath to achieve 6T STAM design with 3T footprint. The functionalities of tri-state inverters and half-SRAM (1PD/1PU/1PG) of monolithic three-tier transistor stacking are successfully demonstrated.

#### **Technology Session 11**

## **Ferroelectric Materials for Memory Applications**

Wednesday, June 11, 16:00-18:05

Chairpersons: S. Chang, PSMC

M. Kobayashi, The University of Tokyo

#### T11-1 - 16:00

Revealing Wake-Up Mechanism In Ultra-Thin Ferroelectric HZO: Domain De-Pinning Triggered by Oxygen Vacancy Annihilation Exhibiting Optimal Wake-Up Frequency, K. Ito, M. Takenaka, S. Takagi and K. Toprasertpong, The Univ. of Tokyo, Japan

We present a new comprehensive model describing wake-up mechanism in ferroelectric  $H_{0.5}Zr_{0.5}O_2$  (HZO), a key chal-lenge in reducing operating voltage with sub-6 nm films. Strong frequency and temperature dependences suggest that thermally-activated oxygen migration driven by electric-field cycling annihilates vacancies and results in domain de-pinning. It is found that oxygens inside HZO films exhibit dispersive transport rather than conventional drift-diffusion transport. Our insight reveals that fast wake-up can be achieved at the optimum frequency that matches with the oxygen migration oscillation frequency.

# T11-2 - 16:25

Record-high *Pr* (2*Pr* > 40 μC/cm²) in 3 nm (Physical) Ferroelectric HZO Annealed at 450°C: High-T (85°C) Electrical Cycling and Oxygen Vacancy Engineering, Y. Feng\*.\*\*, X. Wang\* and X. Gong\*, \*National Univ. of Singapore, Singapore and \*\*Shandong Univ., China

A remarkable improvement in remnant polarization (2Pr), from below 5 to over 40, has been achieved in an ultra-thin 3 nm (physical) ferroelectric (FE) HZO layer, facilitated by increasing oxygen vacancies and high-temperature electrical cycling (HTEC). Notably, this enhancement persists upon returning to room temperature (RT). Through systematic characterization, the achievement of record-high 2Pr of physical 3 nm can be attributed to oxygen vacancies-assisted phase transition from initially dominant tetragonal (t)-phase to FE orthorhombic (o)-phase. By performing first-principles calculations, we unveil that the augmentation of Vo²+ during the HTEC process plays a key role for driving the t-to-o phase transition by shifting the free energy of t&o phase and eventually making o-phase more stable than t-phase.

#### T11-3 - 16:50

Innovative Nb Electrode Engineering for Ultra-Low-Voltage ( $V_{op}$  = 0.8 V) Ferroelectric Memory with Record-High Energy Efficiency: Applications in Selector-Free FeRAM and Neuromorphic Computing, C.-H. Wu\*, C.-H. Chang\*\*, Y.-M. Tseng\*, T.-Y. Lin\*\*\*, K.-H. Kao\*\*, C.-J. Su\*\*\* and V. P.-H. Hu\*, \*National Taiwan Univ., \*\*National Cheng Kung Univ. and \*\*\*National Yang Ming Chiao Tung Univ., Taiwan

BEOL-compatible 5-nm Hf $_{0.5}$ Zr $_{0.5}$ O $_2$  (HZO) ferroelectric (FE) devices achieve a record-breaking energy efficiency 2Pr-to-operation voltage (V $_{op}$ ) of 43, delivering 2Pr of 34.4  $\mu$ C\*cm² at an ultra-low V $_{op}$  of  $\pm 0.8$  V, through Nb metal engineering. The Nb approach demonstrates remarkable reliability with an only 4.5% 2Pr loss after 10 years at 85 °C and an estimated 10.8% degradation over 1E15 operation cycles. Furthermore, the Nb-integrated selector-free FeRAM exhibits nearly disturbance-free performance (1%), and outstanding potential for neuromorphic computing, characterized by superior linearity and a highly symmetric factor.



#### T11-4 - 17:15

Stacked AFE-Like/FE HZO (4.5nm) to Achieve 0.75V Operating Voltage and Record Endurance Exceeding 7E12 Using Water Quenching and TiN Top Electrodes, J.-Y. Liang, Y.-R. Chen, G.-H. Chen, Y.-W. Chen, Y.-A. Chen, B.-H. Yu, Y.-J. Chen and C. W. Liu, National Taiwan Univ., Taiwan

Ultra-low operating voltage ( $V_{op}$ ) and highly reliable metal-ferroelectric-metal (MFM) capacitors utilizing stacked AFE-like(A-L)/FE with water quenching are demonstrated. The stacked A-L/FE has 22% reduction in 2Vc (1.4V) in major loops than our previously reported beta-W/FE/beta-W [1] with the same thickness of 4.5nm and the similar  $2P_r$  (57 vs  $58\mu C/cm^2$ ). Water quenching further decreases 14% 2Vc (1.2V) due to the smaller grain size than slow cooling. The TiN top electrode is used to preserve A-L in the stacked structure after wake-up. With 0.75V Vop in minor loops, the TiN/A-L/FE/beta-W capacitor with water quenching reaches record fatigue-free endurance exceeding 7E12 cycles with  $2P_r$  of  $30\mu C/cm^2$ 

#### T11-5 - 17:40

First Demonstration of Annealing-free RT-prepared AlScN Film with Large Polarization ( $2P_r > 300 \mu C/cm^2$ ) and Ultra-Sharp  $E_c$  Distribution for 0T1C FeRAM, X. Zhao\*, J. Yu\*, Y. Li\*, Y. Wang\*\*, F. Cao\*, Y. Cheng\*\*, Y. Wei\*, H. Jiang\*, Q. Liu\* and M. Liu\*, \*Fudan Univ. and \*\*East China Normal Univ., China

Selector-free 0T1C FeRAM requires ferroelectric (FE) films with high polarization ( $2P_r$ ) and sharp coercive field ( $E_c$ ) distribution. In this work, we present the first demonstration of a 1kb 0T1C FeRAM based on the optimized AlScN FE film, through electrode engineering and interface optimization. An annealing-free room-temperature (RT)-prepared AlScN epitaxial film with high-quality (001) orientation is achieved, resulting in a large  $2P_r > 300 \ \mu\text{C}$  cm² and an ultra-sharp  $E_c$  distribution (record alpha of 0.045). The sharp Ec allows superior disturb immunity under the  $V_{dd}$  2 read scheme for array operations, even at an elevated 300°C. Furthermore, our circuit model proposes that the optimized AlScN FE film can support much larger array size compared to other ferroelectrics, thanks to its enhanced  $2P_r$  and sharper  $E_c$  distribution. This work underscores the great potential of AlScN-based FE films for high-capacity emerging memory technologies.



#### Innovations in Brain State Classification

Thursday, June 12, 8:30-10:10

Chairpersons: J. Yoo, Seoul National University

S. Brink, Texas Instruments

#### C21-1 - 08:30

PANDA: A 3.178 TOPS/W Reconfigurable Seizure Prediction And Detection Neural Network Accelerator for Epilepsy Monitoring, S. Qiu\*, X. Song\*, X. Song\*, X. Song\*, X. Song\*, J. Yang\*\*, W. Wang\*\*\*, Y. Yang\*.\*\*\* and H. Jiao\*, \*Peking Univ., \*\*Nanfang Hospital of Southern Medical Univ., \*\*\*Southern Univ. of Science and Technology and \*\*\*\*Peking Univ., China

PANDA, a reconfigurable seizure prediction and detection neural network accelerator, is presented. A lightweight two-stage seizure monitoring framework with temporal neural network splitting is proposed to be deployed on PANDA. Channel first-output stationary dataflow with zero activation skipping and weight cache with statistical information are employed for higher energy efficiency. A flexible instruction set is defined to make PANDA highly configurable. For seizure monitoring, PANDA achieves up to 99% sensitivity, 0.43/h false alarm rate (FAR), and 3.178 TOPS/W energy efficiency.

# C21-2 - 08:55

A Closed-Loop Neuromodulation Chipset with 0.0009mm²-0.36µW/Ch Recording Frontend and 0.075mm²-6.76µW Seizure Classification Backend, X. Huang, H. Yassin, B. Lafuente Alcazar, A. Akhoundi and D. G. Muratore, Delft Univ. of Technology, Netherlands

This work presents a closed-loop neuromodulation chipset, integrating 64 analog frontends (AFE) featuring a novel IDAC-embedded OTA, a lightweight feature extraction unit, a Sparse Projection Oblique Randomer Forest (SPORF) classifier, and 4 high-voltage (HV) compliant current stimulators. Each AFE occupies  $0.0009 \text{mm}^2$  and consumes  $0.36 \mu\text{W}$ , the smallest area and lowest power reported to date for ECoG recording (<=1kHz BW). Verified with CHB-MIT and ETHZ databases, the digital backend (DBE) occupies  $0.075 \text{mm}^2$ , consumes  $6.76 \mu\text{W}$  and achieves event-based sensitivity and specificity of 94.93%/98.58% and 99.55%/99.97%, respectively, in patient-specific scenarios.

#### C21-3 - 09:20

A No-Patient-Data Seizure Classifier SoC for Real-Time Classification of Seven Seizure Types Using Feature Fusion and Near-Memory Computing, V. Lukito\*, E. Choi\*, S. Lee\*, J. Koo\*, I.-J. Chang\*\*, S. Ha\*\*\* and M. Je\*, \*KAIST, \*\*Kyunghee Univ., Korea and \*\*\*New York Univ. Abu Dhabi, United Arab Emirates

The proposed multi-seizure-type classifier (MSTC) SoC is the first on-chip classification of multiple seizure types, integrating frequency-based feature fusion (FF) and digital ternary near memory computing (NMC) to achieve a classification accuracy of 91.63% with 24.72 µs computational latency. The MSTC SoC is also the first true zero-shot seizure detection system requiring no patient data for retraining, achieving 93.07% sensitivity and 94.23% specificity.

### C21-4 - 09:45

A 32-Channel 196-µW Logarithmic SoC for Brain Network Connectivity Extraction and Adaptive Psychiatric Symptom Classification, D. Alex\*, A. Yadav\*, J. Joo\*, U. Shin\*.\*\*, A. Afzal\*, J. Liu\*, G. Diehl\*\*\*, A. S. Widge\*\*\* and M. Shoaran\*, \*Swiss Federal Institute of Technology in Lausanne (EPFL), Switzerland, \*\*Apple, Inc. and \*\*\*Univ. of Minnesota, USA

This paper presents a 32-channel brain-computer interface SoC for closed-loop deep-brain stimulation in psychiatric disorders. The SoC integrates a 32-channel analog front-end, an efficient logarithmic feature extraction engine, a neural additive model classifier with online update, and a multimode stimulation controller. Fabricated in 65nm CMOS, the 5.46 mm² SoC consumes 6.14 µW channel, enabling accurate, low-power, and adaptive detection of psychiatric states.

# **Circuits Session 22**

#### **OTP and Nonvolatile Memory**

Thursday, June 12, 8:30-10:10

Chairpersons: S. Takase, KIOXIA Corporation

C. O'Connell, TSMC NA

# C22-1 - 08:30

A 2nm Gate-All-Around 128Kb Anti-Fuse One-Time Programmable Memory Featuring Dynamic Bit-Line and Sense-Amplifier Offset Cancellation, S. Shin, J. Sin, J. Lee, G. Kwon, S. Lee, K. Lee, S. Ki, H. An, M. Kim, H. Bang, J. Jung and S. Baek, Samsung Electronics Co., Ltd., Korea

Decreasing read sensing margin in non-volatile memory (NVM) has become a major challenge in the advanced nodes due to the device down-scaling, lower VDD, and higher gate leakage. We propose a dynamic bit-line (BL) and sense amplifier (SA) offset cancellation read scheme for an anti-fuse one-time programmable (OTP) memory with canceling both offset simultaneously to increase read sensing margin. We fabricated a CMOS logic-compatible 2nm gate-all-around (GAA) process, 128Kb anti-fuse OTP for the first time and is silicon-proved with 100% yield.



# C22-2 - 08:55

GAA Backside-Power eFuse with 0.72um2 Bitcell, 1.59V Field Program, On-Demand Read and 1.8V Standby, Z. Chen, A. Sanne, R. Ma, Y.-f. Chang, S. A. Hutchins, M. M. Hasan, S. H. Kulkarni and E. A. Karl, Intel Corp., USA

A 4Kb eFuse IP featuring the smallest 0.72um2 bit cell is presented in the first Gate All Around (GAA) and backside power delivery 18A class process. A low leakage integrated power switch is developed to support an always-ON 1.8V rail with power sequence independent nominal and 1.8V rails. The IP enables 1.59V charge-pump-free in-field programming and 0.58V 1e9 cycles for on-demand read from -40°C to 125°C with a raw bit fail rate <3 DPM (defect per million), achieving 100% post-repair die yield with up to one billion bits per die

## C22-3 - 09:20

A 4.2 Gb/s 5<sup>th</sup> generation F-chip of Toggle 5.1 Specification with All-Path Speed Boosting Scheme and SCA Protocol for High Density NAND Flash Applications, K. Park, J. Jeong, S.-J. Park, D. Lee, S. Lee, S. Kang, Y. Kang, S. Ko, Y. Kim, J.-H. Boo, M. Lee, E. Lee, G. Cha, Y. Kim, J. Lee, S. Kang, I. Cho, M. Lee, Y. Kim, S.-J. Go, M. K. Jung, K. Kim, E. S. Lee, Y. Jo, J.-J. Park, C. Yoon, M. Kwak, H. Oh and S. Won, Samsung Electronics Co., Ltd., Korea

This paper introduces a 4.2 Gb/s 5<sup>th</sup> generation F-chip operating at a 1.1V supply voltage. To achieve high-speed performance even in high-capacity memory packages, a speed boosting scheme is proposed to ensure optimal write and read performance. This scheme includes technologies for correcting the duty cycle, offset and skew of DQ and DQS signals. To mitigate leakage in the devices, a power gating technique with whole of chip is also introduced. Also, a SCA converter design is presented to overcome command speed bottlenecks for improvement of I/O efficiency. This proposed work achieves a 37% reduction in energy/bit, with a 1.4x speed improvement, and a 92% reduction in standby current through power gating.

# C22-4 - 09:45

A Prototype 16Mbit RRAM on 55nm BCD with 56% Compact-Area Wordline Driver and Constant Write-Current Scheme for Automotive 150°C Operation, K. Santo\*, M. Nakayama\*, R. Mochida\*, H. Suwa\*, M. Tamura\*, A. Mangyo\*, K. Nii\*, Y.-C. Lin\*\*, C. F. Wang\*\*, W.-C. Tsai\*\*, Y.-D. Chih\*\* and T.-Y. J. Chang\*\*, \*TSMC Design Technology, Japan and \*\*TSMC, Taiwan

A prototype 16Mbit embedded RRAM macro is implemented on 55nm BCD technology targeting automotive applications. This macro implements a compact wordline driver (WLDRV) circuit using thin oxide 1.2V core MOSs and thick oxide 5V MOS devices reducing area without speed penalty. To support automotive 150°C operation, it also uses master and local current limiters and local write drivers with wide-range temperature compensation. It realizes constant cell current anywhere on a die and stable operation at 150°C. The macro density is 2.56 Mbit/mm² and operation at 150°C with 1K endurance has been achieved.

#### **Circuits Session 23**

## **Innovatie Computing Systems**

Thursday, June 12, 8:30-10:10

Chairpersons: Y. Lee, KAIST

F. Sheikh, Altera, An Intel Company

## C23-1 - 08:30

A 0.71nJ, 1.53GS/s Throughput 256-FFT Using Floating Point Analog Computation, J. Chang, J. Lee, Y. Shen, S. Yun, Q. Zhang, D. Sylvester, H.-S. Kim and D. Blaauw, Univ. of Michigan, USA

We propose a 256-point FFT engine using the first analog floating point implementation. The proposed method encodes the mantissa value with both voltage and pulse width and a digital 4-bit exponent. It features automatic calibration to cancel process variation and attains precision and dynamic range equivalent to 12.3 bit digital fixed point. Implemented in 22nm CMOS, the chip demonstrates the first reported sub-nJ energy/FFT (0.71 nJ) at a high throughput of 1.53 GS/s.

# C23-2 - 08:55

An OFDMA Baseband Processor Enabling 165µW Long-Range IoT Localization, A. Bejarano-Carbo, C.-W. Tseng, D. Komma, P. Abillama, Z. Fan, D. Sylvester, H.-S. Kim and D. Blaauw, Univ. of Michigan, USA

We present an OFDMA baseband localization processor fabricated in 22nm, co-designed with a low-power, crystal-less narrowband RF receiver chip that efficiently estimates the channel frequency response in real-time for localization. The localization processor consumes 1.52/2.04mW for 2/3D localization processing, and duty-cycles the RF receiver achieving 247nW standby power. The combined system achieves a localization distance of 430m, a 4.3x improvement over state-of-the-art, while maintaining comparable accuracy. We deploy this system in a  $1 \times 2.8$  cm loT localization tag with  $165\mu$ W average power at 0.1 Hz location acquisition rate.

## C23-3 - 09:20

A 28nm 84.9KOPS 1.82μJ/op RISC-V Crypto-SoC with Primitive-based Deep-coupling Unified Post-Quantum Engine, J. Lu, D. Liu, J. Zhang, T. Huang, K. Li, C. Wu, L. Chen, A. Hu, Z. Luo and X. Zou, Huazhong Univ. of Science and Technology, China

This paper presents the first RISC-V cryptographic System-on-Chip (SoC) that supports the latest NIST-released Federal Information Processing Standards (FIPS) for post-quantum cryptographic (PQC). It has three key features: 1) RV-dedicated deep-coupling post-quantum engine with dual-rail parallel scheduling. 2) Vectorized instruction-driven computational workflow. 3) Primitive-based fine-grained operator design for multi-scheme reconstruction. The proposed crypto-SoC achieves a throughput of 84.9KOPS and an energy efficiency of 1.82e-6J/op in 28nm, demonstrating 1.22-5.84x and 2.41-4.68x improvements in throughput and energy efficiency, respectively, compared to the state-of-the-art design.



## C23-4 - 09:45

Enabling Privacy-Preserving Collective Intelligence: A Twin In-Memory Encryption/ Processing Macro Featuring Group Differential Privacy and Spatial-Temporal Ensemble, B.-C. Chiou\*.\*\*, C.-S. Lin\*.\*\*, Y.-T. Ho\*, Y.-H. Lin\*, I.-T. Wu\*, K.-H. Tseng\*, C.-M. Lai\*, P.-I. Mei\*, S.-H. Li\*, S.-M. Chang\*, S.-S. Sheu\*, W.-C. Lo\*, S.-C. Chang\* and T.-H. Hou\*\*, \*Industrial Technology Research Institute and \*\*National Yang Ming Chiao Tung Univ., Taiwan

Privacy protection is a critical concern in AI, particularly in collective AI systems such as AIoT networks and federated learning. To address correlation attacks involving multiple data sources, this study presents a 22 nm twin in-memory encryption/processing (TIME) macro that integrates group differential privacy (GDP) and spatial/temporal ensemble (STE) techniques. This design enables efficient privacy protection while maintaining high accuracy, achieving 82% prediction accuracy for federated vital signs monitoring at a privacy protection strength of 2.3. With a 1Mb capacity, the system delivers a 10.49 TOPS throughput and achieves an energy efficiency of 5827 TOPS/W, highlighting its strong potential for privacy-preserving collective AI applications.

## **Technology Session 12**

# Oxide Semiconductors 2: IWTO and In<sub>2</sub>O<sub>3</sub>

Thursday, June 12, 8:30-10:10

Chairpersons: K. Tateiwa, TPSCo

J. Cai, tsmc

## T12-1 - 08:30

First Demonstration of BEOL-compatible ALD-deposited 2 nm-thick Indium-Tungsten-Tin-Oxide (IWTO) TFTs with Superior Short-channel Electrical Characteristics: Achieving Enhancement-mode V<sub>TH</sub>, I<sub>ON/OFF</sub>> 10<sup>10</sup>, SS ~ 63.3 mV/dec., T.-C. Chiang\*, Y.-C. Chang\*, C.-R. Huang\*\*, C.-H. Hsu\* and P.-T. Liu\*, \*National Yang Ming Chiao Tung Univ. and \*\*National Tsing Hua Univ., Taiwan

This work successfully demonstrates ALD-derived ultra-thin amorphous indium-tungsten-tin-oxide (a-IWTO) TFTs with a channel thickness of 2 nm. Two distinct a-IWTO compositions were systematically investigated, resulting in devices with exceptional electrical performance including a high on/off current ratio exceeding 10<sup>10</sup>, a low subthreshold swing of 63.3 mV/dec, a small DIBL of 37.8 mV/V, enhancement-mode operation, and negligible Vth roll-off following oxygen annealing at a short channel length of 70 nm. These results demonstrate the competitive performance of IWTO TFTs compared to state-of-the-art amorphous oxide semiconductor devices.

#### T12-2 - 08:55

ALD Polycrystalline Ga-Doped In<sub>2</sub>O<sub>3</sub> (Poly-IGO) Nanosheet Exceeding Intrinsic Mobility of 120 cm<sup>2</sup>/Vs for Process-Friendly BEOL-Compatible FET Application, T. Takahashi\*, T. Hoshii\*\*, Y. Tsuruma\*\*\*, M. Sunagawa\*\*\*, S. Tomai\*\*\*, J. Park\*\*, H. Tamamoto\*\*, K. Kakushima\*\* and Y. Uraoka\*, \*Nara Institute of Science and Technology, \*\*Institute of Science Tokyo and \*\*\*Idemitsu Kosan Co., Ltd., Japan

We have evaluated ALD polycrystalline Ga-doped  $In_2O_3$  (poly-IGO) nanosheets and investigated their thickness scaling and electron transport properties through FET operation. Ga-doping to  $In_2O_3$  enhances the FET performance as well as patterning properties. A 5-nm-thick poly-IGO showed a high mobility of 120 cm²/(Vs) with a low contact resistivity of 9.2 \*  $10^{-7}$  Ohm cm². The poly-IGO exhibited excellent thickness scaling down to ~3 nm keeping the intrinsic mobility of 100 cm²/(Vs). The mobility of poly-IGO is predominantly degraded by tensile lattice strain. A strain-free lattice structure obtained with thin channel (<10 nm) ALD poly-IGO nanosheets are advantageous for 3D-integrated devices.

## T12-3 - 09:20

Critical Role of Quantum Confinement on Transfer Length in Achieving High-Performance In<sub>2</sub>O<sub>3</sub> Transistors with Ultra-Scaled Contacted Gate Pitch, J.-Y. Lin, C. Niu, Z. Lin, C. Liu, J. Lu, H. Wang and P. D. Ye, Purdue Univ., USA

In this work, for the first time, we study the contact length ( $L_{\text{C}}$ ) and contacted gate pitch (CGP) scaling in ultrathin  $In_2O_3$  field-effect transistors (FETs). A large 53% decrease in transfer length ( $L_{\text{T}}$ ) from 76 to 36 nm can be observed by increasing the  $In_2O_3$  channel thickness ( $T_{\text{ch}}$ ) from 1.2 nm to 2.0 nm, which can be understood by the positive-to-negative Schottky barrier height transition modulated by the quantum confinement (QC) effect in  $In_2O_3$  channel. Leveraging the record-low  $L_{\text{T}}$  of 36 nm optimized by the QC, 2.0 nm  $In_2O_3$  FETs demonstrate a record-low contact resistance ( $R_{\text{C}}$ ) of 140 ohm\*um and a record-high maximum drain current of 1.57 mA/µm at ultra-scaled CGP of 80 nm among all reported oxide semiconductor FETs with CGP scaling.

# T12-4 - 09:45

First Demonstration of 2-floor GAA In<sub>2</sub>O<sub>3</sub> Nanosheet FET Enabled by TiN Sacrificial Layers and Fluorine Passivation, Y.-S. Wu, Y.-M. Liu, H.-M. Sung, R.-W. Ma, J. Gracia, H. Fujiwara, K.-W. Lu and C. W. Liu, National Taiwan Univ., Taiwan

The true two-floor stacked PEALD (plasma-enhanced atomic layer deposition)  $In_2O_3$  gate-all-around nanosheet FET is demonstrated for the first time by the novel metallic sacrificial layers (MSLs, TiN). MSLs connect stacked S/D and form  $n^+$  region to improve contact resistance. Channel release by SF<sub>6</sub> can shift  $V_T$  positively by passivating oxygen vacancy defects. The gate stack is deposited all at once by ALD to realize the GAA structure. The process temperatures are below 300 degrees Celsius with the back-end-of-line (BEOL) compatibility. The two-floor GAA NSFET achieves high drive current ( $I_{on}$ ) of  $146\mu A/\mu m@V_{OV}=V_{DS}=1V$ ,  $I_{on}/I_{off}>10^7$ , and high stability of positive bias stress with the  $V_T$  shift as small as 60mV at  $V_{OV}=3.6V$  for 1000sec. Furthermore, adding IGZO into  $In_2O_3$  channels achieves a positive  $V_T$  of 1.6V with  $I_{on}$  of  $77\mu A/\mu m$  at  $V_{OV}=V_{DS}=1V$ . The stacked NS architecture enables oxide semiconductor FETs to overcome  $I_{on}$  limitations due to low mobility.



## **Technology Focus Session 2**

#### **Advanced Transistor Evolution in the Next Decade**

Thursday, June 12, 8:30-10:10

Chairpersons: T. Hou, National Yang Ming Chiao Tung University

S. Tsai, IBM

## TFS2-1 - 08:30 (Invited)

Assessment on Nanosheet Transistor Variants Beyond 2nm Node, C.-H. Chang, P. C. Shen, V. S. Chang, K. T. Lai, C. S. Liang, B. F. Wu, T. C. Lin, J. A. Ng, C. Y. Chen, C. J. Lin, J. H. Lu, F. H. Su, C. P. Tsao, C. T. Li, Y. H. Chen, C. H. Hsieh, Y. H. Tseng, P. N. Chen, T. L. Lee, R. Chen, M. C. Chang, K. B. Huang, C. O. Chui, K. S. Chen, C. C. Chen, Y. Ku, S. M. Jang and S.-Y. Wu, TSMC, Taiwan

Nanosheet (NSH) transistor benefits from its gate-all-around nature and the freedom to reduce channel thickness without pattern collapse as in FinFET to achieve the required SCE for Lg scaling. Nanosheet (NSH) transistor benefits from its gate-allaround nature and the freedom to reduce channel thickness without pattern collapse as in FinFET to achieve the required SCE for Lg scaling. The possibility of stacking extra channel layers to increase effective channel width and the improved immunity to gate extension effect provide NSH and its variant forksheet (FSH) with a distinct advantage in aggressive cell height (CH) scaling. NSH is thus the next architecture adopted before complementary field-effect transistor (CFET) is ready. This paper explores the issues encountered in CH scaling and potential integration strategies to advance NSH and FSH beyond 2nm node.

## TFS2-2 - 08:55 (Invited)

Beyond RibbonFET: Energy Efficiency Innovations to Drive Technology and Design for the Next Decade, C.-H. Lin, B. Greene, H. Li, R. Kim, J. Kavalieros and T. Ghani, Intel Corp., USA

As the semiconductor industry adopts major technology advances of RibbonFET and Backside Power, innovation continues unabated to build upon these breakthroughs and drive ever more performance, energy efficiency, and area scaling well into the future. Transistor geometry, materials, and connectivity improvements dominate near-term silicon roadmaps, but incorporation of new transistor physics to enable sub-0.3V operating voltages will be needed once the classical Gate-All-Around (GAA) era reaches full maturity. This paper explores the potential process and architecture elements to enable future technology nodes with energy efficiency.

#### TFS2-3 - 09:20

3D Stacked FET (3DSFET) Logic and SRAM Technology Featuring Single Diffusion Break (SDB) and Back Side Interconnect (BSI) at 48 nm CPP for Advanced Mobile and High Performance Computing (HPC) Applications, D. Ha, J. Park, J. Park, S. Han, D. Yeon, M. Kim, S. Park, J. Yun, K. Hwang, J. Park, J.-W. Yang, S. Lee, J. W. Jeong, C. Yun, J. Bae, D. Huh, H. Choi, S. Baik, S. Ji, H. Park, J. Seo, J. Koo, Y. Choi, H. Yoon, A. Kim, S. Son, S. Han, S. Lee, G. Kim, S. Han, S. H. Lee, S. W. Park, S. Hyun, S. J. Ahn and J. Song, Samsung Electronics Co., Ltd., Korea

For the first time, we report experimental demonstration of highly scalable 3D stacked FET(3DSFET) logic with single diffusion break(SDB) and SRAM bit-cell at 48 nm CPP. Key features include nFET gate-all-around multi-bridge channel(GAA MBCFETTM) vertically stacked on pFET GAA MBCFETTM for reducing logic standard cell height by > 50%, SDB for reducing logic block area without pitch scaling by > 10%, self-aligned direct backside contact(SA-DBC) and back side interconnects(BSI) for power and signal lines with relaxed metal pitch, compared to our 3nm node. Moreover, it can provide > 55% smaller SRAM bit-cell with read static noise margin(SNMR) & write margin(WRM) of 180 & 250 mV, 60 & 50 mV for VDD = 0.7 and 0.3V, respectively. At the same time, SRAM bit-cell leakage decreases by 15.4% at 125°C with unique structural benefits to fully suppress leakage current such as sub-nanosheet, n-p isolation, GIDL and junctions.

## TFS2-4 - 09:45

Compressive Diffusion Break Stressor for Gate-All-Around Nanosheet pFET Transistor Performance Improvement, S. Hung\*, S. Mochizuki\*\*, A. Pal\*, X. He\*\*, C. Zhao\*, E. Bazizi\*, H. Zhou\*\*, J. Li\*\*, A. Tariq\*\*, A. Gasasira\*\*, V. Chen\*, H. Chen\*, A. Londono Calderon\*, B. Peethala\*\*, P. Anekal\*, N. Loubet\*\*, B. Colombeau\* and B. Haran\*, \*Applied Materials, Inc. and \*\*IBM Albany NanoTech, USA

In this work, we have developed a compressive SiN (c-SiN) diffusion break (DB) dielectric stressor for gate-all-around nanosheet (GAA-NS) transistor to improve the pFET device and reduce intrinsic nFET/pFET performance offset in this technology. A significant amount of stress in short channel (SC) Si pFET devices is induced through DB gate replacement using compressive SiN (c-SiN) with customized treatment. We report an additional stress in SC NS pFET devices post channel release of ~700MPa induced by cSiN DB stressors, leading to a corresponding leff-loff performance benefit of 25% on pFET logic devices at scaled CPP with no degradation of the short channel effects and reliability. As expected, the improvement is greater as the devices are closer to the c-SiN DB stressor with a strong dependence on the active length.



## **Technology Session 13**

#### **Power Devices**

Thursday, June 12, 8:30-10:10

Chairpersons: M. Kanda, Toshiba Electronic Devices & Storage Corporation

C. Huang, Intel

#### T13-1 - 08:30

3-kV GaN Smart Power Integration Platform for High-Power-Density Conversion Systems Using Charge-Balanced Superjunction Technology, J. Yang\*, S. Liu\*, J. Yu\*, J. Cui\*, H. Chang\*, T. Li\*, Y. Lao\*, X. Yang\*, X. Liu\*\*, M. Wang\*, B. Shen\* and J. Wei\*, \*Peking Univ. and \*\*Tsinghua Univ., China

This work reports a 3-kV smart GaN power integration platform on sapphire substrate based on charge-balanced superjunction (SJ) technology. The platform provides a variety of low-voltage (LV) components for controlling/driving circuitry, including resistors, capacitor, LV enhancement-mode HEMT (LV E-HEMT), depletion-mode transistor (LV D-HEMT), LV-rectifier. For high-voltage (HV) power loop circuitry, the platform provides 3-kV E-mode surperjunction HEMT (HV SJ-HEMT) and superjunction rectifier (HV SJ-rectifier). The proposed superjunction structure leads to a 1.6 times enhancement in breakdown voltage (BV) and a lower specific ON-resistance ( $R_{SP}$ ). The 3 kV HV devices present a BV > 6.5 kV, leaving a large operation margin. The figure of merit (FOM =  $BV^2/R_{SP}$ ) of HV devices is among the best in literature.

#### T13-2 - 08:55

Improving Irradiation Reliability of 4H-SiC 1200V LDMOS and 20V CMOS Logic Circuits with Leakage Current Blocking Technology, J. Ma\*, Y. Gu\*, M. Zhao\*, T. Nie\*, J. Wei\*, S. Li\*, R. Huang\*\*, S. Bai\*\*, S. Liu\*, W. Sun\* and L. Zhang\*, \*Southeast Univ. and \*\*Nanjing Electronic Devices, China

Replacing silicon integrated circuits (ICs) with SiC ICs can significantly enhance the radiation hardness of power electronic systems. High voltage lateral DMOSFET (LDMOS) and logic circuits are two key elements that make up large-scale power ICs. For Total Ionizing Dose (TID) irradiation, the positive charge trap generated after irradiation induces unexpected leakage paths, resulting in premature breakdown of LDMOS and degraded output voltage swing of logic circuits. For single-event (SE) irradiation, increased leakage current leads to the latch up, and then device/circuit failure. An economical reinforcement technology with floating N+/P+ rings and P+ strips is proposed. Breakdown voltage (*BV*) of the SiC LDMOS with the reinforcement technology can be increased from 692V to 1250V at TID = 300kGy. For reinforced logic circuit with full voltage swing, x3 and x1.44 improvements in TID and SE irradiation endurances can be obtained, respectively, enabling robust operation in aerospace and nuclear applications.

## T13-3 - 09:20

High Power/PAE (27.8dBm/66%) Emode GaN-on-Si MOSHEMTs for 5V FR3 UE Applications, A. Alian\*, S. Yadav\*, R. ElKashlan\*, A. Sibaja-Hernandez\*, H. Yu\*, S. Banerjee\*, B. O'Sullivan\*, B. Kazemi Esfeh\*, U. Peralagu\*, B. Parvais\*.\*\* and N. Collaert\*, \*\*, \*imec and \*\*Vrije Universiteit Brussel, Belgium

Enhancement mode (Emode) GaN MOSHEMTs with high-polarization charge density barriers (AIN, InAIN) and regrown source/drain are investigated for low-voltage 6G FR3 power amplifiers. We demonstrate high output power and PAE of ~27.8dBm (~1W/mm) and 66%, respectively, at 13GHz for 115 nm gate length Emode MOSHEMTs using an InAIN top barrier and an AIGaN/cGaN composite back barrier (BB). Furthermore, a record low total contact resistance of 0.024 ohm.mm is demonstrated by incorporating highly n-doped selectively regrown (In)GaN layers on GaN channel, which, according to TCAD simulations, can further boost the output power to 2W/mm.

## **Circuits Session 24**

# **Circuit Techniques for Biomedical Applications**

Thursday, June 12, 10:30-12:10

Chairpersons: T. Tokuda, Institute of Science Tokyo

P. Nadeau, Analog Devices

# C24-1 - 10:30

An Active Silicon Perforated MEA for Seamless 3D Organoid Interfacing with Low-Noise, Scalable Multimodal Electrophysiology, A. Rivero-Cortazar\*.\*\*, J. Aymerich\*, S. Faizan Shaikh\*, A. Lodi\*, B. Raducanu\*, G. Gielen\*.\*\* and C. Mora Lopez\*, \*imec and \*\*KU Leuven, Belgium

We present the first active silicon perforated MEA for 3D organoid interfacing, integrating CMOS electronics for low-noise, high-resolution recording, stimulation, and impedance spectroscopy (EIS) across 7 modalities. The MEA, fabricated in 130nm CMOS technology, features a scalable 256-island mesh with multiplexed operation, achieving low input-referred noise (9.1±1.5μV<sub>rms</sub>, 300Hz-10kHz) and low power (11.3μW per island). *In vitro* tests with cardiomyocytes demonstrate accurate recordings, network propagation mapping, and intracellular recordings via voltage stimulation. This perforated MEA offers unparalleled functionality and scalability for advancing organ-on-chip research.



## C24-2 - 10:55

A 10kHz-BW, 86.7dB-SNDR, 176.8dB-FoM, LNA-Embedded CT ΔΣ ADC for Closed-Loop Neural Recording, S. Lee\*, J. Yoon\*, T. Jeon\*, D. Ahn\*, S. Yun\*\*, H. Y. Kim\*, J. Bae\*\*.\*\*\* and Y. Chae\*.\*\*\*, \*Yonsei Univ., \*\*Kangwon National Univ. and \*\*\*XO Semiconductor Inc., Korea

This paper presents an LNA-embedded CT Delta-Sigma ADC for closed-loop neural recording. The frontend LNA is embedded in the loop filter of  $\Delta\Sigma$  ADC to achieve low input-referred noise (IRN) and high tolerance to stimulation artifacts. A 25-level feedback RDAC is realized with a 12-tap tri-level FIR-DAC and helps to linearize the LNA, resulting in high linearity over a wide input range. The FIR-DAC's delay is compensated by a novel feedforward compensation scheme to maintain loop stability. The DC-coupled LNA is chopped at a low frequency (~100kHz) and provides high input impedance, low offset, and low 1/f noise without suffering from chopper artifacts. Implemented in a 65nm CMOS process, the ADC achieves an IRN of 62.5nV/root(Hz), 86.7dB SNDR, 87.5dB DR, and 97.9dB SFDR, while consuming only 11.8 microwatt in a 10kHz bandwidth. This corresponds to the state-of-the-art FoM of 176.8dB.

#### C24-3 - 11:20

38kbps Multi-Access Magnetoelectric Backscatter Communication with Non-Interrupted WPT for a Network of Miniature Wireless Bio-Implants, W. Wang, Y. Su, Y. Zou, H.-C. Liao, Y. Chang and K. Yang, Rice Univ., USA

This paper presents a time division multi-access (TDMA) magnetoelectric (ME) backscatter communication scheme exploiting the 2<sup>nd</sup> resonant frequency of the ME transducer, achieving 1) simultaneous ME uplink and wireless power transfer (WPT) with 38kbps data rate and 0.57nJ/bit energy, doubling the data rate compared to existing ME uplink schemes; 2) A WPT self-interference canceller (SIC), along with an "8" shaped RX coil, reducing interference between TX and RX by 70 dB, assisting backscatter uplink with 7E-6 BER.

## C24-4 - 11:45

**128-Channel Multi-Chip Acoustic Hologram Generator,** J. Kustin\*, T. Kang\*\*, S. Song\*, Y. Naveed\* and M. P. Flynn\*, \*Univ. of Michigan, USA and \*\*Sungkyunkwan Univ., Korea

An ASIC combines a numerically controlled oscillator with reconfigurable amplitude and phase control of 16 channels to synthesize acoustic holograms with a phased array of ultrasonic transmitters. Multiple chiplets operate synchronously to drive arbitrarily many channels. A scalable digital delay line technique, leveraging on-chip SRAM and single-bit noise-shaped bitstreams, realizes area-efficient, high-density, and high-resolution true-time-delay phase shifts. The prototype system employs 8 chiplets to drive a 128-element array for hologram generation.

#### Circuits Session 25

#### **Advanced PLLs**

Thursday, June 12, 10:30-12:10

Chairpersons: K. Okada, Institute of Science Tokyo

M. Brox, Micron

## C25-1 - 10:30

A Fractional-N Digital-PLL Based on a Power-Gated Ring-Oscillator and a Frequency-Stabilizing Loop Achieving 74fs Jitter Under 3mV<sub>pp</sub> Supply Ripple, M. Rossoni, R. Moleri, D. Lodi Rizzini, P. Salvi, S. Gallucci, G. Castoro, F. Tesolin, A. L. Lacaita, M. Dartizio and S. Levantino, Politecnico di Milano, Italy

A 10GHz fractional-N digital PLL with a power-gated ring-oscillator (PGRO) to reduce digital-to-time converter (DTC) range and a PGRO-frequency-stabilizing loop to mitigate supply sensitivity is presented. The PLL achieves 74fs and 81fs rms jitter, when a  $3mV_{pp}$  sinusoid and a  $2.5mV_{rms}$  white noise are superimposed to the PGRO supply, respectively.

# C25-2 - 10:55

A 58.9fs-Jitter Fractional-N Digital PLL Using a Double-Edge Variable-Slope DTC, D. Fagotti, S. M. Dartizio, F. Tesolin, R. Moleri, G. R. Trotta, M. Rossoni, S. Gallucci, P. Salvi, G. Castoro, D. Lodi Rizzini, A. L. Lacaita and S. Levantino, Politecnico di Milano, Italy

This work presents a fractional-N digital PLL using a power-efficient double-edge variable slope digital-to-time converter (VS-DTC). Compared to a conventional VS-DTC, the double edge VS-DTC demonstrates 1.65x lower power consumption and retains the same linearity and noise. The implemented PLL achieves 58.9fs jitter, -61.8dBc in-band fractional spur and -110.5 dBc/Hz phase noise at 10kHz offset, which compare favorably with literature, while using a low 125MHz input reference frequency.

## C25-3 - 11:20

A 6.4GHz Fractional-N PLL with 96.6fs<sub>rms</sub> Jitter and -257.4dB FoM, Z. Huang, S. Kong and F. Chen, The Hong Kong Univ. of Science and Technology (Guangzhou), China

This paper presents a 6.4GHz DTC-based fractional-N sampling PLL with 96.6fsrms jitter, -65dBc fractional spurs and -257.4dB FoM. A gated-LMS-based calibration technique is used for both DTC gain and nonlinearity background calibration, eliminating the need for an auxiliary DTC to reduce power consumption and minimize calibration errors. Additionally, an SPD is designed to double the sign-comparison accuracy, further reducing calibration errors in DTC calibration.



## C25-4 - 11:45

A 55.8-to-64.2GHz, 58.3fs<sub>rms</sub>-Jitter, -250.2dB-FoM<sub>J</sub> Fractional-N Cascaded PLL in 28nm CMOS, J. Jung, E. Lee, D. Han, J. Wang, A. P. Chandrakasan and R. Han, MIT, USA

We present a V-band fractional-N cascaded PLL achieving the integrated jitter of  $58.3 fs_{rms}$  and  $FoM_J$  of -250.2 dB. A fully differential voltage domain quantization (Q)-noise cancellation of the 1sL-stage reference sampling PLL suppresses the Q-noise of the delta-sigma modulator with <0.92LSB INL and <0.08mV resolution while only consuming 0.62mW. The 2nd-stage sub-sampling PLL features a 0.82mW switched capacitor frequency-to-voltage converter-based FLL, addressing a harmonic lock issue without a power-hungry mmWave frequency divider. Its frequency re-acquisition time is within 420ns.

## **Circuits Session 26**

## **Switching Regulators**

Thursday, June 12, 10:30-12:10

Chairpersons: S. Hong, Sogang University

A. Thomsen, Cirrus Logic

#### C26-1 - 10:30

A 9.1mW All-5V-CMOS Series-Capacitor AC-DC Converter with C<sub>F</sub> Reallocation Operations for 85-230V<sub>RMS</sub>
Mains Achieving 85.6% Efficiency at 858mW/cm³ Density, J. Shi\*.\*\*, X. Mu\*, Q. Ma\*.\*\*, Y. Jiang\*, R. Martins\*.\*\*\* and P.-I. Mak\*, \*Univ. of Macau, \*\*UM Hetao IC Research Institute, China and \*\*\*Univ. of Lisboa, Portugal

This work presents an efficient switched-capacitor (SC) AC-DC converter for IoT systems powered by mains in 120VRMS/230VRMS standards. To eliminate using high-voltage (HV) silicon devices to preserve conversion efficiency and reduce cost, we propose a series-capacitor single-stage AC-DC structure with a fine-grained SC pre-regulation network and active source detaching function, which achieves rectifier conduction loss reduction, output conduction duty extension, and output ripple reduction. At the same, the pre-regulation flying capacitors ( $C_F$ ) can be reallocated to be in parallel to  $C_{OUT}$  dynamically according to load conditions, further improving the output ripple and enhancing total capacitance utilization. Fabricated in a 180-nm process using all 5-V CMOS transistors, the converter supplies an output ( $V_{OUT}$ ) of 2.6~5V from an  $85\sim230V_{RMS}$  input ( $V_{IN}$ ). It outputs a maximum power of 9.11mW with 85.6% efficiency and 858mW/cm3 overall system power density.

#### C26-2 - 10:55

A 16-24V to 1-1.8V 1.187W/mm³-Power-Density Hybrid DC-DC Converter Featuring Inductor Current in Σ-Fibonacci Region for Unmanned Aerial Vehicle Applications, Y. Ji and L. Cheng, Univ. of Science and Technology of China, China

This paper presents a 16-24V-input 1-1.8V-output 3/2-phase hybrid DC-DC converter for powering point-of-load systems in unmanned aerial vehicles. The converter minimizes inductor DC resistance and AC resistance losses by reducing inductor current into the  $\Sigma$ -Fibonacci region, enabling the use of a compact inductor. This region corresponds to the minimum achievable inductor current with flying capacitors fully utilized. Fabricated in a 180nm BCD process, the converter achieves a 93.9% peak efficiency and a 1.187W/mm³ power density.

## C26-3 - 11:20

A 94.1%-Efficiency Flying-Capacitor-Shared 2-Inductor 3-Level Boost Converter with Simultaneous  $V_{CF}$  and  $I_L$  Balance Achieving <0.92%- $V_{CF}$  and <0.22%- $I_L$  Error, S.-J. Lee, Y.-W. Jeong, J.-H. Kim, M.-J. Cho, M.-S. Kim, M.-H. Kim and S.-U. Shin, POSTECH, Korea

In this paper, a flying-capacitor-shared 2-inductor 3-level boost converter is presented. A unified imbalance calibration technique is proposed to achieve  $V_{CF}$  and  $I_L$  balance, simultaneously. This work features a wide operating range that are not constrained by duty cycle and conversion ratio limitations, while maintaining the voltage stress on all switches at  $0.5V_O$ . The test chip, fabricated using a 180nm CMOS process, achieves simultaneous  $V_{CF}$  and  $I_L$  balancing with balancing errors of less than 0.92% and 0.22%, respectively, under a 0.20hm resistance mismatch. A peak efficiency was measured at 94.1%.

# C26-4 - 11:45

Matryoshka CSCR: A Reconfigurable Matryoshka-Stacked Continuous-Scalable-Conversion-Ratio Switched-Capacitor DC-DC Converter with 0.1-to-1.7V Input, Y. Yang\*, W. Peng\*, M. Huang\*\*, S. Du\*, \*Delft Univ. of Technology, Netherlands and \*\*Univ. of Macau, China

This paper presents a reconfigurable continuous-scalable-conversion-ratio (CSCR) switched-capacitor DC-DC boost converter, named Matryoshka CSCR converter, for energy harvesting (EH) applications. With 6 split-input CSCR cores adaptively stacked like Matryoshka dolls, the structure significantly extends the voltage conversion ratio (VCR) range with small M and N numbers. The chip was fabricated in a 180nm process. It covers 0.1-1.7 V input for a 1.8-V output and achieves 88.6% peak efficiency and 23.8mW peak power.



## **Technology Session 14**

#### **RRAM and Selector Only Memory**

Thursday, June 12, 10:30-12:35

Chairpersons: H. Lue, Macronix International Co., Ltd.

G. Bronner, Rambus

#### T14-1 - 10:30

A CMOS-Compatible 12nm 8Mb MLC RRAM Enabling Producible 2-Bit Per Cell for High Energy Efficiency Compute-In-Memory in Edge Al Applications, C.-Y. Tsai, S.-F. Liu, J.-H. Hsuen, B. Lin, C.-R. Hsieh, C.-W. Chan, C.-Y. Chang, Y.-C. Huang, J. Yang, J.-J. Wu, Y.-W. Chen, M.-F. Chang, Y.-D. Chih, W.-T. Chu, K.-C. Huang and H. Chuang, TSMC, Taiwan

A 12nm 2-bit per cell RRAM, featuring co-optimization across circuit design, write algorithms, and memristor engineering, demonstrates remarkable 10K-cycle endurance and 85°C/10yr retention on physical 8-megabit arrays, exhibiting superior inference accuracy in compute-in-memory applications. Notably, the intrinsic RRAM cells present 1B-cycle endurance and 125oC/10yr retention capability. Achieving this requires precise tuning of the oxygen and vacancy density in the memristor through circuit design and write algorithms, and accounts for the vacancy formation-recombination energy by adjusting memristor property in each storage state. This innovation paves the way for edge artificial intelligence (AI) processors to utilize a simple RRAM solution for firmware storage and high energy efficiency AI computing in application-specific-integrated-circuits (ASIC).

# T14-2 - 10:55

Achieving Outstanding Endurance (> 107) in Large-Array Two-Deck 16 nm SOM through Process, Structure, and Design Strategies for Emerging SCM Applications, S. Ban, M. Kim, N. Park, J. Park, J. Yeon, D. Yun, T. Park, S. Oh, B. Lee, W. Lee, G. Do, Y. Bae, J. Kim, U. Park, I. Yeo, J. Bae, M. Kang, S. Hwang, D. Ahn, G. Jung, J. Lim, Y. Sung, J. Lee, S. Yim, M. Lee, M. Park, T. Kim, A. Choi, G. Park, S. Chung, Y. Tak, J. Ko, J. Han, S. Chae, S. G. Kim, J. Yi, Y. Cho and S. Cha, SK hynix Inc., Korea

We successfully demonstrated a two-deck structure selector-only memory (SOM) based on a 16 nm half-pitch, achieving endurance exceeding 10<sup>7</sup> cycles at raw bit error rate (RBER) 200 ppm through process, structure, and design strategies. Our findings reveal that SOM endurance is not only influenced by process factors such as dual functional material (DFM) and encapsulation but also strongly depends on the spike charge. This underscores the importance of both spike charge suppression and process optimizations to enhance SOM performance.

## T14-3 - 11:20

Scalable Fabrication and Demonstration of the First Fully Integrated 14nm 2-Stack SOM (Selector Only Memory) Device, J. H. Park, C. H. Lee, W. H. Park, K. D. Park, S. J. Song, S. H. Eun, Y. J. Park, K. W. Lee, I. M. Park, Z. Wu, S. C. Oh, H. C. Yoon, S. K. Kim, K. M. Park, S. H. Lee, S. W. Park, S. J. Hyun, Y. J. Song, S. J. Ahn and J. H. Song, Samsung Electronics Co., Ltd., Korea

This study successfully demonstrates the fabrication and operation of the first fully integrated 14nm 2-stack SOM (Selector Only Memory) devices with a 1-Gb main cell array. A read window over 750mV is achieved across all decks and all cell characteristics are maintained up to the 14nm node with the advanced processes, providing the scalability of SOM device. Furthermore, comprehensive reliability evaluations is conducted, confirming the feasibility of the mass production.

# T14-4 - 11:45

Multi-Stack InTe Selector-Only Memory (SOM) Achieving Ultra-Low Power Operation (10 μA) and Excellent Endurance (~ 10¹0 cycles), Y. Seo\*, D. Kim\*, L. Jung\*, J. Lee\*, O. Kwon\*, C. Baek\*\*, Y. B. Park\*\*, T. H. Lee\*\* and H. Hwang\*, \*POSTECH and \*\*Kyungpook National Univ., Korea

We present an innovative In-Te based selector-only memory (SOM) featuring a multi-stack architecture that significantly boosts power efficiency and reliability. Through XPS, Raman spectroscopy and DFT simulation, we clarified how optimized In-Te bonding and trap engineering enable successful SOM operation while reducing write current ( $I_{write}$ ) to an ultra-low 10  $\mu$ A. This approach also resulted in a sufficient memory window (MW  $\sim$  1 V), minimized variability, and achieved excellent endurance ( $\sim$  10 $^{10}$  cycles).

## T14-5 - 12:10

Differences in Operational Mechanisms of As- and Sb-Based Selector Only Memory for Emerging 3DXP Architecture, M. Choi\*, H.-J. Sung\*, J. Park\*, B. Koo\*, J. Moon\*, W. Yang\*, Y. Park\*, Y. Ham\*, C. S. Lee\*, H. Chae\*\*, Z. Wu\*\*, J. Lee\*\*, C. Kim\*\*, K. Yang\* and Y. Kang\*, \*Samsung Advanced Institute of Technology and \*\*Samsung Electronics Co., Ltd., Korea

Selector Only Memory (SOM) exhibits unique polarity-dependent Vth shift characteristics, but their origins are still unclear. In this study, we investigated the operation mechanisms of group 15 elements, such as As and Sb, which contribute to the amorphous stability. For this purpose, GeAsSeln and GeSbSeln SOM devices were fabricated on 12-inch wafer. The I-V analysis confirmed distinct conduction mechanisms between the materials, and Sb-related trap states result in a larger memory window (2.1V) and low  $V_{th}$  drift (4mV/dec) compared to As-based SOM devices. TEM analysis revealed the local atomic distribution, showing high RDE properties (>60G) for both As and Sb-based SOM devices, despite Sb migration during the initial cycle. Also, photonic I-V based trap profile measurement and DFT calculations were performed to identify the origin of the trap states. These results demonstrate that SOM exhibits diverse operational mechanisms, emphasizing the need for fundamental analysis for each material.



# **Technology / Circuits Joint Focus Session 3**

#### JFS3 Al and ML Hardware

Thursday, June 12, 10:30-12:35

Chairpersons: V. Honkote, Intel Corporation

S. Yu, Georgia Tech

#### JFS3-1 - 10:30 (Invited)

Wafer-Scale Integration for Al - The Holy Grail?, B. Kleveland, P. Ferolito, M. Morrison and A. Scherer, Cerebras Systems Inc., USA

Over the past 50 years, there have been numerous wafer scale integration attempts. This paper presents some of the engineering breakthroughs behind productizing a wafer-scale integration (WSI) die. Unlike traditional CPUs or GPUs, the Cerebras WSI die integrates 900,000 processing cores and 44 GB of on-chip memory across a full wafer, significantly enhancing performance for large workloads. Key innovations include high bandwidth across-reticle interconnects, vertically integrated power delivery and silicon holes to enable structural clamping for uniform connection and cooling. An in-system burnin and High Temperature Operating Life (HTOL) approach was adopted, stressing the wafer and system components by heating the coolant. The wafer scale solution delivers substantially higher core density and bandwidth than traditional solutions. This work demonstrates that with proper architectural, mechanical, and testing strategies, full-wafer chips can be manufactured and packaged, opening a path for ultra-large-scale systems that were previously infeasible with conventional packaging.

#### JFS3-2 - 10:55 (Invited)

Design Considerations for LLM Inference in Data Centers: Chip and Interconnect, Z. Shi, Y. Wang, M. Diao, W. Wu, Z. Su, X. Li, R. Niu, H. Zang, K. Song, Y. Yu, H. Lin, J. He, H. Liu, J. Xia and H. Liao, Huawei, China

This paper presents a cost-efficient chip prototype optimized for large language model (LLM) inference. We identify four key specifications – computational FLOPs (flops), memory bandwidth (bw), memory capacity (cap), and interconnect la-tency – and introduce two ratios (bw/cap and flops/ bw) to guide chip design. Through an analysis of LLM inference workflows, we show that memory bandwidth is critical for LLM decoding, while memory capacity and computational FLOPs are less important than in training. Based on these in-sights, we systematically propose a design with reduced memory capacity, increased bandwidth, and an interconnect to-pology design principle optimized for both interconnect la-tency and bandwidth.

## JFS3-3 - 11:20 (Invited)

Marco: Configurable Graph-Based Task Solving and Multi-Al Agents Framework for Hardware Design, C.-T. Ho, J. Gong, Y. Bai, C. Deng, H. Ren and B. Khailany NVIDIA Corp., USA

Hardware design presents numerous challenges stemming from its complexity and advancing technologies. These challenges result in longer turn-around-time (TAT) for optimizing performance, power, area, and cost (PPAC) during synthesis, verification, physical design, and reliability loops. Large Language Models (LLMs) have shown remarkable capacity to comprehend and generate natural language at a massive scale, leading to many potential applications and benefits across various domains. Successful LLM-based agents for hardware design can drastically reduce TAT, leading to faster product cycles, lower costs, improved design reliability and reduced risk of costly errors. In this work, we propose a unified framework, Marco, that integrates configurable graph-based task solving with multi-modality and multi-Al agents for chip design by leveraging the natural language and reasoning abilities with collaborative toolkits. Lastly, we demonstrate promising performance, productivity, and efficiency of LLM agents by leveraging Marco framework on layout optimization, Verilog/design rule checker (DRC) coding, timing analysis tasks.

## JFS3-4 - 11:45

An 157TOPS/W Transformer Learning Processor Supporting Forward Pass Only with Zeroth-Order Optimization, C.-Y. Li, Y.-F. Shyu and C.-H. Yang, National Taiwan Univ., Taiwan

This work presents the *first* Transformer learning processor that supports forward pass only with zeroth-order optimization. By applying uniform perturbation, sign-based weight gradient generation, and exponential ternary speculation, the training complexity is reduced by up to 96%. By employing parallel processing, tiled-based bitmask generation, and workload scheduling, the latency is reduced by up to 96%. Fabricated in 40-nm CMOS, the chip achieves a max energy efficiency of 157TOPS/W, outperforming prior backpropagation-based Transformer learning processors by 1.6-to-7.6x.

## JFS3-5 - 12:10

On Chip Customized Learning on Resistive Memory Technology for Secure Edge Al, M. Pallo\*.\*\*, S. D'Agostino\*\*, M. Piccoli\*\*.\*\*\*, D. E. Bonnet\*\*, N. Castellani\*\*, G. Piccolboni\*, M. A. Iftakher\*\*\*, J.-F. Nodin\*\*, F. Andrieu\*\*, D. Querlioz\*\*\*, G. Molas\*, L. Hutin\*\* and E. Vianello\*\*, \*Weebit Nano FR, \*\*CEA-LETI and \*\*\*CNRS (Centre national de la recherche scientifique), France

This paper presents the first experimental demonstration of few-shot on-chip training on an in-memory computing resistive memory (ReRAM) platform. We use the Model- Agnostic Meta-Learning (MAML) algorithm to reduce training iterations and associated ReRAM conductance updates by orders of magnitude. Through co-optimization of device programming conditions and the algorithm, we achieve >97% accuracy on the Omniglot dataset after just five training iterations (*i.e.*, ReRAM programming operations) while improving device retention at 150°C.



## **Technology Session 15**

#### 2D and BEOL Transistors

Thursday, June 12, 10:30-12:35

Chairpersons: Y. Yamamoto, Renesas Electronics Corporation

Z. Chen, Purdue

#### T15-1 - 10:30

Record PMOS WSe<sub>2</sub> GAA Performance Using Contact Planarization, and Systematic Exploration of Manufacturable, High-yield Contacts, M. Jaikissoon, P. Buragohain, W. Mortelmans, K. Oguz, C. Rogan, J. Lux, A. Kitamura, C. Engel, R. F. Vreeland, H. Barnett, Z. Brooks, S. Harlson, S. S. K. Pinnepalli, E. Gillispie, K. Toku, T. Wilson, A. Oni, A. V. Penumatcha, C. J. Dorow, M. Kavrik, A. Kozhakhmetov, K. Maxey, N. Arefin, J. Kevek, T. Tronic, M. Metz, S. B. Clendenning, K. P. O'Brien and U. Avci, Intel Corp., USA

Two-dimensional (2D) transition metal dichalcogenides (TMDs) hold significant promise for both front-end-of-line (FEOL) and back-end-of-line (BEOL) applications. Here, we report new approaches to contact formation in global back gate 2D PMOS transistors using a manufacturable physical vapor deposition (PVD) sputtering process. We present a systematic study of novel sputtered contacts (Sb/Pt, Sb<sub>2</sub>Te<sub>3</sub>, Bi<sub>2</sub>Te<sub>3</sub>) to multilayer WSe<sub>2</sub> and through statistical analysis we discover trends of increasing performance with Pt thickness and post-metal anneal. Optimized conditions give R<sub>c</sub> down to 1.3 kOhm.µm at N(inv)=1e13cm<sup>-2</sup> and  $I_{dmax}$ =212 µA/µm at V<sub>d</sub>=-1V, the highest performance to date using sputtered contacts. Further, we pioneer a new approach to contact-first gate-all around (GAA) schemes through contact planarization using chemical mechanical polishing (CMP). This first-of-kind CMP flow demonstrates significant performance gains over non-CMP contacts (4x). CMP Ru contacts to monolayer WSe<sub>2</sub> GAA devices achieve record drive currents >600 µA/µm and SS = 132 mV/dec

#### T15-2 - 10:55

First Demonstration of BEOL-Compatible Co-Sputter Deposited Te<sub>1-x</sub>Se<sub>x</sub> p-FETs Enabling 3D Stackable Oxide Semiconductor CFET, DRAM, and First CFET-Structured SRAM, Y. Xu\*, Y. Sun\*, Z. Zhou\*, Z. Zhou\*, S. Luo\*, C. Sun\*, Y. Kang\*, K. Ni\*\*, G. Liang\*\*\* and X. Gong\*, \*National Univ. of Singapore, Singapore, \*\*Univ. of Notre Dame, USA and \*\*\*National Yang Ming Chiao Tung Univ., Taiwan

We report the world first BEOLcompatible p type TelluriumSelenium TeSe FETs with aggressively scaled channel length  $L_{\rm ch}$  of 50 nm, achieved through TeSe cosputtering for bandgap tuning and performance improvement. Our TeSe FETs exhibit outstanding performance. This device has enabled the realization of 3D stackable IGZOTeSe DRAM with excellent retention over 1000 s and short channel IGZOTeSe CFET, achieving a gain of 38 at  $V_{\rm DD}$  2.5 V. Based on this CFET, we further experimentally demonstrated the world first stacked oxide semiconductors CMOS SRAM with 4T footprint.

## T15-3 - 11:20

Homo-Channel WSe₂ n/pFETs with High Performance and On/Off Ratio Using Tunable Doping, K.-H. Chiu\*, W.-C. Wu\*, H.-Y. Huang\*, J.-H. Chih\*, Y.-C. Chang\*, S.-T. Wang\*, H.-Y. Chen\*, C.-Y. Lin\*, D.-H. Lien\*, C. Hu\*\* and C.-H. Chien\*, \*National Yang Ming Chiao Tung Univ., Taiwan and \*\*Univ. of California, Berkeley, USA

Achieving symmetric n/pFETs with high current densities and matched threshold voltages in a single 2D channel material is crucial for homo-channel CMOS integration and the realization of 3D monolithic applications. This work introduces novel n-doping using aluminum nitride (AIN) and p-doping using molybdenum oxide (MoO<sub>x</sub>) on CVD-grown monolayer tungsten diselenide (1L-WSe<sub>2</sub>), achieving record high electron current (~230  $\mu$ A/ $\mu$ m at V<sub>DS</sub> = 1 V) with lowest contact resistance (~1 kOhm- $\mu$ m) and one of the best reported hole currents (>400  $\mu$ A/ $\mu$ m at V<sub>DS</sub> = -1 V), both in enhancement mode. AIN/MoO<sub>x</sub> doping enables tunable doping strength with on/off ratio (>10<sup>6</sup>) and well-matched V<sub>TH</sub> for n- and pFETs. This work demonstrates high gain (> 25 V/V at V<sub>DD</sub> = 1.5 V) and the potential of WSe<sub>2</sub> for mono-channel CMOS technology.

## T15-4 - 11:45

1000x Lower Leakage in High-Performance Carbon Nanotube Nanosheet FETs, N. S. Safron\*, H.-Y. Chiu\*\*.\*\*\*, T.-A. Chao\*\*, L. Liu\*\*, M. Passlack\*, C. Gilardi\*, A. Azizi\*, D. Zhong\*, J.-J. Wu\*\*, S. Natani\*\*\*, A. Kummel\*\*\*\*, S. Li\*\*\*\*\*, S. Mitra\*\*\*\*\*, C.-H. Chien\*\*\*, H.-S. P. Wong\*\*.\*\*\*\*, M. M.-F. Chang\*\*, G. Pitner\*, I. Radu\*\* and M. Cao\*\*, \*TSMC, USA, \*TSMC, Taiwan, \*\*\*National Yang Ming Chiao Tung University, Taiwan, \*\*\*\*University of California, San Diego and \*\*\*\*\*Stanford University, USA

This work reports a Carbon Nanotube (CNT) Nanosheet PFET with record performance ( $I_{\text{max}}$  = 0.9 mA/µm), leakage ( $I_{\text{min}}$  = 20 pA/µm), and sub- $V_{\text{t}}$  slope of 93 mV/dec at -0.5 V  $V_{\text{ds}}$ . This result is attributed to a larger electronic bandgap which simultaneously suppresses leakage by 1000x and improves sub- $V_{\text{t}}$  slope from 150 to 93 mV/dec at  $V_{\text{ds}}$  = -0.5 V. Using planar single-CNT and network CNT FETs, we demonstrate feasibility of performance-matched NFET and PFET with larger bandgap. The future practical application of CNT electronics will be enabled by gate stack optimization and suppressed variation, for which key trends and targets are investigated by SRAM yield analysis.

# T15-5 - 12:10

Wafer-Scale Monolithic 3D Integration of CMOS Logic Gates Based on 2D Materials, L. Hu, J. Shim, J. Kim, H.-S. Jang, B. Park and S. W. Kim, Samsung Advanced Institute of Technology, Korea

Monolithic three-dimensional (M3D) integration offers a promising path to enhance transistor density and energy efficiency in next-generation electronics. In this work, we explore the 8-inch wafer-scale integration of complementary metal-oxide-semiconductor (CMOS) logic gates, including inverters, and NAND gates using two-dimensional materials (2DMs), specifically molybdenum disulfide (MoS $_2$ ) and tungsten diselenide (WSe $_2$ ). By vertically stacking MoS $_2$ -based N-channel MOS (NMOS) and WSe $_2$ -based P-channel MOS (PMOS) field-effect transistors (FETs), we demonstrate a high-performance CMOS inverter with a voltage gain of 60 at a Vdd of 2V.



## **Sensing and Ranging Technologies**

Thursday, June 12, 14:00-15:40

Chairpersons: S. Okura, Ritsumeikan University

M. Dielacher, Infineon

#### C27-1 - 14:00

2/3-inch 2.1Megapixel SPAD Image Sensor with 156dB Single-Shot Dynamic Range and LED Flicker Mitigation Based on Weighted Photon Counting Technique, Y. Ota, H. Sekine, K. Morimoto, W. Endo, T. Sasago, A. Abdelghafar, S. Mikajiri, N. Isoda, D. Kobayashi, M. Shinohara, K. Mikami, K. Inoue, H. Yasui, K. Tojima, M. Niwa, S. Omodani, K. Chida, K. Uehira, T. Itano, F. Inui, J. Iwata, M. Ohmura, Y. Matsuno, K. Sakurai and T. Ichikawa, Canon Inc., Japan

We present a 2/3-inch 2.1Mpixel 3D backside-illuminated SPAD image sensor for automotive applications. Newly proposed pixel architecture based on weighted photon counting technique achieves 156dB single-shot dynamic range with LED flicker mitigation. Read noise-free operation with seamless global shutter ensures robust image capture of distant targets under 0.1lux lighting conditions. Intra-frame non-destructive readout enables acquisition of short exposure frames with reduced motion blur, in parallel with long exposure frame acquisition.

# C27-2 - 14:25

A 25M points/s Back-Illuminated Stacked SPAD Direct Time-of-Flight Depth Sensor with Equivalent Time Sampling for Automotive LiDAR, T. Yui\*, K. Hanzawa\*, M. Hosoya\*, Y. Liu\*, T. Yasufuku\*, Y. Tanaka\*, Y. Tashiro\*, A. Tumewu\*, M. Yamane\*\*, M. Shibata\*\*, T. Sakada\*\*, K. Akatsuka\*, Y. Matsushita\*, K. Yamada\*\*, K. Mori\*\*, T. Toyoshima\*, Y. Sakano\*, O. Kumagai\*, K. Tsunoji\* and M. Takahash\*, \*Sony Semiconductor Solutions Corp. and \*\*Sony Semiconductor Manufacturing Corp., Japan

This paper presents technologies at Single Photon Avalanche Diode (SPAD) direct Time-of-flight (ToF) depth sensor and Light Detection and Ranging (LiDAR). Method of readout by under-pixel differential comparator and the threshold control can select either to prioritize distance accuracy or acquisition of reflectance information. The equivalent time sampling (ETS) at 3 GHz improves depth accuracy. Furthermore, subsequent 250MHz re-sampling and the depth information extraction pipeline relaxes the operation frequency maintaining the depth accuracy and approximately 98% data size reduction. Sensor delivers distance measurement for 520 effective macro pixels with 0.05 degree vertical angular resolution at 120 degree horizontal Field of View (FoV) and 20fps, resulting in 25M points/s (pts/s) ranging data and detection of 25cm object at 250m.

## C27-3 - 14:50

A Radiation-Hardened Neuromorphic Imager with Self-Healing Spiking Pixels and Unified Spiking Neural Network for Space Robotics, Q. Cheng\*.\*\*, Q. Li\*, Z. Yang\*, Z. Kong\*, G. Niu\*, Y. Liang\*, J. Li\*, J.-H. Park\*\*\*, W. Liao\*\*\*\*, H. Awano\*\*, T. Sato\*\*, L. Lin\* and M. Hashimoto\*\*, \*Southern Univ. of Science and Technology, China, \*\*Kyoto Univ., Japan, \*\*\*Kyunghee Univ., Korea and \*\*\*\*Kochi Univ. of Technology, Japan

A radiation-hardened neuromorphic imager prototype is developed for space exploration, featuring a fully spike-based neuromorphic vision system architecture, in-pixel self-healing against radiation-induced damage, and integrated unified spiking neural network (USNN) with adaptive neurons and synapses and contrast enhancement at low-contrast conditions. Self-healing reduces dark current by 6.25x at 14kGy cumulative dose, recovering recognition accuracy by 27.8%. USNN consumes 0.0529 pJ/SOP at 5,000 events/s.

# C27-4 - 15:15

A 320µm² Minimum Guard-band Metal Resistor-based Temperature Sensor with +/-1.4°C Inaccuracy in 18A RibbonFet CMOS with PowerVia, D. Duarte, Y. Li and J. Ayers, Intel Corp., USA

A temperature sensor with a 320µm² metal resistor-based sensing element is implemented in an advanced 18A RibbonFET CMOS technology with back side power delivery (or PowerVia). The design supports remote sensing at any location along the metal stack, which minimizes guard bands by enabling distances between temperature hot-spots and the sensing element that are less than 10um. The 20.5us conversion time supports sensing at multiple locations within a 1ms window. The design achieves +/-1.4°C inaccuracy, which is the lowest among resistor-based sensors designed in sub-22nm process nodes and the use of a standard routing metal layer for temperature sensing enables detailed thermal gradient characterization not previously possible in technologies with front side or back side power delivery networks.



#### Sub-THz TRXs

Thursday, June 12, 14:00-15:40

Chairpersons: W. Deng, Tsinghua University

J. Lagos Benites, imec

#### C28-1 - 14:00

A 150 GHz High-Power-Density Phased-Array Transceiver in 65nm CMOS for 6G UE Module, Y. Yamazaki\*, S. Park\*, T. Uchino\*, C. Liu\*, J. Sakamaki\*, Y. Morishita\*\*, A. Egami\*\*, R. Hasaba\*\*, K. Takahashi\*\*\*, T. Abe\*\*, T. Murata\*\*, Y. Nakagawa\*\*, T. Tomura\*, H. Taneda\*\*\*\*, K. Murayama\*\*\*\*, M. Tsukahara\*\*\*\*, H. Ota\*\*\*\*, Y. Nakabayashi\*\*\*\*, C. Wang\*, H. Herdian\*, Y. Zhang\*, Z. Li\*, W. Wang\*, H. Huang\*, D. Xu\*, S. Kato\*, M. Ide\*, Y. Zhang\*, H. Sakai\*, K. Kunihiro\*, A. Shirane\*, K. Takinami\*\* and K. Okada\*, \*Institute of Science Tokyo, \*\*Panasonic Industry Co. Ltd., \*\*\*Panasonic System Networks R&D Lab. and \*\*\*\*Shinko Electric Industries Co. Ltd., Japan

High-speed wireless communication utilizing the D-band frequencies is envisioned for 6G. This paper proposes a high-power-density 4-element 150 GHz phased-array transciever designed for 6G user equipment (UE) modules, utilizing 65-nm CMOS technology. An 8-element AiP module using two transceiver ICs operates in the 142-164 GHz frequency range, achieving an EIRP of 26 dBm and a maximum data rate of 56 Gbps. The power consumption per element is 150 mW in TX mode and 93 mW in RX mode.

## C28-2 - 14:25

A 52Gb/s 8.9dBm EIRP 300GHz-Band Amplifier-Last Outphasing Transmitter with Path Mismatch Calibration in 65nm CMOS, C. Wang, O. A. Yong, H. Herdian, W. Wang, A. Shehata, C. da Gomez, C. Liu, Y. Yamazaki, K. Kunihiro, H. Sakai, Y. Zhang, A. Shirane and K. Okada, Institute of Science Tokyo, Japan

This paper presents a 300GHz-band amplifier-last outphasing transmitter with path mismatch calibration in 65nm CMOS. The transmitter integrates two independent sub-THz LO generation chains to calibrate the mismatch between two paths in the outphasing topology. After the calibration, the proposed transmitter achieves a minimum EIRP of 5.0dBm over a frequency range from 237GHz to 250GHz, with a maximum of 8.9dBm EIRP at 246GHz. It supports a data rate of 52Gb/s in 16QAM, with a communication distance of 9cm.

#### C28-3 - 14:50

A 0.184 mm² W-band Single-RTWO-Based Subharmonic RX Achieving 3.72 dB-NF and I/Q Mismatch < 0.8° in 22nm CMOS, J. Deng\*, A. Li\*, J. Li\*, W. Tao\*, S. Yang\*, Z. Zhang\*, Y. Yang\*, X. Cheng\*\*, F. Lin\*, R. B. Staszewski\*\*\*, L. Lou\* and Y. Hu\*, \*Univ. of Science and Technology of China, \*\*Hefei Science of China Microelectronics Innovation Center, China and \*\*\*Univ. College Dublin. Ireland

We present a W-band sub-harmonic IQ receiver (RX) featuring a single rotary traveling-wave oscillator (RTWO) integrated with an ADPLL. It addresses key challenges in conventional mm-wave direct-conversion RXs: LO generation and distribution with poor phase noise (PN), IQ mismatch, high power consumption, large area, and high dc offset. RTWO multimodes are suppressed via asymmetric capacitance in the crossover of the Möbius ring. Fabricated in 22 nm CMOS, the RX demonstrates a 3.72 dB noise figure (NF), IQ mismatch < 0.8 deg using a bundle-based calibration. It occupies a tiny area of 0.184 mm² and has a total power consumption of 56.7 mW, including the ADPLL.

# C28-4 - 15:15

A CMOS Antenna-to-Bits 230-mW 120-Gbps F-band Receiver with Analog-Domain 64QAM Detection and Extraction, Y. O. Hassan\*, M. Oveisi\*, Z. Wang\*\*, P. Heydari\*, \*Univ. of California Irvine and \*\*Marvell Semiconductor, Inc., USA

The design and implementation of a fully integrated CMOS antenna-to-bits 100-140 GHz receiver (RX) realizing 64QAM demodulation in analog-domain is presented. A sequential asynchronous QAM demodulation method is proposed that allows symbol detection, recovery, and extraction at 120-Gbps data rate at remarkably low power consumption. Fabricated in a 22-nm FDSOI CMOS process, this 100-140 GHz CMOS RX exhibits a measured maximum conversion gain of 32 dB and a minimum noise figure (NF) of 9.5 dB. A data rate of 120 Gbps is wirelessly measured at 15-cm distance with the received 64QAM signal being directly demodulated on-chip at a bit-error rate (BER) of 10-2. The measured RX sensitivity at this BER is -32 dBm. The prototype occupies 2.5 x 3 mm<sup>2</sup> of die area, including PADs and test circuits and consumes a total dc power of 230 mW.



#### **Communication and Processors**

Thursday, June 12, 14:00-15:40

Chairpersons: C. Yang, National Taiwan University

S. Shao, University of California, Berkeley

#### C29-1 - 14:00

A 142mW 6.4Gbps Massive MU-MIMO RSMA Detector for Next-Generation Communication Systems, P.-J. Chen, R.-H. Chiou and C.-H. Yang, National Taiwan Univ., Taiwan

This work presents the *first* rate-splitting multiple access (RSMA) detector for massive multiuser multi-input, multi-output (MU-MIMO) systems. The proposed detector supports 256x32 MIMO configuration with up to 256-QAM modulation. A polar decoder with a code length of 128 bits is included for error correction. The iterative detection architecture reduces the latency by 98%. The chip achieves a maximal throughput of 6.4Gbps with 142mW power at 200MHz. Compared to the state-of-the-art space-division multiple access (SDMA) detectors, this work achieves 3.3-to-3.6x higher throughput and 2.0-to-4.5x higher energy efficiency, in addition to the better error performance in correlated channels.

#### C29-2 - 14:25

A 6.2mm² 56.6Gbps 18.2pJ/b oFEC Decoder for Optical Communications, C.-H. Lu, W. Tang and Z. Zhengya, University of Michigan, United States

A complete 5-core iteration-unfolded open forward error correction (oFEC) decoder is presented. The 6.2mm² decoder adopts a bit-level parallel architecture with transpose reordering to achieve 56.6Gbps at 18.2pJ/b, surpassing the state-of-the-art by 2.8x in throughput, 4.8x in energy efficiency, and 6.5x in area efficiency. An efficient list-64 decoding enhances error correction with a pre-FEC BER of up to 2%.

## C29-3 - 14:50

Cygnus: A 1 GHz Heterogeneous Octa-Core RISC-V Vector Processor for DSP, V. Jain, D. Grubb, J. Zhao, K. Anderson, K. T. Ho, Y. Chi, E. Schwarz, K. Asanovic, Y. S. Shao and B. Nikolic, Univ. of California, Berkeley, USA

We introduce Cygnus, an energy-efficient octa-core RISC-V vector processor compliant with the RVV 1.0 specification targeting digital signal processing (DSP) applications. This chiplet features dynamic instruction scheduling with short vector lengths at the vector core level and a big/little architecture with tightly-coupled memory at the SoC level, facilitating efficient streaming data flow across cores. Cygnus demonstrates a high average utilization of 90% across GEMM and CONV kernels, achieving leading energy efficiencies of 414 GOPS/W for INT8 GEMM and 109 GFLOPS/W for FP32 GEMM. In a representative computer vision kernel implementing denoising using a non-local means algorithm, the chiplet delivers a latency of 3.7 ms with an energy consumption of 518 μJ/ frame.

## C29-4 - 15:15

A 16nm 550 - 1320 BTOPS/W NPU Exploiting Training-free Structured Bit-level Sparsity and Dynamic Dataflow Processing, M. Shi\*, C. Fang\*, W. Jiang\*, V. Jain\*, A. Joseph\*\*, W. Dehaene\* and M. Verhelst\*, \*KU Leuven and \*\*NXP Semiconductors N.V., Belgium

To jointly achieve aggressive weight compression as well as high compute utilization for neural and transformer workloads, this chip implements a sparse NN processor exploiting structured bit-level sparsity and dynamic dataflows. Without a need for costly network retraining, the chip achieves an energy efficiency of 550 - 1320 BTOPS/W, realizing an up to x6.82 efficiency improvement compared to the SotA.

# **Circuits Session 30**

# **Cryo-CMOS Circuit**

Thursday, June 12, 14:00-15:40

Chairpersons: M. Miyamura, NanoBridge Semiconductor, Inc.

E. Olieman, NXP Semiconductors

# C30-1 - 14:00

A 5.6-100K, 128ppm/K Cryo-CMOS Current Reference, S. Ray, D. Frank, J. Bulzacchelli, B. Sadhu, K. Tien, M. Yeck, S. Culbashoyl, Timmerwilke, R. Robertazzi, D. Underwood, B. Gaucher and D. Friedman, IBM T. J. Watson Research Center, USA

The first CMOS current reference with a measured temperature coefficient across cryogenic temperatures is reported. Implemented in a 14nm FinFET technology, occupying  $0.14 \text{mm}^2$ , and drawing 38uA from a 1.4V supply, the reference uses mutual compensation between a MOSFET gate-source voltage and thin-film resistance--a circuit technique that improves as cryogenic temperatures are approached--to achieve a temperature coefficient of 128ppm/K over 5.6-100K, as averaged over 5 dice from 3 wafers. The cryogenic supply sensitivity, at 0.06%/V, is 6x lower than the lowest reported among cryo-CMOS references, either current or voltage. Finally, cryogenic low-frequency noise is measured for the first time among cryo-CMOS references, either current or voltage.



## C30-2 - 14:25

A Cryo-CMOS RF-DAC Based Super-heterodyne Transmitter for Superconducting Qubit Control, F. Yuan\*, H. Su\*, Y. Zou\*\*, Y. Peng\*, J. Yin\*, F. Yan\*\*\*, E. Charbon\*\*, R. P. Martins\*,\*\*\*\* and P.-I. Mak\*, \*Univ. of Macau, Macau, \*\*Swiss Federal Institute of Technology in Lausanne (EPFL), Switzerland, \*\*\*Beijing Academy of Quantum Information Sciences, China and \*\*\*\*Univ. of Lisboa, Portugal

Multiple Qubits should be manipulated to perform practical quantum computing, which requires a broad frequency range for the qubit control chip. We present a 4 K cryo-CMOS quantum control chip featuring a Mixing (MIX)/ Non-return-to-zero (NRZ) mode radio frequency digital to analog converter (RF-DAC) and an up-conversion mixer. By opting for Mixing Mode RF-DAC and a mixer for two-stage frequency up-conversion instead of direct-conversion or other methods, our controller can achieve a wider modulated frequency range and lower power consumption.

#### C30-3 - 14:50

A Cryogenic 1.08mW/Qubit Fully-Integrated 4-Channel Frequency-Division-Multiplexing Transmon Qubit State Readout ASIC in 28nm Bulk CMOS, W. Huang\*, Y. Guo\*\*, T. Tian\*, Q. Liu\*\*\*, G. Wei\*, X. Liu\*, T. Li\*, W. Jia\*, Y. Zheng\*\*, Z. Wang\* and H. Jiang\*, \*Tsinghua Univ., China, \*\*Nanyang Technological Univ., Singapore and \*\*\*Beijing Academy of Quantum Information Sciences, China

In this paper, we propose a cryogenic fully-integrated 4-channel Frequency-Division-Multiplexing (FDM) transmon qubit state readout ASIC, with a power consumption of 4.32mW(1.08mW/qubit). To the author's knowledge, it achieves the lowest power consumption compared to other qubit state readout receiver. By adopting ultra-narrow band zero-IF topology aided by 18-tap Analog Finite Impulse Response low pass filter (AFIR-LPF) with 40dB stop-band rejection, the chip is able to detect four qubits simultaneously in our test system and achieves at least 92% readout fidelity in four qubits FDM readout measurement.

#### C30-4 - 15:15

0.25 mW/qubit, 5.7-7.5 GHz Cryogenic CMOS Microwave Signal Selector Using Dual-Stage Injection-Locked Oscillator for Frequency-Multiplexed Qubit Control, H. Fuketa, I. Akita, T. Ishikawa, H. Koike and T. Mori, AIST, Japan

Cryogenic CMOS (cryo-CMOS) quantum bit (qubit) control circuits are expected to overcome the interconnect complexity problem in large-scale quantum computers. However, since each qubit has a unique frequency for control, many power-hungry oscillators are needed to generate the frequencies at the cryogenic temperature stage of a refrigerator. In this paper, to remove the oscillators from the refrigerator, we propose a cryo-CMOS signal selector consisting of two injection-locked oscillators that extract the specified frequency tone for qubit control from the multi-tone microwave signal. This can realize an efficient frequency-multiplexed qubit control system with 1/30 power consumption of the conventional systems.

## **Technology Session 16**

## **FeRAM Array and Module**

Thursday, June 12, 14:00-15:40

Chairpersons: T. Ohtou, Tokyo Electron Limited

K. Hofmann, Infineon

## T16-1 - 14:00

Designing Robust Interfaces of HZO Module (>10² at 85 °C) with High Sensing Margin (>300 mV) for ≤1.1 V 1T-1F with Common Plate Line and 1T-nF FeRAM,I. Jeon\*, H. Lim\*, S. Y Kim\*, Y. Sur\*, J. S. Kim\*, Y Lee\*, J. Choi\*, Y Hong\*, S. Ahn\*, C. H. Jung\*, E. Chung\*, Y Goh\*, K. Lee\*, W Kim\*, H. Kwon\*\*, S.-W Park\*\*, D. Kwon\*\*, J. Jeon\*\*\*, S. H. Lee\*, D. Ha\*, S. J. Ahn\*, S. Hyun\* and J. Song\*, \*Samsung Electronics Co., Ltd., \*\*Hanyang Univ. and \*\*\*Sungkyunkwan Univ., Korea

Energy-efficient AI will be realized in conjunction with ferroelectric nT-mF in the near future. First, we present a ferroelectric module solution that is equipped with Pr-boosted HZO (2Pr 54 μC cm<sup>-2</sup>) and universally improved ILs for FE, MPB and AFE. The films show high write/read endurance cycles (>10<sup>12</sup> at 85 °C). Second, we implement the solution into 1T-1F with a common plate line and 1T-nF FeRAM. We confirm sufficiently high sensing margins (≥300 mV).

## T16-2 - 14:25

Vertical 2T-nC FeRAM Demonstration: BEOL Read Transistor for 4F<sup>2</sup> Memory Strings and Two-Terminal Selector Design for Polarization Disturb Mitigation, S. Deng\*, J. Howe\*, S. Ma\*, S. Y. Tauki\*\*, Z. Zhao\*, J. Duan\*, Y. Lee\*, Y. Qin\*, R. Joshi\*\*\*, T. Kampfe\*\*\*\*, X. Gong\*\*\*\*\*, V. Narayanan\*\* and K. Ni\*, \*Univ. of Notre Dame, \*\*The Pennsylvania State Univ., \*\*\*IBM T. J. Watson Research Center, USA, \*\*\*\*Fraunhofer IPMS, Germany and \*\*\*\*\*National Univ. of Singapore, Singapore

In this work, we demonstrate a vertical 2T-nC FeRAM with a back-end-of-line (BEOL) read transistor ( $T_R$ ) for 4F2 string and propose a selector design to mitigate polarization disturb in passive capacitor crossbar arrays. Key contributions include: 1) successful integration and operation composed of a Si MOSFET write transistor ( $T_W$ ), 3-layer cylindrical ferroelectric capacitors, and Si-doped  $In_2O_3$  BEOL  $T_R$ , demonstrating the feasibility of 4F² 2T-nC string; 2) introducing nonlinearity into the capacitor stack to suppress ferroelectric voltage drop under inhibition biases while maintaining sufficient write voltage, reducing disturbance; 3)modeling and experimental validation of inserting a metal-semiconductor (a-Si)-metal (MSM) selector into the capacitor in mitigating the disturb, particularly achieving 9x reduction of disturb after 10<sup>6</sup> cycles in the  $V_W$ /2 scheme.



## T16-3 - 14:50

FeRAM Capacitor with Novel Low-Power, Non-Destructive and High Endurance Read Operation for High-Density Embedded Memory, S. A. Siddiqui, S.-C. Chang, C. Neumann, B. Granados Alpizar, B. Bangalore Rajeeva, C.-C. Lin, G. Choe, K. Brian, A. Augustus, M. Metz, J. Kavalieros and U. Avci, Intel Corp., USA

For the first time, advanced CMOS compatible HZO-based FeRAM trench capacitors, are demonstrated with non-destructive read(R) endurance >1e4 cycles at elevated temperature(85°C) after single write(W). This ensures continuous R between refresh cycles and eliminates rewrite after every R, leading to further reduction in both latency and energy of R operation over FeRAM with destructive R. Compared to published non-destructive read operation(NDRO), a higher R bias of 0.8V is used to obtain >28x memory window boost [1,2], which originates from high-charge-response from domain walls due to partial polarization switching. Furthermore, 5yr lifetime at 85C is extrapolated under R/W-intensive NDRO, while low-frequency W-access is identified as the worst-case scenario. Finally, a tight distribution of functional bits with NDRO in FeRAM capacitor array across 300mm wafer is demonstrated, showing that integrating low-voltage FE capacitors with proposed novel NDRO has great performance potential for next-generation low-power, high-speed, dense embedded memory.

## T16-4 - 15:15

Exploring FeFET Degradation Mechanisms: Mid-Interlayer as a Viable Solution for Stable Retention, Disturb Immunity, and Low V<sub>th</sub> Variation, G. Kim\*, S. Lee\*, H. Choi\*, H. Kim\*, S. Shin\*, S. Park\*\*, K. Seo\*\*, K. Kim\*\*, W. Kim\*\*, D. Ha\*\*, J. Ahn\*\*\* and S. Jeon\*, \*KAIST, \*\*Samsung Electronics Co., Ltd. and \*\*\*Hanyang Univ., Korea

This study experimentally investigates the unexplored dynamics of retention loss, disturbance, and V\_th instability in MIFIS FeFETs, featuring a metal-gate interlayer (G.IL)-ferroelectric (FE)-channel interlayer (Ch.IL)-Si stack. To prevent undesired decoupling of polarization (P) and trapped charges during performance degradation, a mid-interlayer (mid-IL) within the FE layer is proposed. The optimized mid-IL (i) stabilizes the FE phase in the HfZrO<sub>x</sub> (HZO) film, (ii) acts as an energy barrier in the gate stack to suppress trapped charge loss toward the channel, and (iii) mitigates channel percolation issues caused by the polymorphic and polycrystalline nature of FE films. These improvements enable an advanced gate stack design with reduced operation voltage ( $V_{PGM}/V_{ERS}$ : 17/-14 V), a wide memory window (MW: 10.5 V), stable TLC retention (delta MW~-6% after 10years at 85oC), disturb immunity (delta  $V_{th}$ =0 after 10<sup>4</sup> cycles at 12V), and low  $V_{th}$  variation (54%  $V_{th}$  variation reduction).

## **Technology Session 17**

#### Oxide Semiconductors 3: Device Physics and Reliability

Thursday, June 12, 14:00-15:40

Chairpersons: K. Tomida, Rapidus Corporation

A. Veloso, Imec

# T17-1 - 14:00

Key to Low Supply Voltage: Transition Region of Oxide Semiconductor Transistors, K. Jana\*, J. Kang\*, S. Liu\*, F. F. Athena\*, C.-H. Huang\*, Y. Tang\*\*, H. J.-Y. Chen\*, B. Saini\*, J. Hartanto\*, R. K. A. Bennett\*, A. E. O. Persson\*, S. Li\*, K. Neilson\*, Y.-M. Lee\*, K. Leitherer\*, K. Nomura\*\*, P. C. McIntyre\*, E. Pop\*.\*\* and H.-S. P. Wong\*, \*Stanford Univ. and \*\*Univ. of California, San Diego, USA

We study the voltage transition region (TR) from sub- to above-threshold of field-effect transistors (FETs) and characterize its width ( $V_{TR}$ ) which informs how much the supply voltage ( $V_{DD}$ ) can be reduced. The TR is significant in amorphous oxide semiconductor field-effect transistors (OSFETs) because the shallow traps in amorphous OS channels lead to large VTR. We introduce a  $V_{TR}$  extraction scheme and identify four main sources of shallow traps (ST) in amorphous OSFETs. We design experiments to individually evaluate their impact on  $V_{TR}$  and successfully devise four OSFET process/design knobs to minimize  $V_{TR}$ . Our analysis is then extended to other prominent FETs in the literature, with crystalline channel FETs showing  $V_{TR}$  < 80 mV, in contrast to amorphous OSFETs with  $V_{TR}$  ranging from 160 mV to as high as 1.1 V, highlighting that  $V_{TR}$  in amorphous OSFETs is a critical challenge that must be addressed.

## T17-2 - 14:25

First Demonstration of Fluorine-Treated IGZO FETs with Record-Low Positive Bias Temperature Instability ( $|\Delta V_{TH}| < 44$  mV) at an Elevated Temperature (395 K), B. Tang\*, X. Chen\*, R. Wan\*, Z. G. Yu\*\*, S. Hooda\*, W. Wang\*, Z. Fang\*, Q. Wan\*, M. Sivan\*, E. Zamburg\*, Y.-W. Zhang\*\*, X. Gong\* and A. V.-Y. Thean\*, \*National Univ. of Singapore and \*\*Institute of High Performance Computing (IHPC), A\*STAR, Singapore

We present here the reliability enhancement of BEOL-compatible ITO enhanced fluorine treated IGZO transistors. The IGZO FETs exhibit a record-low threshold voltage shift of 43.7 mV at an elevated temperature of 395 K under a high oxide electric field of 4 MV/cm for 1 ks, while maintaining excellent electrical performance. For the first time, we propose an atomic level mechanism that explains how fluorine doping modulates migration barrier and path of hydrogen that contributes to enhanced reliability in oxide FETs.



## T17-3 - 14:50

First Demonstration of Atomic-Interlayer-Tuning Driven by First Principles Calculations and Atomic Layer Deposition towards High Thermal Stable BEOL IGZO-FETs with SS=62mV/dec, PBTI < 7mV@ 3MV/cm and 353K, X. Li\*.\*\*, C. Gu\*.\*\*, T. Yu\*, Y. Zhao\*.\*\*, C. Chen\*.\*\*, C. Zhang\*.\*\*, Z. Bai\*, X. Duan\*, W. Li\*, T. Liao\*.\*\*, S. Hu\*.\*\*, N. You\*, J. Wang\*, R. Chen\*, Z. Wu\*.\*\*, N. Lu\*.\*\*, J. Wang\*, G. Yang\*.\*\*, L. Wang\*.\*\*, D. Geng\*.\*\*, L. Li\*.\*\* and M. Liu\*.\*\*, \*State Key Laboratory of Fabrication Technologies for Integrated Circuits, IMECAS, China and \*\*University of Chinese Academy of Sciences, China

In this work, first principles calculations, combining AIMD and static calculation are incorporated in the ALD co-designing of IGZO-FETs for the first time, providing in-depth study into the defect kinetics. With emulation of ALD, the optimized IGZO-FETs with outstanding stability up to 500 is achieved: (1) sub-0.5nm ALD InO interlayer strengthens the PBTI stability by 51 times @3.5MV/cm, from Vth = 174.1mV (5.7V $^{-1}$ ) to 3.4mV (294.1V $^{-1}$ ). (2) PBTI = 6.9mV; (3) 12.1mV @ 3MV/cm and 393K; (4) Endurance > 10 $^{11}$ Cycle with AC  $V_{g,bias}$  2.8V. The first-principles assisted experimental design is practically powerful for oxide FETs with multi-elements and complicated processing methods, and accelerating path-finding of Back-End-of-Line (BEOL) thermal stability enhancement.

## T17-4 - 15:15

First Direct Observation of Two Different Hydrogen-Related Processes Corresponding to the Negative VTH Shift Under PBTI Stress in IGZO Transistors by Pd Hydrogen Spillover, Z. Lin\*, L. Kang\*\*, Y. Zhang\*, S. Lu\*\*, T. Cui\*, L. Zheng\*, J. Zhao\*, X. Li\*, X. Zhao\*, Y. Wu\*\*, J. Xu\*\* and M. Si\*, \*Shanghai Jiao Tong Univ. and \*\*Huawei Technologies Co., LTD, China

In this work, we propose a new method to characterize the impact of hydrogen (H) on the PBTI degradation of IGZO transistors by the spillover of H through Pd electrodes. In this method, the incorporation of H into IGZO channel can be controlled by the H concentration in the test chamber and the location of Pd electrodes. As a result, for the first time, we identify two different negative threshold voltage ( $V_{TH}$ ) shift mechanisms on the same IGZO transistor, where both mechanisms are clarified to be originated from H-related defects. It's understood that different atomic bonding configurations of H might contribute to the different negative  $V_{TH}$  shift components. The IGZO transistors exhibit PBTI  $|\Delta V_{TH}|$  of 18.4 mV at 1 ks under a stress electrical field ( $E_{OX}$ ) of 6 MV/cm and a high temperature of 150°C by an optimized process to suppress vacancy defects and H.

## **Technology Session 18**

#### Interconnects

Thursday, June 12, 14:00-15:40

Chairpersons: M. Tada, Keio University V. Paruchuri, ASM

#### T18-1 - 14:00

Novel Advanced Low-k Dielectric for 2 nm and Beyond Cu and Post Cu Dual Damascene BEOL Interconnect Technologies, S. Nguyen, H. Huang, A. Jog, M. Shoudy, N. Lanzillo, K. Luedders, T. Cabrera, Y. Yao, D. Metzler, C. Meagher, Y. Mignot, T. Nogami, M. Silvestre, A. Simon, L. Wangoh, K. Motoyama, C. Penny, D. Edelstein, K. Choi, S. Ghosh, V. Narayanan, A. Dutta and S. Choi, IBM Albany NanoTech, USA

A novel class of mechanically and electrically robust advanced low-k (ALK) dielectrics has been developed. These have far lower plasma-induced damage (PID), excellent built-in Cu oxidation and diffusion barrier performance, and fundamentally more reliable TDDB. One ALK recipe has been fully evaluated as the next generation low-k interlevel dielectric (ILD) for 2 nm and beyond Cu and post Cu dual damascene BEOL¹. The dense ALK (k=3.2) and lightly porous ALK (k=2.8-3.0) films have high modulus (E  $\sim$  15-33 GPa), from  $\sim$ 1/2 to  $\sim$ 1/10 the PID of typical dense SiCOH (k $\sim$ 2.7-3.2) or pSiCOH (k $\sim$ 2.4-2.55). The ALK Cu and O₂ diffusion barrier properties enable further scaling of metal barriers, to increase Cu line volumes, reducing R, RC while actually improving TDDB and EM. This is confirmed for 2 nm node Cu dual damascene¹ and future subtractive Ru/airgap scheme.

## T18-2 - 14:25

BEOL Interconnects for 2nm Technology Node and Beyond, G. Thareja\*, A. Mema\*\*, G. Qu\*, N. Giulani\*, D. Cornigli\*, T. Takahisa\*\*\*, E. Piccinini\*, X. Wang\*, A. Palmieri\*, S. Barkam\*, M. Jamieson\*, B. Xie\*, S. You\*, S. Sharma\*, Y. Wu\*, M. Gage\*, Z. Wu\*, R. Shaviv\*, F. Nardi\*, M. Haverty\*, M. Spuller\*, B. Ng\*, N. Tam\*, A. Jansen\*, A. Lo\*, Z. Chen\*, F. K. Mungai\*, S. Deshpande\*, H. Ren\*, J. Tang\*, M. Naik\*, S. Kesapragada\*, C.-I. Lang\*, S. Muthukrishnan\*, B. Brown\*, M. Tada\*\*\*, J. J. Lee\*, L. Q. Xia\*, M. Berkens\*, L. Larcher\*, K. Kashefizadeh\*, H. Amrouch\*\* and X. Tang\*, \*Applied Materials, Inc., USA, \*\*Technical Univ. of Munich, Germany and \*\*\*Keio Univ., Japan

We present novel back-end-of-line (BEOL) copper interconnect integration for advanced technology nodes using low-k dielectric dep, binary liner metal gapfill process, radical assisted annealing, chemical mechanical planarization (CMP) and selective metal cap. Unit process and metrology data, electrical tests, Time Dependent Dielectric Breakdown (TDDB) reliability, Electromigration (EM) reliability and circuit simulations confirm significant power-performance-area (PPA) gains for 2nm technology node and beyond.



## T18-3 - 14:50

Selective Deposition and Ruthenium Superfill Exploration Beyond A10 Node Interconnects, M. H. van der Veen\*, G. Pattanaik\*\*, T. Hakamata\*\*, Y. Otsuki\*\*, A. Romo Negreira\*\*, J. Mayersky\*\*, K. Yu\*\*, R. Yonezawa\*\*, H. Suzuki\*\*, R. Clark\*\*, A. Kumar Mandal\*, A. Farokhnejad\*, N. Jourdan\*, P. Marien\*, G. Murdoch\*, H. Struyf\*, A. Sepulveda Marquez\*, S. Park\*, J. Swerts\* and Z. Tokei\*, \*imec, Belgium and \*\*Tokyo Electron Ltd., Japan

We demonstrate that selective Ru CVD can enable multilayer semi-damascene (SD) interconnects -at the critical metal levels in A10 node and below. In ring oscillator simulations, it is shown that Ru SD flows can benefit from barrierless Ru vias. The Ru CVD has excellent selectivity towards low-k 3.0 in MP21 vias. When Ru vias are combined with Cu wires, the MP24 chain resistance is lower compared to barrierless dual damascene (DD) Ru. Furthermore, we applied Ru superfill CVD in a low-k DD test vehicle at MP21-MP26. The Ru superfill does not induce line bending in  $SiO_2$  or in MP21 low-k lines due to its bottom-up growth nature. The Ru superfill resistance scaling down to 70 nm² is confirmed with the physical area. The selective and superfill Ru CVD are enablers of multilayer SD Ru interconnects for future nodes.

#### T18-4 - 15:15

Effects of Adjacent Floating Metal Interconnect Through Plasma-Induced Coupling, S. Lee, S. H. Lee, H. Choi, B. Woo, D.-K. Kim, J. Lee, Y. Heo, E. Kim, S. Park, S. Hong and S. Hur, Samsung Electronics Co., Ltd., Korea

3D NAND memories have been developed by increasing cell stack height, which accompanies strong plasma energy to achieve high aspect ratio. Plasma-induced current causes high bias voltage stress during the fabrication, finally leading to gate oxide breakdown or wafer arcing. Although plasma damage has been studied for a long time, the effects of the floating metal interconnects during the plasma charging processes have been less concerned. This study aims for completeness of layout design guidelines for metal interconnects by presenting various cases of electromigration (EM) failure and suggesting a new index of plasma-induced damage (PID), which can prevent unexpected changes or failures in electrical properties under plasma charging environments.

# **Technology Session 19**

#### **Gate Stack and BEOL Transistor Processes**

Thursday, June 12, 14:00-15:40

Chairpersons: T. Irisawa, AIST

H. Simka, Samsung

## T19-1 - 14:00

Orthogonal V<sub>T</sub> Tuning for Oxide Semiconductor 2T Gain Cell Enabled by Interface Dipole Engineering, F. F. Athena\*, J. Kang\*, M. Passlack\*\*, N. Safron\*\*, D. Dede\*, K. Jana\*, B. Saini\*, X. Wang\*, S. Liu\*, J. Hartanto\*, E. Boneh\*, H. J.-Y. Chen\*, C.-H. Huang\*, Q. Lin\*\*, D. Zhong\*\*, K. Leitherer\*, P. C. McIntyre\*, G. Pitner\*\*, I. P. Radu\*\* and H.-S. Wong\*, \*Stanford Univ., USA and \*\*TSMC, Taiwan

We demonstrate interface engineering (eng.) of the gate dielectric as an independent knob to tune the threshold voltage ( $V_T$ ) of Oxide Semiconductor (OS) FETs for two-transistor (2T) gain-cell memories (GC). By leveraging Interface Dipole (ID) eng. for Indium-Tungsten-Oxide (IWO) FETs, we achieve a ~450-500 mV  $V_T$  increase compared to the standard HfO<sub>2</sub> (STD), maintaining this increase in  $V_T$  from 85 °C to cryogenic temperatures. ID engineered (ID engd.) GCs exhibit good reliability, showing a ~60 mV shift under worst-case DC positive-bias stress (PBS) at 85 °C. This approach is demonstrated across other OS such as Indium-Oxide, Indium-Tin-Oxide, and Indium-Gallium-Zinc-Oxide and at short channel lengths (sub-100 nm). Finally, simulations indicate that ID  $V_T$  tuning reduces GC refresh energy by ~50,000x compared to STD, enabling energy-efficient GC.

## T19-2 - 14:25

First Demonstration of 9N+9P Complete Dipole Multi-V<sub>T</sub>s CMOS Integration with Atomic Interfacial Dipole Buffer Layer Technique in GAA NSFETs, Y. Wei\*.\*\*, J. Yao\*, Y. Wang\*.\*\*, Q. Zhang\*, S. Yang\*, L. Cao\*, Q. Li\*.\*\*, X. Zhang\*.\*\*, Y. Zhang\*.\*\*, H. Yang\*, J. Li\*, H. Yin\*.\*\*, X. Wang\*.\*\* and J. Luo\*.\*\*, \*State Key Laboratory of Fabrication Technologies for Integrated Circuits, Institute of Microelectronics of the Chinese Academy of Sciences and \*\*School of Integrated Circuits, Univ. of Chinese Academy of Sciences, China

For the first time, we reported one complete dipole CMOS integration with atomic interfacial dipole buffer layer (AIDBL) technique for WFM-free multi- $V_T$  modulation technology. With novel dual dipole-first integration process, various dipole tuning and AIDBL linear  $V_T$  modulation capabilities, the experimental CMOS integration on GAA NSFETs were successfully established and the devices achieved symmetric 9N+9P level multi- $V_T$ s within ~800 mV, which providing one promising method to effectively reduce challenging  $T_{sus}$  space for WFM filling and then enhance the transistor driving performance for future transistor technology.

## T19-3 - 14:50

Shifter materials and Stack Explorations for V<sub>t</sub> Fine-Tunable Dual Dipole Multi-V<sub>t</sub> Gate Stacks Compatible with Low Thermal Budget CFET, H. Arimura, L. Lukose, J. Ganguly, J. Franco, H. Mertens, J. Stiers, J.-G. Lai, A. Mehta, M. Bejide, M.-S. Kim and N. Horiguchi, imec, Belgium

We report ALD processes, shifter stack and materials for dual dipole multi- $V_t$  gate stacks having a  $V_t$  fine tunability and compatibility with a low thermal budget RMG. EOT penalty of dipole-first stack is mitigated by the choice of ALD oxidant, q-time control and the use of a less hygroscopic shifter material. Based on an understanding on the impact of La silicate formation in LaOx dipole-first gate stack, fine tunable n/p-type dipole gate stacks are demonstrated via (a) buffer HfO<sub>x</sub> and (b) HfO<sub>2</sub> and ZrO<sub>2</sub> high-k shifters. These interface engineering paves a way towards low thermal-budget compatible dipole multi-Vt gate stacks for CFET.



## T19-4 - 15:15

High-Performance Monolithic 3D CMOS Enabled by Orientation-Aligned Seedless Laser Crystallization and Ultra-Shallow Laser Activation, J. Park\*, H. Jeong\*\*, E. Park\*, G. Park\*\*, C.-h. Ahn\*, S. Lee\*, J. Won\*\*, H.-J. Kwon\*\* and H.-Y. Yu\*, \*Korea Univ. and \*\*Daegu Gyeongbuk Institute of Science and Technology, Korea

In this study, we demonstrate PSLC Si-based CMOS devices on the M3D top layer using a seedless crystallization process. Laser crystallization forms single-orientation Si channels (25  $\mu$ m grain size), enhancing carrier mobility. Laser S-D activation achieves low contact resistivity (~10-8 Ohm cm²) below 400 °C, meeting M3D constraints. PSLC-Si CMOS devices exhibit  $I_{ON}$ - $I_{OFF}$  > 108 with high electron field effect mobility (521 cm²/Vs) and hole field effect mobility (163 cm²/Vs). CMOS inverters show clear switching transitions, confirming feasibility for M3D logic applications. These results validate the potential of a fully laser-based process for M3D-integrated logic devices.

#### **Circuits Session 31**

# **MEMS and Display**

Thursday, June 12, 16:00-17:40

Chairpersons: N. Miura, Osaka University

A. Manickam, Cepheid

#### C31-1 - 16:00

A Fully Integrated Bipolar 1.8Vpp-to-41Vpp 450kHz Switched-Capacitor MEMS-Driver with a Power Reduction Factor of 16.3, M. Gruber\*\*, \*\*\*, B. Eversmann\*, L. Liao\* and R. Brederlow\*\*, \*Infineon Technologies AG and \*\*Technical Univ. of Munich, Germany

This work presents a fully integrated bipolar switched-capacitor driver for ultrasonic Micro-Electro-Mechanical Systems (MEMS) transducers, capable of generating boosted amplitude modulated signals from a low supply voltage. The proposed architecture consists of a bipolar series-parallel charge pump (SP-CP) that generates positive and negative single-ended output voltages. This almost doubles the peak-to-peak output voltage and the power reduction factor compared to conventional unipolar SP-CPs. The bipolar SP-CP achieves better stage utilization, saving up to 50% in number of stages.

## C31-2 - 16:25

A 22µg/√Hz Noise Floor, 1.6mg/g² VRC, High Efficiency MEMS Capacitive Accelerometer using High-Voltage Orthogonal Excitation Technique, W. Cao, L. Zhong, W. Jian, L. Wang and Z. Zhu, Xidian Univ., China

This work presents a high-voltage orthogonal excitation (HVOE) based readout circuit for MEMS capacitive accelerometer. The proposed interface IC is fabricated in 0.18um BCD process, achieving a low noise floor of 22µg/sqrt Hz with low vibration rectification error (VRE) evaluated by vibration rectification coefficient (VRC) (1.6mg/g²) and high efficiency evaluated by FoM (4.58pJ). Compared to state-of-the-art low noise MEMS accelerometers, this work improves the VRC and the FoM by 40x and 11.6x, respectively.

## C31-3 - 16:50

A 560μW 6fA/\Hz 146dB-DR Ultrasensitive Current Readout Circuit for PWM-Dimming-Tolerant Under-Display Ambient Light Sensor, L. Liu\*, T. Qu\*, Q. Pan\*, D. Li\*\*, G. Guo\*\*, Z. Hong\* and J. Xu\*, \*Fudan Univ. and \*\*OPNOUS Smart Sensing & AI Technology, China

This paper presents an ultra-low-noise, power-efficient, and PWM-dimming-tolerant photocurrent readout circuit for under-display ambient light sensors (ALS). To achieve a pA resolution, a transimpedance amplifier (TIA) with a feedback diode is used to provide G-Ohm resistance and input current noise of  $6fA/\sqrt{Hz}$ . To resolve instability and noise folding with low power, the TIA utilizes a signal-dependent auto-tracking zero for frequency compensation, and a low-pass filter to suppress high frequency current noise. To accurately extract ambient light from the interference of PWM dimming, the readout can operate in an under-sampling mode to remove the interference through algorithm. Fabricated in a 180nm CMOS technology, this work achieves best-in-class resolution of  $0.36p_{APP}$  and 146dB DR in a readout time of 0.84ms, reaching an excellent FoM-DR of 206.3dB. ALS optical measurement with PWM dimming interference has been successfully demonstrated.

# C31-4 - 17:15

A Hybrid Touch Sensing AFE with Common-CVQ (Currents, Voltages, and Charges) Subtraction to Improve Display Noise Immunity for Large Sensing Load Up to 820pF, S. H. Choi\*, J. Y. An\*, \*\*, J. Ahn\*, J.-Y. Lee\*\*, S.-W. Kim\*\*, H.-M. Lee\* and Y.-K. Choi\*, \*Korea Univ. and \*\*Samsung Electronics Co., Ltd., Korea

Flexible on-cell touch OLED displays face significant display noise challenges due to large parasitic capacitance (CP). To address the limitations of conventional methods, this paper proposes improved common-current subtraction (CCS) incorporating common-voltage subtraction (CVS) and common-charge subtraction (CQS) techniques. CVS enhances SNR by up to 6.72 dB in self-capacitance (SC) sensing and 6.33 dB in mutual capacitance (MC) sensing, while CQS increases the operational load capacity from 390 pF to 820 pF. Fabricated in 65 nm CMOS, the proposed AFE achieves a twofold improvement in input dynamic range (DR) and consistent performance under display noise (D-noise) injection.



#### **High-Resolution ADCs**

Thursday, June 12, 16:00-18:05

Chairpersons: S. Sin, University of Macau

Z. Toprak-Deniz, IBM Research

#### C32-1 - 16:00

An 88.8dB-SNDR 6-MS/s Pipelined SAR ADC with A Closed-Loop Dynamic Amplifier Featuring Highly-Linear Full-Scale Output Swing, Z. Wang\*, H. Luo\*, R. Bao\*, L. Jie\*\* and X. Tang\*, \*Peking Univ. and \*\*Tsinghua Univ., China

This paper presents a pipelined SAR ADC with a two-stage closed-loop residue amplifier (RA). The proposed RA incorporates a floating gm-ratio-based amplifier (FGA) as the first stage and a charge-pump-assisted cascoded floating inverter amplifier (CP-FIA) as the output stage. It enables fast-slewing, energy-efficient, robust, and highly-linear amplification, with a THD of <-100 dB at full-scale output swing. A system-level chopping-assisted auto-zero (AZ) is adopted to eliminate the RA's flicker noise and offset without additional thermal noise penalty. The prototype achieves 88.8dB SNDR at 6MS/s, demonstrating the best FoMs of 183dB among ADCs with >5MS/s conversion rate.

#### C32-2 - 16:25

A 91.2dB-SNDR 250kHz-BW CT Zoom ADC Achieving a 6-bit Linear Zoom-in with Interstage LPF and 1.5-bit DAC, J. Yoon\* and Y. Chae\*.\*\*, \*Yonsei Univ. and \*\*XO Semiconductor, Korea

This paper presents a continuous time (CT) zoom ADC that achieves a 6-bit linear zoom-in with an interstage LPF and an intrinsically linear 1.5-bit DAC. By using the interstage LPF, the shaped residue signal of the coarse delta-sigma modulator is properly filtered, enabling 6-bit linear zoom-in, boosting the SQNR by >36dB. Fabricated in 28nm CMOS process, the ADC consumes only 0.392mW while achieving 101.3dB SFDR, 91.2dB SNDR, and 92.7dB DR in a signal bandwidth of 250kHz. This corresponds to a state-of-the-art FoM of 180.7dB.

# C32-3 - 16:50

A 0.0035mm² 86dB-SNR 1.25MHz-BW Noise-Shaping SAR ADC Enabling kT/C Noise Shaping, X. Wang, Z. Zhang, Y. Zhong, N. Sun and L. Jie, Tsinghua Univ., China

This paper proposes a 13-bit 2nd-order noise-shaping SAR (NS-SAR) ADC with a novel kT/C noise shaping technique. It achieves 85.8dB SNDR with a 40MHz sam-pling rate and an oversampling ratio (OSR) of 16. Its input sampling capacitance is merely 0.3 pF, occupying an area of 0.0035 mm² in a 28nm CMOS technology. It is the smallest among all published ADCs exceeding 70dB SNDR.

# C32-4 - 17:15

A 94.4dB-SNDR 500kHz-BW Multi-Rate MASH 0-1-0 ADC with Easy-to-Drive Capacitive Input and Deadband-Embedded Gm-C Loop Filter, C. Xing, Y. Zhong, N. Sun and L. Jie, Tsinghua Univ., China

This work proposes a multi-rate MASH 0-1-0 ADC consisting of an 8b oversampled SAR ADC, a 1b incremental CTDSM, and a 12b Nyquist SAR ADC to achieve high SQNR with high efficiency. A coupling capacitive input network is introduced to ease the ADC driving. A high linearity Gm-C integrator adopting low-noise deadband operation is used for high efficiency and high SNR. The prototype ADC achieves 94.4dB SNDR over a 500kHz bandwidth, consuming only 793uW of power, and resulting in a FoMS of 182.3dB.

# C32-5 - 17:40

An NS-SAR Quantizer-Based Pipeline Incremental Delta-Sigma ADC Using a Current-Regulated Floating Ring Amplifier and Two-Phase Miller Negative-C, S. Song\*, T. Kang\*\*, A. Knowlton\*, S. Lee\* and M. P. Flynn\*, \*Univ. of Michigan, USA and \*\*Sungkyunkwan Univ., Korea

A pipeline incremental ADC (IADC) leverages a 6-bit noise-shaping SAR quantizer for high SQNR at low OSR. A pipeline IADC structure enables continuous sampling and concurrent stage processing, improving bandwidth. Delay-free integrators and residue amplification enhance signal accumulation, boosting SQNR. A current-regulated floating ring amplifier (CURE FLORA) halves power consumption, while a two-phase Miller Negative-C technique mitigates integrator gain errors. Implemented in 28-nm CMOS, the prototype achieves an SNDR of 80.1 dB, SFDR of 97.2 dB, at 80 MS/s over a 5 MHz bandwidth. With a power consumption of 1.85 mW, the Walden FoM and Schreier FoM of 22.4 fJ/conversion-step and 174.4 dB, respectively.



#### Wireless Power and Gate Drivers

Thursday, June 12, 16:00-17:40

Chairpersons: P. Chen, National Yang Ming Chiao Tung University

G. Pillonnet, CEA-Leti

#### C33-1 - 16:00

A 6.78 MHz Multiple-Transmitter Wireless Power Transfer System with Integrated Coupling Coefficient Sensor, Y. Yao\*, F. Tian\*, J. He\*, W.-H. Ki\*, C.-Y. Tsui\*, K.-T. Cheng\* and Y. Liu\*\*, \*The Hong Kong Univ. of Science and Technology, Hong Kong and \*\*Xidian Univ., China

A 6.78 MHz multiple-transmitter wireless power transfer (WPT) system with integrated coupling coefficient magnetic and polarity sensor is proposed to achieve misalignment-free WPT. Compared to conventional systems, the perpendicular WPT efficiency increased 212x from 0.3% to 63.7%, with the load power increased from 0.3mW to 2W.

## C33-2 - 16:25

A Multi-Rectenna, Single-Output, Power Combine-and-Regulate Boost Converter for 5.8GHz Wireless Power Receiver Achieving 3.1W over 50m-Distance, H. Lee\*, J. Yoo\*, J. Lee\*, K. Lee\*\*, W. Lim\*\*, S.-H. Yi\*\* and H.-S. Kim\*, \*KAIST and \*\*Korea Electrotechnology Research Institute (KERI), Korea

This paper presents a multi-input single-output power combine-and-regulate (MISO-PCR) converter chip for a 5.8GHz WPT receiver (Rx). The MISO-PCR simultaneously collects, combines, and converts the multiple power from 30 GaAs rectennas into a single regulated output (28V). Fabricated in a 180nm BCD, the chip enables the Rx system, paired with a beamforming Tx, to deliver a 3.1W power at a 50m distance.

## C33-3 - 16:50

A Wireless Power and Synchronized Full-Duplex Data Transceiver IC with 400 kbps Bidirectional Data Rate Using a Single Inductive Link for Low-power Systems, J. Lee, Y. Kim, D. Kim, D. Kang, M. Song and B. Lee, Hanyang Univ., Korea

This paper presents a wireless power and synchronized full-duplex data transceiver (WPSFDT) system using a single inductive link. This system employs TRX synchronization (TRXS) for synchronized data recovery (SynDR) of both off-duration shift keying (ODSK) downlink and short-pulse load shift keying (SP-LSK) uplink. The envelope hold scheme ensures robust full-duplex (FD) data telemetry by mitigating self-interference (SI) caused by uplink data transmission. Measured results demonstrate that the proposed system achieves, for the first time, 400 kbps synchronized FD data transmission using a single inductive link through TRXS without any additional components for frequency separation.

# C33-4 - 17:15

A Dual-Mode Direct-Drive D-GaN Driver with Reused Inductor and Power Switches for Negative Voltage Generation and Gate Energy Recycling, P. Chu\*, S. Hua\*, Y. Gao\*, Q. Hu\*, R. Sokolovskij\*, Q. Wang\*, H. Yu\*, \*\* and Y. Gao\*, \*Southern Univ. of Science and Technology and \*\*Shenzhen Polytechnic Univ., China

This paper presents a dual-mode direct-drive driver for depletion-mode GaN (D-GaN) power transistors, integrating negative voltage generation and gate energy recycling with reused inductors and switches. D-GaN devices offer superior reliability and gate voltage swing but require complex drive schemes and incur significant gate driving losses in high-power, high-frequency applications. The proposed driver combines an inverting buck-boost converter with a resonant gate driver, operating in resonant direct-drive (RDD) mode for efficient gate energy recycling at high frequencies and conventional direct-drive (CDD) mode for adjustable slew rate control. Fabricated in a 0.18µm HV BCD process, the driver achieves a gate charging efficiency of 93% and reduces power consumption by up to 40% in RDD mode.

# **Technology Session 20**

## **DRAM**

Thursday, June 12, 16:00-18:05

Chairpersons: T. Kudo, Micron Memory Japan, Inc.

N. Yoshida, Applied Materials, Inc.

# T20-1 - 16:00

Process Insights into 3D-DRAM with Vertical Bit Line and Scalable GAA Transistor, N. Rassoul\*, L. A. Labbate\*, G. Eneman\*, A. Fantini\*, R. Ritzenthaler\*, J. L. Prado\*, R. Loo\*\*, W. Devulder\*, E. Dupuy\*, T. Peissker\*, A. Pacco\*, H. K. Raut\*, P. Eyben\*, E. Canga\*, E. Rosseel\*, A. Sharma\*, H. Oh\*, Y. Rawal\*, M. Beggiato\*, J. Kim\*, Y. Jiang\*, A. Milenin\*, S. Subhali\*, J. Mitard\*, D. Crotti\*, I. Lee\*, A. Belmonte\* and G. S. Kar\*, \*imec and \*\*Ghent Univ., Belgium

A novel 3D-DRAM integration flow with vertical bit line and gate-all-around (GAA) transistor is successfully demonstrated. The process modules are thoroughly described along with the challenges and optimization. The morphological integrity and electrical functionality of GAA select transistors with monocrystalline Si channel are demonstrated. TCAD modeling predicts scaling possibilities with this architecture, corroborating its suitability for future DRAM implementation.



# T20-2 - 16:25

High Performance and Reliable 4F<sup>2</sup> IGZO Vertical Channel Transistor (VCT) with Extremely Low Contact Resistance and 10 Year BTI lifetime for Sub-10nm DRAM, D. Ha, Y. Lee, K. J. Moon, S. Lee, K. Yoo, W. Lee, S. Yoo, M. H. Cho, S. N. Kim, M. Terai, M. Kim, J. H. Bae, S. Park, S. M. Lee, M. Hong, K. Sim, C. Im, S. Hong, C. Sung, H. Kim, K. Kim, H. Cho, S. Byeon, I. Shin, J. Chae, Y. S. Tak, H. Yoon, S. Kim, S. Jeong, K. Park, S. H. Lee, S. W. Park, P. Yun, S. Hyun, S. J. Ahn and J. Song, Samsung Electronics Co., Ltd., Korea

For the first time, we present experimental demonstration of a high-performance and highly reliable IGZO vertical channel transistor(VCT). It exhibits excellent leakage current(< 0.1 aA/ $\mu$ m) and switching characteristics(98 mV/dec) with V<sub>T</sub> of -0.13V at 85°C. Furthermore, it provides a symmetric I<sub>ON</sub> of 49.5  $\mu$ A/ $\mu$ m at V<sub>DD</sub>= 1.5V which is sufficient for DRAM read/write operations of data '0' & '1', primarily due to the enhanced mobility and extremely low storage contact(BC) resistivity(3.4x10<sup>-7</sup> ohm-cm²). To best of our knowledge, this represents the best resistivity reported for an IGZO transistor with physical dimension below 10 nm. In addition, its wafer level reliability(WLR) assessment indicates that stable DRAM operation can be guaranteed longer than 10 years, including static NBTI, dynamic PBTI and TDDB. Measured negligible floating body effects(FBE) and self heating effect(SHE) suggests 4F² IGZO VCT can be an excellent candidate to sustain DRAM scaling trajectory.

## T20-3 - 16:50

First Thorough Assessment of Time-Dependent Dielectric Breakdown in Sub-25 nm Gate-All-Around Vertical InGaZnO Transistor for 4F<sup>2</sup> DRAM Application, A. Kamiyama, S. Kabuyanagi, M. Toda, T. Okada, S. Hasegawa, T. Hamai, I. Watanabe, S. Sato, K. Matsuo, Y. Kasahara, K. Ariyoshi, K. Haga, K. Ikeda, A. Kajita, T. Tsukamoto and S. Fujii, KIOXIA Corp., Japan

In this paper, we investigated and quantified, for the first time, the degradation components of TDDB for InGaZnO gate-all-around (GAA) vertical-channel transistors (VCTs) with 24-40-nm hole diameter. We found that extrinsic degradation components, namely poor hole roughness and process damage of bottom opening significantly degrade the TDDB lifetime. Owing to the optimization of the process, we achieved the TDDB lifetime of >10 years for the sub-25 nm GAA VCT. The most conservative lifetime extrapolation and the area normalization for the cylindrical system made the estimation of >10 years more reliable. Furthermore, the lifetime was comparable to the intrinsic lifetime of gate insulator, indicating the extrinsic degradation components were effectively suppressed. These findings pave the way for the future reliable 4F² DRAM application.

## T20-4 - 17:15

A Recall-Free 3D Stackable nvDRAM Built Upon Gate-Controlled Thyristor, W.-C. Chen, H.-T. Lue, Y.-T. Lin, K.-C. Wang and C.-Y. Lu, Macronix International Co., Ltd., Taiwan

We introduce a novel and compact gate-controlled thyristor (GCT) based 3D stackable non-volatile DRAM (nvDRAM) that features recall-free operation. Unlike typical nvDRAM, which relies on separate SONOS cells for NVM operation, our design incorporates an ONO charge trapping layer within the gate dielectric of the central main gate in a three-word line (3-WL) GCT architecture. This innovation enables the "store" in nvDRAM to be carried out through self-boosting. Thanks to the unique properties of GCT, which integrates NVM and DRAM components within the same device, the "store" effectively serves as the "recall", eliminating the need for an additional "recall". This results in faster boot times and reduced power consumption during startup. This novel nvDRAM holds significant potential to achieve rapid data backup and boot-up, positioning it as a promising alternative to persistent memory.

# T20-5 - 17:40 (Late News)

**4F<sup>2</sup> DRAM Integration with Vertical Gate (VG) Cell Transistor and Peri-Under-Cell (PUC) Architecture,** J. Park, J. Cha, J. S. Cho, S. W. Chu, C. Hyun, D. Kim, I. Kim, J. Kim, S. Kwon, J. Lee, S. C. Lee, J. H. Myung, J. Song, M. Sung, J. Kang, S. Chung, K. Kim, S. Kim, S. S. Kim, S. Kim, S. Kim, S. Lee, Y. Cho and S. Cha, SK hynix, Korea

Process integration of 4F2 DRAM array with peri-under-cell (PUC) architecture has been successfully demonstrated for the first time, employing vertical gate (VG) cell transistor and wafer bonding process. Fusion wafer bonding and inter-wafer contact techniques enabled wafer-to-wafer integration to provide robust electrical connections between cell and peripheral circuit wafers. Compared to conventional 2D DRAM, superior control over threshold voltage is achieved from back gate bias in VG cell transistor. Precise junction engineering by thermal annealing optimizes cell transistor performance.

## **Technology Session 21**

## Advanced Packaging and 3D Integration

Thursday, June 12, 16:00-17:40

Chairpersons: J. Jeong, Samsung Electronics Co., Ltd.

Y. Liang, NVIDIA Corp.

## T21-1 - 16:00

Power Delivery for Scaled-Out Chiplet-Based Wafer-Scale Systems with 8 μm Cu-Cu Bond Pitch on Active Si-Interconnect Fabric Substrate, H. Ren, Z. Guo, B. Yang, K. Sahoo, N. Meenakshi Sundaram, Z. Chen, S. Singh, T. S. Fisher, C.-K. K. Yang, B. Razavi, B. Vaisband and S. S. Iyer, UCLA Center for Heterogeneous Integration and Performance Scaling (UCLA CHIPS), USA

Wafer-Scale chiplet-based systems allow for condensing an entire data center rack into a single advanced substrate. However, power delivery and thermal management remain challenging. We demonstrate a heterogeneously integrated voltage converter partitioned into two parts - a local utility chiplet and an active Silicon-Interconnect Fabric (Si-IF) substrate for large chiplet-based systems with a power density of 562 W/cm³, and a two-phase cooling thermal solution. These advances can be applied to a variety of power and thermal architectures for wafer-scale systems, paving the way for the development of high-performance computing (HPC) and large language model (LLM) applications that overcome the limitations of interposer and 3D packaging technologies



## T21-2 - 16:25

Investigation of Post-Bonding Die Stretching in Die-to-Wafer Hybrid Bonding, Y. Lu\*, B. Pressl\*\*, K. Zheng\*\*\*\*, L. Jiang\*, Y. Wang\*\*\*\*, H. Kostner\*\*, C. Scanlan\*\*\*, R. Hung\*\*\*\* and E. Bazizi\*, \*Applied Materials, Inc., USA, \*\*Back End Semiconductor Industries N.V, Austria, \*\*\*Back End Semiconductor Industries N.V., Steinhausen, Switzerland and \*\*\*\*Applied Materials, Inc., Singapore

Low-temperature, high-accuracy die-to-wafer (D2W) hybrid bonding is increasingly employed to enable high bandwidth, high performance, and low power consumption packaged devices [1]. It is crucial to minimize overlay errors to achieve high electrical yield and package performance. A physics-based model is developed to replicate the dynamics of bonding propagation and post-bond die stretching. The simulation results are compared to experiment. The initial die warpage, bonding energy and pedestal design are optimized to achieve target die stretching specifications.

#### T21-3 - 16:50

Novel Ultra-thin Transistor Layer Transfer (TLT) Technology for Demonstrating Wafer-Level nm-Scale 3-Layer Stacking to Enable Multi-Tier Transistors and Backside PDN of a 3D Vertical FET Architecture, C.-L. Lu\*, H. C.-H. Chang\*\*, Y.-C. Lin\*, P.-R. Ni\*, C.-H. Chuang\*, H. Henck\*\*, C. Y. Chuang\*, S.-C. Lin\*, H.-S. Huang\*, L.-H. Chiu\*, Y.-C. Sun\*, K.-Y. Shih\*, I. Radu\*\*, B.-Y. Nguyen\*\* and S.-Z. Chang\*, \*Powerchip Semiconductor Manufacturing Corporation, Taiwan and \*\*Soitec, France

A new wafer-level transistor layer transfer (TLT) technology based on Smart CutTM and IR laser release processing is reported to target on nm-scale ultra-thin multi-tier stacking. This innovation demonstrated 3-layer stacking with minimum Si thickness less than 300nm and layer-to-layer isolation dielectric thinner than 40nm under low thermal budget (<350°C) for the 1st time. This TLT stack shows good wafer warpage (<60µm) and small total thickness variation (<2nm) across wafer. Moreover, it was further hybrid bonded with a Si wafer (4-layer wafer stack in total) to reveal the possibility of functional SoC integration. An integration flow with the TLT technology as backbone is proposed to enable a 3D vertical FET structure composited multi-tier transistor-stacking, backside PDN integration, and cascaded contact plug formation.

## T21-4 - 17:15

First Demonstration of 3D Monolithic-Integrated BEOL OSFETs on GaN HEMTs: CEO-GaN, G. Zheng\*, H. Li\*, L. Huang\*\*, J. Xie\*, Z. Zheng\*, H. Xie\*\*\*, Y. Wang\*, X. Chen\*, W. Gu\*\*, G. I. Ng\*\*\* and X. Gong\*.\*\*\*, \*National Univ. of Singapore, Singapore, \*\*Nanjing Univ. of Science and Technology, China and \*\*\*Institute of Microelectronics, A\*STAR, Singapore

For the first time, we propose and demonstrate an innovative 3D monolithic <u>Cascode Enhancement-mode</u> (E-mode) <u>O</u>xide semiconductor-integrated GaN (CEO-GaN) technology, featuring BEOL-compatible top-gate indium-tin-oxide (TG-ITO) FET on GaN HEMTs and MISHEMTs on Si. Our device exhibits outstanding electrical performance, achieving E-mode operation, a high drain current ( $I_D$ ) of 1090 mA/mm, a high breakdown voltage (BV) of 603 V, and stable performance up to 350 K.

## **Technology Session 22**

## **DTCO and Design Enablement**

Thursday, June 12, 16:00-17:40

Chairpersons: H. Morioka, Socionext Inc.

S. Song, Google

## T22-1 - 16:00

Extending the Gate-All-Around (GAA) era to the A10 node: Outer Wall Forksheet Enabling Full Channel Strain and Superior Gate Control, L. Verschueren\*, G. Eneman\*, S. Yang\*, J. Boemmels\*, P. Matagne\*, K. B. Cahuenas\*.\*\*, A. Sharma\*, D. Abdi\*, H. Mertens\*, G. Mirabelli\*\*\* and G. Hellings\*, \*imec, Belgium, \*\*Universidad San Francisco de Quito USFQ, Ecuador and \*\*\*Synopsys, Inc., Belgium

Outer wall forksheet device architecture enables further scaling of gate-all-around (GAA) technologies, while maintaining full channel control through Omega-gate, and full s/d stress efficacy. The presented wall-last approach allows for a continuous crystal template during epi growth, enabling full channel stress. Furthermore, the wider wall reduces process complexity and enables omega gate through wall etch back, effectively creating a GAA channel, increasing electrostatic control, and boosting on-current by 27% compared to tri-gate forksheet architectures.

# T22-2 - 16:25

SRAM Scaling Opportunities Below 0.01 µm² Using Double-Row CFET Architecture with Wordline-Folded Bitcell Design for Performance Optimization, D. Abdi, G. Hellings, J. Boemmels, F. G.-Redondo, L. Verschueren, A. Sharma, H. Kukner, M. G.-Bardon and J. Ryckaert, imec, Belgium

This work presents a DTCO exploration that enables SRAM scaling below 0.01  $\mu m^2$  in CFET nodes, achieving over 50% scaling compared to the A14 nanosheet node. It focuses on simplifying SRAM bitcell internal routing in CFET architecture, identifying process integration modules required for a compact CFET bitcell, and proposing design approaches to optimize performance. The DTCO study demonstrates that designing two bitcells together in a wordline-folded configuration, compared to bitcells without WL-folding, improves performance by 2x, enhances write margin by 162 mV, and enables a standard cell-like layout.



## T22-3 - 16:50

PPA Scaling of Flip FET Technology Down to A2 Node Enabled by Architecture Innovations: Self-aligned Gate, 2T Design with Embedded Power Rail and Ultra-stacked 4-Tier Transistors, W. Peng, H. Lu, J. Jiang, R. Guo, J. Sun, J. Jin, Y. Cheng, S. Zhou, Z. Xu, C. Lan, Y. Chu, X. Jiang, F. Teng, M. Li, Y. Lin, X. Wang, R. Wang, H. Wu and R. Huang, Peking Univ., China

We carried out a critical examination of Flip FET (FFET) technology from process innovation to circuit design from A14 to A2. The Fully-aligned FFET (F3ET) featuring crucial <u>self-aligned FS-BS gates</u> was experimentally demonstrated, with common & split gates validated on fins and multi-stack Nanosheets. The <u>Forksheet-based F3ET (F4ET)</u> with embedded Power Rail was proposed to <u>reduce the cell height to 2T</u>. With <u>ultra-stacked 4-tier transistors</u>, the CFET-based FFET shows further area scaling potential. Comprehensive PPA evaluation was conducted on circuit- and block-level. DTCO knobs were studied on the A7 F3ETNS and A5 F4ET, with RO frequency (HP) improved by 11.3% and 6.2% respectively. A3 HP F4ET outperforms A14 HP Fin-based FFET by 38.9%. P&R results on RISC-V cores shows 44.9%(HP)/49.8%(HD) area scaling and 20.0%(HP)/27.9%(HD) frequency improvement from A14 to A5. SRAM scaling down to A2 was also studied in a 256x256 array.

## T22-4 - 17:15

First Demonstration of Symmetric Dual-Sided Vertical FET (DSVFET) for Energy Efficient Computing (EEC): From Processes and Devices to Circuits, Y. Liu\*, Y. Chu\*, Y. Wang\*, Z. Xu\*, J. Zhang\*\*, F. Sun\*.\*\*\*, H. Lu\*, Z. Wang\*, L. Li\*\*\*\*, L. Zhang\*, J. Wu\*, Y. Wu\*, S. Liu\*, X. He\*, T. Liu\*\*\*\*\*, M. Xu\*\*\*\*\*, P. Ren\*\*\*, Z. Ji\*\*\*, X. Wu\*\*\*\*, L. Zhang\*, W. Bu\*, J. Kang\*, J. Zhang\*\*, M. Li\*, R. Wang\*, H. Wu\* and R. Huang\*, \*Peking Univ., \*\*SongShan Lake Materials Laboratory, \*\*\*Shanghai Jiao Tong Univ., \*\*\*\*East China Normal Univ. and \*\*\*\*\*Fudan Univ., China

For the first time, we proposed and demonstrated the symmetric dual-sided vertical FET (DSVFET) experimentally, with aggressive contact gate pitch of 50 nm and Lg of 19 nm. The DSVFET not only inherits low leakage, drive voltage and parasitic from the standard VFET but also features unique dual-sided S/Ds with smaller footprint and better symmetry, perfectly suiting the energy efficient computing. New processes and DTCO methods were developed to study the structure, performance and circuit design. DSVFET was also validated with nearly symmetric behaviors ( $C_{\text{eff}}$  & delay difference < 3%) and 38%/60% lower power than FinFET for 2 & 1 nanosheet designs at iso-frequency based on RO simulation respectively. A RISC-V core was used for block level evaluation and 17.4% lower energy, 15% lower energy delay product (EDP) and 39.8% smaller areas were identified for the 1NS DSVFET with PowerVia. DSVFET has great potential for future's EEC.

## **Technology Session 23**

## Wireless and RF Devices

Thursday, June 12, 16:00-18:05

Chairpersons: R. Kuroda, Tohoku University

N. Mahalingam, Texas Instruments

## T23-1 - 16:00

Compact, Low-Loss, Cost-Effective, CMOS Embedded RF Switch Solution Achieving DC-100GHz True-Time Delay Phase Shifter by Phase Change Material, H. J. Li\*, K. P. Chang\*, C. E. Chen\*, Y. T. Lin\*\*, C. C. Huang\*, Y. W. Wang\*, W. F. Chen\*, H. Y. Chen\*, C. R. Hsieh\*, Z. H. Ya\*, H. H. Wang\*, Y. W. Ting\*, K. C. Tseng\*, Z. M. Tsai\*\*, K. C. Huang\* and H. Chuang\*, \*TSMC and \*\*National Yang Ming Chiao Tung Univ., Taiwan

The study presents phase change material (PCM) RF switch (RFS) in single pole double throw (SPDT), demonstrating insertion loss (IL) of >-1.6dB and isolation (ISO) of <-34dB for wideband across 10GHz to 100GHz. The low-loss and series-only PCM SPDT is further embedded with delay lines to demonstrate a 3-bit true-time delay (TTD) phase shifter, exhibiting superior performances in <1.03dB and <1.28ps RMS amplitude and group delay errors. This is the BEOL technology and can be the embedded CMOS solutions, offering advantages in high performances (Ron\*Coff<12fs), cost-effective (Bulk Si substate) and small area (Series-only SPDT) to Silicon-on-Insulator (SOI) CMOS RF switch.

# T23-2 - 16:25

First Demonstration of Top T-gate BEOL-Compatible Indium-Oxide RF Transistors with Record Maximum Oscillation Frequency of 70 GHz, H. Zhou\*, K. Zhang\*, P. Hong\*\*, M. Zhou\*, G. Gao\*, M. Xiang\*, Z. Liu\*, X. Li\*\*, M. Si\*\*\*, Y. Hao\* and J. Zhang\*, \*Xidian Univ., \*\*Huazhong Univ. of Science and Technology and \*\*\*Shanghai Jiao Tong Univ., China

This work reports the first demonstration of back-end-of-line (BEOL) compatible  $In_2O_3$  RF FETs on SiC substrate with a top T-gate architecture. The high thermal conductivity ( $k_T$ ) SiC can alleviate self-heating effect (SHE) so that a high drain bias ( $V_{DS}$ ) is applicable and the T-gate can help to reduce gate resistance ( $R_g$ ), both increases maximum oscillation frequency ( $f_{max}$ ). Benefiting from a high top-gate field-effect mobility ( $\mu_{FE}$ ) of 120 cm<sup>2</sup>Vs, record  $f_{max}$  of 70 GHz and cut-off frequency ( $f_T$ )\*gate length (LG) of 5.1 GHz $\mu$ m are simultaneously achieved, where the  $f_{max}$  is more than 4 times of prior best reported value of BEOL-compatible oxide FETs. Combining with a high top-gate  $I_{Dmax}$  of 2.28 A-mm and  $g_{mmax}$  of 1.1 S-mm, as well as a low SS of 80 mV/dec and DIBL = 60 mV-V, atomic-layer-deposited (ALD) In2O3 transistors show their great promise for BEOL-compatible RF applications.



## T23-3 - 16:50

Scaled-Footprint Ultra-Low Power Cryogenic InGaAs/InP HEMTs with Record-High Combination of Low-Noise and High-Frequency Performance, A. Ferraris\*.\*\*, E. Cha\*, A. Olziersky\*, M. Sousa\*, H.-C. Han\*\*, E. Charbon\*\*, K. Moselund\*\*.\*\*\* and C. Zota\*, \*IBM Research - Zurich, \*\*Swiss Federal Institute of Technology in Lausanne (EPFL) and \*\*\*Paul Scherrer Institute, Switzerland

In this work we demonstrate cryogenic InGaAs/InP HEMTs with highly scaled gate footprints, down to  $380 \times 40 \text{ nm}^2$  for a single gate finger, and investigate the impact of footprint scaling on device performance. The 80% In channel devices show fT = 622 GHz and  $f_{\text{MAX}}$  = 733 GHz together with a noise indication factor  $\text{sqrt}(I_d)/\text{gm}$  = 0.17 sqrt(V\*mm/S) at 4 K, which is a record-high combination of high-frequency and low-noise performance. The performance is enabled by heterostructure engineering, resulting in ultra-low  $R_{\text{ON}}$  =  $250 \text{ Ohm*}\mu\text{m}$  together with a minimum subthreshold swing SS < 10 mV/decade. These results show that cryogenic III-V HEMT technology can provide excellent performance at scaled footprints for readout in future high-density quantum systems.

#### T23-4 - 17:15

Cryogenic In<sub>0.8</sub>Ga<sub>0.2</sub>As Quantum-Well High-Electron Mobility Transistors from Low-Power Quantum Computing to Tera-Hz Applications, S.-W. Son\*, M.-S. Yu\*, S.-P. Son\*, I.-G. Lee\*, W.-S. Park\*, J.-H. Yoo\*, S.-K. Kim\*\*, J. Yun\*\*, T. Kim\*\*, J. Grahn\*\*\*, A. Pourkabirian\*\*\*\*, S. Peter\*\*\*\*, H.-M. Kwon\*\*\*\*\*, T.-W. Kim\*\*\*\*\*\*, J.-H. Lee\*, K. Yang\*\*\*\*\*\*\* and D.-H. Kim\*, \*Kyungpook National Univ., \*\*QSI, Korea, \*\*\*\*Chalmers Univ. of Technology, \*\*\*\*Low Noise Factory, Sweden, \*\*\*\*\*\*Hankyong National Univ., Korea, \*\*\*\*\*\*\*Texas Tech Univ., USA, \*\*\*\*\*\*\*KAIST, Korea

We present cryogenic  $In_{0.8}Ga_{0.2}As$  QW HEMTs with a gate length of 35 nm, achieving a record combination of low-power and high-frequency performance. A meticulous modeling of source resistance - incorporating ballistic channel resistance - provides key insights for advancing low-power quantum computing and terahertz (THz) applications. At 4 K, the fabricated device exhibits exceptional performance metrics, including a minimum subthreshold-swing ( $S_{min}$ ) of 4.41 mV/dec., a  $g_{m_cmax}$  of 2.49 mS/ $\mu$ m, and the highest record  $f_T$  of 813 GHz, with an average gain-bandwidth product ( $f_{avg}$ ) of 810 GHz. These results stem from a tightly controlled gate-recess process, minimizing the side length to below 20 nm. Delay-time analysis indicates further THz performance can be achieved by scaled  $L_g$  below 20 nm and reducing fringing gate capacitance by 20%. This work demonstrates the potential of cryogenic  $n_{0.8}Ga_{0.2}As$  QW HEMTs to revolutionize quantum computing and THz electronics.

#### T23-5 - 17:40

Fully Heterogeneous and Monolithic 3D Integrated RF Platform with III-V HEMTs on Si CMOS for Next-Generation Wireless Communication Systems, J. Jeong\*, J.-T. Lim\*\*, Y.-J. Suh\*, N. Rheem\*, C. J. Lee\*, B. H. Kim\*, J. P. Kim\*\*, J. Kim\*\*\*, J. Lee\*\*, C.-Y. Kim\*\* and S. Kim\*, \*KAIST, \*\*Chungnam National Univ. and \*\*\*Korea Advanced NanoFab Center (KANC), Korea

In this work, we demonstrate a fully heterogeneous and monolithic 3D (HM3D) integrated RF platform, in which RF circuits are stacked on digital/analog circuits by co-integrating III-V HEMT technology and Si CMOS at the same chip. For a not only high-performance but also cost-effective and industrial-feasible approach, RF active components utilize InGaAs HEMTs, digital/analog active components are based on Si FEOL, while Si BEOL-based passive devices are shared across both. The InGaAs HEMTs are monolithically integrated on industry-standard Si CMOS, achieving record-high RF performance among M3D RF transistors. The top-tier LNA with 12.2 dB gain and 3.6 dB noise at 26.5 GHz demonstrates superior RF performance. The bottom-tier mixer and Op-Amp provide a valid analog/digital signal process without any degradation caused by the HM3D integration process.