Ai Soc Design

Photo of Alanlin

Made by Alanlin / Retro Tech

About the project

This article introduces several different use cases for die-to-die connections, and looking for high-speed PHY IPs for die-to-die connections that can also use traditional low-cost packaging based on organic substrates.

Project info

Difficulty: Easy

Platforms: AllThingsTalkthingSoC

Estimated time: 1 hour

License: GNU General Public License, version 3 or later (GPL3+)

Story

Soc development and challenges

Since the advent of big data, designers have been facing ever-evolving challenges in the design process of system-on-chips (SoC) for hyper-scale data centers, artificial intelligence (AI), and network applications. Due to workload requirements and the need to move data faster, this type of SoC with advanced features has become more complex and has reached the maximum mask size. Therefore, designers divide the SoC into smaller modules in a multi-chip module (MCM) package. These divided chips require ultra-short and very short-distance links to achieve inter-chip connections with high data rates. In addition to bandwidth, die-to-die connections must also ensure a reliable link with extremely low latency and extremely low power consumption.

This article introduces several different use cases for die-to-die connections, and looking for high-speed PHY IPs for die-to-die connections that can also use traditional low-cost packaging based on organic substrates.

Die-to-Die connectivity use cases

New use cases for die-to-die connections in MCM are constantly emerging, including:

1 High-performance computing and server SoC close to the maximum mask size

2 Ethernet switches and network SoCs exceeding the maximum mask size

3 Artificial intelligence (AI) SoC with distributed SRAM that can scale complex algorithms

The size of high-performance computing and server SoC is getting bigger and bigger, reaching 550 square millimeters (mm2) to 800 mm2, thus reducing the yield of SoC and increasing the unit chip cost. A better way to optimize SoC yield is to divide the SoC into two or more equivalent homogeneous chips (as shown in Figure 1) and use die-to-die PHY IP to connect the chips. In this use case, the main requirements are extremely low latency and zero error rate, because smaller multiple chips must behave and behave like a single chip.

The Ethernet switch SoC is the core of the data center. Data must be transferred at a faster rate of 12 Tbps to 25 Tbps. This requires a 256-channel 100G SerDes interface, so the SoC cannot be placed in the mask area of 800 mm2. In order to overcome this challenge, designers will split the SoC and reconfigure it. The core chip is surrounded by I/O chips, as shown in Figure 2. Then, use a die-to-die transceiver to connect the core chip to the I/O chip.

In this use case, chip splitting is effective only when the bandwidth density of the die-to-die transceiver is much better than the long-distance SerDes in the I/O chip. Therefore, the key parameter is the chip edge bandwidth density per millimeter.

In the AI SoC, each chip contains an intelligent processing unit (IPU) and distributed SRAM located near each IPU. In this use case, the IPU in one chip may need to rely on a short-distance die-to-die link with extremely low latency to access data in the SRAM of another chip.

In all these use cases, choosing an ideal high-speed PHY for die-to-die connection can simplify the MCM packaging requirements. As the throughput of each channel is up to 112 Gbps, a very high total throughput can be achieved with a relatively limited number of channels. In this case, the spacing and stacking of the package traces may be conservative (L/S is usually 10u/10u). In these use cases, traditional low-cost packaging based on organic substrates can also be used.

High-speed PHY IP requirements for Die-to-Die connection

The Optical Internetworking Forum (OIF) is defining electrical I/O standards to transmit data rates up to 112 Gbps over ultra-short-range (USR) and very short-range (XSR) links. These specifications define the die-to-die link (that is: in the package), and the link from chip to chip, to the optical module in the same package as the SoC, which significantly reduces power consumption and complexity, and realizes Very high throughput density.

When studying high-speed PHY IP solutions for die-to-die connections for MCM, SoC designers must consider several basic functions, including data throughput measured in gigabits or terabits per second (Gbps or Tbps) Amount or bandwidth, energy efficiency measured in picojoules per bit (pJ/bit), delay measured in nanoseconds (ns), maximum link range measured in millimeters (mm), and bit error rate (unitless).

Data throughput or bandwidth

In order to achieve interoperability with other transceivers, die-to-die PHY IP must ensure compliance with the relevant OIF electrical specifications of the USR and XSR links. Supporting pulse amplitude modulation (PAM-4) and non-return-to-zero (NRZ) signaling is essential to meet the requirements of both links and achieve a maximum bandwidth of 112 Gbps per channel. Such signaling produces very high bandwidth efficiency, which is a key requirement because the amount of data transferred between chips in MCM is very large. The rate of data movement is usually in the range of terabytes per second, which limits the size of the chip edge allocated to the USR and XSR links. However, supporting multiple data rates is equally important. Generally, assuming that its data rate is consistent with the data rate used in the internal structure, or supports all the data rates required for the die-to-die protocol, it is expected that the chip-to-chip link will be realized. For example, even at high speeds such as 32 Gbps, PCI Express must support data rates as low as 2.5 Gbps for protocol initialization.

Link distance

In die-to-die implementations, large amounts of data must flow through short data paths that bridge the gap between chips. In order to ensure maximum flexibility when placing the chip on the package substrate, the PHY IP must support a maximum distance of 50 mm between TX and RX.

Efficiency 

Energy efficiency becomes an important factor, especially in use cases where SoC functions are divided into multiple homogeneous chips. In such use cases, designers seek ways to push large amounts of data between chips without affecting the total power budget of the SoC. The energy efficiency of an ideal die-to-die PHY IP is better than 1 picojoule per bit (1pJ/bit) or equivalent 1mW/Gbps.

Latency and BER

In order to make the connection between the chips "transparent", the delay must be kept extremely low, and the bit error rate (BER) must be optimized. Due to the simplified architecture, the die-to-die PHY IP itself achieves ultra-low latency, and the BER is better than 10e-15. Depending on the link distance, forward error correction (FEC) mechanisms may be needed to protect the interconnection to achieve such low BER. FEC delay affects the overall delay of the solution.

Macro block layout

In addition to these performance-related parameters, the PHY IP must also support the placement of chips on all sides to achieve an effective layout of chips and MCM. Optimized macro layout can realize low-coupling and high-efficiency inter-chip wiring, optimize chip and MCM area, and ultimately improve power consumption efficiency.

There are many other considerations when choosing a die-to-die PHY IP, including the inclusion of testability features to enable production testing of the chip before packaging, but the factors mentioned above are the most important.

Conclusion

Data rates continue to increase and functions are becoming more complex, making the SoC size of ultra-large-scale data centers, AI, and network applications increasing day by day. As the SoC size approaches the full mask size, designers are forced to divide the SoC into smaller chips, which are packaged in multi-chip modules (MCM) to achieve high yields and reduce overall costs.

Then, multiple smaller chips in an MCM are linked by die-to-die interconnects, which have extremely low power consumption and high bandwidth at the edge of each chip. In high-performance computing and AI applications, large SoCs are divided into two or more homogeneous chips, while in network applications, I/O and network cores are divided into separate chips. The die-to-die interconnection in the SoC must not affect the overall system performance and requires low latency, low power consumption and high throughput. These requirements have driven the demand for high-throughput die-to-die PHYs.

Credits

Leave your feedback...