nav emailalert searchbtn searchbox tablepage yinyongbenwen piczone journalimg journalInfo journalinfonormal searchdiv qikanlogo popupnotification paper paperNew
2025 24 1⁃9
面向芯粒互连的低延迟AXI适配器设计

Summary

Note: Please note that the following content is generated by AI. CNKI does not take any responsibility related to this content.
基金项目(Foundation): 国家自然科学基金青年项目:基于模态局部化的高分辨力微型电场传感器研究(62101054)
邮箱(Email):
DOI: 10.16652/j.issn.1004⁃373x.2025.24.001
88 22 0
阅读 下载 被引

工具集

引用本文 下载本文
PDF
引用导出 分享

    扫码分享到微信或朋友圈

使用微信“扫一扫”功能。
将此内容分享给您的微信好友或者朋友圈
摘要:

芯粒间(D2D)互连技术是决定芯粒系统性能的关键。芯粒互连要尽可能满足低延迟、高带宽、低功耗以及高可靠性的要求。然而,现有标准对主流片上总线的支持不够,需要兼容AXI等总线的D2D适配方案。为了响应上述需求,设计一种支持AXI总线的D2D适配器。在一种分层的芯粒互连接口传输协议架构基础上,完成协议层和数据链路层的硬件设计。在协议层硬件实现中,为了尽量隐藏打包延时,提出一种活跃业务通道数据合并(拼接)策略;在数据链路层,针对芯粒间互连低延迟、面积小的需求和误码率低的特点,采用基于n⁃回退的自动重发请求(ARQ)重传机制。运用硬件描述语言System Verilog 完成了适配器的寄存器传输级设计,并基于TSMC 28 nm 工艺库进行综合。实验结果表明,系统的延迟为11.02 ns,功耗为15.58 mW,相比于UCIe接口控制器,延迟降低了16%,功耗降低了22.5%,能够满足更低延迟和功耗的片上总线适配要求。

Abstract:

Die ⁃ to ⁃ die (D2D) interconnection technology is a key technology that affects chip performance. The interconnection of chips should meet the requirements of low latency, high bandwidth, low power consumption, and high reliability as much as possible. However, the existing standards do not provide sufficient support for mainstream on⁃chip buses and require D2D adaptation solutions compatible with AXI and other buses. In response to the above requirements, a D2D adapter that supports AXI bus is designed. On the basis of layered chip interconnect interface transmission protocol architecture, the hardware design of protocol layer and data link are implemented. In order to hide the packaging delay as much as possible, an active business channel data merging (concatenation) strategy is proposed in the hardware implementation of the protocol layer. In the data link layer, the automatic repeat request (ARQ) retransmission mechanism based on n⁃backoff is adopted to meet the requirements of low delay, small area and low bit error rate of inter core interconnection. The register transfer level design of the adapter was implemented by means of the hardware description language System Verilog, and was synthesized based on the TSMC28nm process library. The experimental results show that the system latency was 11.02 ns and the power consumption was 15.58 mW. In comparison with UCIe interface controller, the latency was reduced by 16% and the power consumption was reduced by 22.5%, meeting the demand of on⁃chip bus adaptation with lower latency and power consumption.

参考文献

[1] MOORE G E. Cramming more components onto integrated circuits [J]. IEEE solidstate circuits society newsletter, 2006, 11(3): 33⁃35.

[2] TAN Z, CAI H, DONG R, et al. NN ⁃ Baton: DNN workload orchestration and chiplet granularity exploration for multichip accelerators [C]// Proceeding of the 48th ACM/IEEE Annual International Symposium on Computer Architecture. [S.l.]: ACM, 2021: 1013⁃1026.

[3] MAGNUSSEN B M, KAWASUMI T, MIKAMI H, et al. Performance evaluation of OSCAR multitarget automatic parallelizing compiler on Intel, AMD, arm and RISC ⁃ V multicores [C]// Languages and Compilers for Paraller Computing. Cham: Springer, 2022: 13181.

[4] DE CASTRO M, OSORIO R R, VILARIÑO D L, et al. Imple⁃mentation of a motion estimation algorithm for Intel FPGAs using OpenCL [J]. Supercomputer, 2023, 79: 9866⁃9888.

[5] NABEEL M, ASHRAF M, PATNAIK S, et al. An interposer based root of trust: seize the opportunity for secure system⁃level integration of untrusted chiplets [J]. Cryptography and security, 2019(2): 1⁃18.

[6] VINNAKOTA B, AGARWAL I, DRUCKER K, et al. The open domainspecific architecture [J]. IEEE micro, 2020, 44(1): 30⁃36.

[7] LEE D U, KIM K W, KIM K W, et al. A 1.2 V 8 Gb 8⁃channel 128 GB/s high⁃bandwidth memory (HBM) stacked DRAM with effective microbump I/O test methods using 29 nm process and TSV [J]. IEEE journal of solid⁃state circuits, 2014, 50(1): 191⁃203.

[8] SHIVNARAINE R, IERSSEL M V, FARZAN K, et al. 11.2 A 26.5625⁃to⁃106.25 Gb/s XSR SerDes with 1.55 pJ/b efficiency in 7 nm CMOS [C]// 2021 IEEE International Solid ⁃ State Circuits Conference (ISSCC). San Francisco, CA, USA: IEEE, 2021: 181⁃183.

[9] ARDALAN S, FARJAD R, KUEMERLE M, et al. Bunch of wires: an open die⁃to⁃die interface [C]// 2020 IEEE Symposium on High ⁃ Performance Interconnects (HOTI). Piscataway, NJ, USA: IEEE, 2020: 9⁃16.

[10] CARUSONE A C, DEHLAGHI B, BEERKENS R, et al. Ultra⁃ short ⁃ reach interconnects for package ⁃ level integration [C]// 2016 IEEE Optical Interconnects Conference. San Diego, CA, USA: OI, 2016: 10⁃11.

[11] ARDALAN S, FARJADRAD R, KUEMERLE M, et al. Chiplet communication link: bunch of wires (BoW) [J]. IEEE micro, 2021, 41(1): 54⁃60.

[12] LIN M S, GOEL S K, FU C M, et al. A 7⁃nm 4⁃GHz Arm1⁃core⁃ based CoWoS1 chiplet design for high⁃performance computing [J]. IEEE journal of solid⁃state circuits, 2020, 55(4): 956⁃966.

[13] LIU Y F, LI X Y, YIN S. Review of chiplet⁃based design: system architecture and interconnection [J]. Science China (information sciences), 2024, 67: 200401.

[14] FOLEY D, DANSKIN J. Ultra⁃performance pascal GPU and NVlink interconnect [J]. IEEE micro, 2017, 37(2): 7⁃17. [15] DAS SHARMA D, PASDAST G, TIAGARAJ S, et al. High⁃performance, power ⁃ efficient three ⁃ dimensional system ⁃ in ⁃ package designs with universal chiplet interconnect express [J]. Nature electronics, 2024, 7: 244⁃254.

[16] 熊国杰,张津铭,贺光辉.一种面向Chiplet互连的高效传输协议设计与实现[J].计算机工程与科学,2023,45(8):1339⁃1346.

[17] DORAIRAJ N, KEHLET D, SHEIKH F, et al. Open⁃source AXI4 adapters for chiplet architectures [C]// 2023 IEEE Custom Integrated Circuits Conference (CICC). [S. l.]: IEEE, 2023: 1⁃5.

[18] CHOI D M, DONG Y, NICHOLSON R, et al. A 4.6 pJ/b 64 Gb/s transceiver enabling PCIe 6.0 and CXL 3.0 in Intel 3 CMOS technology [C]// 2024 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits). Honolulu, HI, USA: IEEE, 2024: 1⁃2.

[19] SIVARANJANI P, SASIKALA S, LAVANYA A, et al. Design and verification of low latency AMBA AXI4 and ACE protocol for on ⁃ chip peripheral communication [J]. Wireless personal communications, 2024, 136: 1811⁃1824.

基本信息:

DOI:10.16652/j.issn.1004⁃373x.2025.24.001

中图分类号:

引用信息:

[1]陈佳敏1,李翔宇2,殷树娟.面向芯粒互连的低延迟AXI适配器设计[J],2025,48(24):1⁃9.DOI:10.16652/j.issn.1004⁃373x.2025.24.001.

基金信息:

国家自然科学基金青年项目:基于模态局部化的高分辨力微型电场传感器研究(62101054)

文档文件

引用

GB/T 7714-2015 格式引文
MLA格式引文
APA格式引文