# A 2.1 pJ/bit, 8 Gb/s Ultra-Low Power In-Package Serial Link Featuring a Time-based Front-end and a Digital Equalizer

Po-Wei Chiu, Muqing Liu, Qianying Tang and Chris H. Kim Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN, USA Email: Chiux148@umn.edu

Abstract— An 8 Gb/s time-to-digital converter (TDC) based receiver with a time-based front-end in 65nm CMOS specifically designed for in-package serial link applications. The proposed receiver converts the channel signal to a corresponding time delay which is amplified by a novel delay line based time amplifier. Next, a time-to-digital converter generates a 4 bit code which is used for digital equalization. The proposed design is digital intensive and hence highly resilient to voltage headroom and/or PVT issues. A bathtub curve and time domain eyediagram were measured by an in-situ bit-error-rate (BER) monitor circuit. An energy-efficiency of 2.1 pJ/b was achieved at 8 Gb/s for a 7 mm link. The receiver area is  $240 \times 120 \mu m^2$ .

Keywords— Time-based, digital equalization, time-to-digital converter (TDC), system-in-package (SiP), digital intensive, inverter-based.

#### I. INTRODUCTION

Emerging packaging technologies such as system-inpackage (SiP), 2.5D integration, through-silicon-via based 3D ICs, and silicon interposers enable ultra-small form factors while allowing heterogeneous technologies to be integrated into the same chip package [1]-[2]. Serial links for such inpackage applications must be more compact, energy-efficient, and digital friendly compared to their chip-to-chip counterparts. As shown in Fig. 1, analog front-ends (AFE) usually contain analog-intensive circuits such as continuous linear equalizer (CTLE), variable gain amplifier (VGA), and current-mode-logic (CML) based decision feedback equalizer (DFE). These circuits are not suitable for in-package links as they suffer from voltage headroom and PVT variation issues, and are typically powered by a separate high power supply. Recently, ADC-based receivers have been drawing attention for off-chip links where RX equalization is performed entirely in the digital domain by a DSP unit [3]-[6]. While ADC-based receivers can operate at a lower voltage than their analog counterparts, and hence take full advantage of technology scaling, they still rely on analog-intensive circuits for signal conditioning. This paper presents a time-to-digital converter (TDC) based receiver with a digital-intensive time-based frontend (TBFE) for in-package link applications. The proposed TBFE consists of a voltage-to-time converter (VTC), a delay line based time amplifier (TA) and a 4 bit TDC. The VTC converts the channel signal to a time delay which is then amplified by a delay line based TA. A Vernier-line based TDC converts the amplified delay difference to a 4 bit digital code which is fed to the digital equalization block. The proposed TBFE obviates the need for a sample and hold (S/H) circuit as the VTC converts the instantaneous voltage seen by the passing signal edge.

| Receiver Type Digital Analog                                                     | Features                            |                  |
|----------------------------------------------------------------------------------|-------------------------------------|------------------|
| Analog Frontend                                                                  | Fully-<br>Analog                    | Voltage<br>based |
| Analog Frontend<br><u>CTLE</u> →VGA→S/H<br>+ ADC→DSP                             | Analog FE,<br>Digital<br>Equalizer  | Voltage<br>based |
| Proposed Time-based Frontend<br><u>VTC</u> → <u>TA</u> → <u>TDC</u> → <u>DSP</u> | Digital FE,<br>Digital<br>Equalizer | Time<br>based    |

Fig. 1. Comparison between the proposed time-based front-end design with conventional RX designs.

#### II. TIME-BASED TRANSCEIVER IMPLEMENTATION

The block diagram of the proposed transceiver system, including a voltage mode TX driver, a 1/4 rate time-based receiver and an in-situ bit error rate (BER) monitoring circuit, is shown in Fig. 2. The transmitter is based on a 3-tap half-rate feedforward equalizer (FFE) with an inverter-based driver. The receiver contains four lanes of TBFE+TDC circuits, followed by a DSP for digital equalization. Each lane operates at 2Gb/s. A 2<sup>15</sup>-1 pseudo random bit generator (PRBS) and an in-situ BER monitor was implemented to characterize the circuit performance. The circuit implementation and operation of the VTC are shown in Fig. 3 [7]. A 2GHz clock enters two identical inverter-based VTCs, generating the reference and RX clock signals. The time delays of the VTC circuits are determined by the channel voltage V<sub>RX</sub> and reference voltage V<sub>REF</sub>, respectively. A low channel voltage (=data '0') induces a larger delay difference, and vice versa. The reference delay path is fixed to the longest delay to ensure it is always slower than the RX path delay. Inter-symbol-interference noise corrupts the delay of the RX clock path which is filtered out later by the digital DFE.



Fig. 2. Block diagram of the proposed digital-intensive time-based transceiver.

### III. PROPOSED INVERTER-BASED TIME AMPLIFIER

The delay difference generated by the VTC block is amplified by the fully-digital delay line based TA shown in Fig. 4. The TA circuit consists of two identical tri-state inverter based delay lines. Each stage is driven by two parallel tri-state inverters with 1X and NX sizing, respectively. Initially (i.e. STARTi=0, STOPi=0), the enable signal EN is high which activates all the NX tri-state inverters. Since a total of (N+1)X tri-state inverters are driving the output, the rising edge of STARTi experiences a short propagation delay. Once the STARTi signal arrives, EN is set to low after a fixed delay which disables the parallel NX tri-state inverters. The delay line is now driven only by the 1X tri-state inverters, resulting in a longer propagation delay seen by the STOPi signal. Since the STARTi edge travels faster than the STOPi edge, an (N+1) times longer delay difference appears at the end of the delay line. The timing diagram is shown in Fig. 5. A ring-oscillator based TA using a similar concept was reported in [8]. However, in the previous design, the performance was limited by the ring oscillator frequency. Furthermore, a NAND gate based implementation was required to ensure circuit oscillation. To increase the TA operating speed, in this work, we proposed an open-loop delay line configuration, and utilized tri-state inverters. Post-layout simulation results of the proposed TA with size N=1 and 4 in Fig. 6 confirm high linearity between the input delay and output delay. The proposed delay line based VTC and TA implementation can reduce the circuit complexity while canceling out voltage and temperature induced delay shifts in the delay lines.







Fig. 3. (a) Voltage to time converter (VTC) circuit implementation and (b) timing diagram.



Fig. 4. Schematic of delay line based time amplifier (TA) with open-loop configuration.



Fig. 5. Timing diagram of the proposed TA.



Fig. 6. TA gain simulation results for N=1 and 4 (Gain=2 and 5).

## IV. TDC, DIGITAL EQUALIZER, AND IN-SITU BER MONITOR

Fig. 8 shows the implementation of the TDC. The Vernierline based TDC consists of four cascaded delay units with each unit having four delay buffers and four arbiters. A 16 bit thermometer code generated by the Vernier-line is converted to a 4 bit binary code using a thermometer-to-binary (T2B) decoder. The 4-bit output from the TDC is fed to the DSP for digital equalization. A bank of 4-bit digital comparators in the DSP compares the new TDC output with predetermined weights w0000, w0001, etc. The correct result from the comparator is selected based on the previous decision results D1-D4. A 16:1 digital MUX outputs the final RX data.

To verify the performance, the in-situ monitor circuit is adopted as shown in Fig. 9. The PRBS in the RX chip, identical to the one in the TX chip, is clocked using a delayed clock to generate the ground-truth data needed for BER measurements. An 11 bit BER counter increments whenever an error is detected. The error count is serially read out using a scan chain. To measure the BER eye-diagram using the in-situ circuit, we swept the two programmable delays denoted as phase delay in red box and time offset in blue box. The two programmable delays correspond to the x and y axes of the BER eye-diagram.



Fig. 7. Implementation of 4-bit Vernier-line based TDC.

## V. 65 NM TEST CHIP

A chip photo and SiP prototype building with both TX and RX are shown in Fig. 10. Two dies were integrated into a single package to mimic the link behavior of an SiP system. A

single-ended data signal and a differential clock were transferred from the TX chip to the RX chip. To test the proposed circuits for different channel conditions, we built packages with varying link distances from 1mm to 7mm. The bathtub data in Fig. 11 shows an eye width of 0.12 UI for a BER of  $10^{-12}$  for 7mm. The time-domain eye-diagram measured using in-situ circuits is shown in the Fig. 12. To save the measurement time, BER down to 10-11 is reported. Lower BER values such as  $10^{-12}$  or  $10^{-13}$  can be measured using the same setup. Fig. 13 shows the comparison with previous link designs. When operating at a data rate of 8 Gb/s, the proposed system achieves an energy-efficiency of 2.1 pJ/b (including TX, RX, and DSP power) at 1V. The circuit areas of the time-based front-end (including TDC) and DSP are 0.0192 mm<sup>2</sup> and 0.0096 mm<sup>2</sup>, respectively.



Fig. 8. DSP for digital equalization.



Fig. 9. In-situ bathtub and BER eye diagram measurement circuits.



Fig. 10. 65nm chip microphotograph and test package with TX and RX chips for in-package link demo.





Fig. 11. Measured BER bathtub.

Fig. 12. Measured BER eye diagram.

|                               | JSSC'12 [6]                  | JSSC'13 [7]            | JSSC'15 [8]                     | JSSC'16 [9]                   | This work                             |
|-------------------------------|------------------------------|------------------------|---------------------------------|-------------------------------|---------------------------------------|
| Application                   | Off Chip                     | Off Chip               | Off Chip                        | Off Chip                      | SiP                                   |
| RX<br>Architecture            | 4x Flash ADC                 | 4x Flash ADC           | 4x Flash ADC                    | 32x SAR ADC                   | 4x TDC                                |
| Front-end<br>Type             | Voltage-Based<br>(CTLE +VGA) | Voltage-Based<br>(VGA) | Voltage-Based<br>(VGA)          | Voltage-Based<br>(Analog FFE) | Time-Based<br>(VTC+TA)                |
| Data Rate                     | 10 Gb/s                      | 10.3125 Gb/s           | 8.5-11.5 Gb/s                   | 10 Gb/s                       | 8 Gb/s                                |
| Technology                    | 65nm                         | 40nm                   | 40nm                            | 65nm                          | 65nm                                  |
| Voltage                       | 1.1V                         | 0.9V                   | 1V                              | 1V                            | 1V                                    |
| Resolution                    | 4 bit                        | 6 bit                  | 6 bit                           | 6 bit                         | 4 bit                                 |
| BER                           | <1E-9                        | <1E-12                 | <1E-12                          | <1E-10                        | <1E-12                                |
| RX Area<br>(w/o DSP)          | 0.288 mm <sup>2</sup>        | 0.27 mm <sup>2</sup>   | 0.82 mm <sup>2</sup>            | 0.38 mm <sup>2</sup>          | 0.0192 mm <sup>2</sup>                |
| Power<br>Efficiency<br>(pJ/b) | 8.1<br>(RX only)             | 15.1<br>(RX only)      | 18.9<br>(RX, includes<br>Clock) | 7.9<br>(RX only)              | 2.1<br>(TX+RX, includes<br>DSP power) |

Fig. 13. Comparison with state-of-the-art link designs.

## VI. CONCLUSION

In this paper, a TDC based receiver with TBFE is demonstrated on in-package serial link in 65nm GP process. To our best knowledge, this is the first TBFE receiver. A highly linear TA is proposed to amplify the small time difference generated from VTC. A BER less than  $10^{-12}$  is verified using the in-situ measurement circuits. Our proposed TBFE is highly digitalize, low voltage operation and has good compatibility with post digital circuit. The compact size and high energy efficiency shows that the proposed time-based receiver is promising for SiP application.

#### References

[1] C. Y. Ho, H. H. Cheng, P. C. Pan, C. C. Wang and C. P. Hung, "Dielectric Characterization of Ultra-Thin Low-Loss Build-Up Substrate for System-in-Package (SiP) Modules," in *IEEE Trans. Microw. Theory Techn.*, vol. 63, no. 9, pp. 2923-2930, Sept. 2015.

- [2] D. Greenhill et al., "A 14nm 1GHz FPGA with 2.5D transceiver integration," *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers*, Feb. 2017, pp. 54-55.
- [3] E. H. Chen, R. Yousry and C. K. K. Yang, "Power Optimized ADC-Based Serial Link Receiver," in *IEEE J. Solid-State Circuits*, vol. 47, no. 4, pp. 938-951, April. 2012.
- [4] A. Varzaghani et al., "A 10.3-GS/s, 6-Bit Flash ADC for 10G Ethernet Applications," in *IEEE J. Solid-State Circuits*, vol. 48, no. 12, pp. 3038-3048, Dec. 2013.
- [5] B. Zhang et al., "A 40 nm CMOS 195 mW/55 mW Dual-Path Receiver AFE for Multi-Standard 8.5–11.5 Gb/s Serial Links," in *IEEE J. Solid-State Circuits*, vol. 50, no. 2, pp. 426-439, Feb. 2015.
- [6] A. Shafik, E. Z. Tabasy, S. Cai, K. Lee, S. Hoyos and S. Palermo, "A 10Gb/s hybrid ADC-based receiver with embedded 3-tap analog FFE and dynamically-enabled digital equalization in 65nm CMOS," in *IEEE J. Solid-State Circuits*, vol. 51, no. 3, pp. 671-685, April. 2016.
- [7] P. W. Chiu, S. Kundu, Q. Tang and C. H. Kim, "A 65-nm 10-Gb/s 10mm On-Chip Serial Link Featuring a Digital-Intensive Time-Based Decision Feedback Equalizer," in *IEEE J. Solid-State Circuits*, vol. 53, no. 4, pp. 1203-1213, April 2018.
- [8] B. Kim, H. Kim, and C. H. Kim, "An 8bit, 2.6ps two-step TDC in 65nm CMOS employing a switched ring-oscillator based time amplifier," in *Proc. IEEE Custom Integr. Circuits Conf. (CICC)*. Sep. 2015, pp. 1-4.