## 15.5 A Programmable Adaptive Phase-Shifting PLL for Clock Data Compensation Under Resonant Supply Noise

Dong Jiao, Chris H. Kim

University of Minnesota, Minneapolis, MN

Power supply noise has become one of the main performance-limiting factors in sub-1V technologies. Resonant supply noise caused by the package/bonding inductance and on-die capacitance has been reported as the dominant supply noise component in high performance microprocessors [1,2]. Resonant noise frequency typically resides in the 40MHz to 300MHz frequency band but can be made as low as 7MHz with a dedicated metal-insulator-metal capacitor technology [3].

Recently, adaptive clocking schemes have been proposed to mitigate the impact of resonant noise on circuit performance. Here, the clock period is intentionally modulated by the resonant noise such that the increased clock period partially compensates for the increased datapath delay, which is also modulated by the resonant noise. Figure 15.5.1 (top) illustrates the concept of the adaptive clocking scheme. In a "constant-period clock" scenario, sampling failures can occur due to the increased datapath delay under resonant noise. In contrast, the adaptive clocking scheme stretches out the clock period to compensate for the increased datapath delay such that sampling failures are avoided. The clock period can either be modulated in the clock generation block, for example using a phase-locked loop (PLL) [4], or in the clock tree while the clock edge is propagating [5,6]. A brief analysis of the adaptive clocking scheme is shown in Fig. 15.5.1 (bottom left). The four waveforms represent the supply voltage with resonant noise, and the clock period modulation effect seen by the PLL, the clock distribution and the local registers, respectively. The minimum supply voltage occurs at point "A", which is also the point when the datapath delay is worst. Suppose the adaptive PLL produces the longest clock period at "B" [4] and the clock cycle is stretched to its maximum at "C" when the supply voltage has the sharpest negative slope. Since the clock cycle is modulated by both the PLL and the clock path, the net effect results in the maximum clock cycle occurring somewhere between "B" and "C", denoted as "D". Once we account for the clock path delay, local registers see the maximum clock cycle at time "E". To achieve optimal timing compensation between the clock cycle and the datapath delay, "E" needs to be aligned with the maximum datapath delay ("A") with the same phase and amplitude. Therefore, a certain amount of phase shift and proper adjustment of the clock period's sensitivity to supply noise are required for the best possible timing compensation, as shown as " $B_{opt}$ ". Previous designs, however, did not consider both effects and are not able to adapt to different design parameters. Motivated by these observations, we design an adaptive phaseshifting PLL design, in which both the phase shift and the supply noise sensitivity of the clock can be digitally programmed and adjusted. A comparison between our work and previous designs is given in Fig. 15.5.1 (bottom right).

Figure 15.5.2 shows the schematic of the phase-shifting PLL. The phase shifting and noise sensitivity adjustment are implemented with a supply-tracking modulator that consists of three binary-weighted capacitor banks and a bias-generation circuit. The capacitor arrays and transistors M1 and M2 form a high-pass filter to provide the desired programmable phase shift. The equivalent capacitance and the clock period's sensitivity to supply noise can be expressed as  $C_{eq}=C_{f}||(C_u+C_d)$  and  $S_V=C_u/C_d$ , respectively, which are both digitally programmable. By choosing proper configurations of the three capacitor banks, the resonant noise can be coupled to the bias voltage of the voltage-controlled oscillator (VCO) to generate the desired adaptive clock signal.

A 1.2V, 65nm testchip is designed to verify the effectiveness of the phase-shifting PLL (Fig. 15.5.3). The adaptive clock signal is generated by the PLL and then propagates through the clock distribution networks. We implement eight clock trees with different buffer types (i.e., inverter, differential, and RC-filtered inverter [5]) and different interconnect lengths. A separate 40pF decoupling capacitor (decap) can be enabled to reduce the supply noise seen by the clock trees. The datapath under test consists of two D-flip-flops and both logic-dominated and interconnect-dominated circuit paths. An XOR gate is used to compare the sampled results from the datapath with the reference data, and any sampling error will generate a pulse at the XOR output, which increments a 10-bit ripple counter. As a result, the transition in the *i*<sup>th</sup> bit of the counter output (i.e., BER<9:0>) indicates that 2*i* sampling errors have occurred. By measuring the average period of the counter output and the clock frequency, the bit-error rate (BER) can be conveniently calculated. The noise injection block has individual devices clocked by an on-chip VCO and a clock pattern synthesis circuit. The clock pattern can be selected from 1, 2, 8 or 32 pulses for every 32 clock cycles to emulate a firstdroop or a sinusoidal noise waveform. The test chip also includes an array of linear feedback shift registers for injecting random supply noise and a local supply noise monitor [1].

Figure 15.5.4 (left) shows an example of the BER data measured at different clock frequencies. Without loss of generality, we define the maximum operating frequency as the point when the BER is  $10^{\circ}$ , and denote it as  $F_{max}$  in this paper. The noise waveforms measured from the supply noise monitor when injecting a first-droop noise and a sinusoidal supply noise are shown in Fig. 15.5.4 (right).

Figure 15.5.5 shows the measured  $F_{max}$  while sweeping the phase shift and supply noise sensitivity values. The chip is tested for a supply voltage of 1.2V and 1.0V using a sinusoidal noise waveform. As can be seen from the figure,  $F_{max}$  improves by more than 5% for both cases when an optimal configuration is chosen. We also see a large discrepancy in the optimal configurations between the two cases (i.e., 1.2V and 1.0V). This is because the timing compensation is affected by various design parameters such as clock frequency, clock path delay, noise frequency, and so on. The PLL is flexible and can adapt to different operating conditions and clock network designs by configuring the phase shift and supply noise sensitivity.

The PLL is tested under different supply noise frequencies. For this test, an inverter-based clock tree is chosen and the noise pattern is configured to emulate the first-droop noise. Measurement results in Fig. 15.5.6 (left) show a 4%  $F_{max}$  improvement for noise frequencies between 40 and 300MHz. As the noise frequency increases, the performance improvement becomes smaller. This is because the clock distribution delay makes it difficult, or even impossible, for the adaptive clock to compensate for the datapath delay variation if the noise period is too short. Different clock trees are also tested and the results are shown in Fig. 15.5.6 (right). Here, clock tree names with "\_C" have a 40pF decap enabled in the clock tree supply and "short" or "long" refers to the interconnect length between the clock buffers. For a 74MHz sinusoidal noise, the  $F_{max}$  is consistently improved by 3.4% to 7.3% verifying the flexibility of the design. The chip micrograph is shown in Fig. 15.5.7.

## Acknowledgements:

This work was supported by the Semiconductor Research Corporation under award 2008-HJ-1804.

## References:

[1] J. Gu, H. Eom, and C.H. Kim, "On-chip Supply Noise Regulation Using a Low Power Digital Switched Decoupling Capacitor Circuit," *J. Solid-State Circuits*, vol. 44, no. 6, pp. 1765-1775, Jun. 2009.

[2] J. Xu, P. Hazucha, M. Huang, P. Aseron, et al., "On-Die Supply-Resonance Suppression Using Band-Limited Active Damping", *ISSCC Dig. Tech. Papers*, pp. 286-603, Feb. 2007.

[3] D. Wendel, R. Kalla, R. Cargoni, et al., "The Implementation of POWER7™: A Highly Parallel and Scalable Multi-Core High-End Server Processor," *ISSCC Dig. Tech. Papers*, pp. 102-103, Feb. 2010

[4] N. Kurd, P. Mosalikanti, M. Neidengard, J. Douglas and R. Kumar, "Next generation Intel® core™ micro-architecture (Nehalem) clocking," *J. Solid-State Circuits*, vol. 44, no. 4, pp. 1121-1129, Apr. 2009.

[5] K. L. Wong, T. Rahal-Arabi, M. Ma and G. Taylor, "Enhancing microprocessor immunity to power supply noise with clock-data compensation," *J. Solid-State Circuits*, vol. 41, no. 4, pp. 749-758, Apr. 2006.

[6] D. Jiao, J. Gu, and C. H. Kim, "Circuit Design and Modeling Techniques for Enhancing the Clock-Data Compensation Effect under Resonant Supply Noise," *J. Solid-State Circuits*, vol. 45, no. 10, pp. 2130-2141, Oct. 2010.



## **ISSCC 2011 PAPER CONTINUATIONS**

| ocal   | Random                   | Datapath &                 |
|--------|--------------------------|----------------------------|
| oise   | noise                    | BER monitor                |
| onitor | (LFSRs)                  | Clock                      |
|        |                          | distribution               |
|        | Phase- '<br>shifting PLL | (8 clock trees.<br>folded) |

| Technology              | 65nm LP CMOS  | Supply voltage                    | 1.2V                      |
|-------------------------|---------------|-----------------------------------|---------------------------|
| Total area              | 350 x 250 µm² | PLL area                          | 120 x 100 µm <sup>2</sup> |
| Regulation<br>frequency | 40MHz-300MHz  | F <sub>max</sub> improve-<br>ment | 3.4%-7.3%                 |

Figure 15.5.7: Chip micrograph and performance summary of the test chip.