# A Supply-Noise Sensitivity Tracking PLL in 32 nm SOI Featuring a Deep Trench Capacitor Based Loop Filter

Bongjin Kim, Member, IEEE, Weichao Xu, and Chris H. Kim, Senior Member, IEEE

ν

Abstract—An adaptive PLL that maximizes the timing compensation between clock and data, commonly referred to as the clock data compensation effect, is demonstrated in 32 nm SOI. A number of previous adaptive PLL designs have successfully proven that processor operating speed can be improved by modulating the clock path delay or the PLL output clock period using the resonant supply noise. In this work, we take the adaptive PLL concept one step further by achieving optimal clock data compensation across a wide range of PVT and operating conditions. This was accomplished by an automated supply-noise sensitivity tracking loop which constantly monitors any timing errors occurring in a critical path replica circuit. Compared to a conventional PLL, the proposed design achieves up to a 15.6% improvement in processor Fmax or a 9.8% reduced dynamic power consumption under an iso-operating frequency for a realistic supply noise. Additionally, a 92.1% reduction in PLL area was achieved by employing ultra-high density deep trench capacitors in the loop filter.

*Index Terms*—Resonant supply noise, clock data compensation, adaptive PLL, deep trench capacitor.

#### I. INTRODUCTION

**P** OWER supply noise is considered as one of the major performance limiting factors of modern low-voltage and high-performance microprocessors [1]. Traditionally, off-chip and on-chip decoupling capacitors have been used to regulate the supply noise across a wide range of frequencies [2]. Supply grid optimization and noise tolerant clock network designs [3]–[8] have also been widely used to minimize the impact of power supply noise on processor performance. Recently, circuit based supply noise cancellation techniques such as active decoupling capacitors [9], active damping resistors [10], and switched capacitor circuits [11] have been proposed for minimizing the supply noise and thereby reducing the processor power consumption.

Supply noise caused by the resonance between the package/bonding inductance and on-die decoupling capacitance, also referred to as first-droop noise [2], is generally

Manuscript received August 26, 2013; revised November 27, 2013; accepted December 02, 2013. Date of publication January 02, 2014; date of current version March 24, 2014. This paper was approved by Guest Editor Jeffrey Gealow.

B. Kim, W. Xu, and C. H. Kim are with the Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN 55455 USA (e-mail: kimx2447@umn.edu).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JSSC.2013.2294323

1.25 1.20 2nd 3rd droop droop 1.15 1st 1.10 droop 1.05 1.00 20 20.5 21 21.5 22 µS (a) 6 1<sup>st</sup> Droop Relative Impedance Magnitude Region ( 5 4 3 2<sup>nd</sup> Droop 2 Region 1 0.01 0.001 0.1 10. 100. 0.0001 1. 1000. Frequency (MHz) (b)

Fig. 1. (a) Supply noise waveform of a typical high performance processor. It contains multiple droop components owing to the different resonance frequencies of a power supply network [2]. (b) Power supply network impedance response for Intel's Nehalem<sup>™</sup> processor [13].

believed to be the dominant supply noise source in modern high performance systems [8], [11] and has the largest voltage droop magnitude as shown in Fig. 1(a). The magnitude of the resonant noise can reach up to 10–15% of the nominal supply voltage while its fundamental frequency typically resides between 40 MHz and 300 MHz [12] as shown in the supply network impedance of Intel's Nehalem microprocessor in Fig. 1(b). Recently, researchers have revealed an intrinsic timing compensation phenomenon between the clock and the data signals (commonly referred to as the Clock-Data Compensation or CDC), which could alleviate the impact of resonant supply noise on processor speed [2], [13]. Adaptive clocking schemes have been proposed to maximize the CDC effect including a



Fig. 2. Impact of resonant supply noise on processor  $F_{max}$  (left) and power consumption (right) for different first droop noise amplitudes.

new clock tree design in [2] where the amplitude and phase of the resonant noise seen by the clock buffers are modified using RC filtered clock buffers. Another promising approach to enhance the CDC effect is to systematically couple the supply noise into the PLL output clock using programmable resistor [13] or capacitor banks [14]. However, previous adaptive PLLs rely on an exhaustive search algorithm to find the optimal CDC parameters, which involves a cumbersome and time-consuming calibration process. Moreover, once programmed, these parameters cannot be adjusted making the design susceptible to the operating condition changes and other PVT variations and aging effects. In addition, the extra passive devices and analog circuitry in the CDC modulator increases the PLL area and worsens the loop stability.

In this paper, we propose an Automatic Supply-noise Sensitivity Tracking (ASST) PLL that addresses all the above-mentioned issues. The proposed PLL aligns the local clock edge with the datapath signal using an on-the-fly supply-noise sensitivity tracking loop based on a tunable critical path monitor circuit that detects timing errors. In addition to the optimized performance and power consumption by using the automatic tracking loop, the PLL area is significantly reduced by utilizing, for the first time, an ultra-dense deep trench capacitor in the loop filter.

The remainder of this paper is organized as follows. Section II describes the impact of resonant supply noise on the processor performance and provides a brief introduction to the CDC effect. A mathematical derivation of the optimization of the CDC effect based on the proposed supply-noise sensitivity tracking scheme is presented in Section III. Section IV shows the implementation details of the 32 nm adaptive PLL test chip including the supply noise sensitivity tracking loop and the dense deep trench capacitor based low area PLL loop filter. The test chip measurement results are given in Section V and a summary is provided in Section VI.

# II. RESONANT SUPPLY NOISE AND CLOCK DATA COMPENSATION

Resonant supply noise has significant implications for improving chip performance and power consumption. Reducing the resonant supply noise allows processors to operate at a higher frequency for a given supply voltage as shown in Fig. 2(left). Similarly, power consumption of a processor can be reduced since the same performance can be met using a lower supply voltage (Fig. 2(right)). To minimize the resonant supply noise, the power supply network in modern processors must have extremely low impedance values (e.g., few milli-ohms) by employing large amounts of on-chip decoupling capacitors, but this incurs a significant area and leakage overhead. Furthermore, integrating more on-chip decoupling capacitors provides diminishing returns in terms of processor performance as experimentally shown in [15].

The timing compensation between the clock period and the datapath delay can be utilized to overcome the resonant noise issue at a much lower overhead. Fig. 3 illustrates this in the context of a conventional and adaptive PLL [13], [14]. Unlike the conventional view depicted in Fig. 3(left) where the clock period is constant irrespective of the supply noise, processor performance may not suffer as much with the beneficial timing compensation effect between the datapath delay and the clock period as shown in Fig. 3, center. The clock delivered to the local datapath is affected by the resonant supply noise while it is travelling through the clock distribution network and therefore, the clock period is modulated. The clock period modulation is a result of two consecutive clock edges travelling through the clockpath experiencing different delays under resonant supply noise. For example, during the supply noise upswings, the second clock edge travels faster than the first resulting in a compressed clock period. Similarly, the clock period stretches out during supply noise downswings. The net effect is the modulated clock partially compensating for the datapath delay variation, alleviating the impact of resonant supply noise (Fig. 3(center)). However, the dependence of the clockpath and datapath delays on supply noise are different and therefore, the intrinsic CDC can offer only limited timing relief.

In this paper, we propose a closed loop system that can track the optimal supply sensitivity parameter using the bit error information from a critical path replica circuit. The closed-loop tracking technique allows the processor to operate at its peak energy-efficiency point by aligning the local clock period with the datapath delay as shown in Fig. 3, right. By modulating the PLL output clock period while carefully accounting for the clock period modulation in the clockpath, timing failures in the datapath



Fig. 3. Intrinsic and enhanced CDC effects and their impact on processor performance.



Fig. 4. Intrinsic CDC effect when using a conventional PLL (left). Enhancing CDC effect by employing an adaptive PLL (right).

can be avoided as shown in Fig. 4. Eventually, this leads to a lower power consumption under iso-operating frequency or a higher operating frequency under iso-power consumption.

### III. PLL SUPPLY-NOISE SENSITIVITY TRACKING FOR OPTIMIZING CDC EFFECT AND ITS MATHEMATICAL DERIVATION

In previous adaptive PLL designs, CDC parameters such as phase shift and sensitivity were determined through an exhaustive search which can be cumbersome and time-consuming [14]. Another shortcoming of previous designs is that the CDC parameters could not be updated after the one-time calibration has been performed. This would make the CDC efficiency vary depending on the chip operating mode and PVT parameters. In this work, a single parameter (i.e., PLL supply-noise sensitivity) based CDC optimization is proposed to address these issues.

#### A. Mathematical Derivation of Optimal CDC Effect

Adaptive PLL designs in [14] and [16] enhance the CDC effect by adjusting the supply sensitivity (i.e., change in PLL output frequency or delay with respect to the resonant noise amplitude) and the phase-shift (i.e., phase difference between PLL output and resonant noise). This was achieved by systematically coupling the resonant supply noise to the PLL output

clock. Before we cover the circuit design details, we first show the mathematical derivation of optimal CDC parameters. It is worth pointing out that the exact mathematical proof for optimal timing compensation is complicated and provides limited insight, so instead a simplified version based on inference is provided here to help the readers understand the basic operation of the proposed adaptive PLL.

For the mathematical modeling of the CDC effect, let us first consider a reference delay line with a nominal delay of  $T_{NOM}$ and a supply sensitivity of S<sub>REF</sub>. Here, supply sensitivity is defined as the change in the reference delay under an AC supply noise, normalized to the amplitude of the supply noise. For example, if the supply voltage has an AC noise of  $V_{\rm NOM} - 0.1$ V\*  $\sin(2\pi ft)$  and the resulting delay change is  $T_{NOM} + 1 \text{ ns } *$  $\sin(2\pi ft + \theta)$ , the supply sensitivity is 1 ns/0.1 V = 10 ns/V. Using the concept of a reference delay line and assuming that its delay is short enough that the phase difference between the supply noise and delay is negligible, we can model the delay of an arbitrary signal traveling through a clock path or data path. To simplify the modeling, the resonant supply noise in our analysis is assumed to be a single-tone sinusoidal function  $V_{SUPPLY} = V_{NOM} - V_{NOISE} \cdot \sin(2\pi ft)$  where  $V_{NOM}$  is the nominal supply voltage and  $V_{NOISE}$  is the amplitude of the resonant noise having a frequency of f. Then, the time-varying delay of the reference delay line under resonant supply can be modeled as

$$T_{REF} = T_{NOM} + S_{REF} \cdot V_{NOISE} \cdot \sin(2\pi ft).$$
(1)

where  $T_{\rm NOM}$  is the nominal delay of a reference delay line at  $V_{\rm NOM}$  and  $S_{\rm REF}$  is the supply sensitivity of the reference delay line.

In order to calculate the period of the local clock under resonant noise, we need to know the period of the clock generated by the adaptive PLL as well as the change in the clock period due to the resonant noise in the clock path. The later component can be calculated by taking the difference in the clock path delays of two consecutive clock edges, i.e., the preceding 1st edge and the subsequent 2nd edge. That is, the local clock period can be derived using the following expression:

$$T_{\text{LOCAL}} = T_{\text{PLL}} + (T_{\text{CP-2nd}} - T_{\text{CP-1st}}).$$
(2)

Now, let us define the time instances pertaining to the clock path and delay path delays needed for the rest of the derivation:

| t:                           | time when the 2nd clock edge arrives at the                          |  |  |
|------------------------------|----------------------------------------------------------------------|--|--|
|                              | datapath (or time when the datapath signal                           |  |  |
|                              | arrives at the sampling flip-flop);                                  |  |  |
| $t - T_{DP}$ :               | time when the datapath signal was launched and entered the datapath; |  |  |
| $t - T_{LOCAL}$ :            | time when the 1st clock edge arrives at the datapath;                |  |  |
| $t - T_{CP-2nd}$ :           | time when the second clock edge enters the clockpath;                |  |  |
| $t - T_{CP-2nd} - T_{PLL}$ : | time when the first clock edge enters the clockpath.                 |  |  |

Using the general delay expression given in (1) and the timing points defined above, we derive the period of an adaptive PLL's output clock (i.e.,  $T_{PLL}$ ) under resonant supply noise by integrating (1) from  $t - T_{CP-2nd} - T_{PLL}$  to  $t - T_{CP-2nd}$ :

$$T_{PLL} = T_{PLL-NOM} + (S_{PLL} \cdot V_{NOISE}/2\pi f)$$
$$\cdot [\cos\{2\pi f(t - T_{CP-2nd} - T_{PLL})\}$$
$$- \cos\{2\pi f(t - T_{CP-2nd})\}].$$
(3)

Here,  $T_{\rm PLL-NOM}$  is the PLL clock period at a nominal supply voltage of  $V_{\rm NOM}$  and  $S_{\rm PLL}$  is the supply noise sensitivity of the adaptive PLL. To derive the local clock period  $T_{\rm LOCAL}$  in (2), now we need to calculate the clockpath delay of the 1st and 2nd clock edges, namely  $T_{\rm CP-1st}$  and  $T_{\rm CP-2nd}$ . The delay of the 1st clock edge travelling through the clockpath is the time integration of (1) from  $t-T_{\rm CP-2nd}-T_{\rm PLL}$  to  $t-T_{\rm LOCAL}$  and expressed as

$$T_{CP-1st} = T_{CP-NOM} + (S_{CP} \cdot V_{NOISE}/2\pi f)$$
$$\cdot [\cos\{2\pi f(t - T_{CP-2nd} - T_{PLL})\}$$
$$- \cos\{2\pi f(t - T_{LOCAL})\}]. \tag{4}$$

Here,  $S_{CP}$  is the supply sensitivity of clockpath and  $T_{CP-NOM}$  is the nominal clockpath delay at a supply voltage of  $V_{NOM}$ .

Likewise, the clock path delay of the 2nd clock edge is the time integration of (1) from  $\rm t-T_{CP\text{-}2nd}$  to  $\rm t$ 

$$T_{CP-2nd} = T_{CP-NOM} + (S_{CP} \cdot V_{NOISE}/2\pi f)$$
$$\cdot [\cos\{2\pi f(t - T_{CP-2nd})\} - \cos\{2\pi ft\}].$$
(5)

Since deriving an analytical expression for optimal CDC is quite involved, we will use the inference method to find the optimal solution by first assuming that supply sensitivities of the PLL and clockpath are identical (i.e.,  $S_{PLL} = S_{CP}$ ). Using (2)–(5), we are now able to express the local clock period as

$$T_{\text{LOCAL}} = T_{\text{PLL-NOM}} + (S_{\text{CP}} \cdot V_{\text{NOISE}}/2\pi f)$$
$$\cdot [\cos\{2\pi f(t - T_{\text{LOCAL}})\} - \cos(2\pi ft)] \quad (6)$$

which shows that the local clock period under the optimal CDC effect is simply the time-varying delay of the 2nd clock edge travelling from  $t-T_{\rm LOCAL}$  to t.

Similar to the derivation of the clock period, we can calculate the time-varying delay of the local datapath by integrating (1) in time from  $t - T_{DP}$  to t:

$$T_{\rm DP} = T_{\rm DP-NOM} + (S_{\rm DP} \cdot V_{\rm NOISE}/2\pi f)$$
$$\cdot [\cos\{2\pi f(t - T_{\rm DP})\} - \cos(2\pi ft)]. \quad (7)$$

Similarity between (6) and (7) suggests that the local clock period and datapath delay react in similar ways in the presence of resonant supply noise. The main discrepancy arises from the fact that the supply sensitivity parameters appearing in the two equations are different. That is,  $S_{CP}$  is used in (6) while  $S_{DP}$  is used in (7). Both numerical and circuit simulations show that the impact on timing compensation effect is negligibly small (see Section III.C) and hence, we can conclude that the two time-varying equations can be aligned closely by simply making the supply sensitivity of the PLL the same as that of the clockpath.

#### B. Proposed PLL Supply-Noise Sensitivity Tracking Loop

Based on the mathematical derivation described in the previous section, we propose a scheme in which the optimal CDC effect is achieved using a single parameter (i.e., PLL supplynoise sensitivity) control. The key benefit of a single parameter control over a multi-parameter one [14] is that it enables a simpler closed-loop system for tracking the optimal configuration (Fig. 5). By employing a tracking based calibration scheme and periodically updating the supply-sensitivity parameter, we can achieve the optimal performance improvement irrespective of the processor operating condition. Simulated results in Fig. 6 show a 7.8% higher processor  $F_{max}$  across a wide range of PVT parameters for the proposed ASST PLL as compared to the previous adaptive PLL design which has a fixed supply-sensitivity. The detailed circuits and an analysis on the proposed closed-loop system will be described in Section IV.

#### IV. CIRCUIT IMPLEMENTATION IN 32 NM SOI

#### A. Automatic Supply-Noise Sensitivity Tracking Loop

A test chip was fabricated in a 0.9 V, 32 nm SOI process to verify the ASST PLL operation and the details are given



|           | # of control parameters  | Parameter setting    | Passive area                 |
|-----------|--------------------------|----------------------|------------------------------|
| Conv.     | None                     | None                 | Large                        |
| [13]      | One (sensitivity)        | 1D sweep, one time   | Small                        |
| [14]      | Two (phase, sensitivity) | 2D sweep, one time   | Large                        |
| This work | *One (sensitivity)       | Closed-loop tracking | Small (C <sub>trench</sub> ) |
|           |                          |                      |                              |

\*Key requirement for stable closed-loop tracking

Fig. 5. Comparison with prior art. The proposed adaptive PLL employs an error tracking loop that adjusts the amount of supply noise coupled to the PLL output clock period according to the timing error information. Additionally, the PLL area is reduced by utilizing deep trench capacitors in the loop filter.



#### □ Conv. PLL □ Fixed Sensitivity ■ Optimal Sensitivity

Fig. 6. Effectiveness of the proposed PVT tracking loop. The proposed scheme (black bars) can achieve up to a 7.8% higher processor  $F_{max}$  compared to a previous one-time calibration scheme (gray bars) under extreme PVT conditions.

in Fig. 7. The PLL consists of building blocks for a typical charge-pump PLL such as a phase frequency detector (PFD), a charge-pump (CP), a loop filter and a VCO, along with special circuits that are part of the supply noise tracking loop. In order to AC-couple the resonant supply noise to the PLL control voltage with constant sensitivity steps, a CDC modulator consisting of two capacitor banks (Cu, Cd) was implemented with each having 63 unit capacitors. Prior to the tracking operation, the delay of the critical path replica was set to its target value using tunable delay stages (Fig. 9(b)).

The PLL operation starts by enabling the sensitivity tracking loop in Fig. 7 after the PLL is locked. To adaptively control the PLL supply-noise sensitivity, an on-chip error monitor circuit is needed. For this purpose, a replica critical path monitor circuit with a tunable delay (Fig. 7) was designed [17]. A bit error monitor enables the error output ERR whenever a timing violation occurs in the critical path replica circuit. This is achieved by comparing the output of the replica path with the correct value (i.e., input of the replica path) using an XOR gate. We have the flexibility to choose between a single error event (=fast but potentially unstable tracking response) or until a certain number of errors has been reached (=slow but smooth tracking response) for updating the supply noise sensitivity. An up/down counter with a binary-to-thermometer code decoder is used to convert bit errors into a sensitivity code. Once the tracking loop is locked, a digital filter determines the up/down counting direction according to the current bit error information.

To analyze the stability of the tracking loop, we need to compare the loop latency (or response time) with the sensitivity tracking time (i.e., the time it takes for the loop to reach a steady state). The loop latency consists of three delay components. The first is the delay from the thermometer code to update of the PLL supply-noise sensitivity. The corresponding signal path includes a high-pass filter and a single-stage current mirror in the differential VCO. The second delay is the clockpath delay which is around 1 ns. The final delay component comes from the bit



Fig. 7. Overall diagram of ASST PLL test chip.

error monitor. The sum of the three delay components is approximately 1 ns and does not exceed a few nanoseconds even in the worst case when the digital filter is set to accumulate the bit errors. Although the short loop latency opens the possibility of a continuous/instantaneous tracking loop, we feel that the proposed circuit is better suited for periodically calibrating the CDC parameters in real designs. For example, whenever a processor undergoes a change in the supply voltage (e.g., DVFS) or operating mode, we can first activate the tracking loop to update the supply sensitivity and then switch back to a normal mode where the processor operates at its peak  $F_{max}$  point. The sensitivity tracking time would be equivalent to many resonant noise periods as shown in the typical response behavior in Fig. 15. Considering the negligibly small loop latency (~few nanoseconds) compared to the sensitivity tracking delay (~hundreds of nanoseconds), the loop is considered stable.

#### B. Built-in Test Circuitry

A dedicated on-chip resonant noise generation circuit shown in Fig. 8 was implemented to test the PLL performance. First, a VCO with an external voltage bias generates the main clock. Various clock patterns such as a continuous clock, pulsed clock, or random clock can be created using a frequency divider and a clock synthesis block. The noise amplitude can also be conveniently controlled using the numerous noise injection NMOS devices that can be individually activated using a 5 bit binary code. Each NMOS device induces a fixed current spike and by activating a number of them, we can achieve a realistic resonant noise amplitude. The flexibility of this design allows us to test the PLL for a wide range of resonant noise patterns and amplitudes. Traditionally, PLL performance is characterized by directly connecting the output signal to a high speed sampling oscilloscope or to a BER measurement equipment. In this work, a simple BER measurement circuit was included in the test chip



Fig. 8. On-chip resonant noise generation circuit.

that allows us to monitor the BER in the critical path replica block using a simpler setup. It consists of a 10-bit counter and a 10-to-1 digital multiplexer (Fig. 9(a)). We measure the average period of the error output and compare that against the PLL clock frequency to calculate the BER without an extensive test setup [14].

#### C. Deep Trench Capacitor Based Loop Filter

The proposed ASST PLL effectively utilizes deep trench capacitor technology, originally developed for embedded DRAM cells [18]. The capacitance density is approximately two orders of magnitude higher compared to that of a thick oxide MOS capacitor (i.e., default option for most traditional PLLs) while the tunneling leakage is negligible due to the thick dielectric layer. Note that only the area-dominating integrating capacitor ( $C_i$  in Fig. 10) was implemented using a deep trench capacitor because the relatively high series resistance of trench capacitors limits their ripple rejection capability when used as a third-pole capacitor ( $C_p$  in Fig. 10). This was confirmed through AC and transient simulations. As is shown in Fig. 11, the deep trench capacitor has 23 dB lower high frequency noise rejection capability and the transient PLL locking simulation result shows a



Fig. 9. (a) On-chip BER measurement circuit. (b) Critical path circuit with tunable delay.



Fig. 10. Loop filter capacitor options and trade-offs.

 $15\times$  larger ripple voltage when it is used as a ripple rejection capacitor for the PLL. Measured results in Fig. 12 show no noticeable difference in PLL performance between a deep trench  $C_{\rm i}$  and a thick oxide  $C_{\rm i}$  based loop filter while the former provides a significant reduction in PLL area. Fig. 13 compares the area between a thick oxide and deep trench capacitor implementation, showing a  $56\times$  reduction in the integrating capacitor area. This translates into a  $12.5\times$  reduction in overall PLL area as shown in Fig. 19.

## D. Practical Design Considerations

A first-order high-pass filter with a finite capacitance was used in our design to couple the supply noise to the PLL clock [12]. Although the actual phase shift induced by this circuit cannot be completely eliminated, it can be made negligibly small compared to the period of resonant noise frequency (e.g., 40-300 MHz) by choosing proper R and C values. Another potential concern is the discrepancy between the supply sensitivities of the clockpath and the datapath which may have a negative impact on the CDC effect. It is well known that an interconnect-dominated signal path has a lower supply sensitivity compared to a logic-dominated one. To quantify this issue, we simulated the CDC effect for datapaths with different interconnect lengths. Minimum sized inverters were used for this test while the interconnect length was varied from 10  $\mu$ m to 160  $\mu$ m. The results in Fig. 14 show an  $F_{max}$  improvement from 15.6% to 16.7% using the optimal CDC configuration for interconnect lengths shorter than 40  $\mu$ m. The F<sub>max</sub> improvement drops for longer wire lengths due to the lower supply sensitivity of the datapath delay. However, the simulation results prove that for practical driver and interconnect configurations, optimizing the CDC effect can provide a significant improvement in processor



Fig. 11. Simulated PLL third-pole AC response and transient ripple noise for deep trench Cp and thick oxide Cp.



Fig. 12. Measured PLL performance for deep trench Ci and thick oxide Ci.



Fig. 13. Integrating capacitor area comparison between deep trench and thick oxide capacitor.

performance. Finally, mismatch between the supply noise seen by the actual critical path and the replica circuit will affect the efficacy of any CDC enhancement technique including ours. Although local noise does exist, it has been shown that resonant noise is more dominant and affects the entire chip globally. These unique properties make circuit techniques enhancing clock-data compensation (e.g., adaptive PLL in [12], [13], [15]) highly effective in modern processor designs.



Fig. 14.  $F_{max}$  simulation results for intrinsic CDC and optimal CDC. For this test, we used minimum sized inverters driving wire interconnects with different lengths.



Fig. 15. Simulated waveforms (above) and measured VDD and PLL control voltage VBN (below) of the proposed ASST PLL.

#### V. TEST CHIP MEASUREMENT RESULTS

For better testability, the PLL reference clock frequency that can be varied from 50 MHz to 200 MHz while the frequency divider can support different programmable ratios (8, 16, 32 and 64). The VCO output frequency was designed to have a wide frequency range of 1 GHz to 3 GHz for the same reason. The nominal PLL loop bandwidth was chosen to be 5 MHz based on the following design parameters: a charge pump current of 50  $\mu$ A, a K<sub>vco</sub> of 10 GHz/V, an integrating cap of 320 pF, a 3rd pole capacitance and resistance of 20 pF and 1 k-ohm, and a dividing ratio of 16.

Several ASST waveforms along with the measured PLL control voltage are shown in Fig. 15 for a typical tracking operation. The ASST PLL starts an initial timing error tracking with a monotonically increasing counter output while bit errors are being generated on-the-fly. The increased counter outputs are decoded to the thermometer code and then the CDC modulator starts to AC-couple the resonant noise to the PLL control voltage. After the initial locking, the tracking loop responds to



Fig. 16. Measured BER versus operating frequency.



Fig. 17. Measured  $F_{max}$  vs. noise amplitude and frequency.

any changes in the clockpath sensitivity due to voltage/temperature shifts by modifying the sensitivity code (i.e., EN[62:0] in Fig. 7).

To compare the performance between the conventional and ASST PLLs,  $F_{max}$  was extracted from the measured BER vs. frequency data [14]. Without loss of generality, we define the maximum operating frequency  $F_{max}$  as the frequency when the BER is  $10^{-6}$  [14]. PLL phase noise along with the actual datapath delay fluctuation under supply noise is accounted for in the BER measurements. As shown in Fig. 16, a 15%  $F_{max}$  improvement was measured with the proposed ASST PLL compared to the conventional PLL when the resonant noise has 100 MHz frequency and 90 mV amplitude (i.e., 10% of supply voltage). Note that a stronger CDC effect would shift the BER curve to the right while a lower jitter manifests as a steeper slope. The measured BER curve of the ASST PLL shows a considerable shift compared to the conventional design for a marginal decrease in



Fig. 18. Measured F<sub>max</sub> vs. processor supply voltage



Fig. 19. Die photo and feature summary table.

the slope (i.e., from 7.8 E-8/MHz to 6.4 E-8/MHz) confirming the effectiveness of the proposed circuits.

The proposed ASST PLL achieves 14.5% to 15.6% higher processor  $F_{max}$  compared to a conventional PLL with a constant output clock period under 90 mV supply noise amplitude (Fig. 17). PLL performance at different noise amplitudes, noise frequencies and clockpath designs has been measured to verify the effectiveness under a wide range of usage scenarios. From the measured results, it is proven that the ASST PLL improves the processor performance proportional to the noise amplitude throughout the resonant noise frequency band regardless of the interconnect types.

Fig. 18 shows the measured  $F_{\rm max}$  for different supply voltages and PLL types. The proposed PLL achieves an  $F_{\rm max}$  of 1.417 GHz at a lower supply (0.855 V) compared to the conventional PLL (0.9 V). This translates into a  $CV^2f$  power reduction of 9.8% under iso-performance condition. Finally, the chip microphotograph and feature summary table are given in Fig. 19.

#### VI. CONCLUSION

An adaptive PLL featuring an automated supply-noise sensitivity tracking loop for mitigating the impact of resonant supply noise on processor performance across the wide range of PVT condition was demonstrate in 32 nm. The proposed design is based on a single parameter (i.e., supply-noise sensitivity) tracking loop which maintains the optimal CDC configuration by monitoring timing errors from a critical path replica. A test chip was designed in 32 nm CMOS to evaluate the proposed circuits. A 14.5% to 15.6% processor  $F_{max}$  improvement was achieved for a resonant supply noise amplitude of 10%·VDD. The improved  $F_{max}$  can be translated into a 9.8% reduction in power consumption for iso-performance as the proposed PLL allows the system to operate at a lower voltage while meeting the same  $F_{max}$  requirement. In addition, the use of dense deep trench integrating capacitors enabled a 92.1% reduction in PLL area compared to a conventional PLL based on a thick-oxide capacitor implementation.

#### REFERENCES

- M. Saint-Laurent and M. Swaminathan, "Impact of power-supply noise on timing in high-frequency microprocessors," *IEEE Trans. Adv. Packag.*, vol. 27, no. 1, pp. 135–144, Feb. 2004.
- [2] K. L. Wong *et al.*, "Enhancing microprocessor immunity to power supply noise with clock-data compensation," *IEEE J. Solid-State Circuits*, vol. 41, no. 4, pp. 749–758, Apr. 2006.
- [3] X. Hu et al., "Enabling power distribution network analysis flows for 3D ICs," in Proc. IEEE Int. 3D Systems Integration Conf., 2010, pp. 1–4.
- [4] V. Gutnik and A. Chandrakasan, "Active GHz clock network using distributed PLLs," *IEEE J. Solid-State Circuits*, vol. 35, no. 11, pp. 1553–1560, Nov. 2000.
- [5] T. Fischer et al., "A 90-nm variable frequency clock system for a power-managed itanium architecture processor," *IEEE J. Solid-State Circuits*, vol. 41, no. 1, pp. 218–228, Jan. 2006.
- [6] M. Mansuri and C. K. Yang, "A low-power adaptive bandwidth PLL and clock buffer with supply-noise compensation," *IEEE J. Solid-State Circuits*, vol. 38, no. 11, pp. 1804–1812, Nov. 2003.
- [7] S. Yasuda and S. Fujita, "Compact fault recovering flip-flop with adjusting clock timing triggered by error detection," in *Proc. IEEE Custom Integrated Circuits Conf. (CICC)*, 2007, pp. 721–724.
- [8] E. Hailu et al., "A circuit for reducing large transient current effects on processor power grids," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, 2006, pp. 2238–2245.
- [9] J. Gu, R. Harjani, and C. Kim, "Distributed active decoupling capacitors for on-chip supply noise cancellation in digital VLSI circuits," in *Symp. VLSI Circuits Dig.*, 2006, pp. 216–217.
- [10] J. Xu et al., "On-die supply-resonance suppression using band-limited active damping," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, 2007, pp. 286–603.
- [11] J. Gu, H. Eom, and C. H. Kim, "On-chip supply noise regulation using a low power digital switched decoupling capacitor circuit," *IEEE J. Solid-State Circuits*, vol. 44, no. 6, pp. 1765–1775, Jun. 2009.
- [12] D. Jiao, B. Kim, and C. H. Kim, "Design, modeling, and test of a programmable adaptive phase-shifting PLL for enhancing clock data compensation," *IEEE J. Solid-State Circuits*, vol. 47, no. 10, pp. 2505–2516, Oct. 2012.
- [13] N. Kurd *et al.*, "Next generation Intel<sup>®</sup> core<sup>™</sup> micro-architecture (Nehalem) clocking," *IEEE J. Solid-State Circuits*, vol. 44, no. 4, pp. 1121–1129, Apr. 2009.
- [14] D. Jiao and C. H. Kim, "A programmable phase-shifting PLL for clock data compensation under resonant supply noise," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, 2011, pp. 272–274.
- [15] T. Rahal-Arabi *et al.*, "Design and validation of the Pentium III and Pentium 4 processors power delivery," in *Symp. VLSI Circuits Dig.*, 2002, pp. 220–223.

- [16] B. Kim, W. Xu, and C. H. Kim, "A 32 nm, 0.9 V supply-noise sensitivity tracking PLL for improved clock data compensation featuring a deep trench capacitor based loop filter," in *Symp. VLSI Circuits Dig.*, 2013, pp. C162–C163.
- [17] J. Tschanz, K. Bowman, S. Walstra, M. Agostinelli, T. Karnik, and V. De, "Tunable replica circuits and adaptive voltage-frequency techniques for dynamic voltage, temperature, and aging variation tolerance," in *VLSI Circuits Dig.*, 2009, pp. 112–113.
- [18] N. Butt *et al.*, "A 0.039  $\mu$ m<sup>2</sup> high performance eDRAM cell based on 32 nm high-K/metal SOI technology," in *IEDM Tech. Dig.*, 2010, pp. 27.5.1–27.5.4.
- [19] J. Craninckx and M. Steyaert, "A fully integrated CMOS DCS-1800 frequency synthesizer," *IEEE J. Solid-State Circuits*, vol. 33, no. 12, pp. 2054–2065, Dec. 1998.



**Bongjin Kim** (S'03–M'10) received the B.S. and M.S. degrees in electrical engineering from Pohang University of Science and Technology (POSTECH), Pohang, Korea, in 2004 and 2006, respectively. He is currently pursuing the Ph.D. degree in the Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN, USA.

He spent four years with System LSI, Samsung Electronics, Giheung, Korea, from 2006 to 2010, where he performed research on the clock generator circuits for high-speed serial interface PHY

transceivers. From May 2012 to August 2012, he worked as an intern in wireless business at Texas Instruments, Dallas, TX, USA, where he designed a low-power bulk-acoustic wave oscillator circuits. From June 2013 to August 2013, he worked as a research summer intern in mixed-signal communication IC design group at IBM T. J. Watson Research Center, Yorktown Heights, NY, USA. His research interests include VLSI and mixed-signal CMOS integrated circuits including PLL, ADC, TDC and high-speed I/O.



Weichao Xu received the B.Eng. degree in electronic engineering from the Chinese University of Hong Kong. He is working towards the Ph.D. degree at the University of Minnesota, Minneapolis, MN, USA, since joining Prof. Chris Kim's research group in 2011. He has been involved in chip design and testing in both silicon and non-silicon chips. His current work is primarily focused on printable organic thin-film transistor modeling and its integrated circuit design.



Chris H. Kim (M'04–SM'10) received the B.S. and M.S. degrees from Seoul National University, Seoul, Korea, and the Ph.D. degree from Purdue University, West Lafayette, IN, USA.

He spent a year at Intel Corporation where he performed research on variation-tolerant circuits, on-die leakage sensor design and crosstalk noise analysis. He joined the Electrical and Computer Engineering Faculty at the University of Minnesota, Minneapolis, MN, USA, in 2004 where he is currently an Associate Professor.

Prof. Kim is the recipient of an NSF CAREER Award, a Mcknight Foundation Land-Grant Professorship, a 3M Non-Tenured Faculty Award, DAC/ISSCC Student Design Contest Awards, IBM Faculty Partnership Awards, an IEEE Circuits and Systems Society Outstanding Young Author Award, ISLPED Low Power Design Contest Awards, and an Intel Ph.D. Fellowship. He is an author/coauthor of 100+ journal and conference papers and has served as the technical program committee chair for the 2010 International Symposium on Low Power Electronics and Design (ISLPED). His research interests include digital, mixed-signal, and memory circuit design in silicon and non-silicon (organic TFT and spin) technologies.