# Adaptive Techniques for Overcoming Performance Degradation due to Aging in Digital Circuits<sup>\*</sup>

Sanjay V. Kumar, Chris H. Kim, and Sachin S. Sapatnekar University of Minnesota, Minneapolis MN 55455

Abstract— Negative Bias Temperature Instability (NBTI) in PMOS transistors has become a major reliability concern in present-day digital circuit design. Further, with the recent usage of Hf-based high-k dielectrics for gate leakage reduction, Positive Bias Temperature Instability (PBTI), the dual effect in NMOS transistors has also reached significant levels. Consequently, designers are required to build in substantial guard-bands into their designs, leading to large area and power overheads, in order to guarantee reliable operation over the lifetime of a chip. We propose a guard-banding technique based on adaptive body bias (ABB) and adaptive supply voltage (ASV), to recover the performance of an aged circuit, and compare its merits over previous approaches.

#### I. INTRODUCTION

NBTI (Negative Bias Temperature Instability) in PMOS devices has become a key reliability issue in sub-130nm technologies. NBTI manifests itself as an increase in the PMOS transistor threshold voltage  $V_{th}$  with time, thereby causing circuit delays to degrade temporally, and exceed their specifications. A corresponding and dual effect, known as Positive Bias Temperature Instability (PBTI) [1]– [3] is seen for NMOS devices, when a positive bias stress is applied across the gate oxide of the NMOS device. Although the impact of PBTI is lower than NBTI [2], PBTI is increasingly becoming important in its own right, particularly with the use of Hf-based dielectrics in the gate-oxide for leakage reduction [1], [3].

Previous approaches to guard-banding a circuit and ensuring optimal performance over its lifetime, such as sizing [4], [5], and synthesis [6] can be classified as "one-time" solutions that add appropriate guard-bands at design time. These methods are generally formulated as optimization problems of the type:

$$\begin{array}{l} \text{Minimize } \alpha \text{ Area } + \beta \text{ Power} \\ \text{s.t. } D(0 \leq t \leq t_{\text{life}}) \leq D_{\text{spec}} \end{array} \tag{1}$$

where  $\alpha$  and  $\beta$  are weights associated with the area and the power objectives, respectively, while  $D_{\text{spec}}$  is the specified target delay, that must be met at all times, up to the lifetime of the circuit,  $t_{\text{life}}$ .

Under the framework of (1), both the synthesis and sizing optimizations lead to an increase in area and power, as compared with a nominally designed circuit that is guaranteed to meet the specification at design time. The work in [6] argues that synthesis can lead to area and power savings, as compared with sizing optimizations, thereby resulting in a lower amount of guard-band. However, guard-banding (through sizing or synthesis) is performed during design time, and is a one-time fixed amount of padding added into the circuit in the form of gates with a higher drive strength. Inevitably, this approach results in large positive slacks during the initial stages of operation of the circuit, and therefore a larger than necessary area and power overhead, in comparison with a circuit designed to exactly meet the specifications, at every time point, during its operation.

We also note that while both NBTI and PBTI cause the transistor threshold voltages to increase, resulting in larger delays, higher  $V_{th}$ also implies lower subthreshold leakage ( $I_{sub} \propto e^{\frac{-V_{th}}{mkT}}$ ). Therefore, both NBTI and PBTI cause the leakage of the circuit to decrease with time, thereby providing the opportunity to trade-off this slack in leakage to restore the lost performance. Adaptive Body Bias (ABB) [7] provides an attractive solution to explore leakage-performance

\*This research was supported in part by SRC under contract 1572.002.

trade-offs. Forward Body Bias (FBB) can be used to speed up a circuit [8], by reducing the  $V_{th}$ , thereby using up the available slack in the leakage budget. Further, the amount of FBB can be determined adaptively based on the exact temporal degradation of the circuit, and hence requisite amounts of body bias can be applied, to exactly meet the target specifications under all conditions.

The main advantage of this scheme is that the performance can be recovered with a minimal increase in the area overhead as compared with "one-time" approaches such as sizing and synthesis. While [9] demonstrated that ABB could be used to recover the circuit from voltage and temperature variations, as well as aging, our work is the first comprehensive CAD solution to take advantage of the reduction in leakage due to bias temperature instability (BTI). We demonstrate how ABB can be used to maintain the optimal performance of the circuit over its lifetime  $t_{life}$ , by determining the PMOS and NMOS body bias values (and supply voltage) required at all time points.

Both the sizing [5] and synthesis [6] approaches have shown reductions in the area and power overhead in using a signal probability profile to determine the delay of the circuit, inside the optimization engine, as opposed to computing the worst-case delay degradation of the circuit, based on [4]. However, it is not possible to compute the asymptotic signal probabilities, over the entire lifetime operation of the circuit, in an accurate manner. Further, any such signal probability based framework must guarantee a reliable operation under all workload conditions. Hence, we consider a worst-case delay degradation based model to determine the temporal delays of the circuit.

Adaptive control systems can be implemented using a critical path replica based approach, or a look-up table based technique, as described in detail in [10]. Comparing these two methods, we note that the critical path replica based approach has several limitations. Hence, we propose to use a look-up table consisting of the optimal body bias and supply voltages, indexed by the time of stress of the circuit. Accordingly, we propose an algorithm to compute the entries of the look-up table, such that the delay of the circuit is met at all time points, and the power overhead is minimized. Our results indicate that this method can be used as a viable alternative to "onetime" solutions such as synthesis, or sizing, thereby resulting in a very small area overhead.

## **II. PROBLEM FORMULATION**

We begin this section by determining the impact of NBTI on the delay and leakage of digital circuits. We then explore the potential of FBB to recover performance, subject to power constraints, and formulate an optimization problem, accordingly.

#### A. Impact of BTI on Delay and Leakage

The Reaction-Diffusion framework [11], [12] has widely been used to determine the long-term impact of NBTI on the threshold voltage degradation of a PMOS device. Accordingly, the  $V_{th}$  degradation for a PMOS transistor under DC stress increases asymptotically as  $\Delta V_{th}(t) \propto t^{\frac{1}{6}}$ , based on [13], [14]. We also use a PBTI model where the degradation mechanism is similar to NBTI, but the magnitude of  $V_{th}$  degradation is lower. The  $\Delta V_{th}$  for a PMOS device after  $10^8$  seconds ( $\approx$  three years) of DC stress is around 50mV (nominal value is -411.8mV), while that for an NMOS device is around 30mV (nominal value of 466mV), based on PTM 45nm model files [15]. It must be noted that NBTI affects the  $V_{th}$  of PMOS devices, and hence the rising delay of a gate, while PBT<del>I</del> affects that of NMOS transistors, and therefore the falling delay only. Worst-case degradation [4] of all gates in the circuit is assumed, for reasons, that will become apparent in Section III. SPICE simulations are run<sup>bh</sup> at degraded  $V_{th}$  values accordingly, to obtain the new delay and leakage numbers, at  $T = 105^{\circ}$ C, at different time-stamps. Since BTI is enhanced with temperature, the library gates are characterized at Ithe maximum operating temperature of the chip of  $T = 105^{\circ}$ C. I

The SPICE numbers are curve-fitted to obtain a model for delay and leakage as a function of the transistor threshold voltages. The delay of a gate is modeled as:

$$D(t) = D_0 + \sum_{i=1}^n \frac{\partial D}{dV_{th_i}} \Delta V_{th_i}(t)$$
(2)

where the  $\frac{dD}{dV_{th}}$  (sensitivity) terms for each of the *n* transistors in the gate, along the input-output path, are determined through a linear least-square curve-fitting. This first order sensitivity based model is fairly accurate, and has an average error of 1% in comparison with the simulation results, within the ranges of  $V_{th}$  degradation caused by BTI. Similarly, a model for leakage can be developed as:

$$L(t) = L_0 e^{\left(\sum_{i=1}^n \frac{\partial L}{dV_{th_i}} \Delta V_{th_i}(t)\right)}$$
(3)

Note that the  $\Delta V_{th}(t)$ ,  $D_0$ , and  $L_0$  values are functions of the supply voltage,  $V_{dd}$ . The leakage numbers are experimentally verified to have an average accuracy of around 95% with respect to the SPICE simulated values.



(a) Temporal Delay
 (b) Temporal Leakage
 Fig. 1. Impact of BTI on the delay and leakage of LGSYNTH93 benchmark
 "des", at different time-stamps.

At the circuit level, Fig. 1 shows the impact of BTI on the temporal delay and leakage of an LGSYNTH93 benchmark "des". The nominal delay and leakage numbers are shown in dotted lines. The results indicate that the delay degrades by around 14%, whereas the leakage reduces by around 50%, after three years of operation.

#### B. Recovery in Performance using FBB

Fig. 2 explains the impact of BTI and the potential of FBB to maintain optimal performance. The drain current  $I_{ds}$  for a PMOS transistor is plotted in Fig. 2(a) for two distinct cases, i.e.,  $I_{on}$  when  $V_{gs} = -V_{dd}$ , and  $I_{off}$  when  $V_{gs} = 0$ , using different scales. Fig. 2(a) plots the currents as a function of PMOS  $V_{th}$ , showing the reduction in the currents due to aging. Fig. 2(b) shows the increase in the on and off currents with the amount of forward body bias  $(V_{bb})$  applied, computed when the transistor is maximally aged. When a FBB of 0.32V is applied, this effectively sets  $V_{th}$  to  $V_{th_0}$ , and hence  $I_{on}=I_{on_0}$ , and  $I_{off}=I_{off_0}$ , where  $I_{on_0}$  and  $I_{off_0}$  are the nominal values. The change in junction capacitance and the subthreshold slope is assumed to be negligible within the ranges of the FBB voltages considered in this framework, based on the results in [8], and [16], respectively.

At first glance, one may imagine that FBB could be used to completely recover any degradation in the PMOS transistor threshold voltage, and bring the  $I_{on}$  and  $I_{off}$  values to their original levels,



Fig. 2.  $I_{on}$  and  $I_{off}$  characteristics (shown using different scales) for a PMOS device with NBTI and FBB.

thereby completely restoring its performance and leakage characteristics. However, on closer examination, it is apparent that this is not the case. The results of applying FBB on a temporally degraded inverter (after three years of constant continuous stress on all the transistors) are shown in Fig. 3. Fig. 3(a) shows the average delay of the inverter (measured as  $\frac{1}{2}(\tau_R + \tau_F))^*$ , where  $\tau_R$  and  $\tau_F$  are the output rise and fall times, respectively, while the x-axis denotes the amount of NMOS and PMOS body bias voltage  $V_{bb}$ , applied simultaneously. The results indicate that after three years of temporal degradation,  $V_{bb}$  of around 0.30V must be applied to restore the delay of the inverter to its original value (shown in dotted lines).

Fig. 3(b) plots the total leakage power assuming maximal  $V_{th}$ degradation of both the NMOS and the PMOS transistors, obtained by combining the subthreshold leakages, along with the substrate junction leakages, with respect to the amount of FBB (denoted as  $V_{bb}$ ), simultaneously applied to both the NMOS and the PMOS devices. The nominal leakage, computed at t = 0s, i.e., with  $\Delta V_{th} = 0$ is shown in dotted lines, and is chosen as the leakage budget. The figure shows that the leakage rapidly rises with  $V_{bb}$ , and exceeds the leakage budget at around 0.2V. This is due to the exponential increase in substrate junction leakages with forward body bias, as shown in Fig. 3(c), which plots the individual components of leakage power, namely the subthreshold and junction leakages for the NMOS and PMOS devices, denoted as  $I_{sub}$  and  $I_{junc}$ , respectively. We ignore the contribution of gate leakage current to the leakage power overhead, since BTI and FBB both do not cause any impact on gate leakage. Further, with the use of high-k dielectrics, gate leakage has been reduced by several orders of magnitude, making it negligible in comparison with the subthreshold and junction leakages.

From Fig. 3(a) and Fig. 3(b), it can be inferred that a complete recovery in the delay degradation of the circuit results in the leakage current exceeding the budget. Simulation results indicate that our benchmark circuits require FBB of the order of (0.3-0.4V), which leads to an exponential increase in the power consumption.

## C. Problem Formulation

As explained in the previous subsection, mere usage of ABB to restore fully the performance of the circuit results in a large power overhead, particularly at times closer to the lifetime of the circuit, since a large amount of FBB may be necessary. Hence, we propose to apply adaptive supply voltage (ASV) in conjunction with adaptive body bias (ABB), to minimize the power overhead. The optimal choice of the values of the NMOS body bias voltage (denoted as  $v_{bn}$ ), PMOS body bias voltage (denoted as  $v_{bp}$ ), and supply voltage  $V_{dd}$  to meet the performance constraint is such that

<sup>\*</sup>The average of rise and fall delays is considered, since alternate stages of logic in a path (consisting of all inverting gates) undergo rising and falling transitions, respectively.



Fig. 3. Impact of applying FBB to a degraded inverter at  $t = t_{\text{life}}$ . Fig. (a) plots the increase in delay with FBB, while Fig. (b) shows the increase in the total leakage power consumed. Fig. (c) shows the individual components of leakage, i.e., the subthreshold and junction leakages for each transistor.

the total power consumption at all times is minimized. Hence, an optimization problem may be formulated as follows:

where  $\gamma P_{\text{act}}$  and  $\delta P_{\text{lkg}}$  are the weighted active and leakage (subthreshold + junction leakage) powers respectively, while  $D_{\text{spec}}$  is the timing specification that must be met at all times.

## D. Models for Delay and Leakage Considering ABB/ASV

We now briefly review the models used for computing the delay and the power of a circuit in (4), considering the impact of FBB and ASV. Active power is modeled using the expression  $P_{\rm act} = \alpha f C V_{dd}^2$ , where the terms have their usual meanings. The delay of a gate, following the application of ABB/ASV can be modeled as:

$$D = D_0 + \frac{\partial D}{dv_{bn}} v_{bn} + \frac{\partial D}{dv_{bp}} v_{bp} + \frac{\partial D}{dV_{dd}} \Delta V_{dd}$$
(5)

where the sensitivity terms are negative, while the increase in leakage (either the subthreshold or the junction leakage) is modeled by an expression of the form:

$$L = L_0 e^{\left(\frac{\partial L}{dv_{bn}} v_{bn} + \frac{\partial L}{dv_{bp}} v_{bp} + \frac{\partial L}{dV_{dd}} \Delta V_{dd}\right)}$$
(6)

The derivatives in (6) are positive in sign. The models for delay are accurate to within 1% of the SPICE computed values, while the leakage numbers have an average accuracy of 95%, for FBB of up to 0.4V, and  $\Delta V_{dd}$  of up to 0.2V.

# E. $\Delta V_{th}$ Dependence on Supply Voltage

While our optimal solution to adaptively compensating the circuit against aging may result in a higher than nominal  $V_{dd}$  assignment, this may also enhance the impact of BTI. Previous works [17], [18] have shown that for thinner gate oxides, the amount of degradation due to NBTI increases with an increase in  $V_{dd}$ . This is caused by increased electric fields that result in higher rates of breakdown of the weak Si - H bonds according to the equations [4], [13]:

$$\Delta V_{th}(t) \propto N_{IT}(t)$$

$$N_{IT}(t) \propto \left(\frac{k_f N_0}{k_r}\right)^{\frac{2}{3}} (Dt)^{\frac{1}{6}}$$

$$k_f \propto B\sigma_0 p T_p$$

$$p = C_{ox}(V_{dd} - V_{th}) \propto E_{ox}$$

$$T_p \approx e^{\left(\frac{V_{dd} - V_{th}}{t_{ox} E_0}\right)}$$
(7)

As can be seen from the plots in [17], within the ranges of  $V_{dd}$  values used in ASV, this dependence is fairly linear. Hence, a linearized model is used to capture this second order dependence:

$$\Delta V_{th}(t_{\text{stress}}, V_{dd}) = \Delta V_{th_0}(t_{\text{stress}}, V_{dd_0}) + m\Delta V_{dd}$$
(8)

where  $\Delta V_{th_0}(t_{\text{stress}}, V_{dd_0})$  is the increase in  $V_{th}$  at nominal supply voltage  $V_{dd_0}$  after  $t_{\text{stress}}$  seconds of DC stress, and the slope m is determined based on [17].

# III. CONTROL SYSTEM FOR ADAPTIVE COMPENSATION

In this section, we investigate how an ABB/ASV based control system can be implemented to guard-band circuits against aging. A look-up table based approach that precomputes and stores the optimal ABB/ASV/frequency values, to compensate for droop and temperature variations is presented in [9], while [7], [8] use a critical path replica based approach to counter the effects of within-die and die-to-die variations. Further, the works in [19], [20] show how on-chip sensors designed for aging, can be implemented to obtain precise high sensing resolution. However, the critical path replica based approach has the following drawbacks, as enlisted below:

- 1) With increasing amounts of intra-die variations, critical path replica based test circuits require a large number of critical paths to provide an  $f_{\text{max}}$  distribution that is identical to the original circuit, leading to an area overhead.
- 2) An additional issue in the case of aging is that the delay of the critical path is dependent on the amount of temporal degradation, which depends on the signal probabilities along the nodes. The actual signal activity at every node in turn depends on the logic values applied at the primary inputs over the entire lifetime of the circuit, and it is impossible to determine this *a priori*. A critical path replica thus requires the inputs of this test circuit to be connected to the actual inputs of the original circuit, to mimic the exact signal activity, which may entail routing and signal integrity issues.
- 3) Further, the critical paths in a circuit can dynamically change based on the signal activity. Adding every potentially critical path from the original circuit into the critical path replica may cause the test circuit to become extremely large. This not only results in a large area overhead, but the test circuit may have variations of its own, which may be different from the actual implemented design.

Owing to these drawbacks, we propose the use of a look-up table based implementation to determine the actual  $v_{bn}$ ,  $v_{bp}$ , and  $V_{dd}$ values that must be applied to the circuit to compensate it for aging. The entries in the look-up table are indexed by the total time for which the circuit has been in operation. This time can be tracked by a software routine, with t = 0 representing the beginning of the lifetime of the circuit (say after burn-in<sup>†</sup>, testing, and binning). This software control enables the system to determine the total time for which the circuit has been operational.

The look-up table method requires the critical paths and the temporal delays of the circuit to be known beforehand, to determine the entries of the table. As stated earlier, it is impossible to determine a priori, the exact temporal degradation of a circuit. Hence, we propose to compute the worst-case degradation of the circuit, at different time stamps, i.e., determine D(t) in (4) based on the method in [4], and use this to find the amount of compensation that must be adaptively applied at those times, to meet the target delay. While [4] considers the impact of NBTI only, and derives a worst-case scenario for the delay degradation of the circuit, by assuming maximal DC stress on every PMOS transistor, the idea can be extended to include maximal impact of PBTI on the NMOS transistors, as well. The worst-case method is computationally efficient, is input-vector independent, requires a single timing analysis run, performed using a pair of degraded  $V_{th}$  values (for each time stamp) for all the NMOS and PMOS transistors, respectively, and serves as an upper bound on the actual signal activity dependent delay of the circuit. Further, the set of  $v_{bn}$ ,  $v_{bp}$ , and  $V_{dd}$  values in (4), determined using this number as a measure of D(t) in (4), is guaranteed to ensure that the circuit meets the delay specification  $D_{\text{spec}}$ , under all operating conditions.

# IV. OPTIMAL ABB/ASV COMPUTATION

In this section, we describe an enumeration based algorithm to determine the optimal ABB/ASV values at different time-stamps. The idea is explained in Fig. 4. Fig. 4(a) shows the temporally degraded delay of a nominally designed circuit, without ABB/ASV, denoted as  $D(t_i)$ , at different values  $t_i$ . Fig. 4(b) shows how ABB/ASV may be applied at every time stamp  $t_i$ , to ensure that the delay degradation during the interval  $[t_i, t_{i+1}]$  does not cause the circuit delay to exceed the specifications. The delay of the circuit immediately after applying ABB/ASV based on the look-up table values at  $t_i$ , is denoted by  $D_{\text{after}}(t_i)$ , and is always less than  $D_{\text{spec}}$ . Similarly,  $D_{\text{before}}(t_i)$  is the net delay of the circuit before applying ABB/ASV at  $t_i$ . Considering the impact of ABB/ASV at  $t = t_{i-1}$  and the temporal degradation due to BTI over  $[t_{i-1}, t_i]$ , we have

$$D_{\text{after}}(t_{i-1}) < D_{\text{before}}(t_{i-1}) \le D_{\text{spec}}$$
$$D_{\text{after}}(t_{i-1}) < D_{\text{before}}(t_i) \le D_{\text{spec}}$$
(9)

This implies that at every time point, the amount of compensation required is dependent on the delay degradation up to the next time stamp, and follows from the shape of the figure in Fig. 4(b). The pseudo-code for computing the optimal ABB/ASV values is shown in Algorithm 1.

The algorithm begins by determining the amount of ABB/ASV that must be applied at the design time, denoted by  $t_0 = 0$ , to compensate for aging until the first time stamp  $t_1$ . This can be computed by determining the amount of  $\Delta V_{th}$  until  $t_1$ , and performing an STA run, to determine  $D(t_1)$ , as shown in line 3 of the algorithm. The target delay after applying ABB/ASV is then computed, as shown in line 5, and expectedly,  $D_{after}(t_0) < D_{spec}$ , as can be seen from Fig. 4(b). An enumeration routine, based on the method described in [10] is run in line 7, to determine the optimal ABB/ASV that must be applied at time  $t_0$ . Since, the  $V_{dd}$  in the solution depends on the delay degradation over the time period  $[t_0, t_1]$ , which in turn depends on the  $V_{dd}$  applied to the circuit at  $t_0$  (based on (8)), an iterative approach is used. However, this dependence is fairly small, and ignoring the second order dependence of  $V_{th}$  degradation on  $V_{dd}$ does not significantly affect the results. Line 11 checks to ensure that the net delay of the circuit at time  $t_1$ , i.e.,  $D_{before}(t_1)$  is less than



Fig. 4. Impact of BTI and ABB/ASV in each time interval.

Algorithm 1 Enumeration( $t = t_0, t_1, \ldots, t_n, D_{\text{spec}}$ )

- 1: Determine nominal  $(t_0 = 0)$  delay and power (active and leakage), and ensure that  $D(t_0) \leq D_{\text{spec}}$ .
- 2: Compute  $\Delta V_{th}(t_1)$  due to BTI using nominal  $V_{dd}$ .
- 3: Determine the delay due to BTI at time  $t_1$ ,  $D(t_1)$ , through STA.
- 4: {Use this to determine the delay after ABB/ASV at time  $t_0$ .}
- 5:  $D_{\text{after}}(t_0) = D_{\text{spec}} \frac{D(t_0)}{D(t_1)} < D_{\text{spec}}.$ 6: {Determine ABB/ASV values to be applied at time  $t_0$  to meet  $D_{after}(t_0).$
- 7: Run enumeration routine to determine  $v_{bn}$ ,  $v_{bp}$ , and  $V_{dd}$ , s.t., the delay  $D_{after}(t_0)$  is met, and power is minimized.
- 8: Recompute  $\Delta V_{th}(t_1)$  if  $V_{dd}$  is different from what was used in line 2 to compute  $\Delta V_{th}$ .
- 9: Iteratively perform STA and recompute  $D(t_1)$  and repeat lines 3-8 if  $V_{dd}$  changes again.
- 10:  $\{D_{\text{before}}(t_1) = D_{\text{after}}(t_0) + \text{temporal degradation over } [t_0, t_1]\}$
- 11: Compute  $D_{\text{before}}(t_1)$  and check  $D_{\text{before}}(t_1) \leq D_{\text{spec}}$ .
- 12: Repeat the above procedure iteratively for all  $t_1, \ldots, t_n$ , to determine the optimal values at  $t_i$  based on the degradation over  $[t_i, t_{i+1}]$ , as shown in (9) and Fig. 4.
- 13: Check  $D_{\text{before}}(t = t_n = t_{\text{life}}) \leq D_{\text{spec}}$ .
- 14: Return  $v_{bn}$ ,  $v_{bp}$ , and  $V_{dd}$  values at all times  $t_1, \ldots, t_{n-1}$ .

 $D_{\text{spec}}$ , as required in Fig. 4(b). The method is repeated for successive values of  $t_i$ , and the look-up table entries are computed.

## V. RESULTS

In this section, we present results on ISCAS85, LGSYNTH93, and ITC99 benchmark circuits, synthesized using SiS on a 45nm [15] based library. A step-size of 50mV is used for the body bias voltages, while a step size of 30mV is used for the supply voltage. The circuit is compensated at different time-stamps, as shown in the first column of Table I, up to its  $t_{life}$  of  $10^8$  s. These numbers are chosen such that the increase in delay over every time interval is fairly constant. However, an initial value of 10<sup>4</sup>s (4<sup>th</sup> row, 1<sup>st</sup> column of the table) is chosen assuming that this is the minimal resolution that can be generated using the software set-up that tracks the total cumulative time of stress applied on the circuit, as described in Section III. The number of time stamps chosen (size of the look-up table) and their specific values depend on the ability of the software to track these times, and the resolution in the body-bias and supply voltages. This does not impact the final ABB/ASV values after three years, although it affects the average power consumed by the circuit. An extreme case involves applying the maximum body bias and supply voltage

<sup>&</sup>lt;sup>†</sup>It is assumed that the degradation in delays due to accelerated stresses at high temperature during burn-in are accounted for in determining D<sub>spec</sub>, by adding an additional timing guard-band.

at t = 0s to compensate for aging over the entire lifetime of the circuit. While this case is similar to synthesis in terms of the temporal performance of the circuit, an *adaptive* approach with discrete time steps takes into consideration the exact amount of delay degradation at each time steps, and applies the requisite amount of compensation, as is necessary.

## TABLE I

LOOK-UP TABLE ENTRIES FOR LGSYNTH93 BENCHMARK "DES"

| Time            | $v_{bn}$ | $v_{bp}$ | $V_{dd}$ | Delay | Pact | P <sub>lkg</sub>  | % Incr- |
|-----------------|----------|----------|----------|-------|------|-------------------|---------|
| $\times 10^8 s$ | (mV)     | (mV)     | (V)      | (ps)  | (µW) | $(\mu \tilde{W})$ | ease    |
| Nominal         | 0        | 0        | 1.00     | 355   | 641  | 327               |         |
| 0.0000          | 0        | 50       | 1.03     | 341   | 680  | 416               | 16%     |
| 0.0001          | 0        | 50       | 1.03     | 341   | 680  | 346               | 6%      |
| 0.0004          | 0        | 100      | 1.03     | 351   | 680  | 362               | 8%      |
| 0.0016          | 50       | 100      | 1.03     | 351   | 680  | 369               | 9%      |
| 0.0035          | 0        | 50       | 1.06     | 352   | 721  | 344               | 9%      |
| 0.0080          | 50       | 50       | 1.06     | 351   | 721  | 357               | 11%     |
| 0.0180          | 50       | 100      | 1.06     | 351   | 721  | 368               | 12%     |
| 0.0400          | 100      | 100      | 1.06     | 352   | 721  | 377               | 14%     |
| 0.0600          | 0        | 100      | 1.09     | 351   | 762  | 353               | 13%     |
| 0.1100          | 50       | 100      | 1.09     | 351   | 762  | 360               | 14%     |
| 0.1700          | 100      | 200      | 1.06     | 352   | 720  | 398               | 17%     |
| 0.2500          | 50       | 150      | 1.09     | 352   | 762  | 362               | 15%     |
| 0.3600          | 50       | 200      | 1.09     | 351   | 762  | 388               | 19%     |
| 0.5500          | 100      | 200      | 1.09     | 351   | 762  | 396               | 20%     |
| 0.7500          | 50       | 150      | 1.12     | 352   | 804  | 359               | 17%     |
| 1.0000          |          |          |          | 355   | 804  | 350               | 16%     |

Table I shows the delays after applying ABB/ASV, the active and leakage powers, the power overhead, and the  $v_{bn}$ ,  $v_{bp}$ , and  $V_{dd}$  values at different time stamps, for the benchmark circuit "des", whose delay and leakage variations with BTI (without ABB/ASV) were shown in Fig. 1. The first four columns of the table, (shown in bold with gray background), denote the actual entries that would be encoded into the look-up table. The column "Delay" denotes the delay of the circuit ( $D_{after}$  in Algorithm 1), at the given time stamp immediately after applying ABB/ASV values from the table. The nominal delay at t = 0s is chosen as  $D_{spec}$ . The results indicate that the target delay is met at all time points, up to  $t_{life}=10^8$  s.

As explained in Fig. 4(b), the circuit is compensated for aging over the time period  $[0,t_1]$ , by applying ABB/ASV at time t = 0s. Hence, the delay of the circuit at t = 0s, after applying ABB/ASV, under <sup> $\mu$ </sup> the "Delay" column is less than  $D_{\text{spec}}$ . The leakage power decreases temporally due to increase in  $V_{th}$  caused by BTI, but increases with ABB/ASV, and hence exceeds the nominal leakage. The percentage increase in power, computed as the ratio of sum of the active and leakage powers with respect to their nominal values is tabulated in the last column of the table. Our approach leads to an average<sup>‡</sup> increase of 17% in the total power consumption, over three years of operation.

# A. Comparison of Transient Power and Delay Numbers

The temporal delay of "des" is shown in Fig. 5, where the delays at different time stamps are plotted. The transient performance of the circuit is compared with the case where the worst-case synthesis based approach from [6]. Worst case BTI-based library gate delays were used during technology mapping to synthesize the circuit, to meet the same  $D_{\text{spec}}$ . The figure indicates that the approach described in this paper reduces the range of the circuit delays.

The active and leakage powers using our approach are compared with worst-case synthesis for "des" in Fig. 6, where the active and leakage powers of the nominally designed circuit at t = 0s, are marked as "Nominal". Since the synthesis approach uses gates of

<sup>‡</sup>computed as a time-weighted average using  $\sum_{i=0}^{n-1} \left[ (t_{i+1} - t_i) \times 0.5 \left( \frac{\Delta P_{act_i}}{P_{act_0}} + \frac{\Delta P_{lkg_i}}{P_{lkg_0}} \right) \right], \text{ where } t_{i=0}$ 

 $\frac{\sum_{i=0}^{n-1} (t_{i+1}-t_i)}{\sum_{i=0}^{n-1} (t_{i+1}-t_i)}$ , where  $P_{lkg_0}$  and  $P_{act_0}$ , are the nominal leakage and active powers, respectively, while  $P_{lkg_i}$  and  $P_{act_i}$ , at time  $t_i$ , with  $t_0=0$ , and  $t_n=10^8$ .



Fig. 5. Temporal delay of benchmark "des" using our approach, and worstcase synthesis.

higher drive strengths (as compared with the nominal design) to guard-band the circuit, this leads to an increase in the active power. Further, the active power consumption remains constant over the entire lifetime of the circuit. However, the supply voltage in our approach increases gradually with time, as shown in Table I. Hence, the active power increases gradually, as shown in Fig. 6(a). The results indicate that the maximum active power dissipated using our approach is less than that by the worst-case synthesis based design. Active power consumed as a function of time using our method depends on the optimal  $V_{dd}$  value in the lookup table, as computed by Algorithm 1, and hence may be nonmonotonic in nature, as seen from Fig. 6(a). Table I shows that at  $t = 0.17 \times 10^8$ s, the optimal solution leads to a decrease in  $V_{dd}$  accompanied by a larger increase in  $(v_{bn}, v_{bp})$ , with respect to the solution at the previous time stamp, hence causing the active power to decrease temporally.



Fig. 6. Temporal active and leakage power of "des" with time using our approach, compared with worst-case synthesis.

Similarly, Fig. 6(b) compares the transient leakage of the two approaches, with respect to the nominal value. The leakage of the worst-case synthesis based circuit is highest at t = 0s (when there is no BTI), but monotonically decreases with time, due to BTI, and the final value is lower than the nominal leakage. Instead, our approach tries to adaptively recover performance by using the slack available in the leakage budget, as explained in Section II. Since complete recovery in performance requires ASV in addition to ABB, this causes the leakage to increase beyond the nominal value at all times. The leakage power at t = 0s is also greater than the nominal value since some amount of ABB/ASV is applied to the circuit to guardband against temporal degradation during  $[0,t_1]$ , as shown in Fig. 4. However, the maximum leakage power (at t = 0s) consumed at any time point using our approach is almost identical with that using the worst-case synthesis method, as seen from Fig. 6(b).

#### B. Overhead in Maximal Power Consumed

Table II compares the area savings and the maximal power overhead of our approach, with the worst-case synthesis method. For a given  $D_{\text{spec}}$ , the synthesis approach uses the worst-case BTI impacted delays of the gates during technology mapping, thereby resulting in

| TABLE II                           |   |
|------------------------------------|---|
| AREA AND POWER OVERHEAD COMPARISON | N |

|         | Nominal design |        |             |             |                             | Our approach |                              |      |                  | Worst-case synthesis |       |      |                              |      |                   |  |
|---------|----------------|--------|-------------|-------------|-----------------------------|--------------|------------------------------|------|------------------|----------------------|-------|------|------------------------------|------|-------------------|--|
| Bench-  | Dspec          | Area   | $P_{lkg_0}$ | $P_{act_0}$ | $\Delta D(t_{\text{life}})$ | Max          | $\Delta P_{lkg}$             | Max  | $\Delta P_{act}$ | Area                 | Over- | Max  | $\Delta P_{lkg}$             | Max  | $\Delta P_{act}$  |  |
| mark    | 2 spec         | 7 fieu | (t = 0)     | (t = 0)     | Dspec                       | Plkg         | P <sub>lkg<sub>0</sub></sub> | Pact | $P_{act_0}$      | 7 fieu               | head  | Plkg | P <sub>lkg<sub>0</sub></sub> | Pact | Pact <sub>0</sub> |  |
|         | (ps)           | (µm)   | $(\mu W)$   | $(\mu W)$   | %                           | (µW)         | %                            | (µW) | %                | (µm)                 | %     | (µW) | %                            | (µW) | %                 |  |
| C2670   | 510            | 13111  | 52          | 94          | 15%                         | 67           | 29%                          | 118  | 26%              | 17298                | 32%   | 60   | 15%                          | 112  | 19%               |  |
| C3540   | 769            | 18692  | 74          | 136         | 14%                         | 96           | 30%                          | 170  | 25%              | 24765                | 32%   | 102  | 38%                          | 186  | 37%               |  |
| C5315   | 729            | 29951  | 114         | 208         | 15%                         | 147          | 29%                          | 247  | 19%              | 34137                | 14%   | 143  | 25%                          | 246  | 18%               |  |
| C7552   | 616            | 42261  | 190         | 337         | 15%                         | 246          | 29%                          | 400  | 19%              | 49824                | 18%   | 219  | 15%                          | 400  | 19%               |  |
| des     | 355            | 81777  | 327         | 641         | 15%                         | 416          | 27%                          | 804  | 25%              | 110807               | 35%   | 418  | 28%                          | 885  | 38%               |  |
| i8      | 840            | 55128  | 157         | 305         | 17%                         | 198          | 26%                          | 382  | 25%              | 65141                | 18%   | 226  | 44%                          | 521  | 71%               |  |
| i10     | 830            | 41063  | 152         | 307         | 14%                         | 200          | 32%                          | 384  | 25%              | 49880                | 21%   | 195  | 28%                          | 386  | 26%               |  |
| t481    | 368            | 68458  | 201         | 572         | 14%                         | 233          | 16%                          | 718  | 26%              | 94717                | 38%   | 261  | 30%                          | 796  | 39%               |  |
| b14     | 1078           | 95626  | 426         | 775         | 14%                         | 537          | 26%                          | 920  | 19%              | 10945                | 16%   | 496  | 16%                          | 909  | 17%               |  |
| b15     | 902            | 179096 | 781         | 1384        | 13%                         | 987          | 26%                          | 1644 | 19%              | 206975               | 16%   | 900  | 15%                          | 1634 | 18%               |  |
| Average |                |        |             |             | 15%                         |              | 27%                          |      | 23%              |                      | 24%   |      | 26%                          |      | 30%               |  |

a circuit that is guaranteed to work over the entire lifetime. However, our approach applies ABB/ASV to a nominally designed circuit (circuit designed using the nominal delays, and hence guaranteed to meet the timing at t = 0s), to compensate for the increase in delay due to BTI. The column D<sub>spec</sub>, is the delay of the nominal circuit at t = 0s. The area of the nominal circuit, and its active (denoted as  $P_{act_0}$ ) and leakage (tabulated as  $P_{lkg_0}$ ) powers are shown in the table. The column  $\frac{\Delta D(t_{life})}{D_{spec}}$  denotes the percentage increase in delay due to maximal BTI after  $t_{life}$  (10<sup>8</sup>s) seconds of stress.

The **maximum** leakage and active powers dissipated at any time instant in  $[0, t_{life}]$ , along with the overheads  $(\frac{\Delta P_{lkg}}{P_{lkg_0}} \text{ and } \frac{\Delta P_{act}}{P_{act_0}},$  respectively,) over the nominal design, are shown in the table for both the worst-case synthesis method and our approach. Active power dissipation for our approach is maximum at  $t = t_{\text{life}}$ , while the leakage power consumed is maximum at t = 0s, as can be seen from Table I and Fig. 6. Hence, the average increase in power (weighted sum of active and leakage powers) at different times, as seen from the last column of Table I, remains relatively uniform, and is lower than either the maximum active or leakage power overhead, in Table I. Both the active and leakage powers are maximum for the worstcase synthesis approach, at t = 0s, as shown in Fig. 6. The last row indicates that the overhead in leakage using our approach is almost identical to worst-case synthesis, while that for active power is lower.

The area of the circuit designed using worst-case synthesis is also shown in the table. The column "Overhead" denotes the overhead in area in using synthesis, as compared with the nominal design. Table II indicates that the worst-case synthesis approach has an average area overhead of around 24%. However, the area overhead of our approach is restricted to the look-up tables, voltage generators for the additional supply voltages, and the body-bias voltages, and is therefore significantly smaller. The work in [7] has shown that this overhead is within 2-3% of the area of the nominal design. Thus, our work provides significant area savings as compared with the worstcase synthesis approach. Hence, adaptive guard-banding of circuits can be used as a viable alternative to BTI-aware sizing and synthesis techniques, to ensure a reliable performance over their lifetime.

# VI. CONCLUSION

BTI has become an important reliability concern in circuit design. Previous "design-time" solutions aimed at guaranteeing a reliable performance of the circuit lead to large amounts of area and associated power overheads. An adaptive approach that determines the transient degradation of the circuit, and compensates for it, through adaptive body biasing (ABB) and adaptive supply voltage (ASV) is proposed. The results indicate that the circuit can be efficiently guard-banded for three years with a minimal overhead in area, and a small increase in power, as compared with a circuit designed only to meet the nominal

specifications. Further, techniques such as [10] may be used to apply ABB/ASV to simultaneously counter the impact of aging, as well as process and temperature variations.

#### References

- [1] S. Zafar et al., "A Comparative Study of NBTI and PBTI Charge Trapping in SiO<sub>2</sub>-HfO<sub>2</sub> Stacks with FUSI, TiN, Re Gates," in Proc. Symposium on VLSI Technology, pp. 23-25, 2006.
- M. F. Li et al., "Dynamic Bias-Temperature Instability in Ultrathin SiO<sub>2</sub> and HfO<sub>2</sub> Metal-Oxide Semiconductor Field Effect Transistors and Its Impact on Device Lifetime," Japanese Journal of Applied Physics, vol. 43, pp. 7807-7814, November 2004.
- F. Crupi et al., "Positive Bias Temperature Instability in nMOSFETs with Ultra-Thin Hf-silicate Gate Dielectrics," Journal of Microelectronic Engineering, vol. 80, pp. 130-133, June 2005.
- [4] B. C. Paul et al., "Temporal Performance Degradation under NBTI: Estimation and Design for Improved Reliability of Nanoscale Circuits," in Proc. DATE, pp. 1–6, 2006. K. Kang et al., "NBTI Induced Performance Degradation in Logic
- [5] K. Kang et al.. and Memory Circuits: How Effectively can we Approach a Reliability Solution?" in *Proc. ASPDAC*, pp. 726–731, 2008. S. V. Kumar *et al.*, "NBTI Aware Synthesis of Digital Circuits," in *Proc.*
- [6] DAC, pp. 370–375, 2007.
  [7] J. W. Tschanz *et al.*, "Adaptive Body Bias for Reducing Impacts of
- Die-to-Die and Within-Die Parameter Variations on Microprocessor Frequency and Leakage," IEEE Journal of Solid-State Circuits, vol. 37, pp. 1396-1402, November 2002.
- S. Narendra et al., "Forward Body Bias for Microprocessors in 130-nm Technology Generation and Beyond," IEEE Journal of Solid-State *Circuits*, vol. 38, pp. 696–701, May 2003. J. W. Tschanz *et al.*, "Adaptive Frequency and Biasing Techniques for
- [9] Tolerance to Dynamic Temperature-Voltage Variations and Aging," in Proc. ISSCC, pp. 292-294, 2007.
- [10] S. V. Kumar et al., "Body Bias Voltage Computations for Process and Temperature Compensation," IEEE Transactions on VLSI, vol. 16, pp. 249–262, March 2008.
- [11] K. O. Jeppson and C. M. Svensson, "Negative Bias Stress of MOS Devices at High Electric Fields and Degradation of MNOS Devices,' Journal of Applied Physics, vol. 48, pp. 2004-2014, May 1977.
- [12] M. A. Alam. "A Critical Examination of the Mechanics of Dynamic NBTI for pMOSFETs," in Proc. IEDM, pp. 14.4.1-14.4.4, 2003.
- S. V. Kumar *et al.*, "An Analytical Model for Negative Bias Temperature Instability (NBTI)," in *Proc. ICCAD*, pp. 493–496, 2006. [13]
- S. Bhardwaj *et al.*, "Predictive Modeling of the NBTI Effect for Reliable Design," in *Proc. CICC*, pp. 189–192, 2006. [14]
- [15] Predictive Technology Model. http://www.eas.asu.edu/~ptm.
- [16] M. Terauchi, "Impact of Forward Substrate Bias on Threshold Voltage Fluctuation in Metal-Oxide-Semiconductor Field-Effect Transistors,' Japanese Journal of Applied Physics, vol. 46, pp. 4105-4107, July 2007.
- [17] R. Vattikonda *et al.*, "Modeling and Minimization of PMOS NBTI Effect for Robust Nanometer Design," in *Proc. DAC*, pp. 1047–1052, 2006.
  [18] K. Imai *et al.*, "Device Technology for Body Biasing Scheme," in *Proc.*
- ISCAS, pp. 13–16, 2005. J. Keane *et al.*, "An On-Chip NBTI Sensor for Measuring PMOS
- [19] Threshold Voltage Degradation," in Proc. ISLPED, pp. 189-194, 2007.
- [20] T.-H. Kim et al., "Silicon Odometer: An On-Chip Reliability Monitor for Measuring Frequency Degradation of Digital Circuits," IEEE Journal of Solid-State Circuits, vol. 43, pp. 874-880, April 2008.