# A High-Speed Variation-Tolerant Interconnect Technique for Sub-Threshold Circuits Using Capacitive Boosting

Jonggab Kil, Jie Gu, Member, IEEE, and Chris H. Kim, Member, IEEE

Abstract—This paper describes an interconnect technique for subthreshold circuits to improve global wire delay and reduce the delay variation due to process-voltage-temperature (PVT) fluctuations. By internally boosting the gate voltage of the driver transistors, operating region is shifted from subthreshold region to super-threshold region enhancing performance and improving tolerance to PVT variations. Simulations of a clock distribution network using the proposed driver shows a 66%–76% reduction in  $3\sigma$  clock skew value and 84%–88% reduction in clock tree delay compared to using conventional drivers. A 0.4-V test chip has been fabricated in a 0.18- $\mu$ m 6-metal CMOS process to demonstrate the effectiveness of the proposed scheme. Measurement results show 2.6× faster switching speed and 2.4× less delay sensitivity under temperature variations.

*Index Terms*—Capacitive boosting, clock distribution network, global wire delay, subthreshold circuits.

### I. INTRODUCTION

TITH aggressive CMOS scaling, on-chip global interconnects have become the bottleneck for high-speed circuit operation due to the increase in resistance–capacitance (RC) per length of minimum wires and near-constant die size. To mitigate the global interconnect delay problem, metal wires have been scaled in a selective fashion. The upper layer metals are remained thick and wide to reduce the wire resistance. As such, the wire pitch is not scaled as aggressively in these layers to maintain a low inter-wire capacitance. This ensures a low RC value for global signals and power networks. The lower layer metals on the other hand, are scaled at approximately the same rate as the devices for the local interconnects. Low-k inter-dielectric materials and copper wires have been deployed for a one time improvement in RC delay. The wire delay can be made proportional (instead of quadratic) to wire length using tapered wires and efficient buffer insertion techniques [1]-[3]. Despite the various process and circuit techniques for wire RC reduction, global wire delay will continue to become the performance limiter as the delay of logic and short interconnects continue to scale faster than that of global interconnects. Fig. 1 illustrates this trend where the proportion of global interconnect delay with

Digital Object Identifier 10.1109/TVLSI.2007.915455



Fig. 1. Scaling trend of logic delay and interconnect delay (source: International Technology Roadmap for Semiconductors [4]).

respect to total system delay rapidly increases with technology scaling for a constant die size.

## II. GLOBAL INTERCONNECT PROBLEM IN SUB-THRESHOLD CIRCUITS

Interconnect delay becomes even more problematic in VLSI systems operating at low supply voltages. The extreme case of this would be in subthreshold circuits where the supply voltage is even lower than the threshold voltage ( $V_t$ ). Subthreshold operation can achieve orders of magnitude lower power consumption compared to conventional super-threshold operation and can be used in applications such as medical devices, portable electronics, and sensor networks where performance is of secondary importance [5]–[9]. By simply scaling down the supply voltage, undesirable characteristics of scaled CMOS, such as

Manuscript received November 26, 2006; revised May 24, 2007.

J. Kil is with Intel Corporation, Folsom, CA 95630 USA (e-mail: jonggab. kil@intel.com).

J. Gu and C. H. Kim are with the Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN 55455-0154 USA (e-mail: jiegu@umn.edu; chriskim@umn.edu).



Fig. 2. Simulation results on logic delay and global interconnect delay at different supply voltages (0.18- $\mu$ m CMOS process). The logic delay is normalized to the global interconnect delay at each supply voltage.

drain induced barrier lowering (DIBL), quantum mechanical gate tunneling, and punch through can also be alleviated.

To motivate our work, we will next exemplify the global interconnect problem in the subthreshold circuits. Fig. 2 shows the logic delay normalized to the global interconnect delay as the supply voltage is reduced from 1 to 0.2 V. Evidently, the logic delay with respect to the global interconnect delay decreases at subthreshold voltages, making the interconnect delay dictate the overall system performance in subthreshold circuits. This trend is opposite to that observed in the super-threshold region where the *RC* delay component which is independent of supply voltage makes the wire delay less sensitive to the supply voltage. The reason why the increase of global wire delay is greater than that of logic delay in subthreshold, can be explained as follows. Assume a repeater with an equivalent output resistance of  $R_{\text{driver}}$  is driving a wire with a resistance of  $R_{\text{wire}}$  and a capacitance of  $C_{\text{wire}}$ . In the subthreshold region, wire resistance  $R_{\text{wire}}$ becomes negligible compared to the driver resistance  $R_{\rm driver}$ due to the small drive current. Hence, the  $R_{\rm wire}C_{\rm wire}$  delay becomes negligible compared to the  $R_{driver}C_{wire}$  which dominates the interconnect delay. Next, we compare the delay of a repeater driving a long wire  $(R_{\text{driver}}C_{\text{wire}})$  with that of a repeater driving another gate  $(R_{driver}C_{gate})$ . Both  $R_{driver}C_{wire}$ and  $R_{\rm driver} \tilde{C}_{\rm gate}$  contain the same " $\tilde{R}_{\rm driver}$ " term which will make both delays increase exponentially in the subthreshold region. However,  $C_{\text{wire}}$  remains constant as the supply voltage is reduced down to the subthreshold region while  $C_{\rm gate}$  decreases significantly; the channel depletion capacitance appears in series with the oxide capacitance resulting in a reduced MOS gate capacitance  $C_{\text{gate}}$  [10]. Unlike MOS gate capacitance  $C_{\text{gate}}$ , the wire capacitance  $C_{\text{wire}}$  is independent of the supply voltage. As a result, the  $R_{\text{driver}}C_{\text{wire}}$  delay of a wire increases more steeply than the  $R_{\rm driver}C_{\rm gate}$  delay of a logic gate in subthreshold circuits, exacerbating the interconnect delay problem of global wires.

With the global wire delay dictating the overall system performance, its variation also has a larger impact on system performance in subthreshold circuits. Due to the exponential relationship between weak-inversion current and the PVT parameters, performance and power dissipation of subthreshold circuits are exceedingly sensitive to PVT fluctuations. Note that random dopant fluctuation (RDF) becomes the main source of parameter fluctuation in subthreshold since DIBL is much reduced [11]. To build efficient subthreshold circuits with higher operating frequencies, it is crucial to minimize the global interconnect delay and its variation.

In this paper, we propose a high-speed variation-tolerant interconnect technique for subthreshold circuits based on capacitive boosting. The central idea is to shift the operating region of the final interconnect drivers from the subthreshold to superthreshold region via bootstrapping techniques. Boosting techniques such as the one proposed in this paper are extremely effective for improving interconnect performance in subthreshold circuits since current is an exponential function of gate voltage. For example, a 100-mV boost in gate voltage can offer a 10× increase in drive current. Advanced interconnect techniques such as pulsed signaling, dynamic drivers, high-speed links, current mode circuits, and differential signaling have been previously studied by researchers [12]–[15]. However, none of these techniques can provide significant benefit in the subthreshold region due to the weak drive current and small  $I_{\rm on}$ -to- $I_{\rm off}$  ratio. Previous bootstrapping techniques have only been applied to improve the speed of super-threshold circuits and have limited benefits when applied to subthreshold circuits. Based on our investigation, a bootstrapped CMOS driver proposed by Lou et al. for highly capacitive loads [16] has worse performance than conventional repeaters in the subthreshold region because of the large charge sharing in the gates of driver. Two diode connected transistors were used in [17] for a bootstrapped inverter but this simple structure has an intolerably long setup time and is sensitive to noise. The boosting circuit in [18] requires a long startup time in the subthreshold region and also suffers from the performance loss due to the reverse leakage current from the boosted nodes.

In this paper, we will introduce circuit techniques to achieve high boosting efficiency, minimum static power consumption, and a half cycle startup time. The proposed scheme was applied to a clock distribution network to verify the effectiveness in reducing the clock skew and clock tree delay. A 0.4-V, 0.18- $\mu$ m test chip was successfully fabricated and tested. Measurement results show 2.6× higher switching speed and 2.4× reduced delay.

The remainder of this paper is organized as follows. Section III describes the proposed interconnect technique using capacitive boosting. A clock distribution network using the proposed driver is presented in Section IV. Section V shows the test chip implementation and measurement data that demonstrates the effectiveness of the proposed technique. Finally, conclusions are drawn in Section VI.

#### III. PROPOSED SUBTHRESHOLD INTERCONNECT TECHNIQUE

#### A. Conceptual Idea

Fig. 3 shows the principle of the proposed subthreshold interconnect technique which uses capacitive boosting. Note that the supply voltage is lower that the  $V_t$ . The input signal is boosted to  $2V_{\text{DD}}$  for the nMOS driver and to  $-V_{\text{DD}}$  for the pMOS driver using internal gate capacitors. Owing to the exponential



Fig. 3. Concept of proposed subthreshold interconnect driver using boosting technique (top). Effectiveness of boosting technique in subthreshold region (bottom).

behavior of current in the subthreshold region,  $100 \times$  higher operating current can be expected with the  $V_{\rm DD}$  boost in driver voltage as shown in Fig. 3. Note that for the same amount of voltage boost, the increase in drive current will be significantly less in super-threshold circuits. Hence voltage boosting offers greater speed benefits in the subthreshold region where current is an exponential function of the gate-to-source voltage. The improvement in drive current comes at the expense of leakage current since the gate input to N5 (or P5) must be preset to  $V_{DD}$ (or 0 V) before the boosting occurs. Despite the static power overhead which may look unreasonably high ( $\sim$ 4 orders of magnitude increase), test chip measurements in Section V show that the proposed interconnect driver can offer lower power consumption than the conventional buffers in applications with high activity factors. The improvement in power consumption using the proposed technique is due to: 1) the lower operating voltage and 2) the reduction in short circuit current of the load gate owing to the improved signal edge rate (see explanation for Fig. 7). The static power overhead can be further reduced by using minimum sized transistors P5 and N5 in Fig. 3 since the operating current is already increased significantly via the boosting technique.

#### B. Circuit Design

Circuit implementation and operating waveforms of the proposed interconnect driver operating at 0.4 V are shown in Fig. 4.  $V_{\text{BOOST}-N}$  and  $V_{\text{BOOST}-P}$  are boosted by capacitors  $C_1$  and  $C_2$  to increase the operating currents of N5 and P5 which drive the long *RC* wire. Circuit operation for boosting the gate voltage of N5 is as follows (boosting operation of P5 is similar to that of N5). In order for  $V_{\text{IN}-\text{BAR}}$  to offset the voltage level of  $V_{\text{BOOST}-N}$  using  $C_1$ ,  $V_{\text{BOOST}-N}$  is preset to 0.4 V before  $V_{\text{IN}-\text{BAR}}$  makes the low-to-high transition. This



Fig. 4. Circuit implementation and operation waveforms of the proposed interconnect driver for  $V_{\rm DD}$  = 0.4 V.

is realized by P4 which connects  $V_{\text{BOOST}-N}$  to 0.4 V while  $V_{\text{IN}-\text{BAR}} = 0$  V. After the low-to-high transition of  $V_{\text{IN}-\text{BAR}}$ , P4 is cut off so that the boosted voltage  $V_{\text{BOOST}_N}$ , stays at 0.7 V while N5 is driving the *RC* interconnect. Due to the parasitic capacitance on the node  $V_{\text{BOOST}_N}$ , the boosting voltage does not reach the ideal 0.8 V value. The preset signal generator (PSG) circuit generates a  $V_{\text{PRESET}-N}$  of -0.25 V using C4 during the preset of  $V_{\text{BOOST}_N}$ . This enables a fast preset by overdriving P4, minimizing the startup time despite the low drive current. On the other hand, a  $V_{\text{PRESET}_N}$  of 0.7 V



Fig. 5. Simulation waveforms of the proposed interconnect driver.



Fig. 6. Delay and power consumption comparisons between the conventional and proposed drivers. The static power component includes the leakage current and the static current of the boosting circuit.

is generated by the PSG circuit during the boosting operation by connecting  $V_{\text{BOOST}_N}$  to  $V_{\text{PRESET}_N}$  via P3. This eliminates the reverse current through P4 which can adversely discharge the boosted voltage.

Fig. 5 shows the simulated waveforms of proposed and conventional scheme driving a 1-pF load. The conventional scheme uses two progressively sized inverters to drive the load. The switching speed of the proposed driver is  $2.6 \times$  faster than that of the conventional driver at 0.4 V. Fig. 6 compares the delay of the conventional and proposed driver for various capacitive loads.



Fig. 7. Delay and energy-per-switching of the conventional and proposed drivers at different supply voltages.

For fair comparison, the input capacitance of the two drivers where set to be the same. The switching speed improved by at least  $2.5 \times$  across a wide range of output loads. This indicates that the delay overhead associated with the boosting operation is small and that most of the delay comes from driving the long RC interconnect. Also, shown in Fig. 6, is the comparison of total power and static power consumption at 0.4 V. Here, the static power component includes the leakage current and the static current of the boosting circuit. The conventional circuit was not able to function properly above 4 MHz. At 1 MHz, the proposed circuit consumes 44% more power due to the static current which takes up 68% of the total power consumption. However, at 4 MHz, the proposed driver has 5% lower power consumption than the conventional driver. This is due to the fact that the proposed driver has a reduced short circuit power in the load inverter owing to the better edge rate for the same operating frequency. Note that the edge rate of the proposed driver is sharper than that of the conventional driver because part of the proposed driver delay comes from the internal control stages for the boosting operation. Hence, the proposed scheme consumes less power than the conventional repeater at higher clock frequencies owing to the greater savings in short circuit current compared to the static current penalty. The delay and energy-per-switching of the conventional and proposed drivers are compared in Fig. 7 for a supply voltage range from 0.2 to 0.5 V. Simulation results tell us that the technique is effective



Fig. 8. Dynamic and static power dissipation of the conventional and proposed drivers for different temperatures. Results are shown for activity factors of 1 and 0.1. The drivers were clocked at their respective maximum operating frequencies for each supply voltage.



Fig. 9. Standby mode operation of the proposed circuit.

for speed improvement down to 0.2 V. Proposed circuit shows higher improvement in energy-per-switching at lower supplies since the large short circuit current component in this regime is significantly reduced due to the better edge rate of the proposed circuit. The impact of temperature and activity factor on the dynamic and static energy-per-switching is displayed in Fig. 8. Here, the circuits driving a 1-pF wire were simulated at their maximum operating frequencies at each supply voltage. The static energy-per-switching of the conventional driver is reduced at higher supply voltages since the switching cycle is reduced. On the other hand, the static energy component increases rapidly in the proposed circuit at higher temperatures and higher supply voltages which negatively impacts its energy efficiency. At low supply voltages however (e.g., sub-0.4 V), the static energy-per-switching is manageable even at relatively high temperatures (e.g., 50 °C) and low activity factors (e.g., 0.1) which makes our proposed circuit still effective. It is important to note that the temperature of subthreshold systems is significantly lower than the worst case temperature we see in current-day microprocessors.

During standby mode where there is no switching activity in the input of the proposed driver, the internal boosted voltages cannot be sustained because of the parasitic leakage currents. This makes any type of boosting circuit require a startup time penalty. Fig. 9 indicates the discharge or charge leakage path of



Fig. 10. Definition of boosting efficiency.

the boosted voltages during standby mode through devices P4 or N4. Fortunately, leakage current in the subthreshold region is significantly reduced due to the reverse-DIBL effect extending the retention time of the boosted voltages. Simulation results show that it takes more than 200 inactive cycles for the boosted nodes to discharge for  $V_{\rm DD} = 0.4$  V, 4 MHz, and 20 °C. For long inactive periods, our circuit requires a half cycle startup time to set up the internal voltages prior to the boosting operation. Keepers are used to restore the output voltage levels during the long inactive periods.

One additional issue with the proposed driver is the low  $I_{\rm on}$ -to- $I_{\rm off}$  ratio as the boosted voltage makes the transistor operate in the strong inversion region. For a supply voltage of 0.4 V, simulations show that the worst case  $I_{\rm on}$ -to- $I_{\rm off}$  ratio of the driver stage is  $34 \times$  under 50 mV  $V_t$  variation. At lower supply voltages, this ratio is improved despite the smaller voltage swing as the transistor with the boosted gate voltage falls back into the subthreshold region (e.g.,  $52 \times$  at 0.3 V). In this case, however, the delay variability of the driver increases as the boosted voltage is not high enough to keep the driver transistor in strong inversion mode. This tradeoff between  $I_{\rm on}$ -to- $I_{\rm off}$  ratio and driver variability should be analyzed during the circuit design phase.

### C. Boosting Efficiency in Subthreshold Region

The boosting efficiency is defined in Fig. 10 as the ratio between the boosting capacitance  $(C_{\text{boost}})$  and the total capacitance  $(C_{\text{boost}} + C_{\text{node}})$ , which consists of the boosting capacitance and the node capacitance. The boosting capacitance is implemented using a MOS capacitor and the node capacitance consists of the gate capacitances of P5, N5, P4, and N4, as well as the junction capacitances of all the other devices attached to the boosted node. To obtain a high drive current via efficient boosting, the boosting capacitance implemented using a MOS capacitor must be significantly larger than the node capacitance. Unfortunately, the boosting MOS capacitance reduces in the subthreshold region since the depletion capacitance appears in series with the oxide capacitance in a weak-inversion device. Note that in the super-threshold region, the inversion layer has a shielding effect which makes the gate capacitance equivalent to the oxide capacitance. Fig. 11 (left) shows the reduction in boost capacitance in the subthreshold region. There is also a minor reduction in node capacitance since part of it consists of MOS gate capacitance of the drivers (P5, N5 P4, N4). To achieve a high boosting efficiency in our design, 40% of the total driver area is dedicated to boosting capacitors. N5 and P5 which can make up 50% of the total node capacitance are also minimized. Fig. 11



Fig. 11. Boosting capacitance, node capacitance, and boosting efficiency at different voltages across MOS capacitors.



Fig. 12. Rise and fall delay with respect to voltage and temperature variation. Delay variation is defined as the maximum delay to minimum delay ratio.

(right) verifies the boosting efficiency of the proposed circuit at different voltages. The boosting efficiency is maximized at 0.6 V and reduces to 59% at 0.3 V mainly due to the reduction in boosting capacitance in the weak-inversion region. Gate capacitances (P5, N5 P4, N4) consisting the node capacitance also reduces at lower supply voltages. However, the increase in junction capacitances offsets this change making the node capacitance relatively constant with respect to supply voltage. At 0.4-V operation, boosting efficiency of 70% was achieved which is sufficient for a significant drive current boost.

## D. Sensitivity to PVT Variation

In addition to the speed benefit, the proposed interconnect technique reduces the impact of PVT variation on global interconnect performance. Fig. 12 shows the rise delay and fall delay under supply  $(0.4 \text{ V} \pm 5\%)$  and temperature  $(20 \sim 80 \,^{\circ}\text{C})$  variations. The conventional driver in the subthreshold region is highly sensitive to voltage and temperature variation since



Fig. 13. Delay variation (i.e., maximum delay to minimum delay ratio) under temperature and supply voltage variation.

the drive current is an exponential function of the PVT parameters. This results in  $5.4 \times (4.8 \times)$  variation in rise (fall) delay. The switching speed of the proposed interconnect varies significantly less even for the worst case corner conditions because the driver transistors are no longer in the subthreshold region. Delay variation (i.e., maximum delay to minimum delay ratio) is reduced from  $5.4 \times$  to  $2.6 \times$  for the rise delay, and from  $4.8 \times$ to  $2.8 \times$  for the fall delay. The impact of temperature and supply voltage variation on interconnect delay is shown separately in Fig. 13. It verifies the reduced impact of environmental variation on circuit performance for the proposed technique. Similarly, the proposed circuit has less delay variation with respect to  $V_t$  fluctuation which primarily comes from RDF in the subthreshold region [11].

## IV. PROPOSED INTERCONNECT TECHNIQUE

The proposed interconnect technique is applied to a realistic clock distribution network [19] to validate the effectiveness in reducing the clock skew caused by PVT variations. Clock buffers are ideal applications for the proposed interconnect technique as the dynamic power dominates the total clock power due to the high activity factor. Fig. 14 shows the simplified H-tree clock network topology where the clock signal paths are symmetrically routed across the chip for identical delays from clock source to the final load. The hierarchical topology combines four clock buffer stages with each buffer driving four clock buffers as well as the long interconnects. Clock skew is the signal arrival time difference between two different locations in a die. An ideal H-tree should be perfectly matched and will have zero clock skew without any within-die



Fig. 14. Subthreshold clock network and clock skew distribution comparison.

PVT variations. The worst case clock skew occurs when two clock paths are experiencing the extreme opposite PVT conditions. However, this would lead to unrealistically pessimistic estimates on clock skew [20], [21]. Hence, in this paper, we follow the clock skew distribution analysis based on Monte Carlo simulations assuming that the local supply voltage and  $V_t$ 's are Gaussian random variables [19]. The supply voltage was varied for each single gate and the  $V_t$  was varied for each single transistor. To simplify the analysis, we did not consider the systematic variation component.

By applying the proposed driver, we can achieve  $8.3 \times$  reduction in  $3\sigma$  clock skew value as shown in Fig. 14. The  $3\sigma$ value of supply voltage  $(V_{DD})$  and  $V_t$  for the simulations were assumed to be 5% of their nominal values. Although a 5% variation in  $V_{DD}$  and  $V_t$  may seem too optimistic, these assumptions are acceptable for subthreshold circuits due to the following two reasons. First, the IR and Ldi/dt supply noise is much reduced. Second, the  $V_t$  variation component due to DIBL is reduced in the subthreshold region leaving just the RDF component [11]. The RDF component is further reduced in the clock buffers as large transistors are used to drive the interconnect load. The average clock tree delay reduced from 251 to 51 ns and the standard deviation reduced from 30.7 to 5.1 ns. Fig. 15 summarizes the properties of the proposed and conventional subthreshold clock networks at various temperatures. Energy-per-switching decreases by 9% for the proposed driver at 80 °C due to the reduced short-circuit current and improved performance. The clock tree delay and clock skew was reduced by more than 76% and 88%, respectively. The delay improvement was greater than that shown in Fig. 6 by improving the boosting efficiency via optimal device sizing.

#### V. TEST CHIP MEASUREMENTS

A test chip was fabricated in a 0.18- $\mu$ m 6-metal CMOS process to demonstrate the effectiveness of the proposed subthreshold interconnect scheme.  $V_t$ 's of the nMOS and pMOS were 0.51 and -0.51 V, respectively. The die photo of the test chip and the layout comparisons are shown in Fig. 16. Although the device count for the proposed driver has gone up to 18 (device count for conventional repeater is 4) the increase in layout area was only 32% since the transistors can



Fig. 15. Clock power dissipation, clock tree delay, and clock skew comparison between conventional and proposed driver at 0.4 V.



Fig. 16. Chip microphotograph and driver layout comparison. The test chip was fabricated in a 0.18- $\mu$ m 6-metal CMOS process.

be minimized thanks to the higher operating current using the proposed boosting technique. The four MOS capacitors for the boosting operation occupy 40% of the total driver layout area.



Fig. 17. Organization of the test chip for differential delay measurements. Level converters are designed for the interface between subthreshold and super-threshold circuits.



Fig. 18. Measured waveforms (input trigger signal and output waveforms from the three interconnect paths) from the test chip.

Eight stages of conventional drivers and proposed drivers were implemented with each stage driving a 10-mm-long on-chip wire (see Fig. 17). To improve the accuracy of the delay measurement, a differential method was used to obtain the delay difference between the two paths and a bypass path. The peripheral and input/output (I/O) circuit delay is cancelled out by subtracting the delay of the bypass path  $(t_{bypass})$  from the delay of the other two paths  $(t_{conv} + t_{bypass}, t_{proposed} + t_{bypass})$ . The core, peripheral, and I/O circuits operate at 0.4, 1.0, and 1.8 V, respectively, (i.e.,  $V_{DD}$ \_CORE = 0.4 V,  $V_{DD}$ \_PER = 1.0 V). Level converters were employed for the interface between subthreshold and super-threshold circuits. The subthreshold level-down converter contains a pull-up nMOS N6 to speed up the low-to-high transition in the internal node, A1. A dual-rail level-up converter was designed with an extra pMOS switch P7 (or P8) to reduce the contention current between P9 (or P10) and N7 (or N8).

Fig. 18 shows the measured waveforms from the three different paths together with the input trigger signal. The core signal operating at 0.4 V is level up-converted to a 1.8-V I/O



Fig. 19. Rise and fall delay measurement data for different core voltages indicating  $2.6-2.9 \times$  improvement in interconnect speed.

signal. Delay of the proposed driver was 0.18  $\mu$ s, which is 2.6× shorter than that of a conventional driver. Delay of the bypass path was 0.13  $\mu$ s, which is comparable to the core interconnect delay. The delay improvement is less than the ideal amount of boost in operating current shown in Fig. 3 due to the following reasons: 1) minimum size driver transistors for reducing leakage power; 2) extra number of logic stages required in the boosting circuit; and 3) reduced boosting effect because of the node capacitance. Fig. 19 shows measured delay improvements of the proposed interconnect technique at different core voltages. The delay improvement is 1.7–1.8× at 0.5 V and 2.6–2.9× at 0.4 V. This confirms that the proposed boosting technique becomes more efficient at low supply voltages due to the exponential current behavior in the subthreshold regime.

Fig. 20 shows the measured delay variation and power dissipation of the proposed and conventional interconnect drivers. The delay variation (i.e., maximum delay to minimum delay ratio) of the conventional and proposed driver were  $2.6 \times$  and  $1.7 \times$ , respectively, for a temperature range of 20 °C-80 °C. Delay of the proposed driver is less sensitive to PVT variations since the driver transistors are no longer operating in the subthreshold region due to the boosted gate voltages. Measured energy-per-switching is shown in Fig. 20 (bottom) for the conventional and proposed drivers. Due to the leakage power during the preset of the gate voltages, energy-per-switching of the proposed driver increases by 12% when operated at 0.4 V. The leakage current of the proposed boosting technique reduces exponentially in the deep subthreshold region as shown in Fig. 20. Power dissipation comparison between conventional and proposed buffer is shown in Fig. 21 for different operating frequencies. Measurement results show good agreement with the simulation data. The proposed driver consumes 41% less power (or



Fig. 20. Measurement data from test chip. Delay variation with respect to temperature (top). Energy-per-switching comparison between conventional and proposed interconnect driver (bottom).



Fig. 21. Power consumption versus frequency. (Measurement data is shown in dots and simulation data is shown in lines).

49%+ higher performance) compared to the conventional driver for 4 MHz (or 5.1  $\mu$ W) operation since the boosting technique allows the proposed driver to run at a lower supply voltage for the same frequency of operation.

## VI. CONCLUSION

Digital subthreshold logics are becoming increasingly popular for ultra-low power applications where performance is of secondary importance. Global interconnect drivers used for on-chip buses and clock distribution networks significantly suffer from performance degradation. This is because unlike gate capacitance, wire capacitance does not scale as the supply voltage is lowered. Another issue with subthreshold interconnects is the large variability in performance under PVT variations; drive current in the subthreshold region is an exponential function of  $V_t$ , supply voltage, and temperature. In this paper, we proposed a capacitive boosting technique that can mitigate the performance and variability issues in subthreshold interconnects. Owing to the exponential relationship between current and gate-to-source voltage in the subthreshold region, 100-mV boost in gate voltage can offer 10× improvement in drive current. This makes boosting techniques extremely effective for global interconnect drivers. Monte Carlo simulations for a realistic clock distribution network using the proposed drivers show 66%-76% reduction in worst case clock skew and 84%-88% reduction in clock tree delay at 0.4 V. A test chip was fabricated in a 0.18- $\mu$ m 6-metal CMOS process to demonstrate the proposed ideas. Measurement results show 41% less power consumption for same performance (or 49%+ higher performance for same power consumption) with  $2.4 \times$ reduced delay sensitivity under temperature variations.

#### REFERENCES

- K. Yamashita and S. Odanaka, "Interconnect scaling scenario using a chip level interconnect model," *IEEE Trans. Electron Devices*, vol. 47, no. 1, pp. 90–96, Jan. 2000.
- [2] D. Sylvester, C. Hu, O. S. Nakagawa, and S.-Y. Oh, "Interconnect scaling: Signal integrity and performance in future high-speed CMOS designs," in *Symp. VLSI Technol. Dig. Tech. Papers*, Jun. 1998, pp. 42–43.
- [3] S. Dhar and M. A. Franklin, "Optimum buffer circuits for driving long uniform lines," *IEEE J. Solid-State Circuits*, vol. 26, no. 1, pp. 32–40, Jan. 1991.
- [4] ITRS, "International Technology Roadmap for Semiconductors," [Online]. Available: http://www.public.itrs.net/
- [5] C. H. Kim and K. Roy, "Ultra-low power DLMS adaptive filter for hearing aid applications," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 11, no. 6, pp. 352–357, Dec. 2003.
- [6] T. Kim, H. Eom, J. Keane, and C. H. Kim, "A high-density sub-threshold SRAM with data-independent bitline leakage and virtual ground replica scheme," in *Proc. Int. Solid-State Circuits Conf.*, Feb. 2007, pp. 330–331.
- [7] A. Wang and A. P. Chandrakasan, "A 180 mV FFT processor using subthreshold circuit techniques," in *Proc. Int. Solid-State Circuits Conf.*, Feb. 2004, pp. 292–293.
- [8] B. H. Calhoun and A. P. Chandrakasan, "Ultra-dynamic voltage scaling using sub-threshold operation and local voltage dithering in 90 nm CMOS," in *Proc. Int. Solid-State Circuits Conf.*, Feb. 2005, pp. 300–302.
- [9] B. H. Calhoun, A. Wang, and A. P. Chandrakasan, "Modeling and sizing for minimum energy operation in subthreshold circuits," *IEEE J. Solid-State Circuits*, vol. 40, no. 9, pp. 1778–1786, Sep. 2005.
- [10] Y. Taur and T. Ning, Fundamentals of Modern VLSI Devices. Cambridge, U.K.: Cambridge University Press, 1998.
- [11] B. Zhai, S. Hanson, D. Blaauw, and D. Sylvester, "Analysis and mitigation of variability in subthreshold design," in *Proc. Int. Symp. Low Power Electron. Des.*, Aug. 2005, pp. 20–25.
- [12] H. Tzartzanis and W. W. Walker, "Differential current-mode sensing for efficient on-chip global signaling," *IEEE J. Solid-State Circuits*, vol. 40, no. 11, pp. 2141–2147, Nov. 2005.
- [13] A. P. Jose, G. Patounakis, and K. L. Shepard, "Pulsed current-mode signaling for nearly speed-of-light intrachip communication," *IEEE J. Solid-State Circuits*, vol. 41, no. 4, pp. 772–780, Apr. 2006.
- [14] M. Khellah, J. Tschanz, Y. Ye, S. Narndra, and V. De, "Static pulsed bus for on-chip interconnects," in *Proc. VLSI Circuits Symp.*, Jun. 2002, pp. 78–79.

- [15] R. Bashirullah, W. Liu, R. Cavin, and E. Edwards, "A 16 Gb/s adaptive bandwidth on-chip bus based on hybrid current/voltage mode signaling," *IEEE J. Solid-State Circuits*, vol. 41, no. 2, pp. 461–473, Feb. 2006.
- [16] J. H. Lou and J. B. Kuo, "A 1.5 V full-swing bootstrapped CMOS large capacitive-load driver circuit suitable for low-voltage CMOS VLSI," *IEEE J. Solid-State Circuits*, vol. 32, no. 1, pp. 119–121, Jan. 1997.
- [17] S. Y. Choe and G. A. Rigby, "A 1 V bootstrapped CMOS digital logic family," in *Proc. Euro. Solid-State Circuits Conf.*, Sep. 1997, pp. 352–355.
- [18] J. H. T. Chen and J. B. Kuo, "Ultra-low-voltage SOI CMOS inverting driver circuit using effective charge pump based on bootstrap technique," *Electron. Lett.*, vol. 39, pp. 183–185, Jan. 2003.
- [19] D. Harris and S. Naffziger, "Statistical clock skew modeling with data delay variations," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 9, no. 6, pp. 888–898, Dec. 2001.
- [20] M. Shoji, "Elimination of process-dependent clock skew in CMOS VLSI," *IEEE J. Solid-State Circuits*, vol. 21, no. 5, pp. 875–880, Oct. 1986.
- [21] G. Geannopoulos and X. Dai, "An adaptive digital deskewing circuit for clock distribution networks," in *Proc. Int. Solid-State Circuits Conf.*, Feb. 1998, pp. 400–401.



**Jonggab Kil** received the B.S. degree in electronics engineering from Kookmin University, Seoul, Korea, in 2003, and the M.S. degree from University of Minnesota, Minneapolis, in 2006.

He is currently a Circuit Designer with Intel Corporation, Folsom, CA. His current research interests include circuit and architecture approaches for lowpower and high-performance.



**Jie Gu** (M'06) received the B.S. degree from Tsinghua University, Tsinghua, China, and the M.S. degree from Texas A&M University, College Station, in 2001 and 2003, respectively. He is currently pursuing the Ph.D. degree in electrical and computer engineering from the University of Minnesota, Minneapolis.

He was an intern with Texas Instruments, where he was involved in power management and timing analysis for wireless ICs. His research interests include power integrity for digital and mixed-signal ICs, sta-

tistical modeling of nanometer devices, and circuits.



**Chris H. Kim** (M'04) received the B.S. degree in electrical engineering and the M.S. degree in biomedical engineering from Seoul National University, Seoul, Korea, in 1998 and 2000, respectively, and the Ph.D. degree in electrical and computer engineering from Purdue University, West Lafayette, IN.

After a short stint with Intel Circuit Research Lab, he joined the Electrical And Computer Engineering Department, University of Minnesota, Minneapolis, in 2004. He is a coauthor of over 40 journal and

conference papers and serves as a technical program committee member for ISLPED, ASSCC, ICCAD, ISQED, and ICICDT. His current research interests include theoretical and experimental aspects of VLSI circuit design in nanoscale technologies.

Dr. Kim was a recipient of the 2006/2007 IBM Faculty Partnership Award, the 2005 IEEE Circuits and Systems Society Outstanding Young Author Award, the 2005 ISLPED Low Power Design Contest Award, the 2003 Intel Ph.D. Fellowship Award, and the 2001 Magoon's Award for Excellence in Teaching.