# A Systematic Approach for Analyzing and Optimizing Cell-Internal Signal Electromigration

Gracieli Posser Universidade Federal do Rio Grande do Sul - PPGC Porto Alegre, RS, Brazil Email: gposser@inf.ufrgs.br Vivek Mishra University of Minnesota Minneapolis, MN, USA Email: vivek@umn.edu

Palkesh Jain Texas Instruments India Bangalore, India Email: palkesh@ti.com Ricardo Reis Universidade Federal do Rio Grande do Sul - PPGC Porto Alegre, RS, Brazil Email: reis@inf.ufrgs.br Sachin S. Sapatnekar University of Minnesota Minneapolis, MN, USA Email: sachin@umn.edu

Abstract—Electromigration (EM) in on-chip metal interconnects is a critical reliability failure mechanism in nanometer-scale technologies. This work addresses the problem of EM on signal interconnects within a standard cell. An approach for modeling and efficient characterization of cell-internal EM is developed, incorporating Joule heating effects, and is used to analyze the lifetime of large benchmark circuits. Further, a method for optimizing the circuit lifetime using minor layout modifications is proposed.

# I. INTRODUCTION

Electromigration (EM) is an increasing on-chip concern in future technologies [1]. EM is initiated by current flow through metal wires and may cause open-circuit failures over time. Traditionally, EM has been a significant concern in power delivery networks, which largely experience unidirectional current flow. Of late, two new issues have emerged. First, EM has become increasingly important in signal wires, where the direction of current flow is bidirectional. This is due to increased current densities and Joule heating effects that accelerate EM [2], which depends exponentially on temperature. Second, traditional EM analysis has focused on higher metal layers. However, with shrinking wire directions and increasing currents, the current densities in lower metal layers are also now in the range where EM effects are visible. EM effects are visible at current densities of about 1MA/cm<sup>2</sup>, and such current densities are seen in the internal metal wires of standard cells, resulting in cell-internal signal EM [3]. These high current densities arise because local interconnect wires within standard cells typically use low wire widths to ensure compact cell layouts. However, the current that flows through these wires to charge/discharge the output load can be large enough to create significant EM effects over the lifetime of the chip.

Such high current densities are seen in the cell library used in our work, e.g., wires in an INV\_X4 cell have an effective average current density of 1.08 MA/cm<sup>2</sup> at 1GHz. This switching rate is very realistic, and be seen in clock buffers in almost any modern design, as well as in cells that switch at 25% probability in a 4GHz design. While the cell-internal signal EM problem is described in industry publications such as [3], its efficient analysis is an open problem.

In this work, we study the problem of systematically analyzing cell-internal signal EM. We devise a solution that facilitates the analysis and optimization of cell-internal signal EM for a standard cell library based design. We first develop an approach to *efficiently characterize* cell-internal EM over all output pin locations within a cell, incorporating Joule heating effects into our analysis. We then formulate the *pin optimization problem* so that cell output pins are chosen during place-and-route so as to maximize the design lifetime.

We motivate the problem using the INV\_X4 (inverter with size 4) cell, shown in Fig. 1(a), from the 45nm NANGATE library [4]. The



Figure 1. (a) The layout and output pin position options for INV\_X4. Charge/discharge currents when the output pin is at (b) node 4 and (c) node 3. The red [blue] lines represent rise [fall] currents.

input signal A is connected to the polysilicon structure. The layout uses four parallel transistors for the pull-up (poly over p-diffusion, upper half of the figure) and four for the pull-down (poly over n-diffusion, lower half of the figure), and the output signal can be tapped along the H-shaped metal net in the center of the cell. The positions where the output pin can be placed are numbered 1 through 7, and the edges of the structure are labeled  $e_1$  through  $e_6$ , as shown in the figure. Since the four PMOS transistors are all identical, by symmetry, the currents injected at nodes 1 and 5 are equal; similarly, the NMOS-injected currents at nodes 3 and 7 are equal.

When the output pin is at node 4, the charge/discharge current is as shown in Fig. 1(b). Moving the pin changes the current distribution in  $e_1-e_6$ . If the pin is at node 3 (Fig. 1(c)), since the rise and fall discharge currents have similar values, the charging current in edge  $e_2$  is about  $2 \times$  larger than the earlier case, while the discharging current is about the same (with opposite direction). As quantified in Section II, the larger peak current leads to a stronger net electron wind that causes EM, resulting in a larger *effective average current*, and therefore, a lower lifetime. Based on exact parasitic extraction of the layout, fed to SPICE (thus including short-circuit and leakage currents), the average effective EM current through  $e_2$  is  $1.17 \times$  larger than when the pin is at node 4. Accounting for Joule heating, this results in a 19% lifetime reduction.

# II. MODELING CELL-INTERNAL EM

#### A. Modeling Time-to-Failure Under EM

EM is widely computed using Black's equation [5]:

$$TTF = A J^{-n} \exp\left(\frac{Q}{k_B T_m}\right) \tag{1}$$

where TTF is the time-to-failure, A is a constant that depends on material properties, J is the current density, the exponent n is typically between 1 and 2, Q is the activation energy,  $k_B$  is Boltzmann's constant and  $T_m$  is the metal temperature. The current density  $J = I_{avg}/(T_w \cdot W)$ , where W and  $T_w$  are the wire width and thickness and  $I_{avg}$  is the average current.

For unidirectional currents (e.g., in power grid wires), EM causes a steady unidirectional migration of metal items, and  $I_{avg}$  is simply the time average of the current. In signal wires, currents may flow in both directions. For signal nets with bidirectional current flow, the time-average of the current waveform is often close to zero. However, even in cases where the current in both directions is identical, it is observed that EM effects are manifested. In this effect, often referred to as *AC EM*, the motion of atoms under one direction of current flow is partially, but not fully, negated by the "sweep-back" recovery effect that moves atoms in the opposite direction when the current is reversed. This partial recovery is captured by an *effective average current*,  $I_{avg}$  [2], [3]:

$$I_{avg} = I_{avg}^+ - \mathcal{R} \cdot I_{avg}^-, \tag{2}$$

where  $\mathcal{R}$  represents the *recovery factor* that captures sweep-back. Here,  $I_{avg}^+$ , is the larger of the average currents (forward-direction) and  $I_{avg}^-$  is the smaller current (reverse-direction). For signal wires in a cell, the rise and fall cycle currents are not always in opposing directions. We consider two cases:

**Case I**: When the rise and fall currents,  $I_{avg}^r$  and  $I_{avg}^{l}$ , are in opposite directions, as in edge  $e_3$  in Fig. 1(c), Eq. (2) yields:

$$I_{avg} = \frac{\max\left(\left|I_{avg}^{r}\right|, \left|I_{avg}^{f}\right|\right) - \mathcal{R} \cdot \min\left(\left|I_{avg}^{r}\right|, \left|I_{avg}^{f}\right|\right)}{2} \quad (3)$$

where the factor of 2 arises because half the transitions correspond to an output rise and half to an output fall.

**Case II**: When the rise and fall currents are in the same direction (e.g., in edge  $e_1$  in Fig. 1(c), where the charging rise current and the short-circuit current (not shown) during the fall transition both flow downwards), then

$$I_{avg} = \frac{\left|I_{avg}^{r}\right| + \left|I_{avg}^{f}\right|}{2} \tag{4}$$

In this work, we use a recovery factor  $\mathcal{R}$  of 0.7 [2]. We use  $A = 1.47 \times 10^7 \text{As/m}^2$  in SI units, which corresponds to an allowable current density of  $10^{10} \text{ A/m}^2$  over a lifetime of 10 years at 378K, with an activation energy, Q = 0.85 eV [6].

#### B. Joule Heating

Current flow in a wire causes Joule heating, which hastens EM, as seen in Eq. (1). The temperature  $T_m$  in a wire is given by:

$$T_m = T_{ref} + \Delta T_{Joule} \tag{5}$$

where  $T_{ref}$  is the reference chip temperature for EM analysis and  $\Delta T_{Joule}$  is the temperature rise due to Joule heating. In the steady-state, the wire temperature rises by [7]:

$$\Delta T_{Joule} = I_{rms}^2 R R_{\theta} \tag{6}$$

Here,  $I_{rms}$  is the root mean square (RMS) wire current, R is the wire resistance, and  $R_{\theta} = t_{ins}/(K_{ins}LW_{eff})$  is the thermal impedance of the wire to the substrate, where  $t_{ins}$  is the dielectric thickness,  $K_{ins}$  is the thermal conductivity normal to the plane of the dielectric, L is the wire length, and  $W_{eff} = W + 0.88t_{ins}$ , for a wire width W. We obtain R by parasitic extraction using a commercial tool and use  $t_{ins} = 59$ nm [8] and  $K_{ins} = 0.07$ W/m.K [7] at 22nm.

### C. Current Divergence

A via in a copper interconnect allows the flow of electrical current but acts as a barrier for the migration of metal atoms under EM. Thus, the average current used for EM computation depends on the magnitude and direction of currents in neighboring wires where the metal migration flux is blocked by a via; for details, the reader is referred to [9]. The computation of the average EM current can be performed according to the flux-divergence criterion presented in [9], which says that the average EM current for a wire is the sum of the current through the wire and the divergence at the via. *This new average current replaces all average currents in Section II-A*.



Figure 2. Current divergence for a multifanout tree.

**Example:** Consider the example of Fig. 2 showing the left half of the H-shaped INV\_X4 output wire presented in Fig. 1. Note that all metal wires within the H-shaped structure are routed on the same metal layer, regardless of direction. Here, the output pin is placed at node 2 and consequently a via is placed over this node. The arrows in Fig. 2 indicate the direction of electron flow of the current in this wire during the rise and fall transitions. Poly-metal contacts (nodes 1, 3) are also blocking boundaries for metal atoms, and flux divergence must be used for wires at these nodes. Since voids in Cu interconnects are formed near the vias, we consider the two vias at either end of each edge. If an edge has multiple vias (e.g.,  $e_1$  has vias at nodes 1 and 2),  $I_{avg,d}$  uses the largest divergence.

For edge  $e_1$ , node 1 does not see a void: the electron flow in this edge, during both the rise and fall transitions, is in the direction of node 1, and EM voids are only caused by electron flow away from the via. However, for the via at node 2, there is an effective outflow and the EM average current for edge  $e_1$  with respect to via 2,  $I_{avg,d}(e_1)$ , is computed using Eq. (4):

$$I_{avg,d}(e_1) = (I^{r}_{avg,d}(e_1) + I^{f}_{avg,d}(e_1))/2$$
  
where  $I^{r}_{avg,d}(e_1) = I^{r}_{avg}(e_1) - I^{r}_{avg}(e_2) + I^{r}_{avg}(e_3)$   
 $I^{f}_{avg,d}(e_1) = I^{f}_{avg}(e_1) - I^{f}_{avg}(e_2) - I^{f}_{avg}(e_3)$ 

The expression for  $I^r_{avg,d}$  above has contributions from:

- Current in e<sub>1</sub>, drawing metal flux away from the via, and adds to void formation.
- Current in e<sub>2</sub>, which inserts flux into the via: although this current flows to the output load through the via at node 2, due to the blocking boundary at the via, the metal flux does not pass through, but instead, accumulates atoms, thus negating void formation.
- Current in  $e_3$ , which draws flux away from node 2.

The expression for  $I_{avg,d}^f$  is similarly derived.

#### III. CURRENT CALCULATION

For a standard cell with m pin positions, characterization for delay and power can be performed at any one of the pin positions. Since the cell-internal wire parasitics in a standard cell are negligible and are dominated by transistor parasitics, this characterized value is accurate at all other pin locations.

However, the evaluation of EM TTF requires a characterization of the average currents,  $I_{avg}^r$  and  $I_{avg}^f$  and the RMS current  $I_{rms}$ , which is very dependent on the pin position. For a library with  $N_{lib}$  cells, each with an average of m pin positions, the CPU time required for standard cell characterization is given by:

$$T_{char} = m \cdot N_{corners} \cdot T^{avg}_{char.cell} \tag{7}$$

where  $N_{corners}$  represents the number of corners at which the cell is characterized, and  $T_{char,cell}^{avg}$  is the average characterization time (typically SPICE simulations for the output rising/falling cases) for each cell. A typical library may have  $N_{lib} = 200$ . In our experiments, the average characterization time to build the 7×7 .lib table for a cell in the 45nm NANGATE library is found to be  $T_{char,cell}^{avg} = 17.5$ s. For the NANGATE library, the average number of pin positions n = 12, and the number of corners,  $N_{corners} = 15$  at 45nm. This yields  $T_{char} = 7$  days, which is m times the cost of characterizing each cell at one pin position. At more advanced process nodes, the number of corners goes up significantly, and therefore  $T_{char}$  is much higher.

In this work, we show that a simpler approach is possible, speeding this up by a factor of almost m, implying that the above 7-day characterization can be conducted more practically, in about half a day. Our procedure extracts the average and RMS current information from the same simulations used for delay and power characterization, at a *reference pin position*, and then uses inexpensive graph traversals to evaluate EM for other pin positions. In other words, the additional overhead over conventional cell characterization is negligible.

To illustrate the EM characterization procedure, consider INV\_X4 in Fig. 1 with the output pin at node 4. We will temporarily ignore short-circuit and leakage currents to simplify the example. Here, all PMOS [NMOS] devices are identical and inject equal charge/discharge currents. When the pin is moved to node 2 [node 6], the distribution of currents in the branches remains similar, except edge  $e_3$  [ $e_4$ ], which now carries an equal current in the opposite direction. Therefore, the Joule heating and EM lifetime for each edge are unchanged, and only the current divergence calculations change.

When the pin is moved from node 4 to node 3, the PMOS current injected at node 5 is redirected to also flow through  $e_2$  and  $e_3$ . The only changed current magnitudes correspond to segments  $e_2$  and  $e_3$ ; those for the other wire segments remain almost the same since intracell wire parasitics are small.

Both cases above show small changes in current flow patterns when the pin is moved, indicating that it may be possible to reduce the characterization effort by performing a single SPICE simulation for one pin position, called the *reference case*, and inferring the current densities for every other pin position from this data by determining the current redirection. We develop a graph-based method for determining this redirection, and an algebra for computing  $I_{avg}$  and  $I_{rms}$  for each pin position based on the values from the reference case.

The reference case is characterized for a fixed reference frequency,  $f_{ref}$ , chosen to be 1GHz in our experiments. If a given design operates at a frequency f and an activity factor  $\alpha$ , as long as the circuit operates correctly at that frequency (i.e., all transitions can be completed), it is easy to infer the average and RMS currents

## Algorithm 1 Efficient cell EM current characterization.

**Input:** Undirected graph  $G(V, E) \equiv$  cell output net; Reference pin  $ref \in V$ ; Set of candidate pin positions  $C \subseteq V$ .

- **Output:**  $I^+_{avg}(e)$ ,  $I^-_{avg}(e)$ ,  $I_{rms}(e) \forall e \in E \forall$  pin positions in C.
- 1: SPICE-simulate the cell with the output at *ref*, find triangle representations, average of edge currents during rise, fall.
- 2: for each current injection point j do
- 3:  $P_i^{\{r/f\}} = \{\text{charge/discharge}\} \text{ path from } j \text{ to } ref.$
- 4: Find charge/discharge, short-ckt/leakage currents injected at j.
  5: end for
- 6: for each pin position  $i \in C$  do
- 7: Compute unique path  $P_i$  from *ref* to pin position *i*.
- 8: **for each** current injection point *j* **do**
- 9: New {charge,discharge} path from j to i,  $P_j^{\{r/f\}}$  = algebraic sum of paths  $P_i$  and  $P_j^{\{r/f\}}$ .
- 10: Update the {charge,discharge} current for each edge in  $P'_j$ , keep short-circuit/leakage currents unchanged
- 11: end for
- 12: Compute  $I^+_{avg}(e)$ ,  $I^-_{avg}(e)$ ,  $I_{rms}(e) \forall e \in E$  for pin position j.
- 13: end for
- 14: return

in each branch. The average and RMS currents are multiplicatively scaled by factors of  $\alpha f/f_{ref}$  and  $\sqrt{\alpha f/f_{ref}}$ , respectively.

# A. Current Flows Using Graph Traversals

We present a graph-based algorithm that computes the currents through each edge when the pin position is moved from the reference case to another location. Our algorithm captures the effect of both charge/discharge currents and short-circuit and leakage currents (neglected in the example above), and its pseudocode is shown in Algorithm 1. The short-circuit and leakage currents are unaffected by the pin location, but Fig. 1 shows that the flow of the charge/discharge currents is affected by the output pin position. The algorithm uses graph traversals to trace the change in the current path when the pin position is moved from the reference pin position, *ref*, to any candidate pin position on the output net, as enumerated in a candidate set C.

Lines 1–5 perform a SPICE simulation at reference pin location *ref* to compute each average and triangle representations for edge currents during rise and fall. The charge/discharge and shortcircuit/leakage currents for each edge are given by the simulation.

The output metallization has several points that are connected to the NMOS and PMOS transistors: we refer to these as *current injection points*. In Fig. 1, the NMOS and PMOS current injection points are at nodes {1,5} and {3, 7}, respectively. Next, in the **for** loop that commences at line 6, we determine the current contribution for each candidate pin position in C during rise and fall transitions. The graph-based approach determines the unique path  $P_i$  from the reference pin position *ref* to pin candidate *i* (line 7). For each current injection point, the charge/discharge path for pin candidate *i* (lines 8–11) is the algebraic sum of  $P_i$  and the charge/discharge path  $P_j$ for the reference pin position. The currents are updated in line 12.

**Example:** The key idea is illustrated in Fig. 3 for the rise transition when the pin is moved from reference node 4 to node 3: the unique path  $P_3$  between these nodes is shown at left. The two figures on the right show the algebraic addition of path  $P_3$  with paths  $P_1^r$  and  $P_5^r$ , respectively, corresponding to the two rise current injection points. After cancellations, the resulting path successfully shows the new



Figure 3. Recomputation of the rise currents when the pin is moved from reference node 4 to node 3.

path for charging currents:  $\{e_1, e_2\}$  for the PMOS current from node 1, and  $\{e_5, e_4, e_3, e_2\}$  for the PMOS current from node 5. The charge/discharge currents are updated in lines 9–11, while the short-circuit and leakage contributions are the same as the reference case.

# B. Algebra for Average/RMS Current Updates

The current waveforms in the wire segments, for the rise and fall transitions, are used to calculate the RMS and effective average current through the wire: the former is used to measure self-heating, and the latter is used in the EM TTF formula. We now develop an algebra for efficient RMS and effective average current updates for various pin positions, given information for the reference case.

1) Algebra for Computing Average Current: For edge e,  $I_{avg}$  during a rise or fall half-cycle is given by:

$$I_{avg}(\mathbf{e}) = \frac{1}{T/2} \int_0^{T/2} I(\mathbf{e})(t) dt = \frac{1}{T/2} \sum_{i \in S} \int_0^{T/2} I(p_i(e))(t) dt$$
(8)

where the summation is over the set S of all current insertion points whose currents contribute to the current in edge e.

When the pin is moved, the set S is modified, and some entries are added and removed to the set. For example, in Fig. 1, when the pin is moved from node 4 to node 3, the current in edge  $e_2$  has new contributions from current insertion points 5 (rise) and 7 (fall) and a removal of the contribution from insertion point 3; the current in  $e_3$ must subtract the contribution of current insertion point 1 (rise) and 3 (fall), and add contributions from insertion points 5 (rise) and 6 (fall). To perform these operations, we can simply add or subtract the average currents associated with the corresponding current insertion point. For a current  $I(p_i)$  from a pin insertion point  $p_i$  that is added or subtracted, we can write

$$(I(\mathbf{e}) \pm I(p_i))_{avg} = \frac{1}{T/2} \int_0^{T/2} (I(\mathbf{e})(t) \pm I(p_i)) dt$$
  
=  $I_{avg}(\mathbf{e}) \pm I_{avg}(p_i)$ 

Therefore,  $I_{avg}$  updates for a new pin position simply involve add/subtract operations on average reference case currents.

2) Algebra for Computing the RMS Current: The waveform for the current drawn by each device may be approximated by a triangle with height  $I_a$ , and with a nonzero current for a period of T' seconds, where T' < T, the clock period (this current model is widely used). It is well-known that the RMS value of such a waveform is

$$I_{rms,\Delta} = I_a \sqrt{\frac{T'}{3T}} \tag{9}$$

Due to the tree structure of the output wire, the current in each edge is a sum or difference of a set of such triangular signals, and this set can be determined based on a tree traversal. The sum (or difference) of a set of triangular waveforms, potentially each with different heights, start times, and end times, can be represented as a piecewise linear waveform, and thus each edge current has this form. To find the RMS value of such a piecewise linear waveform, we can decompose it into a set of nonintersecting (except at the edges) triangles and trapezoids, as shown in Figure 4.



Figure 4. The sum of the two upper triangular waveforms can be represented as a set of piecewise triangular or trapezoidal segments (below).

The RMS for this waveform can be shown to be:

$$I_{rms}^{2} = \sum_{\text{all triangles } i} I_{rms,\Delta_{i}}^{2} + \sum_{\text{all trapezoids } i} I_{rms,trap_{i}}^{2}$$
(10)

To use the above equation, we use Equation (9) for the RMS of a triangular waveform, and the following formula for the RMS of a trapezoid bounded by the time axis, with value  $I_b$  at time b and  $I_c$  at time c, where c > b:

$$I_{rms,trap} = \sqrt{\frac{(I_b^2 + I_b I_c + I_c^2) (c - b)}{3T}}$$
(11)

For INV\_X4, since the transistors of each type are all identical and are driven by the same input signal, each PMOS [NMOS] device injects an identical charging [discharging] current waveform; however in general, the currents may be different. Since the intracell parasitics of the output metallization are small, some combination of these nearly unchanged currents is summed up along each edge during each half-cycle. The set of triangular PMOS waveforms that contribute to the current in each edge in Fig. 1 is simply the set of PMOS devices iwhose charge or discharge path (Algorithm 1) traverses edge i. When the output is moved from node 4 to node 3, the current through an edge loses some set membership and gains others. The updated set of triangles add up, in general, to a waveform with triangles and trapezoids, whose RMS value is given by Equation (10).

## IV. IMPLEMENTATION FLOW

We now present the implementation flow of this work for analyzing and improving circuit lifetime under cell-internal EM. Since we do not have access to a library at a recent technology node, where EM effects are significant [3], our evaluation is based on scaling layouts in the NANGATE 45nm cell library down to 22nm. While this may not strictly obey all design rules at a 22nm node, the transistor and wire sizes are comparable to 22nm libraries, and so are the currents.

Initially the cells are characterized for the average and RMS currents in each cell under a reference pin position. The cells are characterized considering  $f_{ref} = 1$ GHz and for 7 different values each for the input slew and output load. The characterization thus generates a  $7 \times 7$  look-up table with the RMS and average current values for the slew and load values, and these values are determined

based on SPICE characterization of the scaled 22nm library based on publicly available 22nm SPICE ASU PTM models for the High Performance applications (PTM HP).

We synthesize ITC'99 and ISCAS'89 benchmarks using Design Compiler with delay specs set to the best achievable frequency. The cells from the NANGATE library [4] are: NAND2\_X2, NAND2\_X4, NOR2\_X2, NOR2\_X4, AOI21\_X2, AOI21\_X4, INV\_X4, INV\_X8, INV\_X16, BUF\_X4, BUF\_X8, BUF\_X16, DFF\_X2, DFFR\_X2 and DFFS\_X2. We focus on EM in the combinational cells.

Each circuit is placed and routed using Cadence Encounter. The SPEF file with the extracted wire RCs and the Verilog netlist are saved. The timing, power, area and wirelength are reported. Synopys PrimeTime reads the SPEF, Verilog, and SDC files and reports the input slew, output load, and switching probability for each cell. The PrimeTime timing report provides the slew, load, and switching probability for all cell instances. For each cell, based on the reported slew and load, we calculate  $I_{avg}$  and  $I_{rms}$  for each internal wire, interpolating from a  $7 \times 7$  look-up table characterized for the reference pin position, and infer currents for each candidate position using the approach in this paper. The TTF is found using Eq. (1) at 378K, a typical EM specification.

The **worst TTF** of the circuit is given by the cell in the circuit that has the smallest TTF. To compute the best TTF that the circuit can achieve under output pin selection, for each cell we determine the output pin position with the best TTF. The smallest such value over the entire circuit is the "weakest link" using the best possible pin positions, and is reported as the **best TTF** of the circuit.

Next, we turn to the problem of optimization, and the objective of our method is to optimize the lifetime of the circuit. We choose the lifetime specification to the best TTF in the circuit. We report the critical pin positions (pin candidates for which the lifetime is smaller than the best TTF) for each cell instance in the circuit, and invalidate these pins. We also enforce a design requirement that limits the maximum allowable Joule heating in a wire. A typical Joule heating specification is a 5K temperature rise.We invalidate pin candidates in a cell that violate this requirement.

We provide the above information, describing pin positions to be avoided, to the router. We implement this by changing the pin information in the Library Exchange Format (LEF) file to outlaw the critical pin positions as we build a new TTF-optimized layout.

Table I.COMPARISON WITH SPICE FOR  $I_{avg}$  CALCULATED USING OURALGORITHM. FOR EACH CELL, THE VALUE CORRESPONDS TO THE EDGE<br/>CURRENT WITH THE LARGEST ERROR.

| Cell     | # Candidates | SPICE   | Ours    | Error (%) |
|----------|--------------|---------|---------|-----------|
| NAND2_X2 | 8            | 4.72e-5 | 4.70e-5 | 0.32%     |
| NAND2_X4 | 10           | 4.27e-5 | 4.31e-5 | 0.99%     |
| NOR2_X2  | 6            | 2.74e-5 | 2.76e-5 | 0.72%     |
| NOR2_X4  | 8            | 2.22e-5 | 2.23e-5 | 0.28%     |
| AOI21_X2 | 8            | 3.81e-5 | 3.81e-5 | 0.09%     |
| AOI21_X4 | 11           | 3.00e-5 | 2.96e-5 | 1.23%     |
| INV_X4   | 7            | 9.84e-5 | 9.88e-5 | 0.46%     |
| INV_X8   | 13           | 1.02e-4 | 1.02e-4 | 0.64%     |
| INV_X16  | 25           | 1.29e-4 | 1.28e-4 | 0.63%     |
| BUF_X4   | 7            | 9.79e-5 | 9.85e-5 | 0.57%     |
| BUF_X8   | 13           | 1.12e-4 | 1.11e-4 | 0.36%     |
| BUF_X16  | 25           | 1.24e-4 | 1.25e-4 | 0.08%     |
| AVG      | 11.8         |         |         | 0.53%     |

## V. RESULTS

Table I shows the results of our characterization approach for our library based on a single SPICE simulation, followed by graph

Table II. TTF IN YEARS FOR EACH CELL IN THE LIBRARY.

|          | 50% sv      | witching     | 100% switching |              |  |
|----------|-------------|--------------|----------------|--------------|--|
| Cell     | Best<br>TTF | Worst<br>TTF | Best<br>TTF    | Worst<br>TTF |  |
| NAND2_X2 | 22.03       | 21.85        | 10.95          | 10.85        |  |
| NAND2_X4 | 27.65       | 20.37        | 8.75           | 8.08         |  |
| NOR2_X2  | 24.33       | 24.30        | 12.11          | 12.07        |  |
| NOR2_X4  | 29.61       | 25.71        | 14.74          | 10.75        |  |
| AOI21_X2 | 28.32       | 28.30        | 14.12          | 14.11        |  |
| AOI21_X4 | 13.13       | 13.10        | 6.47           | 6.43         |  |
| INV_X4   | 23.23       | 9.90         | 11.49          | 4.73         |  |
| INV_X8   | 33.80       | 16.92        | 16.82          | 8.43         |  |
| INV_X16  | 30.80       | 2.42         | 15.31          | 0.20         |  |
| BUF_X4   | 25.85       | 12.93        | 12.64          | 6.35         |  |
| BUF_X8   | 40.93       | 13.55        | 20.35          | 6.01         |  |
| BUF_X16  | 35.91       | 3.17         | 17.65          | 0.50         |  |

traversals and the current update algebra. One reference case is chosen for each cell and the number of candidate pin positions varies from 6 to 25, with an average of about 12 pin candidates per cell. For this library, the number of SPICE simulations is therefore reduced by  $12\times$ , significant and worthwhile savings even for an one-time library characterization task. The table shows the edge within each cell that shows the largest error for the effective average current: in each case, this error is seen to be small, and the computational savings for characterization are large.

Table II presents the results of our lifetime evaluation scheme for the set of library cells. The best and worst TTF values correspond to the largest and smallest lifetimes over all pin candidates. The TTF is calculated for two different switching activities of 50% and 100% of the clock frequency: although few cells in a layout switch frequently, it is likely one of these cells that could be an EM bottleneck. The 100% switching case is a clear upper bound on the lifetime of the cell: typical cells, even worst-case cells, switch at a significantly lower rate, except on always-on networks such as core elements of the clock network. The table shows that the pin position is important: choosing a good pin position could better balance current flow and improve EM lifetime. It can be noted that the worst TTFs for the X16 cells are extremely small: this is due to the large number of pin choices for such cells, and due to the effects of large currents associated with specific pin positions, as well as divergence effects. While this result may include possible inaccuracies from our direct geometric scaling of the publicly-available 45nm cell layouts to 22nm, the impact of pin positions is real and can be extreme for large cells. To counter this effect, a library cell layout may use wider wires to control current densities, or more practically, outlaw a set of critical positions. For each of the X16 cells, pin positions that see more balanced currents provide high lifetimes (as shown by the best TTF for these cells).

Fig. 5 shows the TTF in years for the different pin position options for an INV\_X4, considering a switching activity of 100% at 2GHz. The TTF changes for different pin positions. When the pin is at node 4, the TTF is  $2 \times$  larger than when the pin is at PMOS or at node 2 or node 6 and  $2.43 \times$  larger than when the pin is at NMOS. For this cell, the best TTF is 11.49 years and the worst TTF is 4.73 years.

Table III presents the results for a set of ITC'99 and ISCAS'89 benchmarks circuits mapped to our set of characterized cells and placed-and-routed. For each benchmark the number of combinational cells, the clock period, total power consumption (leakage and switching power), area of core and total wirelength (WL) are presented, as reported by Encounter. The best and worst TTF values are computed as described in Section IV. These results correspond to a post place-and-route layout with no EM awareness, and the gap between the best and worst TTF values indicates how much the lifetime can

| Circuit | # of<br>comb.<br>cells | Period<br>(ns) | Power<br>(mW) | Area of<br>core<br>$(\mu m^2)$ | Total wire<br>length<br>(µm) | Worst TTF<br>(years) | Best TTF<br>(years) | TTF<br>Improv. | # of<br>critical<br>nets | # of<br>critical<br>cells |
|---------|------------------------|----------------|---------------|--------------------------------|------------------------------|----------------------|---------------------|----------------|--------------------------|---------------------------|
| b05     | 859                    | 0.544          | 0.551         | 504                            | 2682.50                      | 4.07                 | 6.53                | 37.59%         | -                        | 4                         |
| b07     | 461                    | 0.306          | 0.352         | 317                            | 1426.87                      | 3.81                 | 5.25                | 27.43%         | -                        | 3                         |
| b11     | 821                    | 0.384          | 0.460         | 471                            | 2439.83                      | 2.75                 | 5.82                | 52.80%         | 1                        | 5                         |
| b12     | 1217                   | 0.282          | 0.810         | 824                            | 4236.15                      | 3.13                 | 3.14                | 0.15%          | 3                        | 1                         |
| b13     | 340                    | 0.208          | 0.467         | 272                            | 1272.99                      | 3.89                 | 6.05                | 35.70%         | 1                        | 7                         |
| s5378   | 1219                   | 0.299          | 0.679         | 890                            | 6418.27                      | 2.74                 | 3.59                | 23.67%         | 2                        | 1                         |
| s9234   | 1044                   | 0.373          | 0.584         | 849                            | 4873.30                      | 2.73                 | 3.48                | 21.39%         | -                        | 1                         |
| s13207  | 1401                   | 0.720          | 1.063         | 1733                           | 7146.48                      | 4.94                 | 13.18               | 62.50%         | -                        | 7                         |
| s38417  | 10068                  | 0.493          | 8.836         | 7959                           | 46419.93                     | 3.43                 | 5.77                | 40.51%         | 2                        | 6                         |





Figure 5. TTF for various pin positions in INV\_X4, at 100% switching.

be improved. The number of critical nets corresponds to the nets that violate the Joule heating constraint, and the number of critical cells corresponds to the cells that have pin positions that correspond to lifetimes below the best TTF. Interestingly, these numbers are both small, implying that large improvements to the lifetime can be obtained through a few small changes to the layout. Note that the best TTF values are in the range required for many modern applications (e.g., mobile devices) with short TTF specs of 3 - 4 years.

| Circuit | Period<br>(ns) | $\begin{array}{c} \Delta \\ \textbf{Period} \\ (\%) \end{array}$ | Power<br>(mW) | <b>Area</b> (μ <b>m</b> <sup>2</sup> ) | WL<br>(μm) | $\begin{array}{c} \Delta \\ \mathbf{WL} \\ (\%) \end{array}$ |
|---------|----------------|------------------------------------------------------------------|---------------|----------------------------------------|------------|--------------------------------------------------------------|
| b05     | 0.544          | -                                                                | 0.551         | 504                                    | 2682.6     | 0.00                                                         |
| b07     | 0.306          | -                                                                | 0.353         | 317                                    | 1428.5     | 0.12                                                         |
| b11     | 0.384          | -                                                                | 0.460         | 471                                    | 2443.5     | 0.15                                                         |
| b12     | 0.280          | -0.89                                                            | 0.808         | 824                                    | 4112.8     | -2.91                                                        |
| b13     | 0.208          | -                                                                | 0.467         | 272                                    | 1273.5     | 0.04                                                         |
| s5378   | 0.299          | -                                                                | 0.679         | 890                                    | 6422.2     | 0.06                                                         |
| s9234   | 0.373          | -                                                                | 0.584         | 849                                    | 4873.4     | 0.00                                                         |
| s13207  | 0.720          | -                                                                | 1.063         | 1733                                   | 7146.6     | 0.02                                                         |
| s38417  | 0.493          | -                                                                | 8.836         | 7959                                   | 46420.2    | 0.00                                                         |

 Table IV.
 PERFORMANCE IMPACT OF EM-AWARE PHYSICAL

 SYNTHESIS USING PIN OPTIMIZATION.

Table III shows that the lifetime of a circuit can be improved by up to 62.50% by altering the pin position of a few cells. The benchmark where the TTF improvement is small is b12: the critical cell for this circuit is a NOR2\_X2 where the worst TTF is 3.13 and the best TTF is 3.14, i.e. changing the pin position the TTF does not change the lifetime significantly. The largest TTF improvement is for s13207, where the critical cell is an INV\_X4 and its worst TTF is 4.94 years and the best TTF is 13.18 years.

We now redo the routing step to guarantee that the best TTF in Table III can be met by outlawing all pin positions whose TTF is worse than the best TTF in Table III, or that result in a cellinternal Joule heating violation. Since the best TTF was computed by choosing the best pin position for each cell, and then finding the weakest link by determining the shortest TTF among these cells, a few cells may be forced to use a single pin, but most cells will have the choice of a number of pin positions, and the circuit lifetime will be significantly enhanced. (Note that by the definition of best TTF, each cell is guaranteed to have at least one allowable pin).

After these new constraints are imposed on the pin positions, the router makes incremental changes to some interconnect routes. Table IV shows the results after physical synthesis considering the best pin positions, i.e., for each cell, we disallow EM-unsafe pin positions. Thus, we see that the circuit lifetime is improved up to 62.50% while keeping the delay, area and power of the circuit unchanged, and with marginal changes ( $\leq 0.15\%$ ) to the total wirelength (in fact, for one circuit, b12, the wirelength and the clock period are even slightly improved). As there are only a few instances with critical pin positions and critical wire segments, the TTF can be increased without major changes in the circuit.

**Runtime**: As previously cited, the circuit analysis is executed by Encounter tool and the runtime for each benchmark is less than 40s. The critical pin positions for each circuit are reported in under 1s.

#### VI. CONCLUSION

We have developed an approach to touch upon the problem of cell-internal EM, addressing the problem of EM on signal interconnects within a standard cell, with a new modeling approach that includes Joule heating effects. The lifetimes of benchmark circuits are optimized using minor layout modifications. We demonstrate lifetime improvements of up to 62.50% at the same area, delay, and power.

#### References

- J. Lienig, "Electromigration and its impact on physical design in future technologies," in *Proceedings of the ACM International Symposium on Physical Design*, 2013, pp. 33–40.
- [2] K.-D. Lee, "Electromigration recovery and short lead effect under bipolar- and unipolar-pulse current," in *Proceedings of the IEEE International Reliability Physics Symposium*, 2012, pp. 6.B.3.1–6.B.3.4.
- [3] P. Jain and A. Jain, "Accurate current estimation for interconnect reliability analysis," *IEEE Transactions on VLSI Systems*, vol. 20, no. 9, pp. 1634–1644, 2012.
- [4] "NanGate 45nm Open Cell Library," http://www.nangate.com.
- [5] J. R. Black, "Electromigration a brief survey and some recent results," *IEEE Transactions on Electron Devices*, vol. ED-16, pp. 338–347, 1969.
- [6] C.-K. Hu et al., "Impact of Cu microstructure on electromigration reliability," in *International Interconnect Technology Conference, IEEE*, 2007, pp. 93–95.
- [7] K. Banerjee and A. Mehrotra, "Global (interconnect) warming," *IEEE Circuits & Devices Magazine*, vol. 17, no. 5, pp. 16–32, Sep. 2001.
- [8] "FreePDK45 process design kit," http://www.eda.ncsu.edu/wiki/ FreePDK45:Contents.
- [9] Y.-J. Park, P. Jain, and S. Krishnan, "New electromigration validation: Via node vector method," in *Proceedings of the International Reliability Physics Symposium*, 2010, pp. 698–704.