# Cell-Internal Electromigration: Analysis and Pin Placement Based Optimization

Gracieli Posser, *Student Member, IEEE*, Vivek Mishra, *Student Member, IEEE*, Palkesh Jain, *Member, IEEE*, Ricardo Reis, *Senior Member, IEEE*, and Sachin S. Sapatnekar, *Fellow, IEEE* 

Abstract—Electromigration (EM) in on-chip metal interconnects is a critical reliability-driven failure mechanism in nanometer-scale technologies. This work addresses the problem of EM on signal interconnects and on Vdd and Vss rails within a standard cell. An approach for modeling and efficient characterization of cell-internal EM is developed, incorporating Joule heating effects. We also present a graph-based algorithm that computes the currents when the pin position is moved avoiding a new characterization for each pin position and consequently considerably reducing the characterization time. We use the cell lifetime analysis to determine the lifetime of large benchmark circuits, and show that these circuit lifetimes can be improved about  $2.5\times -161\times$  by avoiding the EM-critical output, Vdd, and Vss pin positions of the cells, using minor layout modifications.

*Index Terms*—Electromigration, Cell-internal Signal Electromigration, Joule Heating, Current Divergence, Physical Design, EDA.

#### I. INTRODUCTION

Electromigration (EM) is a major source of failure in onchip wires and vias, and is becoming a progressively increasing concern as feature sizes shrink [1]. EM is initiated by current flow through metal wires and may cause open-circuit failures over time in copper interconnects.

Traditionally, EM has been a significant concern in global power delivery networks, which largely experience unidirectional current flow. Recently, two new issues have emerged. First, EM analysis can no longer be restricted just to global wires. Traditional EM analysis has focused on higher metal layers, but with shrinking wire dimensions and increasing currents, the current densities in lower metal layers are also now in the range where EM effects are manifested. EM effects are visible at current densities of about 1MA/cm², and such current densities are seen in the internal metal wires of standard cells, resulting in cell-internal signal EM [2]. These high current densities arise because local interconnect wires within standard cells typically use low wire widths

Manuscript received October 29, 2014; revised April 13, 2015; accepted June 8, 2015. This work was supported in part by SRC 2012-TJ-2234, Brazilian National Council for Scientific and Technological Development (CNPq - Brazil) and Coordination for the Improvement of Higher Education Personnel (CAPES).

Gracieli Posser and Ricardo Reis are with the Universidade Federal do Rio Grande do Sul - PPGC/PGMicro, Porto Alegre, 91501-970 Brazil (e-mail: gposser@inf.ufrgs.br; reis@inf.ufrgs.br).

Vivek Mishra and Sachin S. Sapatnekar are with the University of Minnesota, ECE Department, Minneapolis, MN, USA (e-mail: vivek@umn.edu; sachin@umn.edu).

Palkesh Jain is with Qualcomm India, Bangalore, India (e-mail: palkesh@qti.qualcomm.com).

to ensure compact cell layouts. In short metal wires, such effects were traditionally thought to be offset by Blech length considerations, but for reasons discussed later, such effects do not help protect intra-cell wires in designs at deeply scaled technology nodes. Second, EM has become increasingly important in signal wires, where the direction of current flow is bidirectional. This is due to increased current densities, whose impact on EM is amplified by Joule heating effects [3], since EM depends exponentially on temperature. Therefore, the current that flows through these wires to charge/discharge the output load can be large enough to create significant EM effects over the lifetime of the chip.

Intra-cell power networks are also associated with EM concerns. In going down to deeply scaled technology nodes, the current through the power rails of the cells has remained roughly constant while the cross-sectional area of power rails has decreased, causing the current density in power rails to increase [4]. Moreover, the power rails are generally subjected to a unidirectional current flow, referred as DC electromigration, which acts more aggressively in causing electromigration [5].

In the cell library used in this work, we can see high current densities on the Vdd and Vss power rails as well as on signal wires, reducing the lifetime of the cells. For example, we compute signal wires in an INV\_X4 cell to have an effective average current density of 1.8 MA/cm<sup>2</sup> at 2GHz, while power wires have an effective current density of 2.15 MA/cm<sup>2</sup> in a 22nm technology. This switching rate is very realistic, and can be seen in, for example, clock buffers in almost any modern design.

While the cell-internal signal EM problem has been described in industry publications such as [2], its efficient analysis is an open problem. In this work<sup>1</sup>, we study the problem of systematically analyzing cell-internal signal EM due to both AC EM on signal wires and DC EM on the Vdd and Vss rails of the cells. We devise a solution that facilitates the analysis and optimization of cell-internal signal EM for a standard cell library based design. We first develop an approach to efficiently characterize cell-internal EM over all output, Vdd, and Vss pin locations within a cell, incorporating Joule heating effects into our analysis. We then formulate the pin optimization problem that chooses cell output pins during place-and-route so as to maximize the design lifetime.

We motivate the problem using the INV\_X4 (inverter with size 4) cell, shown in Fig. 1(a), from the 45nm NANGATE

<sup>1&</sup>quot;A preliminary version of this work was published in [6]."



Figure 1: (a) The layout and output pin position options for INV\_X4. Charge/discharge currents when the output pin is at (b) node 4 and (c) node 3. The red [blue] lines represent rise [fall] currents. (d) The Vdd pin position options for INV\_X4 and the currents when the Vdd pin is at node 3' and (e) node 2'. (f) The Vss pin position options for INV\_X4 and the currents when the Vss pin is at node 4" and (g) node 1".

library [7]. The input signal A is connected to the polysilicon structure. The layout uses four parallel transistors for the pull-up (poly over p-diffusion, upper half of the figure) and four for the pull-down (poly over n-diffusion, lower half of the figure), and the output signal can be tapped along the H-shaped metal net in the center of the cell. The positions where the output pin can be placed are numbered 1 through 7, and the edges of the structure are labeled e<sub>1</sub> through e<sub>6</sub>, as shown in the figure. Since the four PMOS transistors are all identical, by symmetry, the currents injected at nodes 1 and 5 are equal; similarly, the NMOS-injected currents at nodes 3 and 7 are equal.

Let us first consider cell-internal signal EM. When the output pin is at node 4, the charge/discharge current is as shown in Fig. 1(b). Moving the pin changes the current distribution in  $e_1-e_6$ . If the pin is at node 3 (Fig. 1(c)), since the rise and fall discharge currents have similar values, the charging current in edge  $e_2$  is about  $2 \times$  larger than the earlier case, while the discharging current is about the same (with opposite direction). As quantified in Section II, the larger peak current leads to a stronger net electron wind that causes EM, resulting in a larger effective average current, and therefore, a lower lifetime. Based on exact parasitic extraction of the layout, fed to SPICE (thus including short-circuit and leakage currents), the average effective EM current through  $e_2$  is  $1.17 \times$ larger than when the pin is at node 4. Accounting for Joule heating, this results in a 19% lifetime reduction. For the Vdd and Vss pins, a similar effect occurs when the pin position is changed.

Next, we consider EM on the supply wires. Fig. 1(d) and (e) represent the Vdd rail, where the Vdd pin can be placed on the nodes numbered 1' through 6'. Fig. 1(d) shows how the charge current is flowing through the edges when the Vdd pin is placed at node 3'. We can see that the current flows are symmetric for this pin position. Since the edge  $e_3'$  supplies two transistors, as shown in Fig. 1(a), the current flowing through e'<sub>3</sub> is larger than the current flowing through the other edges, which each supply just one transistor. Thus, the edge  $e_3'$  is the critical edge when the Vdd pin is placed at node 3'. Fig. 1(e) shows the current flowing through the edges when the Vdd pin is placed at node 2'. In this case, the current flowing through edge  $e'_1$  supplies three of the four transistors, is  $3 \times$  larger than the current flowing through this same edge when the pin is at node 3'. Thus, this is the critical edge for this pin position, reducing the lifetime of the cell by  $2 \times$  compared with the lifetime when the pin is placed at node 3'.

Similarly, the Vss rail of the INV\_X4 cell is represented in Figs. 1(f) and (g). The Vss pin can be placed on the numbered nodes 1'' through 6'', and the currents being discharged through the edges by the Vss pin placed at node 4'' are shown in Fig. 1(f). Using a similar argument as for the Vdd case, moving the pin from node 4'' in Fig. 1(f) to pin 1'' in Fig. 1(g) changes the critical edge from  $e_3''$  to  $e_1''$ , and the lifetime again degrades by about  $2\times$ .

The rest of this paper is organized as follows. Section II describes the models used in this work to calculate the EM lifetime. The current calculation approach for our cell-internal EM analysis flow is presented in Section III, where the algebras to calculate the average and RMS current for different pin positions are described and our graph-based algorithm that computes the current through each edge when the pin is moved is also presented in this Section. Next, a method for optimizing the circuit lifetime using incremental layout modifications is proposed. The circuit lifetime can be increased by placing the output, Vdd, and Vss pins appropriately, avoiding the critical pin positions that reduce the lifetime of the cells by EM. The implementation flow is then discussed in Section IV and the experimental results are presented in Section V.

# II. MODELING CELL-INTERNAL EM

## A. Modeling Time-to-Failure Under EM

For EM lifetime estimation, we use the well-known Black's equation [8], given by:

$$TTF = A J^{-n} \exp\left(\frac{Q}{k_B T_m}\right) \tag{1}$$

where TTF is the time-to-failure, A is a constant that depends on material properties of the interconnect, J is the current density, the exponent n is typically between 1 and 2, Q is the activation energy,  $k_B$  is Boltzmann's constant and  $T_m$  is the metal temperature. The current density  $J = I_{avg}/(T_w \cdot W)$ , where W and  $T_w$  are the wire width and thickness and  $I_{avg}$  is the average current. The use of Black's equation to predict wire failure is consistent with design flows used in industry. A resistance evolution model was first suggested at the circuit

level in [9], but this model is not in active use in industry at this time, and therefore our approach is based on Black's equation.

For unidirectional currents (e.g., in power grid wires), EM causes a steady unidirectional migration of metal items, and  $I_{avg}$  is simply the time average of the current. In signal wires, currents may flow in both directions. For signal nets with bidirectional current flow, the time-average of the current waveform is often close to zero. However, even in cases where the current in both directions is identical, it is observed that EM effects are manifested. In this effect, often referred to as  $AC\ EM$ , the motion of atoms under one direction of current flow is partially, but not fully, negated by the "sweep-back" recovery effect that moves atoms in the opposite direction when the current is reversed. This partial recovery is captured by an effective average current,  $I_{avg}$  [2], [3]:

$$I_{avg} = I_{avg}^{+} - \mathcal{R} \cdot I_{avg}^{-}, \tag{2}$$

where  $I_{avg}^+$ , is the larger of the average currents (forward-direction),  $I_{avg}^-$  is the smaller current (reverse-direction) and  $\mathcal{R}$  represents the *recovery factor* that captures sweep-back. For signal wires in a cell, the rise and fall cycle currents are not always in opposing directions. We consider two cases:

**Case I**: When the rise and fall currents,  $I_{avg}^r$  and  $I_{avg}^f$ , are in opposite directions, as in edge  $e_3$  in Fig. 1(c), Eq. (2) yields:

$$I_{avg} = \frac{\max\left(\left|I_{avg}^{r}\right|, \left|I_{avg}^{f}\right|\right) - \mathcal{R} \cdot \min\left(\left|I_{avg}^{r}\right|, \left|I_{avg}^{f}\right|\right)}{2}$$
(3)

where the factor of 2 arises because half the transitions correspond to an output rise and half to an output fall.

**Case II**: When the rise and fall currents are in the same direction (e.g., in edge  $e_1$  in Fig. 1(c), where the charging rise current and the short-circuit current (not shown) during the fall transition both flow downwards), then

$$I_{avg} = \frac{\left|I_{avg}^r\right| + \left|I_{avg}^f\right|}{2} \tag{4}$$

The recovery factor  $\mathcal{R}$  is empirically determined [10] [11]. In this work, we use a recovery factor  $\mathcal{R}$  of 0.7 corresponding to Cu interconnects [3]. We use  $A=1.47\times 10^7 \mathrm{As/m^2}$  in SI units, which corresponds to an allowable current density of  $10^{10}~\mathrm{A/m^2}$  over a lifetime of 10 years at 378K, with an activation energy,  $Q=0.85\mathrm{eV}$  [12].

#### B. Joule Heating

Current flow in a wire causes Joule heating, which hastens EM, as seen in Eq. (1). The temperature  $T_m$  in a wire is given by:

$$T_m = T_{ref} + \Delta T_{Joule} \tag{5}$$

where  $T_{ref}$  is the reference chip temperature for EM analysis and  $\Delta T_{Joule}$  is the temperature rise due to Joule heating. In the steady-state, the wire temperature rises by [11]:

$$\Delta T_{Joule} = I_{rms}^2 R R_{\theta} \tag{6}$$

Here,  $I_{rms}$  is the root mean square (RMS) wire current, R is the wire resistance, and  $R_{\theta} = t_{ins}/(K_{ins}LW_{eff})$  is the thermal impedance of the wire to the substrate, where  $t_{ins}$ 

is the dielectric thickness,  $K_{ins}$  is the thermal conductivity normal to the plane of the dielectric, L is the wire length, and  $W_{eff}=W+0.88t_{ins}$ , for a wire width W. We obtain R by parasitic extraction using a commercial tool and use  $t_{ins}=59\mathrm{nm}$  [13] and  $K_{ins}=0.07\mathrm{W/m.K}$  [11] at 22nm.

### C. Current Divergence

A via in a copper interconnect allows the flow of electrical current but acts as a barrier for the migration of metal atoms under EM. Thus, the average current used for EM computation depends on the magnitude and direction of currents in neighboring wires where the metal migration flux is blocked by a via; for details, the reader is referred to [14]. The computation of the average EM current can be performed according to the flux-divergence criterion presented in [14], which says that the average EM current for a wire is the sum of the current through the wire and the divergence at the via. *This new average current replaces all average currents in Section II-A*.



Figure 2: Current divergence for a multifanout tree.

**Example:** Consider the example of Fig. 2 showing the left half of the H-shaped INV\_X4 output wire presented in Fig. 1. Note that all metal wires within the H-shaped structure are routed on the same metal layer, regardless of direction. Here, the output pin is placed at node 2 and consequently a via is placed over this node. The arrows in Fig. 2 indicate the direction of electron flow of the current in this wire during the rise and fall transitions. Poly-metal contacts (nodes 1, 3) are also blocking boundaries for metal atoms, and flux divergence must be used for wires at these nodes. Since voids in Cu interconnects are formed near the vias, we consider the two vias at either end of each edge. If an edge has multiple vias (e.g.,  $e_1$  has vias at nodes 1 and 2),  $I_{avg,d}$  uses the largest divergence.

For edge  $e_1$ , node 1 does not see a void: the electron flow in this edge, during both the rise and fall transitions, is in the direction of node 1, and EM voids are only caused by electron flow away from the via. However, for the via at node 2, there is an effective outflow and the EM average current for edge  $e_1$  with respect to via 2,  $I_{avg,d}(e_1)$ , is computed using Eq. (4):

$$\begin{array}{lcl} I_{avg,d}(\mathbf{e}_1) & = & (I_{avg,d}^r(\mathbf{e}_1) + I_{avg,d}^f(\mathbf{e}_1))/2 \\ \text{where } I_{avg,d}^r(\mathbf{e}_1) & = & I_{avg}^r(\mathbf{e}_1) - I_{avg}^r(\mathbf{e}_2) + I_{avg}^r(\mathbf{e}_3) \\ I_{avg,d}^f(\mathbf{e}_1) & = & I_{avg}^f(\mathbf{e}_1) - I_{avg}^f(\mathbf{e}_2) - I_{avg}^f(\mathbf{e}_3) \end{array}$$

The expression for  $I^r_{avg,d}$  above has contributions from:

• Current in e<sub>1</sub>, drawing metal flux away from the via, and adds to void formation.

- Current in e<sub>2</sub>, which inserts flux into the via: although this current flows to the output load through the via at node 2, due to the blocking boundary at the via, the metal flux does not pass through, but instead, accumulates atoms, thus negating void formation.
- Current in  $e_3$ , which draws flux away from node 2. The expression for  $I_{avg,d}^f$  is similarly derived.

# D. The Impact of Blech Length on Cell-Internal Interconnects

As pointed out by Blech [15], the migration of metal atoms results in a concentration gradient and a back stress that opposes the electron wind force that causes electromigration. This is typically translated into a criterion whereby the product of the current density and wire length must exceed a critical value; if it does not, the wire is deemed immortal.

Although intracell wires are short, the Blech criterion cannot be applied directly to signal interconnects in standard cells, as indicated in [2], [3]. This may also be explained by observing that the bidirectional nature of AC EM does not allow a substantial buildup in the concentration gradient, and therefore the back-stress that opposes the electron wind is limited. For Vdd and Vss wires, although the wires shown in an individual cell may be short, they are typically concatenated along an entire row of standard cells, implying that the actual length of the wire is much larger than the short segment seen in the layout schematic of a single cell, to the point where the length of the wire does not make it immortal under the Blech criterion.

# III. CURRENT CALCULATION

For a standard cell with m output pin positions, characterization for delay and power can be performed at any one of the pin positions. Since the cell-internal wire resistance parasitics in a standard cell are negligible and are dominated by transistor parasitics, this characterized value is accurate at all other pin locations. This is also true for the transients on the Vdd and Vss pin networks, which are essentially independent of the pin positions.

The evaluation of EM TTF requires a characterization of (a) the average and RMS currents through a Vdd/Vss line and (b) the average currents,  $I_{avg}^r$  and  $I_{avg}^f$  and the RMS current  $I_{rms}$ . All of these parameters are both dependent on the pin position, as demonstrated in Fig. 1, and an obvious approach would be to enumerate the characterization over all possible combinations. For a library with  $N_{lib}$  cells, each with an average of m output pin positions, d Vdd pin positions and s Vss pin positions, this implies  $m \times d \times s$  characterizations. However, given that EM evaluations in Vdd, Vss, and signal nets are independent, this can be brought down to m+d+s characterizations. With this reduction, the CPU time required for standard cell characterization is given by:

$$T_{char} = (m + d + s) \cdot N_{corners} \cdot N_{lib} \cdot T_{char,cell}^{avg}$$
 (7)

where  $N_{corners}$  represents the number of corners at which the cell is characterized, and  $T_{char,cell}^{avg}$  is the average characterization time (typically SPICE simulations for the output rising/falling cases) for each cell. A typical library may have

 $N_{lib}=200$  cells. In our experiments, the average characterization time to build the  $7\times7$  .lib table for a cell in the 45nm NANGATE library is found to be  $T_{char,cell}^{avg}=17.5$ s. The characterization time of 17.5s is using the Synopsys HSPICE tool, in a Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz and 16GB of memory. For the NANGATE library, the average number of pin positions  $m=12,\ d=10,\ s=10,\$ and the number of corners,  $N_{corners}=15$ . This yields  $T_{char}=19$  days, which is (m+d+s) times (=32× for this example) the cost of characterizing each cell at one pin position for output, Vdd and Vss pins. At more advanced process nodes, the number of corners goes up significantly, and therefore  $T_{char}$  is much higher.

In this work, we show that a simpler approach is possible, speeding up the characterization time by a factor of almost (m+d+s). This implies that the above 19-day characterization can be conducted more practically, in about half a day. Our procedure extracts the average and RMS current information from the same simulations used for delay and power characterization, at a *reference pin position*, and then uses inexpensive graph traversals to evaluate EM for other pin positions. In other words, the additional overhead over conventional cell characterization is negligible.

To illustrate the EM characterization procedure for the output signal wire, consider INV\_X4 in Fig. 1 with the output pin at node 4. We will temporarily ignore short-circuit and leakage currents to simplify the example. Here, all PMOS [NMOS] devices are identical and inject equal charge/discharge currents. When the pin is moved to node 2 [node 6], the distribution of currents in the branches remains similar, except edge e<sub>3</sub> [e<sub>4</sub>], which now carries an equal current in the opposite direction. Therefore, the Joule heating and EM lifetime for each edge are unchanged, and only the current divergence calculations change.

When the pin is moved from node 4 to node 3, the PMOS current injected at node 5 is redirected to also flow through  $e_2$  and  $e_3$ . The only changed current magnitudes correspond to segments  $e_2$  and  $e_3$ ; those for the other wire segments remain almost the same since intracell wire parasitics are small.

Both cases above show incremental changes in current flow patterns when the pin is moved. Similar observations may be made when the Vdd and Vss pins are moved: in each case, the difference from moving a pin arises because of a redirection of a set of currents. These facts indicate that it may be possible to reduce the characterization effort by performing a single SPICE simulation for one pin position, called the *reference case*, and inferring the current densities for every other pin position from this data by determining the current redirection. We develop a graph-based method for determining this redirection, and an algebra for computing  $I_{avg}$  and  $I_{rms}$  for each pin position based on the values from the reference case.

The choice of the reference pin position does not matter due to the low wire resistance parasitics. Specifically, due to these low resistances:

 Voltage drops on the output wire are negligible. As a result, virtually the same currents are used to supply each transistor in the cell, and all that changes is the wire(s) through which this current is supplied. In other words, we can perform a set of additions/subtractions on the set of currents for the reference case (which is what our graphbased algorithm does), and any choice of reference will work.

• The d/dt of the transient waveforms on the wires remain virtually unchanged due to pin positions. This means that the coupling currents due to Cdv/dt coupling are unchanged since the voltage waveform remains almost identical, regardless of the pin position.

The reference case is characterized for a fixed reference frequency,  $f_{ref}$ , chosen to be 1GHz in our experiments. If a given design operates at a frequency f and an activity factor  $\alpha$ , as long as the circuit operates correctly at that frequency (i.e., all transitions can be completed), it is easy to infer the average and RMS currents in each branch. The average and RMS currents are multiplicatively scaled by factors of  $\alpha f/f_{ref}$  and  $\sqrt{\alpha f/f_{ref}}$ , respectively. For example, if a circuit operates at 4GHz but the average current was characterized for 1 GHz, then the average current is multiplied by 4GHz/1GHz = 4 and the RMS current is multiplied by  $\sqrt{4GHz/1GHz} = 2$ .

## A. Current Flows Using Graph Traversals

We present a graph-based algorithm that computes the currents through each edge when the pin position is moved from the reference case to another location. Our algorithm captures the effect of charge/discharge currents, short-circuit currents, and leakage currents (neglected in the example above), and its pseudocode is shown in Algorithm 1. For the output net (but not for Vdd/Vss nets), the short-circuit and leakage currents are unaffected by the pin location, and for all nets, the flow of the charge/discharge currents is affected by the output pin position. Coupling capacitance currents are the same for almost all nets since moving the pin does not significantly change the transient waveforms in these nets.

Our algorithm uses graph traversals to trace the change in the current path when the pin position is moved from the reference pin position, *ref*, to any candidate pin position on the output net, as enumerated in a candidate set C. Lines 1–6 perform a SPICE simulation at reference pin location *ref* to compute each average and triangle representations for edge currents during rise and fall on the output net, and over the cycle for the Vdd and Vss components. The charge/discharge, short-circuit/leakage and coupling capacitance currents for each edge are determined from the simulation.

The output metallization has several points that are connected to the NMOS and PMOS transistors: we refer to these as *current injection points*. For example, in Fig. 1(b), the NMOS and PMOS current injection points are at nodes  $\{1,5\}$  and  $\{3, 7\}$ , respectively. Next, in the **for** loop that commences at line 7, we determine the current contribution for each candidate pin position in C. The graph-based approach determines the unique path  $P_i$  from the reference pin position ref to pin candidate i (line 8). Note that the Vss pin draws current out from the cell while the Vdd pin injects current, and therefore the direction of the path  $P_i$  is reversed for the two cases. For the output pin, we use the same direction as the Vss pin, but the precise direction does not matter due to the max/min operators used in Eq. (3).

## Algorithm 1 Efficient cell EM current characterization.

**Input:** Undirected graph G(V, E) with separate connected components for the cell output, Vdd, and Vss nets; Reference pins ref for output, Vdd, and Vss for each connected component  $\in V$ ; Set of candidate pin positions  $C \subseteq V$  for output, Vdd, and Vss components.

**Output:**  $I_{avg}$  for all Vdd and Vss edges,  $I_{avg}^+(e)$ ,  $I_{avg}^-(e)$  for all output edges,  $I_{rms}(e) \ \forall \ e \in E \ \forall$  pin positions in C.

- SPICE-simulate the cell with the output, Vdd, and Vss at ref, find triangle representations, average of edge currents during rise, fall.
- 2: **for each** connected components ∈ Vdd, Vss, output **do**
- 3: **for each** current injection point j **do** 
  - $P_i^{\{r/f\}} = \{\text{charge/discharge}\}\ \text{path from } j \text{ to } ref.$
- 5: Find charge/discharge, short-circuit/leakage, and coupling capacitance currents injected at *j*.
- 6: end for

4:

- 7: **for each** pin position  $i \in C$  **do** 
  - Compute unique path  $P_i$  from ref to pin position i. The direction of  $P_i$  is from ref to i for output and Vss, and from i to ref for Vdd.
- 9: **for each** current injection point j **do**
- 10: New {charge,discharge} path from j to i,  $P_j^{\{r/f\}}$  = algebraic sum of paths  $P_i$  and  $P_j^{\{r/f\}}$ .
- Update the current for each edge in  $P'_j$ , For the output net, update only the {charge,discharge} current, keeping short-circuit/leakage, and coupling capacitance currents unchanged; for Vdd/Vss nets, update all currents, except coupling capacitance currents which are unchanged.
- 12: end for
- 13: Compute  $I_{avg}(e)$  for Vdd/Vss or  $\{I_{avg}^+(e), I_{avg}^-(e)\}$  for output, as well as  $I_{rms}(e) \ \forall \ e \in E$  for pin position i
- 14: end for
- 15: end for
- 16: return

For each current injection point, the charge/discharge path for pin candidate i (lines 9–12) is the algebraic sum of  $P_i$  and the charge/discharge path  $P_j$  for the reference pin position. The currents are updated in line 13.



Figure 3: Recomputation of the rise currents when the pin is moved from reference node 4 to node 3.

**Example (output pin):** The key idea is illustrated in Fig. 3 for the rise transition when the pin is moved from reference node 4 to node 3: the unique path  $P_3$  between these nodes is shown

at left. The two figures on the right show the algebraic addition of path  $P_3$  with paths  $P_1^r$  and  $P_5^r$ , respectively, corresponding to the two rise current injection points. After cancellations, the resulting path successfully shows the new path for charging currents:  $\{e_1, e_2\}$  for the PMOS current from node 1, and  $\{e_5, e_4, e_3, e_2\}$  for the PMOS current from node 5. The charge/discharge currents are updated in lines 9–11, while the short-circuit and leakage contributions are the same as the reference case.

**Example (Vdd pin):** Fig. 4 shows an example of how our graph-based algorithm is applied for the Vdd pin. The example considers the case when the pin is moved from reference node 3' to node 2': the unique path  $P_2$  between these nodes is shown at left. Note that according to line 6, the direction of this path is the opposite of that used for the output and Vss components. The algebraic addition of path  $P_2$  with paths  $P_4^r$  and  $P_6^r$  is shown on the two figures on the right, respectively, corresponding to the two rise current charging points. The resulting paths for charging currents are:  $\{e_3', e_2', e_1'\}$  for the PMOS current from node 4', and  $\{e_5', e_4', e_2', e_1'\}$  for the PMOS current from node 6'.



Figure 4: Our graph based algorithm applied to the Vdd pin when the pin is moved from node 3' to node 2'.

### B. Algebra for Average/RMS Current Updates

The current waveforms in the wire segments, for the rise and fall transitions, are used to calculate the RMS and effective average current through the wire: the former is used to measure self-heating, and the latter is used in the EM TTF formula. We now develop an algebra for efficient RMS and effective average current updates for various pin positions, given information for the reference case.

1) Algebra for Computing Average Current: For edge e,  $I_{avg}$  during a rise or fall half-cycle is given by:

$$I_{avg}(\mathbf{e}) = \frac{1}{T/2} \int_0^{T/2} I(\mathbf{e})(t)dt = \frac{1}{T/2} \sum_{i \in S} \int_0^{T/2} I(p_i(e))(t)dt$$
(8)

where the summation is over the set S of all current insertion points whose currents contribute to the current in edge e.

When the pin is moved, the set S is modified, and some entries are added and removed to the set. For example, in Fig. 1, when the pin is moved from node 4 to node 3, the current in edge  $e_2$  has new contributions from current insertion points 5 (rise) and 7 (fall) and a removal of the contribution from insertion point 3; the current in  $e_3$  must subtract the contribution of current insertion point 1 (rise) and 3 (fall), and add contributions from insertion points 5 (rise) and 6 (fall). To perform these operations, we can simply add or subtract the average currents associated with the corresponding current insertion point. For a current  $I(p_i)$  from a pin insertion point

 $p_i$  that is added or subtracted, we can write

$$(I(e) \pm I(p_i))_{avg} = \frac{1}{T/2} \int_0^{T/2} (I(e)(t) \pm I(p_i)) dt$$
  
=  $I_{avg}(e) \pm I_{avg}(p_i)$ 

Therefore,  $I_{avg}$  updates for a new pin position simply involve add/subtract operations on average reference case currents.

**Example:** For the Vdd net example shown in Fig. 4, we illustrate how the average current values are updated. Fig. 5(a) and (b) show the formal representation of how the currents change when the pin is moved from node 3' to node 2', while Fig. 5(c) and (d) show the SPICE simulation results when the pin is at node 3' and at node 2', respectively. Fig. 5(a)shows the rise currents  $I(p_i)$  charging the pin insertion points  $p_i$  when the Vdd pin is placed at node 3'. In this example, for the INV\_X4, there are three insertion points, 2', 4' and 6'. When the pin is moved from node 3' to node 2', the currents through the edges  $e'_3$ ,  $e'_4$  and  $e'_5$  remain the same and are shown in a grey color in Fig. 5(b) and (d). The currents that must be updated are those through the edges  $e_1'$  and  $e_2'$ ,  $I_{avg}(e_1')$  and  $I_{avg}(e'_2)$ , respectively. Calculating by our algebra and using the notation and values in Fig. 5,  $I_{avq}(e'_1)$  and  $I_{avq}(e'_2)$  are each given by:

$$I(p_4) + I(p_6) = 27.6\mu A + 13.8\mu A = 41.4\mu A$$

Comparing the calculated by our algebra with the value from SPICE simulation for the new pin position, this value is seen to be very close to the actual value of  $42\mu$ A.



Figure 5: The Vdd pin position options for INV\_X4 and the current values when the Vdd pin is at (b) node 3' and (b) node 2'.

2) Algebra for Computing the RMS Current: The waveform for the current drawn by each device may be approximated by a triangle with height  $I_a$ , and with a nonzero current for a period of T' seconds, where T' < T, the clock period (this current model is widely used). It is well-known [16] that the RMS value of such a waveform is

$$I_{rms,\Delta} = I_a \sqrt{\frac{T'}{3T}} \tag{9}$$

Due to the tree structure of the output wire, the current in each edge is a sum or difference of a set of such triangular signals, and this set can be determined based on a tree traversal. The sum (or difference) of a set of triangular waveforms, potentially each with different heights, start times, and end

times, can be represented as a piecewise linear waveform, and thus each edge current has this form. To find the RMS value of such a piecewise linear waveform, we can decompose it into a set of nonintersecting (except at the edges) triangles and trapezoids, as shown in Figure 6.



Figure 6: The sum of the two upper triangular waveforms can be represented as a set of piecewise triangular or trapezoidal segments (below).

The RMS for this waveform can be shown to be:

$$I_{rms}^2 = \sum_{\text{all triangles } i} I_{rms,\Delta_i}^2 + \sum_{\text{all trapezoids } i} I_{rms,trap_i}^2 \qquad (10)$$

To use the above equation, we use Equation (9) for the RMS of a triangular waveform, and the following formula for the RMS of a trapezoid bounded by the time axis, with value  $I_b$  at time b and  $I_c$  at time c, where c > b:

$$I_{rms,trap} = \sqrt{\frac{(I_b^2 + I_b I_c + I_c^2)(c - b)}{3T}}$$
 (11)

For INV\_X4, since the transistors of each type are all identical and are driven by the same input signal, each PMOS [NMOS] device injects an identical charging [discharging] current waveform; however in general, the currents may be different. Since the intracell resistive parasitics of the output metallization are small, some combination of these nearly unchanged currents is summed up along each edge during each half-cycle. The set of triangular PMOS waveforms that contribute to the current in each edge in Fig. 1 is simply the set of PMOS devices i whose charge or discharge path (Algorithm 1) traverses edge i. When the output is moved from node 4 to node 3, the current through an edge loses some set membership and gains others. The updated set of triangles add up, in general, to a waveform with triangles and trapezoids, whose RMS value is given by Equation (10). For the Vdd and Vss rails, the currents are updated in the same way. Vdd rail injects current to charge the PMOS devices and the Vss rail discharge the current from the NMOS transistors.

#### IV. IMPLEMENTATION FLOW

We now present the implementation flow of this work for analyzing and improving circuit lifetime under EM. Since we do not have access to a library at a recent technology node, where EM effects are significant [2], our evaluation is based on scaling layouts in the NANGATE 45nm cell library down to 22nm. While this may not strictly obey all design rules at

a 22nm node, the transistor and wire sizes are comparable to 22nm libraries, and so are the currents.

Initially the cells are characterized for the average and RMS currents in each cell under a reference pin position. The cells are characterized considering  $f_{ref}=1 {\rm GHz}$  and for 7 different values each for the input slew and output load. The characterization thus generates a  $7 \times 7$  look-up table with the RMS and average current values for the slew and load values, and these values are determined based on SPICE characterization of the scaled 22nm library based on publicly available 22nm SPICE ASU PTM models for the High Performance applications (PTM HP) [17].

Hereafter, the analysis follows the flow presented in Figure 7. First, we synthesize ITC'99, ISCAS'89 and OpenCores benchmarks using Design Compiler with delay specs set to the best achievable frequency. The cells from the NANGATE library [7] are: NAND2\_X2, NAND2\_X4, NOR2\_X2, NOR2\_X4, AOI21\_X2, AOI21\_X4, INV\_X4, INV\_X8, INV\_X16, BUF\_X4, BUF\_X8, BUF\_X16, DFF\_X2, DFFR\_X2 and DFFS\_X2. We focus on EM in the combinational cells.



Figure 7: Implementation flow used for analyzing and improving circuit lifetime under EM.

Using the synthesized file, each circuit is placed and routed using Cadence Encounter tool. The SPEF file with the extracted wire RCs and the Verilog netlist are saved. The timing, power, area and wirelength are reported. Synopys PrimeTime reads the SPEF, Verilog, and SDC files and reports the input slew, output load, and switching probability for each instance of the circuit. The PrimeTime timing report provides the slew, load, and switching probability for all cell instances. These information are used as input on the pin optimization step presented in Figure 8. For each instance, based on the reported slew and load, we calculate  $I_{avg}$  and  $I_{rms}$  for each internal wire, interpolating from a  $7 \times 7$  look-up table characterized for the reference pin position, and infer currents for each candidate

position using the approach presented in this paper. The TTF is found using Eq. (1) at 378K, a typical EM specification.

The worst TTF of the circuit is given by the cell in the circuit that has the smallest TTF. To compute the best TTF that the circuit can achieve under output, Vdd or Vss pin selection, for each cell we determine the output, Vdd or Vss pin position with the best TTF. The smallest such value over the entire circuit is the "weakest link" using the best possible pin positions, and is reported as the best TTF of the circuit.

Next, we turn to the problem of optimization, and the objective of our method is to optimize the lifetime of the circuit. We choose the lifetime specification to the best TTF in the circuit. We report the critical pin positions (pin candidates for which the lifetime is smaller than the best TTF) for each cell instance in the circuit, and invalidate these pins. We also enforce a design requirement that limits the maximum allowable Joule heating in a wire. A typical Joule heating specification is a 5°C temperature rise. We invalidate all pin positions in a cell that violate this requirement.



Figure 8: Pin optimization flow to maximize the lifetime of the circuit.

For the output pin position optimization, we provide the above information, describing pin positions to be avoided, to the router. We implement this by changing the pin information in the Library Exchange Format (LEF) file to outlaw the critical pin positions only for these critical cells where we build a new TTF-optimized layout, as the last steps in Figure 7 presents.

For the Vdd and Vss pins, the LEF file was not changed to avoid the critical pin positions and the circuit was not re-synthesized considering the restricted Vdd and Vss pin positions, because the impact on the global power grid is negligible due to these minor changes. Therefore, it is enough for our analysis to just perform local analyses.

#### V. RESULTS

Table I shows the results of our characterization approach for our library based on a single SPICE simulation, followed by graph traversals and the current update algebra. These results were calculated for the output pin positions. One reference case is chosen for each cell and the number of output candidate pin positions varies from 6 to 25, with an average of about 12 pin candidates per cell, as shown in Table III. The number of Vdd candidate pin positions varies from 4 to 26 and for Vss varies from 5 to 26, with an average of about 10 pin candidates per cell, as Tables III and IV show. For this library, the number of SPICE simulations is therefore reduced by about 32x, significant and worthwhile savings even for an one-time library characterization task. The Table I shows the edge within each cell that shows the largest error for the effective average current: in each case, this error is seen to be small, 0.53% on average, while the computational savings for characterization are large.

Table I: Comparison with SPICE for  $I_{avg}$  calculated using our algorithm. For each cell, the value corresponds to the edge current with the largest error.

| Cell     | SPICE   | Ours    | Error (%) | Runtime (ms) |
|----------|---------|---------|-----------|--------------|
| NAND2_X2 | 4.72e-5 | 4.70e-5 | 0.32%     | 10.7         |
| NAND2_X4 | 4.27e-5 | 4.31e-5 | 0.99%     | 15.4         |
| NOR2_X2  | 2.74e-5 | 2.76e-5 | 0.72%     | 6.5          |
| NOR2_X4  | 2.22e-5 | 2.23e-5 | 0.28%     | 9.7          |
| AOI21_X2 | 3.81e-5 | 3.81e-5 | 0.09%     | 10.4         |
| AOI21_X4 | 3.00e-5 | 2.96e-5 | 1.23%     | 17.2         |
| INV_X4   | 9.84e-5 | 9.88e-5 | 0.46%     | 7.9          |
| INV_X8   | 1.02e-4 | 1.02e-4 | 0.64%     | 18.7         |
| INV_X16  | 1.29e-4 | 1.28e-4 | 0.63%     | 38.7         |
| BUF_X4   | 9.79e-5 | 9.85e-5 | 0.57%     | 8.1          |
| BUF_X8   | 1.12e-4 | 1.11e-4 | 0.36%     | 17.8         |
| BUF_X16  | 1.24e-4 | 1.25e-4 | 0.08%     | 37.1         |
| AVG      |         |         | 0.53%     | 16.5         |

Table II: TTF in years for each cell in the library for the output pin positions.

|          | #          | 50% sv | witching | 100% s | witching |
|----------|------------|--------|----------|--------|----------|
| Cell     | - "        | Best   | Worst    | Best   | Worst    |
|          | Candidates | TTF    | TTF      | TTF    | TTF      |
| NAND2_X2 | 8          | 22.03  | 21.85    | 10.95  | 10.85    |
| NAND2_X4 | 10         | 27.65  | 20.37    | 8.75   | 8.08     |
| NOR2_X2  | 6          | 24.33  | 24.30    | 12.11  | 12.07    |
| NOR2_X4  | 8          | 29.61  | 25.71    | 14.74  | 10.75    |
| AOI21_X2 | 8          | 28.32  | 28.30    | 14.12  | 14.11    |
| AOI21_X4 | 11         | 13.13  | 13.10    | 6.47   | 6.43     |
| INV_X4   | 7          | 23.23  | 9.90     | 11.49  | 4.73     |
| INV_X8   | 13         | 33.80  | 16.92    | 16.82  | 8.43     |
| INV_X16  | 25         | 30.80  | 2.42     | 15.31  | 0.20     |
| BUF_X4   | 7          | 25.85  | 12.93    | 12.64  | 6.35     |
| BUF_X8   | 13         | 40.93  | 13.55    | 20.35  | 6.01     |
| BUF_X16  | 25         | 35.91  | 3.17     | 17.65  | 0.50     |
| AVG      | 11.8       |        |          |        |          |

Tables II, III and IV present the results of our lifetime evaluation scheme for the set of library cells considering the output, Vdd, and Vss pin placement, respectively. The best and worst TTF values correspond to the largest and smallest lifetimes over all pin candidates. The TTF is calculated for two different switching activities of 50% and 100% of the clock frequency: although few cells in a layout switch frequently, it is likely one of these cells that could be an EM bottleneck. The 100% switching case is a clear upper bound on the lifetime

Table III: TTF in years for each cell in the library for different Vdd pin positions considering two different switching activities of 50% and 100% of the clock frequency.

|          | #          | 50% sv | witching | 100% s | witching |
|----------|------------|--------|----------|--------|----------|
| Cell     | - "        | Best   | Worst    | Best   | Worst    |
|          | Candidates | TTF    | TTF      | TTF    | TTF      |
| NAND2_X2 | 6          | 24.84  | 22.28    | 12.38  | 11.10    |
| NAND2_X4 | 10         | 23.66  | 11.36    | 11.80  | 5.57     |
| NOR2_X2  | 5          | 51.10  | 24.13    | 25.48  | 12.02    |
| NOR2_X4  | 6          | 24.84  | 12.14    | 12.39  | 5.93     |
| AOI21_X2 | 4          | 28.34  | 28.23    | 14.11  | 14.05    |
| AOI21_X4 | 5          | 26.45  | 13.40    | 13.16  | 6.61     |
| INV_X4   | 6          | 18.75  | 9.03     | 9.32   | 4.41     |
| INV_X8   | 10         | 18.43  | 4.31     | 9.16   | 1.57     |
| INV_X16  | 18         | 15.69  | 1.42     | 7.64   | 0.25     |
| BUF_X4   | 8          | 22.45  | 7.35     | 11.12  | 3.52     |
| BUF_X8   | 14         | 21.40  | 3.24     | 10.37  | 1.24     |
| BUF_X16  | 26         | 11.03  | 1.24     | 5.31   | 0.25     |
| AVG      | 9.83       |        |          |        |          |

Table IV: TTF in years for each cell in the library for different Vss pin positions considering two different switching activities of 50% and 100% of the clock frequency.

|          | #          | 50% sv | vitching | 100% switching |       |
|----------|------------|--------|----------|----------------|-------|
| Cell     | #          | Best   | Worst    | Best           | Worst |
|          | Candidates | TTF    | TTF      | TTF            | TTF   |
| NAND2_X2 | 5          | 41.38  | 22.57    | 20.63          | 11.20 |
| NAND2_X4 | 5          | 23.22  | 10.99    | 11.52          | 5.33  |
| NOR2_X2  | 6          | 43.05  | 22.57    | 21.49          | 11.20 |
| NOR2_X4  | 10         | 43.39  | 10.81    | 21.65          | 4.20  |
| AOI21_X2 | 6          | 52.59  | 25.74    | 26.26          | 12.80 |
| AOI21_X4 | 10         | 30.56  | 12.39    | 14.98          | 5.39  |
| INV_X4   | 6          | 18.67  | 8.68     | 9.21           | 4.10  |
| INV_X8   | 10         | 18.35  | 3.32     | 9.06           | 0.95  |
| INV_X16  | 18         | 15.68  | 0.92     | 7.61           | 0.11  |
| BUF_X4   | 8          | 22.28  | 7.04     | 10.93          | 3.23  |
| BUF_X8   | 14         | 21.42  | 2.77     | 10.35          | 0.81  |
| BUF_X16  | 26         | 15.68  | 0.92     | 7.61           | 0.11  |
| AVG      | 10.33      |        |          |                |       |

of the cell: typical cells, even worst-case cells, switch at a significantly lower rate, except on always-on networks such as core elements of the clock network. The tables show that the pin position is important: choosing a good pin position could better balance current flow and improve EM lifetime. It can be noted that the worst TTFs for the X16 cells are extremely small: this is due to the large number of pin choices for such cells, and due to the effects of large currents associated with specific pin positions, as well as divergence effects.

Placing the Vdd pin on the best position could improve the lifetime of the INV\_X16 in about 31×, for a cell switching 100% of the time. For the Vss pin, the EM lifetime could be improved about  $69 \times$  for the X16 cells. It is important to note that these low lifetimes correspond to very high switching rates: in other words, some pin positions would be impermissible on clock buffers, but may be permissible on nets with low switching activity. For the cell AOI21\_X2 the TTF almost doesn't change when the Vdd pin position changes. However, changing the Vss pin position the lifetime can be improved on about  $2\times$ . We can observe that the pin placement has a larger lifetime improvement for the Vss pin than for the Vdd pin. This is because for some cells (AOI21\_X2, for example) the geometry of the Vdd and Vss wires are different, producing different pin position options and consequently different current distribution. While this result may include possible inaccuracies from our direct geometric scaling of the publicly-available 45nm cell layouts to 22nm, the impact of pin positions is real and can be extreme for large cells. To counter this effect, a library cell layout may use wider wires to control current densities, or more practically, outlaw a set of critical positions. For example, for each of the X16 cells, pin positions that see more balanced currents provide high lifetimes (as shown by the best TTF for these cells).

Fig. 9 shows the TTF in years for the different output, Vdd, and Vss pin position options for an INV\_X4, considering a switching activity of 100% at 2GHz. The different pin positions are named from P1 to P6, where the TTF changes for each different pin position. Relating to Fig. 1, P1 is when the output pin is at node 3, Vdd is at node 2 and Vss is at node 1. P2 is when the output pin is at node 7, Vdd is at node 4 and Vss is at node 3. These are the critical pin positions for this cell, where they have the smallest TTF. Avoiding the critical pin positions, the larger TTF that the INV X4 can achieve is 9.21 years, that is limited by the Vss pin. So, to achieve the maximum TTF, all pin positions with a TTF smaller than 9.21 years are avoided, as shown by the shaded area in the chart. In this way, the TTF for the INV\_X4 can be improved in  $2.25 \times$ avoiding the critical output, Vdd, and Vss pin positions. For this cell, the best TTF given by the output pin is 11.49 years and this value cannot be achieved because it is limited by the best TTF of the Vss pin, that is 9.21 years. Moreover, the worst TTF of this cell is 4.1 years and it is also given by the Vss pin.

#### TTF (in years) for the INV X4 pin positions



Figure 9: TTF for various output, Vdd, and Vss pin positions in INV\_X4, at 100% switching.

Table V presents the results for a set of ITC'99, ISCAS'89 and OpenCores benchmarks circuits mapped to our set of characterized cells and placed-and-routed. For each benchmark the number of combinational cells, the clock period, total power consumption (leakage and switching power), area of core and total wirelength (WL) are presented, as reported by Encounter. The best and worst TTF values for the output pin positions are computed as described in Section IV. These results correspond to a post place-and-route layout with no EM awareness, and the gap between the best and worst TTF values indicates how much the lifetime can be improved avoiding the critical output pin positions. The number of critical nets corresponds to the output nets that violate the Joule heating constraint, and the number of critical cells corresponds to the cells that have output pin positions that correspond to lifetimes below the best TTF. Interestingly, these numbers are both

|           | # of   | Period | Power   | Area of     | Total wire | Worst   | Best    | TTF     | # of  | # of  | Opt.  | Total    | Ovhd  |
|-----------|--------|--------|---------|-------------|------------|---------|---------|---------|-------|-------|-------|----------|-------|
| Circuit   | comb.  | (ns)   | (mW)    | core        | length     | TTF     | TTF     | Improv. | crit. | crit. | RT    | Enc.     | (%)   |
|           | cells  | ()     | (,      | $(\mu m^2)$ | $(\mu m)$  | (years) | (years) |         | nets  | cells | (s)   | RT       |       |
| b05       | 859    | 0.544  | 0.551   | 504         | 2682.50    | 4.07    | 6.53    | 1.60×   | -     | 4     | 1.1   | 12.0s    | 9.14  |
| b07       | 461    | 0.306  | 0.352   | 317         | 1426.87    | 3.81    | 5.25    | 1.38×   | -     | 3     | 1.1   | 11.5s    | 9.47  |
| b11       | 821    | 0.384  | 0.460   | 471         | 2439.83    | 2.75    | 5.82    | 2.12×   | 1     | 5     | 1.0   | 12.5s    | 8.30  |
| b12       | 1217   | 0.282  | 0.810   | 824         | 4236.15    | 3.13    | 3.14    | 1.001×  | 3     | 1     | 1.9   | 16.0s    | 12.10 |
| b13       | 340    | 0.208  | 0.467   | 272         | 1272.99    | 3.89    | 6.05    | 1.56×   | 1     | 7     | 1.1   | 11.0s    | 9.68  |
| s5378     | 1219   | 0.299  | 0.679   | 890         | 6418.27    | 2.74    | 3.59    | 1.31×   | 2     | 1     | 2     | 14.4s    | 13.78 |
| s9234     | 1044   | 0.373  | 0.584   | 849         | 4873.30    | 2.73    | 3.48    | 1.27×   | -     | 1     | 1.9   | 13.8s    | 13.80 |
| s13207    | 1401   | 0.720  | 1.063   | 1733        | 7146.48    | 4.94    | 13.18   | 2.67×   | -     | 7     | 2.6   | 22.5s    | 11.67 |
| s38417    | 10068  | 0.493  | 8.836   | 7959        | 46419.93   | 3.43    | 5.77    | 1.68×   | 2     | 6     | 5.3   | 50.0s    | 10.49 |
| aes_core  | 27420  | 0.345  | 25.393  | 13356       | 206199.45  | 2.28    | 5.06    | 2.22×   | 63    | 5     | 43.3  | 96.0min  | 0.75  |
| wb_conmax | 34562  | 0.438  | 14.228  | 18176       | 321431.88  | 2.26    | 5.25    | 2.32×   | 6     | 59    | 60.3  | 117.5min | 0.86  |
| des_perf  | 90112  | 0.441  | 121.190 | 59206       | 727368.54  | 1.91    | 5.05    | 2.65×   | 10    | 12    | 163.7 | 320.3min | 0.85  |
| vga_lcd   | 103774 | 0.331  | 70.128  | 73450       | 1189099.87 | 0.18    | 2.87    | 15.77×  | 2308  | 183   | 225.8 | 380.4min | 0.99  |

Table V: Cell-internal EM analysis for a set of benchmark circuits considering the output pin positions.

small, implying that large improvements to the lifetime can be obtained through a few small changes to the layout. Note that the best TTF values are in the range required for many modern applications (e.g., mobile devices) with short TTF specs of 3-4 years.

Table V shows that the lifetime of a circuit can be improved by up to  $15.77 \times$  by altering the output pin position of a few cells. The benchmark where the TTF improvement is small is b12: the critical cell for this circuit is a NOR2\_X2 where the worst TTF is 3.13 and the best TTF is 3.14, i.e. changing the output pin position the TTF does not change the lifetime significantly. The largest TTF improvement is for vga\_lcd circuit, where the critical cell is an INV\_X8 and its worst TTF is 0.18 years and the best TTF is 2.87 years, given by an instance of an INV\_X4 cell.

The last two columns in Table V show, respectively, the runtime required by Encounter to place and route the design (Total Enc. RT) and the additional overhead in runtime incurred by our method (Ovhd). This overhead corresponds to the cost of finding critical cells and then performing the rerouting step to achieve a better circuit TTF. It is easily seen that the overhead is under 15% for all circuits. Most importantly, the larger overheads correspond to small runtimes and the overheads for the largest circuits are all below 1%. This is because our optimizations are all incremental changes, and the bulk of the runtime for the design is incurred in the original place-and-route step in Encounter.

We now redo the routing step to guarantee that the best TTF in Table V can be met by outlawing all output pin positions whose TTF is worse than the best TTF in Table V, or that result in a cell-internal Joule heating violation. Since the best TTF was computed by choosing the best output pin position for each cell, and then finding the weakest link by determining the shortest TTF among these cells, a few cells may be forced to use a single output pin, but most cells will have the choice of a number of pin positions, and the circuit lifetime will be significantly enhanced. (Note that by the definition of best TTF, each cell is guaranteed to have at least one allowable pin).

After these new constraints are imposed on the pin positions, the router makes incremental changes to some interconnect routes. Table VI shows the results after physical synthesis considering the best output pin positions, i.e., for each cell,

Table VI: Performance impact of EM-aware physical synthesis using output pin optimization.

|           | Period | Δ             | Power   | Area                 | WL            | Δ         |
|-----------|--------|---------------|---------|----------------------|---------------|-----------|
| Circuit   | (ns)   | Period<br>(%) | (mW)    | $(\mu \mathbf{m}^2)$ | (μ <b>m</b> ) | WL<br>(%) |
| b05       | 0.544  | -             | 0.551   | 504                  | 2682.6        | 0.00      |
| b07       | 0.306  | -             | 0.353   | 317                  | 1428.5        | 0.12      |
| b11       | 0.384  | -             | 0.460   | 471                  | 2443.5        | 0.15      |
| b12       | 0.280  | -0.89         | 0.808   | 824                  | 4112.8        | -2.91     |
| b13       | 0.208  | -             | 0.467   | 272                  | 1273.5        | 0.04      |
| s5378     | 0.299  | -             | 0.679   | 890                  | 6422.2        | 0.06      |
| s9234     | 0.373  | -             | 0.584   | 849                  | 4873.4        | 0.00      |
| s13207    | 0.720  | -             | 1.063   | 1733                 | 7146.6        | 0.02      |
| s38417    | 0.493  | -             | 8.836   | 7959                 | 46420.2       | 0.00      |
| aes_core  | 0.345  | -             | 25.393  | 13356                | 206207.8      | 0.00      |
| wb_conmax | 0.438  | -             | 14.228  | 18176                | 321409.6      | -0.01     |
| des_perf  | 0.440  | -0.11         | 121.190 | 59206                | 727319.6      | -0.01     |
| vga_lcd   | 0.331  | -             | 70.128  | 73450                | 1189356.6     | 0.02      |

we disallow EM-unsafe output pin positions. Thus, we see that the circuit lifetime is improved up to  $15.77\times$  while keeping the delay, area and power of the circuit unchanged, and with marginal changes ( $\le 0.15\%$ ) to the total wirelength (in fact, for two circuits, b12 and des\_perf, the wirelength and the clock period are even slightly improved). As there are only a few instances with critical output pin positions and critical wire segments, the TTF can be increased without major changes in the circuit.

Tables VII and VIII present the lifetime optimization results considering the Vdd and Vss pin placement, respectively. The results are obtained in the same way as those considering the output pin placement and for the same benchmark circuit set. The worst and best TTF are shown for each circuit and its TTF improvement. Furthermore, the number of critical nets and critical cells that have to be avoided to achieve the best TTF are also shown. For the Vdd pin, avoiding the critical pin positions the TTF of the circuits can be improved from  $1.63 \times$  to  $81.73 \times$ , as shown in Table VII. For most circuits the number of critical cells is very small, about 10. For the circuits s13207 and s38417 the number of critical cells is 48 and 39, respectively, representing about 3.4% of the total number of cells. For the vga lcd circuit, about 10% of the instances are critical, i.e., with Vdd pin positions that give a TTF smaller than 2.94 years.

The results for the Vss pin placement are shown in Table VIII, where a higher TTF improvement is possible choosing the best Vss pin position than choosing the best output or

Table VII: Vdd pin analysis for a set of benchmark circuits.

| Circuit   | Worst TTF<br>(years) | Best TTF<br>(years) | TTF<br>Improv. | # of<br>critical<br>nets | # of<br>critical<br>cells |
|-----------|----------------------|---------------------|----------------|--------------------------|---------------------------|
| b05       | 4.26                 | 7.87                | 1.85×          | -                        | 7                         |
| b07       | 1.15                 | 5.36                | 4.66×          | 9                        | 7                         |
| b11       | 2.94                 | 7.18                | 2.44×          | -                        | 10                        |
| b12       | 2.60                 | 4.23                | 1.63×          | -                        | 8                         |
| b13       | 2.08                 | 5.06                | 2.43×          | -                        | 9                         |
| s5378     | 2.40                 | 5.27                | 2.20×          | -                        | 11                        |
| s9234     | 2.38                 | 6.04                | 2.54×          | -                        | 7                         |
| s13207    | 5.20                 | 11.85               | 2.28×          | -                        | 48                        |
| s38417    | 3.73                 | 6.30                | 1.69×          | -                        | 39                        |
| aes_core  | 0.85                 | 4.66                | 5.47×          | 267                      | 64                        |
| wb_conmax | 0.37                 | 5.35                | 14.44×         | 958                      | 205                       |
| des_perf  | 1.51                 | 5.84                | 3.87×          | 169                      | 274                       |
| vga_lcd   | 0.04                 | 2.94                | 81.73×         | 11202                    | 1162                      |

Vdd pin positions. The TTF can be improved from  $2.5\times$  to  $161\times$  avoiding the critical Vss pin positions. The number of critical nets and cells are also larger than for output and Vdd pins. The largest number is for b13 circuit, where there are 204 critical cells and this is 60% of the total number of the cells in the circuit. Des\_perf is other circuit with a large number of critical cells, with more than 10% of the total number of combinational cells. For the other circuits, the number of critical cells is not larger than 3.8% of the total number of cells of the circuit.

Table VIII: Vss pin analysis for a set of benchmark circuits.

| Circuit   | Worst TTF<br>(years) | Best TTF<br>(years) | TTF<br>Improv. | # of<br>critical<br>nets | # of<br>critical<br>cells |
|-----------|----------------------|---------------------|----------------|--------------------------|---------------------------|
| b05       | 3.56                 | 9.37                | 2.63×          | 1                        | 14                        |
| b07       | 1.00                 | 7.77                | 7.77×          | 9                        | 29                        |
| b11       | 2.17                 | 6.64                | 3.06×          | 15                       | 10                        |
| b12       | 1.32                 | 7.34                | 5.56×          | 27                       | 47                        |
| b13       | 1.04                 | 10.29               | 9.89×          | 12                       | 204                       |
| s5378     | 1.22                 | 5.61                | 4.60×          | 15                       | 12                        |
| s9234     | 2.23                 | 5.73                | 2.57×          | 8                        | 8                         |
| s13207    | 4.41                 | 11.31               | 2.56×          | 2                        | 12                        |
| s38417    | 2.51                 | 8.29                | 3.30×          | 22                       | 68                        |
| aes_core  | 0.59                 | 4.35                | 7.34×          | 532                      | 68                        |
| wb_conmax | 0.25                 | 6.08                | 24.09×         | 1041                     | 230                       |
| des_perf  | 1.31                 | 5.48                | 4.18×          | 1067                     | 269                       |
| vga_lcd   | 0.02                 | 2.51                | 160.66×        | 14164                    | 1238                      |

Tables V, VII and VIII show the TTF improvement when the output, Vdd or Vss pin positions, respectively, are optimized separately. In this way, the results when the benchmark circuits are optimized to avoid the critical pin positions simultaneously are shown in Table IX. The best TTF of the circuit is the smallest best TTF among the output, Vdd, and Vss pin optimization values. Consequently, the worst TTF is the smallest TTF among the worst TTF of the pin positions. The number of critical cells is reduced compared to the Vss pin optimization because the TTF limit (best TTF) is smaller, reducing the number of critical pin positions and consequently the number of cells. By optimizing the pin positions, the lifetime of the circuits could be improved about 2.5×-161×, that is the case of the vga lcd circuit, where the lifetime can be improved from 0.02 year to 2.51 years, avoiding the critical pin positions. Note that our objective in this work is to obtain the best possible TTF by merely moving pin positions.

**Runtime**: As previously cited, the circuit analysis is executed by Encounter tool and the runtime for each benchmark is less

Table IX: TTF results optimizing the output, Vdd, and Vss pin positions for a set of benchmark circuits.

| Circuit   | Worst TTF<br>(years) | Best TTF<br>(years) | TTF<br>Improv. | # of<br>critical<br>nets | # of<br>critical<br>cells |
|-----------|----------------------|---------------------|----------------|--------------------------|---------------------------|
| b05       | 3.56                 | 6.53                | 1.83×          | 1                        | 4                         |
| b07       | 1.00                 | 5.25                | 5.25×          | 9                        | 7                         |
| b11       | 2.17                 | 5.82                | 2.68×          | 17                       | 10                        |
| b12       | 1.32                 | 3.14                | 2.38×          | 30                       | 8                         |
| b13       | 1.04                 | 5.06                | 4.87×          | 16                       | 8                         |
| s5378     | 1.22                 | 3.59                | 2.94×          | 18                       | 7                         |
| s9234     | 2.23                 | 3.48                | 1.56×          | 9                        | 3                         |
| s13207    | 4.41                 | 11.31               | 2.56×          | 3                        | 48                        |
| s38417    | 2.51                 | 5.77                | 2.30×          | 26                       | 26                        |
| aes_core  | 0.59                 | 4.35                | 7.34×          | 532                      | 68                        |
| wb_conmax | 0.25                 | 5.25                | 20.83×         | 1041                     | 220                       |
| des_perf  | 1.31                 | 5.05                | 3.86×          | 1067                     | 263                       |
| vga_lcd   | 0.02                 | 2.51                | 160.66×        | 14164                    | 1238                      |

than 50s for the benchmarks with up to 10k cells. For the benchmarks aes\_core and wb\_conmax, the runtime is less than 2 hours and for des\_perf and vga\_lcd, this runtime is about 5-7 hours, as shown in Table V. The critical pin positions for each circuit are reported in under 1s.

## VI. CONCLUSION

We have developed an approach to address the problem of cell-internal EM, addressing the problem of EM on signal interconnects and on Vdd and Vss rails within a standard cell, with a new modeling approach that includes Joule heating effects. Avoiding the critical pin positions, the lifetime of the cells can be improved significantly. The lifetimes of benchmark circuits are optimized using minor layout modifications. We demonstrate lifetime improvements of up to  $15.77\times$  at the same area, delay, and power just avoiding the critical output pin positions. When the output, Vdd, and Vss pin positions are optimized, the lifetime of the circuits could be improved about  $2.5\times -161\times$ .

# REFERENCES

- J. Lienig, "Electromigration and its impact on physical design in future technologies," in *Proceedings of the ACM International Symposium on Physical Design*, 2013, pp. 33–40.
- [2] P. Jain and A. Jain, "Accurate current estimation for interconnect reliability analysis," *IEEE Transactions on VLSI Systems*, vol. 20, no. 9, pp. 1634–1644, 2012.
- [3] K.-D. Lee, "Electromigration recovery and short lead effect under bipolar- and unipolar-pulse current," in *Proceedings of the IEEE International Reliability Physics Symposium*, 2012, pp. 6.B.3.1–6.B.3.4.
- [4] C.-H. Wang, K.-H. Tam, and H.-Y. Chen, "Automatic place and route method for electromigration tolerant power distribution," Apr. 8 2014, US Patent 8,694,945.
- [5] J.-L. Pelloie, "Method of adapting a layout of a standard cell of an integrated circuit," Feb. 19 2013, US Patent 8,381,162.
- [6] G. Posser, V. Mishra, P. Jain, R. Reis, and S. S. Sapatnekar, "A systematic approach for analyzing and optimizing cell-internal signal electromigration," in *Proceedings of the International Conference on Computer-Aided Design*, ser. ICCAD '14, 2014.
- [7] "NanGate 45nm Open Cell Library," http://www.nangate.com.
- [8] J. R. Black, "Electromigration a brief survey and some recent results," IEEE Transactions on Electron Devices, vol. ED-16, pp. 338–347, 1969.
- [9] V. Mishra and S. S. Sapatnekar, "The impact of electromigration in copper interconnects on power grid integrity," in *The 50th Annual Design Automation Conference 2013, DAC '13, Austin, TX, USA, May 29 - June 07, 2013*, 2013, p. 88. [Online]. Available: http://doi.acm.org/10.1145/2463209.2488842

- [10] W. R. Hunter, "Self-consistent solutions for allowed interconnect current density. ii. application to design guidelines," *Electron Devices, IEEE Transactions on*, vol. 44, no. 2, pp. 310–316, 1997.
  [11] K. Banerjee and A. Mehrotra, "Global (interconnect) warming," *IEEE*
- [11] K. Banerjee and A. Mehrotra, "Global (interconnect) warming," *IEEE Circuits & Devices Magazine*, vol. 17, no. 5, pp. 16–32, Sep. 2001.
  [12] C.-K. Hu *et al.*, "Impact of Cu microstructure on electromigration
- [12] C.-K. Hu et al., "Impact of Cu microstructure on electromigration reliability," in *International Interconnect Technology Conference, IEEE*, 2007, pp. 93–95.
- [13] "FreePDK45 process design kit," http://www.eda.ncsu.edu/wiki/ FreePDK45:Contents. [17]
- [14] Y.-J. Park, P. Jain, and S. Krishnan, "New electromigration validation: Via node vector method," in *Proceedings of the International Reliability Physics Symposium*, 2010, pp. 698–704.
- [15] I. A. Blech, "Electromigration in thin aluminum films on titanium nitride," *Journal of Applied Physics*, pp. 1203–1208, 1976.
- [16] A. S. Nastase, "How to derive the RMS value of a triangle waveform," available at http://masteringelectronicsdesign.com/how-to-derive-the-rms-value-of-a-triangle-waveform.
- [17] "Predictive technology model (ptm)," [Online]. Available: ptm.asu.edu.