# HIGH-PERFORMANCE POWER GRIDS FOR NANOMETER TECHNOLOGIES

Sachin S. Sapatnekar

ECE Dept, Univ. of Minnesota 200 Union St. SE, Minneapolis, MN 55455

# ABSTRACT

With shrinking noise margins and increasing numbers of on-chip noise sources, power grid design has become a critical performance determinant. This paper presents an overview of recent techniques for the analysis and optimization of supply networks, and discusses future trends in power grid design.

## 1. INTRODUCTION

Performance forecasts for current and upcoming designs show a constantly reducing value for the supply voltage. In conjunction with the increased number of on-chip noise sources, this implies a reduction in noise margins. A major contributor to noise is the on-chip power grid, and therefore, controlling power grid noise is of paramount importance for current and future designs. Moreover, variations in the voltage level of the supply and ground networks causes gate delays to vary, which can result in malfunctioning circuits.

In principle, the supply voltage should be at a perfect  $V_{dd}$  or ground level all over the chip. However, in practice, imperfections arise due to the non-ideal nature of the conductors that carry the supply and ground signals from the supply pins to the gates. These on-chip problems manifest themselves in several ways:

• **IR drops** are caused by the flow of currents through wires that are imperfect conductors with nonzero resistances. From one technology generation to the next, wire resistances of proportionately scaled wires have increased significantly, causing IR drops to become major factors in the performance of supply networks. A typical power grid voltage contour is illustrated in Figure 1.

• L dI/dt effects result from the increasing inductive behavior seen in global lines, particularly in the uppermost levels of metal, in newer process generations. These inductance effects, which include both self and mutual inductances in the supply lines and in the package, could result in unwanted ringing and noise spikes. • Electromigration is an aging effect that appears on supply net wires that often carry currents in the same direction over many years, resulting in the migration of metal over an extended period of time. It has been empirically observed that the mean time to failure for a wire can be controlled by limiting its current density.



Figure 1: An IR drop contour on a ground network for benchmark ac3 and the corresponding grid on M3/M4.

Package design can also affect on-chip supply voltages. The supply grid pins may either be distributed on the periphery of the chip, or may use C4 connections distributed over the entire surface of the chip. In the former case, the voltage contours typically show a degradation as the signal travels the periphery to the interior of the chip, while the latter allows for a more

THIS RESEARCH WAS SUPPORTED IN PART BY THE SRC UNDER CONTRACT 99-TJ-714 AND BY THE NSF UNDER AWARD CCR-0205227.

even distribution, as in the top figure in Figure 1.

# 2. ANALYSIS OF SUPPLY NETWORKS

#### 2.1. Formulation of equations

Current-day supply networks may contain tens or hundreds of millions of nodes, and analyzing them accurately is therefore acutely computational as it requires the solution of systems of linear equations in about as many variables. The formulation of these equations first requires the elements of the supply grid to be modeled using basic electrical elements. The wires in the  $V_{dd}$  grid can be modeled as a linear RLC network that is connected through pads to multiple  $V_{dd}$  sources, and to switching elements in the circuit that draw current through the power grid. These switching elements are most often modeled either as current sources or as RC switch elements [1,2]. In either case, the simulation of the network requires the solution of the following system of differential equations, which may be formulated using modified nodal analysis (MNA):

$$\mathbf{G} \cdot \mathbf{x}(t) + \mathbf{C} \cdot \mathbf{x}'(t) = \mathbf{b}(t), \qquad (1)$$

where **G** is a conductance matrix, **C** is the admittance matrix resulting from capacitive and inductive elements,  $\mathbf{x}(t)$  is the time-varying vector of voltages at the nodes, and currents through inductors and voltage sources, and  $\mathbf{b}(t)$  is the vector of independent timevarying current sources. This differential system is efficiently solved in the time domain by reducing it to a linear algebraic system

$$(\mathbf{G} + \mathbf{C}/h) \cdot \mathbf{x}(t) = \mathbf{b}(t) + \mathbf{C}/h \cdot \mathbf{x}(t-h), \quad (2)$$

using the Backward Euler (BE) technique with a small fixed time step, h. The coefficient matrix can be constructed to be symmetric and positive definite when the power grid is represented by RC elements and current sources; if inductances are used, the K matrix [3] can maintain this property. Since the matrix remains constant under a fixed time step, transient analysis entails only one Cholesky factorization followed by forward/backward substitution at each time point.

### 2.2. Solution using hierarchical methods

Even with the use of the most efficient methods that exploit sparsity and positive definiteness, the solution of a system of a hundred million equations is highly computation- and memory-intensive. To alleviate this problem, hierarchy may be used in the analysis of these large systems. Several techniques have been suggested in this connection, and we will describe a few promising approaches in this context. In addition to these approaches, other methods using model order reduction techniques have been proposed. Exploiting existing hierarchy



Figure 2: Hierarchical power network analysis

The macromodel-based approach in [4] is illustrated in Figure 2. This is based on the fact that the power grid is inherently hierarchical since it is created as a part of a hierarchical design process, where individual blocks with locally constructed power are first designed individually and then assembled at the chip level.

Based on this structure, the power grid can be divided into k local partitions, corresponding to blocks, and a global partition that connects the power grids within these blocks. A node in a local partition having links only to other nodes in the same partition is called an *internal node*, a node in the global partition is called a *global node*, and a node in a local partition that is connected to some node outside the local partition (i.e., in the global partition or in another local partition) is called a *port*. The *global grid* is then defined to include the set of nodes that lie in the global partition and the port nodes, while the grid in a local partition constitutes a *local grid*. The runtime improvements of this approach arise from the fact that existing design hierarchies often permit the power grid to be divided into partitions with a small number of port nodes, relative to the number of internal nodes. The technique consists of the following steps:

• Step 1: Macromodeling Typically, the number of port nodes is much smaller than the number of internal nodes, and therefore each of the local grids may be modeled as a multi-port linear element represented by a macromodel of the type  $\mathbf{I} = A \cdot \mathbf{V} + \mathbf{S}$ , where  $\mathbf{I}$  and  $\mathbf{V}$  are the vectors of port voltages, and A and  $\mathbf{S}$  are, respectively, a constant matrix and a constant vector.

• Step 2: Solution of the global grid Once the macromodels for all the local grids are generated, the entire network is abstracted simply as the global grid, with the macromodel elements connected to it at the port nodes. The equations of each local grid may be stamped into nodal equations of the global grid, so that the system to be solved at the global level is considerably smaller than the original system as all internal nodes have been removed from consideration. However, the A matrices for each partition are typically dense,

which could severely limit the gains of the method. Hence it is important to sparsify these matrices while preserving the positive definiteness of the system. The approach in [4] presents a sparsification scheme that formulates the problem as an integer knapsack problem and bounds the error obtained from sparsification.

• Step 3: Solution of the local grids For each partition, I is obtained from the above solution and from using the port voltages. Equation (2) is solved for each partition using I on the right hand side, to obtain voltages at the internal nodes of partitions.

Experimental results in [4] show this technique simulating networks with up to 63 million nodes, providing savings in the run time as compared to the non-hierarchical approach. Even more importantly, the non-hierarchical is limited by memory requirements and cannot simulate networks of 20 million nodes or more. In contrast, the peak memory utilization of this approach is much smaller, making the algorithm scalable. *"Introducing" hierarchy into the solution* 

The approach in [5] implicitly introduces hierarchy into a supply grid by solving it in two steps: first, by coarsening form of the network with a reduced number of nodes, which can be solved efficiently, and then by propagating the result of this solution to the full network. This technique is inspired by the multigrid approach for solving partial differential equations, where the coarse solution controls the low frequency error components in the solution, and the finer solution corresponds to a relaxation step that reduces the high frequency error. The technique consists of four steps:

• Grid reduction, in which the large grid is coarsened by selecting a subset of nodes to be maintained, while the other nodes are removed. The number of variables is significantly reduced from from n to m.

• Interpolation, in which an  $n \times m$  interpolation operator matrix P is defined to map the original grid to the coarsened grid. This operator relates the voltages on the removed nodes to those on the coarsened grid, thereby allowing the solution of the coarsened grid to accurately reflect that of the original grid.

• Solution of the coarsened grid, in which a solution is found for the voltages in the coarsened grid by solving the above linear equations.

• Mapping the solution from the coarsened grid to the original grid, by applying the interpolation operator concludes the process.

The proposed multigrid-like method is a fusion of the algebraic multigrid (AMG) method that forms a reduction matrix based on directly on the A matrix, and the standard multigrid (SMG) method that forms the reduction matrix based on geometric information of the power grid. While both AMG and SMG are iterative methods starting from the initial solution predicted from the coarsened grid, the multigrid-like method in [5] requires no relaxation. Experimental results in [5] on real ASIC designs with about half a million nodes show improvements of about  $16-20 \times$  over an analysis of the flat network for DC analysis, and even better for transient analysis, where the speedups are of the order of  $600 \times$ .

#### 2.3. Localized solutions using random walks

The solution to any network with resistors and voltage sources can be interpreted as being equivalent to an equivalent probabilistic problem. The method in [6] applies this observation to solve large power grids.



Figure 3: A typical node in a power grid.

For the DC analysis of a power grid, consider a single node x in the circuit, as illustrated in Figure 3. Applying Kirchoff's Current Law, Kirchoff's Voltage Law and the device equations for the conductances,

$$\sum_{i=1}^{degree(x)} g_i(V_i - V_x) = I_x \tag{3}$$

where the nodes adjacent to x are labeled  $1, 2, \dots$ , degree(x),  $V_x$  is the voltage at node x,  $V_i$  is the voltage at node i,  $g_i$  is the conductance between node i and node x, and  $I_x$  is the current load connected to node x. This corresponds to one equation out of the set in Equation (1), and can be reformulated as:

$$V_x = \sum_{i=1}^{degree(x)} \frac{g_i}{g_{sum}} V_i - \frac{I_x}{g_{sum}}$$
(4)

where  $g_{sum} = \sum_{i=1}^{degree(x)} g_i$ . In other words, the voltage at any node is a linear function of the voltages at its neighbors, linear coefficients sum up to 1.

An equivalent random walk can be postulated as follows. Given a finite undirected connected graph representing a street map, a walker starts from one of the nodes x, and goes to an adjacent node k every day with probability  $p_{x,k}$  for  $k = 1, 2, \dots, degree(x)$ , where degree(x) is the number of edges connected to node x. The walker pays an amount  $m_x$  to a motel for lodging everyday, until he/she reaches one of the homes, which are a subset of the nodes. If the walker reaches home, he/she will stay there and be awarded a certain amount of money,  $m_0$ . We will consider the problem of calculating the expected amount of money f(x) that the walker has accumulated at the end of the walk, as a function of the starting node x, assuming he/she starts with nothing. Clearly,  $f(\text{one of the homes}) = m_0$ . For a non-home node x,

$$f(x) = p_{x,1}f(1) + p_{x,2}f(2) + \cdots + p_{x,degree(x)}f(degree(x)) - m_x \quad (5)$$

Equation (5) becomes similar to Equation (4) if we set  $f(x) \equiv V_x$  and we map  $p_i \equiv \frac{g_i}{g_{sum}}$ ,  $m_x \equiv \frac{I_x}{g_{sum}}$  and  $m_0 \equiv V_{dd}$ . Practically, this means that the power grid can be solved by running a set of random walks, and averaging the money left at the end as the voltage.

A desirable feature of this algorithm is that it localizes the computation, i.e., it can calculate a single node voltage without having to solve the whole circuit. This is especially meaningful when the designer knows which part of his power grid is problematic, or when the designer makes a minor change in the design and want to see the impact. This method is extendable to transient analysis of RC supply grids, since a capacitor can be replaced by a conductance and a current source during each step of transient analysis.

#### 3. OPTIMIZING SUPPLY NETWORKS

An analysis of the dependence of the voltage drop using ITRS parameter projections is presented in [7]. It is shown that the projected trends can be overcome using one of the following techniques:

- Adding *decoupling capacitors* (decaps) to the grid
- Placing the capacitance closer to the load and/or by using *wider wires/denser topologies* to reduce conductance from a load to the decaps
- Increasing the grid conductance by using wider wires or denser topologies

We consider several techniques that may be used to reduce noise on the power grid: specifically, those of selecting the power grid topology, of widening wires, and of inserting decaps. A useful metric to estimate power-grid-induced noise at a node is the integral of the voltage drop below a user specified noise ceiling [8]:

$$z_j(p) = \int_{t_s}^{t_e} \{NM_H - v_j(t, p)\} dt,$$
 (6)

where p represents the tunable circuit parameters, which may include the values of the decoupling capacitors or wire widths. The voltage drop integral beyond the expressed by Equation (6) represents the shaded area in Figure 4. We define the measure of goodness for the whole circuit as the sum of the individual node metrics:

$$Z = \sum_{j=1}^{K} z_j(p), \tag{7}$$

where K is the number of nodes. This metric penalizes more harshly transients that exceed the imposed noise ceiling by a large amount for a long time, and has empirically been seen to be more effective in practice than one that penalizes merely the maximum noise violation. Intuitively, this can be explained by the fact the metric incorporates, in a sense, both the voltage and time axes together, as well as spatial considerations through the summation over all nodes in the circuit.



Figure 4: Illustration of a noise metric for voltage drop.

### 3.1. Topology selection

Power grids are typically structured as dense meshes. The work in [9, 10] presented a method that is intermediate to the two extremes of dense meshes (which are reliable but hard to analyze) and trees (which are easy to analyze but provide poor voltage levels), and employs a hybrid of the two. An overlying coarse mesh structure of a user-specified topology provides global distribution of the P/G signals across the chip. From various nodes of this mesh, tree structures of userspecified topologies originate and distribute the supply voltage to the utilization points, each of which is modeled as an equivalent RC branch. Other types of hierarchical power grids may also be constructed with, for example, meshes driving meshes. As long as the number of connections between meshes is relatively sparse, fast analysis in both the time domain and frequency domain [10] is possible.

## 3.2. Wire widening

Various techniques for wire widening have been proposed in the literature. These include techniques for flat netlists [11, 12] as well as those for hierarchical grids [10]. As an example that illustrates an approach to wire widening, we describe here the method used in [11], where the power grid sizing problem is directly formulated as a nonlinearly constrained nonlinear programming problem that minimizes the wire area, subject to an upper bound on the noise metric, Z, and upper and lower bounds on the wire width. A sequential quadratic programming (SQP) method is used to solve the optimization problem.

It is possible to solve a modified version of the optimization problem using linear programming methods, as is done in [12], where a nonlinear program formulation adapted from [13] is cleverly solved using an iterative sequence of linear programs. The objective function there, however, is the worst-case voltage drop and extending the solution to handle the integral of the voltage violation is nontrivial.

### 3.3. Decoupling capacitance insertion

The placement of decaps can have a large influence on the integrity of the supply signals. In this section, we will describe a decap optimization and placement algorithm for row-based standard-cell design typical of Application Specific Integrated Circuits (ASIC) where each row has a fixed height. An approach for decap insertion during floorplanning for custom layouts is described in [14].

The ASIC layout here has N rows, with the *i*th row having  $M_i$  cells (blocks). A ratio  $r_i (\leq 100\%)$  of the *i*th row is filled by cells, and the remainder of the row, corresponding to a fraction of  $(1 - r_i)$ , may be used for placing decaps. This corresponds to a post-placement optimization, after cells have been assigned to rows, but the rows may be perturbed slightly in order to insert decaps, with the objective of minimizing the power grid noise metric Z described in Equation (7). The problem of decoupling capacitor optimization is therefore formulated as minimizing Z subject to ensuring that the inserted decaps only use "idle" regions in the layout, and do not increase its width. This is solved using quadratic programming, followed by a postprocessing step that removes extremely small decaps.

### 4. CODESIGN OF SUPPLY, SIGNAL AND CLOCK NETWORKS

#### 4.1. Codesign of supply and signal networks

The need for dense power grids with widened wires implies that a major consumer of these resources is the supply network. Global signal wires also compete for the same routing resources, as they often require shortest-path routes to meet their own performance requirements. Traditionally, supply and signal nets have been designed independently, with the routing needs for a regular power grid being determined first, after which the remaining resources are calculated to provide routing resource budgets for the signal nets. This is unsustainable in nanometer designs, where the contention for resources is critical, and a unified approach to designing signal wires an power grids is essential for an integrated approach to routing resource management.

While it is convenient to build a regular power grid with a constant pitch (defined as the distance between adjacent wires in the grid), some degrees of freedom exist and it is desirable that they be exploited. For instance, in regions where the demand for routing resources from signal nets is high, a sparser power grid



Figure 5: A congestion-driven routing paradigm.

may be used as long as the performance constraints on the supply and ground lines can be met; likewise, signal nets should avoid the hot spots of the chip if possible, since these may need a locally dense power grid.

The procedure in [11] presents a method for the codesign of supply and signal networks, which is illustrated in Figure 5. The entire chip is tessellated into an array of grid cells. The width of a boundary between two neighboring grid cells represents the limited resources that must be shared on each layer by the supply lines and the signal lines that traverse the boundary. The procedure consists of the following steps:

• Initialization An initial uniform width power grid is built by choosing sufficiently wide wires, and this corresponds to a feasible solution that satisfies the supply network constraints. The global nets are then routed in an attempt to meet boundary capacities. This is a resource-intensive solution that is likely to result in routing capacity violations at several boundaries.

• Supply net readjustment and sizing The adjustments made to the power grid consist of wire removal from the grid in congested regions, and sizing of the power grid to compensate for this removal. The wire removal heuristic identifies a set of critical wires that traverse "hot spots" of the chip with high voltage drops, and considers only noncritical wires as candidates for removal. The removal of any power wires is likely to degrade the quality of the power grid, and therefore, the removal step is followed by a power grid resizing step in which the remaining wires in the grid are individually sized using an SQP-based formulation to compensate for the loss of the removed wires.

• Signal net rerouting The signal nets that cross the overflowing boundaries are then ripped up and rerouted to utilize the newly available capacity. If this satisfies all constraints, the procedure ends; otherwise, it iterates through another step of adjusting and resizing the supply net, followed by rerouting the signal nets.

The structure at the right in Figure 1 shows an example power grid topology that is created by this method for the benchmark circuit ac3. It can be seen that the power grid is sparser in some places than in others, and these correspond to regions where the demand from signal wires is large while the requirements on the power grid are less stringent.

## 5. CONCLUSION

The problem of analyzing and optimizing power grids will become extremely important for future designs. Techniques that use hierarchy and locality are likely to be the most successful in finding good solutions to the analysis and optimization problems. A trend that will be increasingly visible in the future will be the need to codesign signal, clock and supply lines as more lines will require shielding from capacitive and inductive crosstalk and routing resources will be limited.

### 6. ACKNOWLEDGEMENTS

The author gratefully acknowledges the help of collaborations in this area with David Blaauw, Kaushik Gala, Jiang Hu, Sani Nassif, Rajendran Panda, Haifeng Qian, Narendra Shenoy, Haihua Su, and Min Zhao.

#### 7. REFERENCES

- H. H. Chen and D. D. Ling, "Power supply noise analysis methodology for deep-submicron VLSI chip design," in *Proceedings of the ACM/IEEE Design Automation Conference*, (Anaheim, CA), pp. 638–643, June 1997.
- [2] A. Dharchowdhury, R. Panda, D. Blaauw, R. Vaidyanathan, B. Tutuianu, and D. Bearden, "Design and analysis of power distribution networks in PowerPC<sup>TM</sup> microprocessors," in *Proceedings of the ACM/IEEE Design Automation Conference*, pp. 738–743, June 1998.
- [3] A. Devgan, H. Ji, and W. Dai, "How to efficiently capture on-chip inductance effects: Introducing a new circuit element K," in *Proceedings* of the IEEE/ACM International Conference on Computer-Aided Design, (San Jose, CA), pp. 150– 155, Nov. 2000.
- [4] M. Zhao, R. V. Panda, S. S. Sapatnekar, and D. Blaauw, "Hierarchical analysis of power distribution networks," *IEEE Transactions on Computer-Aided Design*, vol. 21, pp. 159–168, Feb. 2002.
- [5] J. Kozhaya, S. R. Nassif, and F. N. Najm, "A multigrid-like technique for power grid analysis,"

*IEEE Transactions on Computer-Aided Design*, vol. 21, pp. 1148–1160, Oct. 2002.

- [6] H. Qian, S. R. Nassif, and S. S. Sapatnekar, "Random walks in a supply network," in *Proceedings of* the ACM/IEEE Design Automation Conference, pp. 93–98, 2003.
- [7] H. Su, S. S. Sapatnekar, and S. R. Nassif, "An algorithm for optimal decoupling capacitor sizing and placement for standard cell layouts," in *Proceedings of the ACM International Symposium on Physical Design*, (San Diego, CA), pp. 477–480, Apr. 2002.
- [8] A. R. Conn, R. A. Haring, and C. Visweswariah, "Noise considerations in circuit optimization," in Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, (San Jose, CA), pp. 220–227, Nov. 1998.
- [9] J. C. Shah, A. A. Younis, S. S. Sapatnekar, and M. M. Hassoun, "An algorithm for simulating power/ground networks using Padé approximants and its symbolic implementation," *IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications*, vol. 45, pp. 1372–1382, Oct. 1998.
- [10] H. Su, K. H. Gala, and S. S. Sapatnekar, "Fast analysis and optimization of power/ground networks," in *Proceedings of the IEEE/ACM International Conference on Computer-Aided Design*, (San Jose, CA), pp. 477–480, Nov. 2000.
- [11] H. Su, J. Hu, S. R. Nassif, and S. S. Sapatnekar, "Congestion-driven codesign of power and signal networks," in *Proceedings of the ACM/IEEE De*sign Automation Conference, (New Orleans, LA), pp. 477–480, June 2002.
- [12] X.-D. Tan, C.-J. R. Shi, D. Lungeanu, J.-C. Lee, and L.-P. Yuan, "Reliability-constrained area optimization of vlsi power/ground networks via sequence of linear programmings," in *Proceedings of the ACM/IEEE Design Automation Conference*, (New Orleans, LA), pp. 78–83, June 1999.
- [13] S. Chowdhury, "Optimum design of reliable IC power networks having general graph topologies," in *Proceedings of the ACM/IEEE Design Automa*tion Conference, (Las Vegas, NV), pp. 787–790, June 1989.
- [14] S. Zhao, K. Roy, and C.-K. Koh, "Decoupling Capacitance Allocation for Power Supply Noise Suppression," in *Proceedings of the ACM International Symposium on Physical Design*, (Napa, CA), pp. 66–71, Apr. 2001.