



### A Probabilistic Self-Annealing Compute Fabric Based on 560 Hexagonally Coupled Ring Oscillators for Solving Combinatorial Optimization Problems

### Ibrahim Ahmed<sup>1</sup>, Po-Wei Chiu<sup>1</sup>, and Chris H. Kim<sup>1</sup>

<sup>1</sup> University of Minnesota

- Motivation and Prior Art
- Circuit and Architecture Design
- Mapping Graph Problems to Hardware
- Measured Results
- Conclusion

- Motivation and Prior Art
- Circuit and Architecture Design
- Mapping Graph Problems to Hardware
- Measured Results
- Conclusion

### Motivation



## Applications of combinatorial optimization problems (COPs)



- Solution space and time increases rapidly with the number of variables
- Intractable to solve using traditional von Neumann computer

#### CA3.3

## Motivation



- Ising computation uses energy transfer between spins to reach global minima
- A network of coupled oscillators can solve many NP-hard and NP-complete problems, such as COPs A. Lucas, Front. Phys., 2014

# **Prior Art**

### Quantum computer

#### Special process





#### D-Wave

#### <u>Google</u>

### Pros:

Can theoretically solve a wide variety or problems

### <u>Cons:</u>

- 1. Requires sophisticated hardware
- 2. Very sensitive to noise
- 3. Operating temperature -273.14°C





#### Nano-magnet based

### Pros:

Can achieve coupling dynamics

### Cons:

- 1. Requires special process with fabrication challenges
- 2. Limited practical demonstrations

#### CA3.3

# **Prior Art**

#### **ASIC** implementation



#### **Breadboard and PCB implementation**





### Pros:

Can theoretically solve a wide variety or problems

### Cons:

- 1. No coupling dynamics : computation time and energy may be higher
- 2. Search for better solution is time dependent

#### Pros:

Can achieve coupling dynamics <u>Cons:</u> Not practical for large number of

programmable spins

#### CMOS integrated Ising computer has the best of both worlds

- Motivation and Prior Art
- Circuit and Architecture Design
- Mapping Graph Problems to Hardware
- Measured Results
- Conclusion

# **Choice of CMOS Spin**

### LC oscillator:

- Pros: low phase noise, frequency insensitive to PVT
- Cons: very large area, narrow frequency tuning range

### Ring oscillator (ROSC):

- Pros: small area, simple design, wide frequency tuning range
- Cons: higher phase noise, higher sensitivity to PVT



Type of oscillator not important for solving Ising problems (e.g. mechanical, electrical, chemical, etc.)

## **ROSC Coupled Using Digital Latches**



- Any coupling medium that enables energy transfer may couple ROSCs
- ROSC and digital latches are designed with global and local enable signals

## **Choice of Architecture**



- Hexagonal unit cell maximizes the number of neighbors in 2D plane
- Latch based coupling between cells is digitally controlled

Slide 10

## **ROSC and Latch Design**



**ROSC:** 

- Seven stage ROSC
- Global and local enable signals, G\_REN and L\_REN, respectively, controls ROSC activation
- Measured frequency: 120MHz

#### **Digital Latch:**

- Designed with back to back inverters
- Global and local enable signals, G\_LEN and L\_LEN, respectively, controls latch activation

#### CA3.3

#### 2020 Symposia on VLSI Technology and Circuits

#### Slide 11



- Read block samples one of the neighboring ROSC
- Scan block programs the graph: four program bit per cell, 2240 bit for the chip

## **Timing Based Annealing Mechanism**



- Annealing helps the chip to find a better local minima
- We periodically turn off the coupling latches and let spin states to change
- To reduce delay, we only annealed the chip three times for measurement

## **Die Photo and Chip Summary**



|         | Pe | riph | eral  | circ | uit |  |
|---------|----|------|-------|------|-----|--|
| Ę       |    |      |       |      |     |  |
| H<br>Ta |    |      |       |      |     |  |
|         |    |      |       |      |     |  |
|         |    |      |       |      |     |  |
|         |    |      |       |      |     |  |
|         |    |      |       |      |     |  |
|         |    | 28x2 | 20 ai | rray |     |  |
|         |    |      |       |      |     |  |

| Application    | Combinatorial<br>optimization problems     |  |  |
|----------------|--------------------------------------------|--|--|
| Process        | 65nm CMOS                                  |  |  |
| Architecture   | ROSC, latch based coupling, self-annealing |  |  |
| Voltage        | 1.0V                                       |  |  |
|                | Chip: 1.44mm <sup>2</sup>                  |  |  |
| Area           | Core: 0.53mm <sup>2</sup>                  |  |  |
|                | Unit cell: 0.00095mm <sup>2</sup>          |  |  |
| Peak power     | 23mW                                       |  |  |
| Power per cell | 41µW                                       |  |  |

Full chip layout

#### **Chip summary**

• 28x20=560 coupled oscillators (only limiting factor is chip area)

2020 Symposia on VLSI Technology and Circuits

Slide 14

- Motivation and Prior Art
- Circuit and Architecture Design
- Mapping Graph Problems to Hardware
- Measured Results
- Conclusion

## Mapping Graph Problems to Hardware



- Randomly generate spatial graph using an algorithm that can be directly mapped to the chip without complicated embedding algorithm
- "Easy" COP: 2-color graphs such as checker board pattern
- "Difficult" COP: 4 or more color graphs

### Introduction to Max Cut Problem



- The problem of finding a maximum cut in a graph is known as the Max-Cut Problem
- Finding max-cut of a graph is an NPhard problem



### Introduction to Max Cut Problem





• Ising Hamiltonian = [sum of all weights] - 2×[cut size]

- Motivation and Prior Art
- Circuit and Architecture Design
- Mapping Graph Problems to Hardware
- Measured Results
- Conclusion

### **Repeated Measurement of Same Graph**



• Unit cells may change states between iterations while the final cut values remain similar : proving chip is **exploring various local minima** 

### Max-Cut Results for Graph Size 15x15



- 150 difficult COPs are mapped and max-cut results are measured
- Annealing is done only three times as compared to thousands of times in quantum computers

### **Max-Cut Results for Different Graph Sizes**



• Consistently better max-cut solution compared to 1 million random search

## **Comparison with Prior Arts**

|                          | PRL '19 [1]         | IEDM '19 [2]        | ISSCC '19 [3]       | ISSCC '20 [4]       | Quantum<br>computer [5] | This work                           |
|--------------------------|---------------------|---------------------|---------------------|---------------------|-------------------------|-------------------------------------|
| Architecture             | All to all          | All to all          | Near neighbor       | Near neighbor       | Near neighbor           | Near neighbor                       |
| Technology               | Photonics           | IMT devices         | CMOS 40nm           | CMOS 65nm           | Superconductor          | CMOS 65nm                           |
| Coupling                 | Light<br>modulation | IMT<br>interaction  | Digital logic       | Digital logic       | Qubit interaction       | ROSC interaction                    |
| Operating<br>temperature | Room<br>temperature | Room<br>temperature | Room<br>temperature | Room<br>temperature | -273.14°C               | Room<br>temperature                 |
| Total measured solution  | 16                  | 1                   | 10                  | 1                   | Many                    | 1000                                |
| Delay                    | 1000 cycles         | 1ms                 | 22µs                | 30 cycles           | -                       | 1µs - 10µs (est.)                   |
| Peak power               | Not reported        | Not reported        | Not reported        | Not reported        | 25KW                    | 23mW                                |
| Measured<br>accuracy     | >95%<br>(87% cases) | 97.6%<br>(4 spins)  | 98.8%               | 100%<br>(Easy COP)  | -                       | 98%-100% (Easy)<br>82%-100% (Diff.) |

[1] D. Pierangeli, PRL, 2019 [2] S. Dutta, IEDM, 2019 [3] T. Takemoto, ISSCC, 2019
[4] Y. Su, ISSCC, 2020 [5] Z. Bian, D-Wave Systems, 2010

- Motivation and Prior Art
- Circuit and Architecture Design
- Mapping Graph Problems to Hardware
- Measured Results
- Conclusion

## Conclusion

- First hardware demonstration of a true coupling based integrated CMOS Ising computer
- Probabilistic exploration of various local minima
- Mapped and solved 1000 COPs in the chip with an accuracy of 82%-100%