# A 104.8TOPS/W One-Shot Time-Based Neuromorphic Chip Employing Dynamic Threshold Error Correction in 65nm

### Luke Everson, Muqing Liu, Nakul Pande, Chris Kim

Department of Electrical & Computer Engineering Univ. of Minnesota, Minnesota, USA

# Outline

- Motivation
- Time Based Neural Network
- DTEC- Dynamic Threshold Error Correction
- Measurement results
- Conclusion

### **Motivation**









### **Motivation**









# Motivation



- + Low area and power via subthreshold operation
- Sensitive to noise and PVT



IBM TrueNorth (Digital neurons)<sup>[2]</sup>

- + Robust to PVT
- + Technology scaling
- Large area overhead



STT-MRAM(Emerging neurons)<sup>[3]</sup>

- + Compact, low write energy, scalable
- Beginning early production

### **Time-Based Neuron**



Advantages of time-based circuits:

- Compact area
- Low power consumption
- MAC is intrinsic to structure
- High precision tunability



Advantages of digital arithmetic:

- Binary Representation
- Less "buy-in" required
- Existing IP for rapid SoC development
- No calibration

## **Top Level Schematic**



### **Pixel Unit Detail**



## **Complex Tristate**



### **Phase Detector Detail**

**A-SSCC 2018** 



### A-SSCC 2018 DTEC- Dynamic Threshold Error Correction



### A-SSCC 2018 DTEC- Dynamic Threshold Error Correction

- DTEC applied to MNIST Application
  - Single Layer
  - 3b Weights
  - 11x11 image
- Trained with Tensorflow
- Coarse 69.8% Accuracy
- 26% Ambiguous
- 1<sup>st</sup> Fine DTEC 46% recovered
- 2<sup>nd</sup> Fine DTEC 37% recovered
- Total Accuracy- 82%
- DTEC Overhead 41%/image for 89% error recovery



# **Multi-Layer Dataflow**



## A-SSCC 2018 Handwritten Digit Recognition



- Application Handwritten digit recognition
- Training Network Single Layer & MLP
- Learning Method Supervised Learning
- Input database MNIST

## **Classifier Networks**

**A-SSCC 2018** 



### **Measurement Results**

**A-SSCC 2018** 



## **Die Photo**

| 772um                                                                       |                   |          |                                                                                                   |                                                                                                                |
|-----------------------------------------------------------------------------|-------------------|----------|---------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------|
| WLCONTROL                                                                   |                   | 100      |                                                                                                   |                                                                                                                |
| B3400<br>B3400<br>B050<br>B050<br>B050<br>B050<br>B050<br>B050<br>B050<br>B | <b>BL CONTROL</b> | A MANANA | Process<br>Core Area<br>VDD<br># of Neurons<br># of Synapses<br>Throughput<br>Power<br>Energy/SOP | 65nm LP CMOS<br>0.644mm <sup>2</sup><br>0.7-1.4V<br>64<br>8256 (8K)<br>1.7Gpixels/s/DD<br>47.1uW/DLL<br>27.6fJ |
| OUTPUT REGISTER                                                             |                   |          |                                                                                                   |                                                                                                                |
|                                                                             |                   | -        |                                                                                                   |                                                                                                                |

## **Comparison Table**

|                                | This        | Work                    | A-SSCC'16 [1] | CICC'17 [2] | ISSCC'17 [3] | ISSCC'17 [4] | ISSCC'16 [5] | ISSCC'16[6]          | Science'14[7] |
|--------------------------------|-------------|-------------------------|---------------|-------------|--------------|--------------|--------------|----------------------|---------------|
| Chip Architecture              | Time-Based  |                         | Time-Based    | Time-Based  | Digital      | Digital      | Digital      | Sw. Cap              | Digital       |
| Algorithm Target               | FCDNN & CNN |                         | FCDNN & CNN   | FCDNN & CNN | FCDNN & CNN  | FCDNN & FFT  | CNN          | <b>CNN &amp; SGD</b> | FCDNN & CNN   |
| Technology [nm]                | 65          |                         | 65            | 65          | 28 FDSOI     | 40           | 65           | 40                   | 28            |
| Chip Area [mm <sup>2</sup> ]   | 0.644       |                         | 3.61          | 0.24        | 1.87         | 7.1          | 12.25        | 0.012                | 430           |
| Precision* [b]                 | [B,T,2,3]   |                         | В             | 3           | [4-16]       | [6-32]       | 16           | 3                    | [B,T]         |
| On-Chip SRAM [kB]              | 8.06        |                         | 20            | 3           | 144          | 270          | 181.5        | [-]                  | 256MB         |
| VDD [V]                        | 1.2 (Nom.)  | 0.7 (E <sub>MAx</sub> ) | 1             | 1.2         | 0.6          | 0.65         | 0.82         | 1                    | 0.85          |
| Frequency [MHz]                | 1700        | 285                     | 23041         | 792         | 200          | 19.3         | 250          | 1000                 | 0.001         |
| Energy Efficiency** [TSop/s/W] | 36.2        | 52.4                    | 48.2          | 2.47        | 5.0          | 0.19         | .18          | 3.86                 | 0.04          |
| Hardware Efficiency [GE/PE][1] | 38.4        |                         | 76.5          | 33.2        | 7456         | 18269        | 50637        | 288                  | 6.5           |

\*B=Binary, T=Ternary

\*\*Synaptic Op=MAC

# Conclusions

- Time-Based Neuromorphic Core in 65nm LP CMOS

   64 DDLs with 129 stages, 1 shared reference
- One-shot evaluation drives high energy efficiency
- Introduced DTEC to recover ambiguous predictions
- Evaluated on MNIST dataset and achieves ~1% difference in software accuracy
- 104.8TOp/S/W @ 0.7V with 3b = 19.1fJ/MAC