# Improving FPGA Performance with a S44 LUT Structure

Wenyi Feng, Jonathan Greene Microsemi Corporation Alan Mishchenko Department of EECS, UC Berkeley

FPGA 2018, Monterey, CA



© 2018 Microsemi Corporation. Company Proprietary

Power Matters.<sup>TM</sup> 1

# **Motivation-1: Changing Technology**

- Previous LUT studies were done in 1999-2005
  - LUT6 is best for performance
    - ~15% advantage over LUT4 at 180nm
  - LUT4 is best for area
    - Fracturable LUT6/ALM were invented to recoup part of this area gap
- Since then:
  - Process scaling has made interconnect delay more prominent
    - Inter-cluster delay grew 1.5x relative to intra-cluster/logic delays from 65nm to 14nm
  - A recent study found the LUT6 perf advantage declined to ~11% at 65nm
- We study the effect at 14nm



#### **Motivation-2: LUT Structure Synthesis**

- A recent synthesis and mapping algorithm showed LUT structures can reduce logic depth as effectively as LUT6
  - S44 cell can be seen as an incomplete LUT7 cell
  - Can also be fractured into two independent LUT4s
- Previous study limited to mapping
- We study the impact of S44 thru the whole flow



3

## **Motivation-3: Modern Industrial Designs Differ**

- Previous work used simple logic designs
  - MCNC-20
- Modern designs have carry chains, IP blocks
  - These diminish the benefit of big LUTs
- We study modern industrial designs



© 2018 Microsemi Corporation. Company Proprietary

## **This Work**

- Created a complete flow to study three logic cell options: LUT4, S44, and LUT6
  - At 14nm node
  - With latest synthesis/mapping algorithms from ABC
  - Using both MCNC-20 and modern industrial designs
  - Cluster-based architecture
  - Place & route based on Microsemi's Libero SoC Design Suite



#### **Results: MCNC-20 Designs**



[1] E. Ahmed and J. Rose, The effect of LUT and cluster size on deep-submicron FPGA performance and density, *IEEE Trans. on VLSI*, vol. 12, pp. 288-298, 2004.
[2] G. Zgheib, Leading the blind: automated transistor-level modeling for FPGA architects, Ph.D Thesis, EPFL, 2017.



© 2018 Microsemi Corporation. Company Proprietary

Power Matters.<sup>™</sup>

6

#### **Results: Modern Industrial Designs**

|             | LUT4 | S44  | LUT6 | Fracturable<br>LUT6[2] |
|-------------|------|------|------|------------------------|
| Performance | 100% | 103% | 103% | <103%                  |
| Area[1]     | 100% | 96%  | 123% | 108%                   |

[1] Area is computed as (number of clusters) \* (die area per cluster)

[2] Fracturable LUT6 results are according to following paper: (10-15% area saving with 1.6%-12% performance loss) T. Ahmed, P. Kundarewich, J. Anderson, Packing techniques for Virtex-5 FPGAs, *ACM TRETS*, vol. 2, No. 3, Article 18, 2009.



© 2018 Microsemi Corporation. Company Proprietary

Power Matters.<sup>TM</sup> 7

#### Conclusion

The combined effect of technology scaling, S44 mapping, and use of modern industrial designs allow LUT4s to approach the performance of LUT6s while still retaining their area advantage



© 2018 Microsemi Corporation. Company Proprietary