FPGA 2017: Program

Wednesday February 22 (All Technical Sessions in San Carlos 2-4)
   Int'l Workshop on Overlay Architectures for FPGAs (OLAF)
Chair: Hayden So, The University of Hong Kong
Co-Chair: John Wawryznek, UC Berkeley
9:00 - 9:10    Welcome and Opening Remarks
John Wawrzynek (UC Berkeley)
9:10 - 12:00    Paper Presentations (http://olaf.eecs.berkeley.edu/program)
12:00 - 1:30    Lunch (Ferrantes Room, 10th Floor)
   Afternoon Special Session: The Role of FPGAs in Machine Learning
Chair: Andrew Ling, Intel
1:30 - 2:30    Deep Learning -- Tutorial and Recent Trends [slides]
Song Han (Stanford and DeePhi)
2:30 - 3:00    Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks? [slides]
Eriko Nurvitadhi, Ganesh Venkatesh, Jaewoong Sim, Debbie Marr, Randy Huang, Jason Gee Hock Ong, Yeong Tat Liew, Srivatsan Krishnan, Duncan Moss, Suchit Subhaschandra, Guy Boudoukh
Intel
3:00 - 3:30    Break
3:30 - 3:55    Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs [slides]
Ritchie Zhao1, Weinan Song1, Wentao Zhang1, Tianwei Xing2, Jeng-Hau Lin3, Mani Srivastava2, Rajesh Gupta3, Zhiru Zhang1
1Cornell University, 2UCLA, 3UCSD
3:55 - 4:20    Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network [slides]
Jialiang Zhang and Jing Li
UW-Madison
4:20 - 4:45    Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System [slides]
Chi Zhang and Viktor Prasanna
USC
4:45 - 5:10    Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks [slides]
Yufei Ma, Yu Cao, Sarma Vrudhula, Jae-sun Seo
Arizona State University
7:00    Opening Reception (Ferrantes Room, 10th Floor)
Thursday February 23 (All Technical Sessions in San Carlos 2-4)
8:00    Continental Breakfast
8:45 - 9:00    Welcome and Opening Remarks
Jason Anderson (University of Toronto), Jonathan Greene (Microsemi)
   Machine Learning
Chair: Jason Cong, UCLA
9:00 - 9:25    An OpenCL Deep Learning Accelerator on Arria 10 (Best Paper Candidate)
Utku Aydonat, Shane O'Connell, Davor Capalija, Andrew Ling, Gordon Chiu
Intel
9:25 - 9:50    FINN: A Framework for Fast, Scalable Binarized Neural Network Inference [slides]
Yaman Umuroglu1,2, Nicholas J. Fraser1,3, Giulio Gambardella1, Michaela Blott1, Philip Leong3, Magnus Jahre2, Kees Vissers1
1Xilinx Research Labs, 2Norwegian University of Science and Technology, 3University of Sydney
9:50 - 10:15    ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA (Best Paper Award) [slides]
Song Han1, Junlong Kang2, Huizi Mao1, Yiming Hu3, Xin Li2, Yubin Li2, Dongliang Xie2, Hong Luo2, Song Yao2, Yu Wang3, Huazhong Yang3, Bill Dally1
1Stanford University, 2DeePhi, 3Tsinghua University
10:15 - 11:15    Poster Session 1 and Break (San Carlos 1)
   Interconnect and Routing
Chair: Sinan Kaptanoglu, Microsemi
11:15 - 11:40    Quality-Time Tradeoffs in Component-Specific Mapping [slides]
Hans Giesen1, Raphael Rubin1, Benjamin Gojman2, Andre DeHon1
1University of Pennsylvania, 2Google
11:40 - 12:05    Synchronization Constraints for Interconnect Synthesis [slides]
Alex Rodionov and Jonathan Rose
University of Toronto
12:05 - 12:30    Corolla: GPU-Accelerated FPGA Routing Based on Subgraph Dynamic Expansion [slides]
Minghua Shen and Guojie Luo
Peking University
12:30 - 2:00    Lunch (Ferrantes Room, 10th Floor)
   Architecture
Chair: Steve Wilton, University of British Columbia
2:00 - 2:25    Don't Forget the Memory: Automatic Block RAM Modelling, Optimization, and Architecture Exploration (Best Paper Candidate) [slides]
Sadegh Yazdanshenas, Kosuke Tatsumura, Vaughn Betz
University of Toronto
2:25 - 2:50    Automatic Construction of Program-Optimized FPGA Memory Networks [slides]
Hsin-Jung Yang1, Kermin Fleming2, Felix Winterstein3, Annie Chen1, Michael Adler4, Joel Emer1
1MIT, 2Intel, 3Imperial College London, 4Intel Corporation
2:50 - 2:55    NAND-NOR: A Compact, Fast, and Delay Balanced FPGA Logic Element [slides]
Zhihong Huang1, Xing Wei1, Grace Zgheib2, Wei Li1, Yu Lin1, Zhenghong Jiang1, Kaihui Tu1, Paolo Ienne2, Haigang Yang1
1Chinese Academy of Sciences, 2EPFL
2:55 - 3:00    120-core microAptiv MIPS Overlay for the Terasic DE5-NET FPGA board [slides]
Nachiket Kapre1, Prashanth Ravi2, Gourav Modi2, Chethan Kumar H B2
1University of Waterloo, 2Nanyang Technological University
3:00 - 4:00    Poster Session 2 and Break (San Carlos 1)
   CAD Tools
Chair: Lesley Shannon, Simon Fraser University
4:00 - 4:25    A Parallelized Iterative Improvement Approach to Area Optimization for LUT-Based Technology Mapping (Best Paper Candidate) [slides]
Gai Liu and Zhiru Zhang
Cornell University
4:25 - 4:50    A Parallel Bandit-Based Approach for Autotuning FPGA Compilation [slides]
Chang Xu1, Gai Liu2, Ritchie Zhao2, Stephen Yang3, Guojie Luo1, Zhiru Zhang2
1Peking University, 2Cornell University, 3Xilinx
6:30 - 9:30    Banquet (San Carlos 2-4)
7:45 - 9:00    Panel: FPGAs in the Cloud
Chair: George Constantinides, Imperial College London
Panelists: Andrew Putnam (Microsoft, USA), Wei Qi (Baidu, China), Gaurav Singh (Xilinx, USA), Mark Shand (Waymo, USA), Ling Shao (IBM Research, China), Richard Veitch (Maxeler, USA)
Friday February 24 (All Technical Sessions in San Carlos 2-4)
8:00    Continental Breakfast
   High-Level Synthesis -- Tools and Applications
Chair: Stephen Neuendorffer, Xilinx
9:00 - 9:25    Hardware Synthesis of Weakly Consistent C Concurrency
Nadesh Ramanathan, Shane Fleming, John Wickerson, George Constantinides
Imperial College London
9:25 - 9:50    A New Approach to Automatic Memory Banking using Trace-Based Address Mining [slides]
Yuan Zhou, Khalid Al-Hawaj, Zhiru Zhang
Cornell University
9:50 - 9:55    Dynamic Hazard Resolution for Pipelining Irregular Loops in High-Level Synthesis [slides]
Steve Dai1, Ritchie Zhao1, Gai Liu1, Shreesha Srinath1, Udit Gupta2, Christopher Batten1, Zhiru Zhang1
1Cornell University, 2Harvard University
9:55 - 10:00    Accelerating Face Detection on Programmable SoC Using C-Based Synthesis [slides]
Nitish Srivastava1, Steve Dai1, Rajit Manohar2, Zhiru Zhang1
1Cornell University, 2Cornell NYC Tech
10:00 - 10:05    Packet Matching on FPGAs Using HMC Memory: Towards One Million Rules [slides]
Daniel Rozhko, Geoffrey Elliott, Daniel Ly-Ma, Paul Chow, Hans-Arno Jacobsen
University of Toronto
10:05 - 11:00    Poster Session 3 and Break (San Carlos 1)
   Graph Processing Applications
Chair: Nachiket Kapre, University of Waterloo
11:00 - 11:25    Boosting the Performance of FPGA-based Graph Processor using Hybrid using Hybrid Memory Cube: A Case for Breadth First Search [slides]
Jialiang Zhang, Soroosh Khoram, Jing Li
UW-Madison
11:25 - 11:50    ForeGraph: Exploring Large-scale Graph Processing on Multi-FPGA Architecture [slides]
Guohao Dai1, Tianhao Huang1, Yuze Chi2, Ningyi Xu3, Yu Wang1, Huazhong Yang1
1Tsinghua University, 2UCLA, 3Microsoft Research Asia
11:50 - 12:15    FPGA-Accelerated Transactional Execution of Graph Workloads [slides]
Xiaoyu Ma1, Dan Zhang1, Derek Chiou1,2
1University of Texas at Austin, 2Microsoft
12:15 - 2:00    Lunch (Ferrantes Room, 10th Floor)
   Virtualization and Applications
Chair: John Lockwood, Algo-Logic Systems
2:00 - 2:25    Enabling Flexible Network FPGA Clusters in a Heterogenous Cloud Data Center [slides]
Naif Tarafdar, Thomas Lin, Eric Fukuda, Hadi Bannazadeh, Alberto Leon-Garcia, Paul Chow
University of Toronto
2:25 - 2:50    Energy Efficient Scientific Computing on FPGAs using OpenCL
Dennis Weller1, Fabian Oboril1, Dimitar Lukarski2, Juergen Becker1, Mehdi Tahoori1
1Karlsruhe Instistute of Technology, 2PARALUTION Labs
2:50 - 3:15    Secure Function Evaluation using an FPGA Overlay Architecture [slides]
Xin Fang, Stratis Ioannidis, Miriam Leeser
Northeastern University
3:15 - 3:45    Break
   Applications
Chair: Miriam Leeser, Northeastern University
3:45 - 4:10    FPGA Acceleration for Computational Glass-Free Displays [slides]
Zhuolun He and Guojie Luo
Peking University
4:10 - 4:35    Hardware Acceleration of the Pair-HMM Algorithm for DNA Variant Calling [slides]
Sitao Huang1, Gowthami Jayashri Manikandan1, Anand Ramachandran1, Kyle Rupnow2, Wen-mei W. Hwu1, Deming Chen1
1University of Illinois at Urbana-Champaign, 2Advanced Digital Sciences Center
4:35 - 4:45    Conference Closing and Best Paper Award
Jason Anderson (University of Toronto), Jonathan Greene (Microsemi)
   Poster Session 1
   Measuring the Power-Constrained Performance and Energy Gap between FPGAs and Processors
Andy Ye1 and Karthik Ganesan2
1Ryerson University, 2University of Toronto
   A Mixed-Signal Data-Centric Reconfigurable Architecture enabled by RRAM Technology
Yue Zha1, Jialiang Zhang1, Zhiqiang Wei2, Jing Li1
1UW-Madison, 2Panasonic
   A Framework for Iterative Stencil Algorithm Synthesis on FPGAs from OpenCL Programming Model
Shuo Wang and Yun Liang
Peking University
   Scala Based FPGA Design Flow
Yanqiang Liu1, Yao Li1, Weilun Xiong1, Meng Lai1, Cheng Chen2, Zhengwei Qi1, Haibing Guan1
1Shanghai JiaoTong University, 2Morgan Stanley
   Thermal Flattening in 3D FPGAs using Embedded Cooling
Girish Deshpande and Dinesh Bhatia
UT-Dallas
   A Machine Learning Framework for FPGA Placement
Gary Grewal, Shawki Areibi, Matthew Westrik, Ziad Abuowaimer, Betty Zhao
University of Guelph
   Precise Coincidence Detection on FPGAs: Three Case Studies
Ralf Salomon and Ralf Joost
University of Rostock
   DTP: Enabling Exhaustive Exploration of FPGA Temporal Partitions for Streaming HPC Applications
Mostafa Koraei1, Magnus Jahre2, S.Omid Fatemi1
1University of Tehran, 2Norwegian University of Science and Technology
   Accurate and Efficient Hyperbolic Tangent Activation Function on FPGA using the DCT Interpolation Filter
Ahmed Abdelsalam, Pierre Langlois, Farida Cheriet
École Polytechnique de Montréal
   An FPGA Overlay Architecture for Cost Effective Regular Expression Search
Thomas Luinaud, J.M. Pierre Langlois, Yvon Savaria
École Polytechnique de Montréal
   Storage-Efficient Batching for Minimizing Bandwidth of Fully-Connected Neural Network Layers
Yongming Shen, Michael Ferdman, Peter Milder
Stony Brook University
   Poster Session 2
   Using Vivado-HLS for Structural Design: a NoC Case Study
Zhipeng Zhao and James C. Hoe
CMU
   Automatic Generation of Hardware Sandboxes for Trojan Mitigation in Systems on Chip
Christophe Bobda1, Taylor Whitaker1, Charles Kamhoua2, Kevin Kwiat2, Laurent Njilla2
1University of Arkansas, 2Air Force Research Lab
   Accelerating Financial Market Server through Hybrid List Design
Haohuan Fu1, Conghui He1, Huabin Ruan1, Itay Greenspon2, Wayne Luk3, Yongkang Zheng4, Junfeng Liao1, Qing Zhang4, Guangwen Yang1
1Tsinghua University, 2Maxeler Technologies, 3Imperial College London, 4China Financial Futures Exchange
   Joint Modulo Scheduling and Memory Partitioning with Multi-Bank Memory for High-Level Synthesis
Tianyi Lu, Shouyi Yin, Xianqing Yao, Zhicong Xie, Leibo Liu, Shaojun Wei
Tsinghua University
   A Batch Normalization Free Binarized Convolutional Deep Neural Network on an FPGA
hiroki nakahara1, Haruyoshi Yonekawa1, Hisashi Iwamoto2, Masato Motomura3
1Tokyo Institute of Technology, 2Poco a Poco Networks, 3Hokkaido University
   A 7.663-TOPS 8.2-W Energy-efficient FPGA Accelerator for Binary Convolutional Neural Networks
Yixing Li1, Zichuan Liu2, Kai Xu1, Fengbo Ren1, Hao Yu2
1Arizona State University, 2Nanyang Technological University
   CPU-FPGA Co-Optimization for Big Data Applications: A Case Study of In-Memory Samtool Sorting
Jason Cong1, Zhenman Fang1, Muhuan Huang2, Libo Wang1, Di Wu1
1UCLA, 2University of California, Los Angeles
   Stochastic-Based Multi-stage Streaming Realization of a Deep Convolutional Neural Network
Mingjie Lin1 and Mohammed Alawad2
1University of Central Florida, 2UCF
   fpgaConvNet: Automated Mapping of Convolutional Neural Networks on FPGAs
Stylianos Venieris and Christos Bouganis
Imperial College London
   Poster Session 3
   FPGA-based Hardware Accelerator for Image Reconstruction in Magnetic Resonance Imaging
Emanuele Pezzotti1, Alex Iacobucci1, Gregory Nash2, Umer Cheema1, Paolo Vinella1, Rashid Ansari1
1University of Illinois at Chicago, 2University of Illinois at Chicago, Altera
   ASAP: Accelerated Short Read Alignment on Programmable Hardware
Subho Banerjee, Mohamed El Hadedy, Jong Bin Lim, Daniel Chen, Zbigniew T. Kalbarczyk, Deming Chen, Ravishankar K. Iyer
UIUC
   RxRE: Throughput Optimization for High-Level Synthesis using Resource-Aware Regularity Extraction
Atieh Lotfi and Rajesh Gupta
UCSD
   GRT 2.0: An FPGA-based SDR Platform for Cognitive Radio Networks
Haoyang Wu1, Tao Wang1, Zhiwei Li1, Boyan Ding1, Xiaoguang Li1, Tianfu Jiang1, Jun Liu1, Songwu Lu2
1Peking University, 2UCLA
   FPGA Implementation of Non-Uniform DFT for Accelerating Wireless Channel Simulations
Srinivas Siripurapu1, Aman Gayasen2, Nitin Chandrachoodan1, Padmini Gopalakrishnan2
1IIT Madras, 2Xilinx
   Learning Convolutional Neural Networks for Data-Flow Graph Mapping on Spatial Programmable Architectures
Shouyi Yin, Dajiang Liu, Lifeng Sun, Xinhan Lin, Leibo Liu, Shaojun Wei
Tsinghua University
   Cache Timing Attacks from The SoCFPGA Coherency Port
Sumanta Chaudhuri
Telecom ParisTech
   Dynamic Partitioning for Library based Placement on Heterogeneous FPGAs
Fubing Mao1, Wei Zhang2, Bingsheng He3, Siew Kei Lam1
1Nanyang Technological University, 2Hong Kong University of Science and Technology, 3National University of Singapore
   An Energy-Efficient Design-Time Scheduler for FPGAs Leveraging Dynamic Frequency Scaling Emulation
Wei Ting Loke1 and Chin Yang Koay2
1National University of Singapore, 2Xilinx Asia Pacific