Research
Designing Application-Specific Hardware
The demand for more computing power is higher than ever as new AI applications are emerging. However, Moore's law no longer provide enough computing power to meet the demand. In this era, specialization may be the only way to bring more computing power. We currently focus on developing specialized hardware for artificial intelligence using Application Specific Integrated Circuits (ASICs) and Field Programmable Gate Arrays (FPGAs).
AI for Designing Computer Chips
Designing computer chips is a labor-intensive and timing consuming process until today. Would it be the same in the future? We strongly believe that AI can revolutionize the way we design computer chips. However, there are still a lot of challenges in applying AI to chip design even though astonishing progress in machine learning. We will foster AI experts and chip design experts in a single lab and develop the best AI for designing chips.
Efficient AI Algorithms
The predictive performance of recent AI often scales with more data and larger models, and we have witnessed what AI is capable of. However, the large models often require massive amounts of computation, huge memory and storage, and present formidable challenges in serving them at Internet-scale and deploying them on mobile devices. We often deal with these challenges by purely developing novel algorithms without relying on new hardware
Microarchitecture-Logic Co-Design
2D spatial architectures such as systolic arrays are the heart of modern deep learning accelerators such as NPU, TPU, etc. From architectural point of view, arithmetic logic circuits such as multipliers and adders are atomic blackbox primitives. We open this box and consider them together with the microarchitecture, finding new opportunities for better designs.
Inayat Ullah, Kashif Inayat, Joon-Sung Yang, and Jaeyong Chung, "Factored Radix-8 Systolic Array for Tensor Processing", IEEE/ACM Proc. Design Automation Conference (DAC), July 2020
Kashif Inayat and Jaeyong Chung, "Hybrid Accumulator Factored Systolic Array for Machine Learning Acceleration", IEEE Transaction on Very Large Scale Integration (VLSI) Systems, 2022
Kashif Inayat, Inayat Ullah and Jaeyong Chung, "Factored Systolic Arrays based on Radix-8 Multiplication for Machine Learning Acceleration", IEEE Transaction on Very Large Scale Integration (VLSI) Systems, 2024
FPGA-based Systems/Neuromorphic Computing Systems
FPGAs allocate exclusive computational resources for each task and provides true concurrency while providing flexibility like software. We are experts on FPGAs. We have built a neuromorphic computing system on top of FPGAs back in 2015 and to our best knowledge, it has the lowest of latency in processing neural networks till today. If your application needs low, predictable latency for real-time processing of neural networks, please let us know.
<Selected Publications>
Taehwan Shin, Yongshin Kang, Seungho Yang, Seban Kim, and Jaeyong Chung, “Live Demonstration: Real-Time Image Classification on a Neuromorphic Computing System with Zero Off-Chip Memory Access”, Proc. IEEE International Symposium on Circuits and Systems (ISCAS), May 2016
Jaeyong Chung and Taehwan Shin, “Simplifying Deep Neural Networks for Neuromorphic Architectures”, Proc. IEEE/ACM Design Automation Conference (DAC), June 2016
Seban Kim and Jaeyong Chung, “Synthesis of Activation-Parallel Convolution Structures for Neuromorphic Architectures”, Proc. IEEE Design Automation & Test in Europe Conference (DATE), March 2017 (Accepted for Presentation)
Yongshin Kang, Seban Kim, Taehwan Shin, and Jaeyong Chung, “Running Convolutional Layers of AlexNet in Neuromorphic Computing System”, Proc. IEEE Design Automation & Test in Europe Conference (DATE), March 2017 (U-booth Demo)
AI-based General Optimization Frameworks for Chip Design
AI based on reinforcement learning (RL) is surpassing humans in games, algorithm design, computational biology, and so on. However, RL is expensive, especially in chip design because simulations in chip design is too slow compared to other domains. We are developing a general, cost-efficient, learning-based optimization framework for chip design as an alternative to RL.
Phouc Pham and Jaeyong Chung, "AGD: A Learning-based Optimzation Framework for EDA and its Application to Gate Sizing", IEEE/ACM Proc. Design Automation Conference (DAC), July 2023
Model Compression
Deep learning models are often redundant and can be compressed by several techniques. We started model compression research back in 2014 and achieved the state-of-the art results in 2021 in binary and multi-bit quantization.
Jaeyong Chung and Taehwan Shin, “Simplifying Deep Neural Networks for Neuromorphic Architectures”, Proc. IEEE/ACM Design Automation Conference (DAC), June 2016
Phuoc Pham, Jacob A. Abraham, and Jaeyong Chung, "Training Multi-bit Quantized and Binarized Networks with A Learnable Symmetric Quantizer", IEEE Access, Vol. 9, March 2021 (github)