Optimizing Efficiency in AI and Machine Learning Hardware and Software

The recent developments in the research area highlight a significant push towards optimizing computational efficiency, energy consumption, and memory usage in hardware and software systems, particularly for AI and machine learning applications. Innovations are focusing on novel architectures and algorithms that reduce complexity, enhance performance, and enable the deployment of advanced models on resource-constrained devices. Key trends include the exploration of biologically inspired neural network architectures, hardware-software co-design for network coding acceleration, and the development of memory-efficient training methods for transformers on edge devices. Additionally, there's a notable emphasis on improving the efficiency of matrix operations and neural network training through innovative algorithms and hardware designs.

Noteworthy Papers

  • Hardware-In-The-Loop Training of a 4f Optical Correlator: Achieves near-backpropagation accuracy with reduced complexity, showcasing the potential of optical computing in neural network training.
  • Planarian Neural Networks: Demonstrates improved image classification accuracy by mimicking the neural architecture of planarians, highlighting the benefits of biologically inspired designs.
  • Towards High-Performance Network Coding: Introduces a hardware-friendly variant of BATS codes and a scalable FPGA-based accelerator, significantly enhancing network coding efficiency.
  • Axon: Proposes a novel systolic array architecture that improves runtime and energy efficiency for GeMM and Conv operations, crucial for AI applications.
  • Ultra Memory-Efficient On-FPGA Training of Transformers: Presents a tensor-compressed optimization method for transformer training on FPGAs, achieving significant memory and energy savings.
  • COMPASS: A compiler framework for resource-constrained crossbar-array based in-memory deep learning accelerators, improving throughput and energy efficiency.
  • Monolithic 3D FPGAs: Leverages back-end-of-line configuration memories to enhance area, latency, and power efficiency in FPGAs.
  • FlexQuant: An elastic quantization framework for deploying LLMs on edge devices, offering improved granularity and storage efficiency.
  • Karatsuba Matrix Multiplication: Extends the Karatsuba algorithm to matrix multiplication, providing area and execution time improvements in custom hardware.
  • Mono-Forward: Introduces a backpropagation-free algorithm for neural network training, reducing memory usage and improving parallelizability.
  • Atleus: A 3D heterogeneous architecture designed to accelerate transformers on the edge, offering significant performance and energy efficiency improvements.

Sources

Hardware-In-The-Loop Training of a 4f Optical Correlator with Logarithmic Complexity Reduction for CNNs

Planarian Neural Networks: Evolutionary Patterns from Basic Bilateria Shaping Modern Artificial Neural Network Architectures

Towards High-Performance Network Coding: FPGA Acceleration With Bounded-value Generators

Axon: A novel systolic array architecture for improved run time and energy efficient GeMM and Conv operation with on-chip im2col

Evaluation of High-Speed Universal Shift Register with 4-bit ALU

Ultra Memory-Efficient On-FPGA Training of Transformers via Tensor-Compressed Optimization

COMPASS: A Compiler Framework for Resource-Constrained Crossbar-Array Based In-Memory Deep Learning Accelerators

Monolithic 3D FPGAs Utilizing Back-End-of-Line Configuration Memories

FlexQuant: Elastic Quantization Framework for Locally Hosted LLM on Edge Devices

Karatsuba Matrix Multiplication and its Efficient Custom Hardware Implementations

Mono-Forward: Backpropagation-Free Algorithm for Efficient Neural Network Training Harnessing Local Errors

Atleus: Accelerating Transformers on the Edge Enabled by 3D Heterogeneous Manycore Architectures

Built with on top of