Compute-in-Memory (CiM) Research

Report on Current Developments in Compute-in-Memory (CiM) Research

General Direction of the Field

The field of Compute-in-Memory (CiM) is rapidly evolving, with recent advancements focusing on enhancing both the efficiency and functionality of in-memory computation architectures. The primary thrust of current research is towards developing architectures that not only reduce latency and energy consumption but also expand the scope of computational tasks that can be performed within memory. This includes enabling more complex operations, such as floating-point computations and sparsity-centric processing, which are crucial for modern AI and machine learning applications.

One of the key trends is the integration of concurrent computation and data flow within the memory banks themselves. This approach aims to minimize the overhead associated with data movement between memory and processors, thereby improving overall system performance. Innovations in this area are particularly significant for tasks that require high-speed data processing, such as matrix and polynomial multiplications, as well as graph processing algorithms.

Another notable direction is the adoption of approximate computing techniques to enhance the power efficiency of CiM systems. These techniques leverage probabilistic methods to approximate complex operations, such as multiply-and-accumulate (MAC) operations, while maintaining high accuracy. This approach not only reduces computational power but also significantly cuts down on memory accesses, leading to substantial improvements in system-level efficiency.

Additionally, there is a growing emphasis on optimizing AI models, particularly Transformer models, for deployment on resource-constrained devices. This involves exploring quantization-aware training to reduce model size and memory footprint without compromising performance. Such optimizations are critical for enabling on-device AI applications, especially in pervasive computing environments.

Noteworthy Innovations

  • Shared-PIM: This architecture significantly reduces data movement latency and energy by enabling concurrent computation and data transfer within memory banks, leading to substantial performance improvements in various computational tasks.

  • PACiM: A sparsity-centric architecture that leverages probabilistic approximation to reduce both computation power and memory accesses, achieving high efficiency while maintaining accuracy in deep neural network processing.

  • TimeFloats: An innovative train-in-memory architecture that performs floating-point scalar products in the time domain, offering high energy efficiency and easier integration with digital circuits.

Sources

Shared-PIM: Enabling Concurrent Computation and Data Flow for Faster Processing-in-DRAM

PACiM: A Sparsity-Centric Hybrid Compute-in-Memory Architecture via Probabilistic Approximation

On-device AI: Quantization-aware Training of Transformers in Time-Series

TimeFloats: Train-in-Memory with Time-Domain Floating-Point Scalar Products