Advancements in Compute-in-Memory and Accelerator Technologies

The field of computer architecture is witnessing significant advancements in compute-in-memory and accelerator technologies. Recent developments have led to improved performance, energy efficiency, and scalability in various applications, including artificial intelligence, machine learning, and high-performance computing. Compute-in-memory architectures, such as those using Processing-in-Memory (PIM) and Compute-Express Link (CXL), have shown promising results in reducing data movement and increasing processing efficiency. Additionally, accelerator technologies like GPUs, FPGAs, and specialized ASICs are being designed to optimize specific workloads and improve overall system performance. Noteworthy papers in this area include CIMPool, which proposes a CIM-aware compression and acceleration framework for neural networks, and MVDRAM, which enables GeMV execution in unmodified DRAM for low-bit LLM acceleration. These innovations have the potential to transform the way we design and optimize computing systems, enabling faster, more efficient, and more scalable processing of complex workloads.

Sources

Performance Characterizations and Usage Guidelines of Samsung CXL Memory Module Hybrid Prototype

CIMPool: Scalable Neural Network Acceleration for Compute-In-Memory using Weight Pools

CIMR-V: An End-to-End SRAM-based CIM Accelerator with RISC-V for AI Edge Device

Memory-aware Adaptive Scheduling of Scientific Workflows on Heterogeneous Architectures

Benchmarking Ultra-Low-Power $\mu$NPUs

A Performance Analysis of Task Scheduling for UQ Workflows on HPC Systems

A Pilot Study on Tunable Precision Emulation via Automatic BLAS Offloading

Plug & Offload: Transparently Offloading TCP Stack onto Off-path SmartNIC with PnO-TCP

Globus Service Enhancements for Exascale Applications and Facilities

Concorde: Fast and Accurate CPU Performance Modeling with Compositional Analytical-ML Fusion

Improved algorithms for single machine serial-batch scheduling to minimize makespan and maximum cost

Generalized Capacity Planning for the Hospital-Residents Problem

Visual Acuity Consistent Foveated Rendering towards Retinal Resolution

Beware, PCIe Switches! CXL Pools Are Out to Get You

Gaussian Blending Unit: An Edge GPU Plug-in for Real-Time Gaussian-Based Rendering in AR/VR

Uni-Render: A Unified Accelerator for Real-Time Rendering Across Diverse Neural Renderers

MVDRAM: Enabling GeMV Execution in Unmodified DRAM for Low-Bit LLM Acceleration

HeteroPod: XPU-Accelerated Infrastructure Offloading for Commodity Cloud-Native Applications

Banked Memories for Soft SIMT Processors

GPU-centric Communication Schemes for HPC and ML Applications

SPRING: Systematic Profiling of Randomly Interconnected Neural Networks Generated by HLS

A batch production scheduling problem in a reconfigurable hybrid manufacturing-remanufacturing system

NeuraLUT-Assemble: Hardware-aware Assembling of Sub-Neural Networks for Efficient LUT Inference

Sim-is-More: Randomizing HW-NAS with Synthetic Devices

Green computing toward SKA era with RICK

GigaAPI for GPU Parallelization

MEEK: Re-thinking Heterogeneous Parallel Error Detection Architecture for Real-World OoO Superscalar Processors

Shared-Memory Hierarchical Process Mapping

FireGuard: A Generalized Microarchitecture for Fine-Grained Security Analysis on OoO Superscalar Cores

MERE: Hardware-Software Co-Design for Masking Cache Miss Latency in Embedded Processors

HH-PIM: Dynamic Optimization of Power and Performance with Heterogeneous-Hybrid PIM for Edge AI Devices

A flexible framework for early power and timing comparison of time-multiplexed CGRA kernel executions

PIMDAL: Mitigating the Memory Bottleneck in Data Analytics using a Real Processing-in-Memory System

Efficient Trace for RISC-V: Design, Evaluation, and Integration in CVA6

PIM-LLM: A High-Throughput Hybrid PIM Architecture for 1-bit LLMs

SProBench: Stream Processing Benchmark for High Performance Computing Infrastructure

ARCANE: Adaptive RISC-V Cache Architecture for Near-memory Extensions

Built with on top of