Samsung improves AI in-memory processing with CXL

Using the most recent interconnect standards, Samsung has created in-memory processing to improve the performance of AI systems in data centers.

The Mi100 AI accelerator from AMD uses HBM-PIM (high bandwidth memory, processing in memory) chips. Then, using 200Gbit/s InfiniBand switches, Samsung created an HBM-PIM Cluster with 96 Mi100 cards and applied it to various new AI and high-performance computing (HPC) applications.

Tests showed that, on average, the addition of HBM-PIM enhanced performance by more than double and reduced energy usage by more than 50% when compared to traditional GPU accelerators.

The accuracy of the most recent AI models tends to directly correlate with volume size, which suggests a significant challenge. If the DRAM capacity and bandwidth for data transference are not sufficiently supported for Hyperscale AI models, then computing this quantity of data may experience bottlenecking with current memory solutions.

It is reported that using a GPU accelerator with HBM-PIM can save 2,100 GWh of energy annually and reduce 960 000 tons of carbon emissions when a large capacity language model proposed by Google is trained on a cluster composed of 8 accelerators.

Commercially accessible GPUs and HBM-PIM, in combination with software integration, can lessen the bottleneck brought on by memory and bandwidth limitations in hyperscale AI data centers.

As a result, customers will be able to use PIM memory solutions in a combined software environment. SYCL was developed in part by Codeplay, which Intel recently bought.

Samsung has applied PIM technology to AI models based on dense data and PNM technology to AI models based on sparse data.

Cheolmin Park, head of the New Business Planning Team at Samsung Electronics Memory Business Division, said that HBM-PIM Cluster technology is the industry’s first customized memory solution for large-scale artificial intelligence.

For more info, click the cited source.