Intel goes all-in on high-bandwidth memory with their Xeon Max CPUs, GPUs- Technology News, Firstpost

It seems that Intel’s latest plan to ward off rivals from high-performance computing workloads involves a CPU with large stacks of high-bandwidth memory and new kinds of accelerators, plus its long-awaited data center GPU that will go head-to-head against Nvidia’s most powerful chips.

The Intel Xeon Max processor is part of the company’s new branding for HPC products. Along with the Max GPU series, the new Xeon Max CPUs are optimized for HPC deployments. In this generation, this is the product previously known as “Sapphire Rapids HBM” combining a next-gen Xeon CPU with HBM onboard to increase memory bandwidth for HPC applications.

The Xeon Max CPUs
The Xeon Max Series will pack up to 56 performance cores, which are based on the same Golden Cove microarchitecture features as Intel’s 12th-Gen Core CPUs, which debuted last year. Like the vanilla Sapphire Rapids chips coming next year, these chips will support DDR5, PCIe 5.0 and Compute Express Link (CXL) 1.1, which will enable memory to be directly attached to the CPU over PCIe 5.0.

Xeon Max, which comes with a thermal design power (TDP) of 350W, comes with 20 accelerators built in for artificial intelligence and HPC workloads. These accelerator types include Intel Advanced Vector Extensions 512 (AVX-512) and Intel Deep Learning Boost (DL Boost), Intel Data Streaming Accelerator (DSA), and Intel Advanced Matrix Extensions (AMX).

With AVX-512, Intel claimed a Xeon Max-based system can provide double the deep learning training performance of a system using AMD’s high-end Epyc 7763 CPU, using the MLPerf DeepCAM benchmark. But with AMX, the company said the Xeon Max system can provide 3.6 times faster performance. As usual, we should take any performance claims with a grain of salt.

Unlike vanilla Sapphire Rapids, Xeon Max will come with 64GB of HBM2e, which will give the CPU roughly 1TB/s of memory bandwidth and more than 1GB per core.

With 64GB of HBM2e, a dual-socket server with two Xeon Max CPUs will pack 128GB in total. This is significant because you can use the HBM as system memory and, as a result, forget about putting in any DRAM modules if you’re fine with that kind of capacity.

McVeigh said this configuration, called HBM-only mode, can help data center operators save on money as well as power, and there is no need to any code changes for software to recognize HBM.

But for data center operators who want to use DDR memory as extra capacity or as the system memory, there are options. In HBM flat mode, the HBM and DDR act as two memory regions, but for software to recognize this, code changes are needed. In HBM caching mode, the HBM acts as a cache for the DDR; this requires no code changes.

McVeigh claimed that HBM helps Xeon Max deliver a major improvement in performance per watt over AMD’s HPC-focused Epyc 7773X, which comes with 768MB of L3 cache. With DDR5 memory installed, Intel said a Xeon Max-based system uses 63 per cent lower power than the Epyc-based system to provide the same level of performance for the High-Performance Conjugate Gradients benchmark. With only HBM, the Xeon Max system uses 67 per cent less power, according to Intel.

The Xeon Max GPUs
While Intel’s Data Center GPU Max Series lacks a creative brand name like Xeon, the company is hoping the accelerator formerly known as Ponte Vecchio will make the company more competitive with data center GPUs from Nvidia, which has a solid lead, and AMD, which is catching up.

The chipmaker called the Max Series GPU its “highest density processor” because of how it packs more than 100 billion transistors into a system-on-package comprising 47 chiplets, known as “tiles” in Intel lingo. These tiles are brought together on the package using Intel’s advanced packaging technologies: embedded multi-die interconnect bridge (EMIB) and Foveros.

The Max Series GPU comes with up to 128 cores based on the Intel Xe HPC microarchitecture, an HPC-focused branch of the chipmaker’s Xe GPU architecture. McVeigh said this allows the GPU’s most powerful configuration to provide 52 teraflops of peak FP64 throughput, a key measure for HPC.

The GPU also comes with up to 128 ray tracing units, which are geared for traditional simulation software as well as digital content creation and pre-visualization applications. Each GPU has 16 Xe Link ports to allow multiple GPUs to directly communicate with each other.

Like Xeon Max, the Max Series GPU comes equipped with HBM2e, except the capacity in this case goes up to 128GB. The GPU also packs a lot of cache, with a maximum of 408MB of Rambo L2 cache (Rambo stands for “random access memory, bandwidth optimized”) and up to 64MB of L1 cache.