Intel claims that in classic FP32 (single-precision) and FP64 (double-precision) floating-point tests, its silicon is highly competitive with the H100 “Hopper,” with the company claiming 52 TFLOP/s FP32 for the “Ponte Vecchio, ” compared to 60 TFLOP/s for the H100; and a significantly higher 52 TFLOP/s FP64 for the “Ponte Vecchio,” compared to 30 TFLOP/s for the H100. This has to do with the SIMD units of the Xe-HP architecture all being natively capable of double-precision floating-point operations; Whereas NVIDIA’s architecture typically rely on FP64-specialized streaming multiprocessors.
Where Intel claims dominance over NVIDIA is with the XMX-accelerated XMX-Float, an architecture-specific workload, where it scores 419 TFLOP/s. This test doesn’t work on “Hopper,” as it lacks specialized hardware. XMX-accelerated half-precision tests such as Bfloat16 (BF16) and FP16 performance is sub-par, with Intel claiming 839 TFLOP/s, compared to 2 PFLOP/s of the NVDIA chip. With 8-bit operations, such as INT8, even with XMX acceleration, “Ponte Vecchio” scores 1.678 PFLOP/s compared to 4 PFLOP/s of the NVIDIA chip.
Whether Intel has “missed the bus” for this generation in the HPC accelerator market will now boil down to pricing and availability. If Intel can manage good volumes, is able to leverage its oneAPI developer ecosystem, is able to score design wins with major HPC projects and cloud-compute providers; And most importantly, is able to beat “Hopper” in price-performance and energy-efficient, then Intel could remain relevant in this generation, and continue investments into the next.