GPUs Intel Gaudi 3 for Artificial Intelligence

GPUs Intel Gaudi 3 for Artificial Intelligence

OLEKSANDR SYZOV

GPUs Intel Gaudi 3 for Artificial Intelligence

Intel Gaudi 3 is the latest generation of Intel's AI accelerators, specifically designed to handle the demanding workloads of generative AI and large language models (LLMs) for both training and inference. Intel aims to offer a competitive alternative to NVIDIA's dominant GPUs in the AI market, emphasizing open standards and cost-effectiveness.

Intel Gaudi 3 is available in two main form factors:

  1. HL-325L (OCP Accelerator Module - OAM Mezzanine Card): This is the high-performance, high-power version designed for dense server configurations.
  2. HL-338 (PCIe Add-In Card): This is a more standard PCIe form factor for broader server compatibility.

Here are the key technical characteristics of Intel Gaudi 3, combining specifications for both form factors where applicable:

General Architecture and Core Features:

  • Manufacturing Process: Built on TSMC's 5nm process node.
  • Compute Engines:
    • Matrix Multiplication Engines (MMEs): 8 units. These are specialized cores for efficient matrix operations, crucial for deep learning.
    • Tensor Processor Cores (TPCs): 64 units. These are programmable vector processors designed for deep learning training and inference workloads.
  • On-die SRAM: 96 MB with 12.8 TB/s bandwidth, providing fast local memory for the cores.
  • Media Engines: 14 Decoders, 4 Rotator Engines, indicating capabilities for processing various media formats for AI applications.
  • Host Interface: PCIe Gen 5.0 x16, offering high bandwidth (128 GB/s bidirectional) for communication with the host CPU.

Memory Subsystem:

  • HBM (High Bandwidth Memory): 128 GB of HBM2e memory.
  • HBM Bandwidth: 3.7 TB/s, providing extremely high data throughput for memory-intensive AI models. The HBM controller is optimized for both random and linear access patterns.

Networking and Scalability:

  • On-chip Ethernet: 24 integrated 200 Gbps RoCE (RDMA over Converged Ethernet) ports. This is a significant differentiator, promoting an open and flexible Ethernet-based fabric for scale-up (within a server) and scale-out (across multiple servers) connectivity.
  • Total Bidirectional Network Bandwidth: 1200 GB/s. This allows for massive communication bandwidth between accelerators within and across nodes, crucial for large-scale distributed AI training.
  • Open Standard: Intel emphasizes the use of industry-standard Ethernet, which aims to reduce vendor lock-in and simplify integration compared to proprietary interconnects.

Performance Metrics (Compared to Gaudi 2 and often H100):

  • AI Compute (FP8): 1835 TFLOPS (TeraFLOPS).
  • AI Compute (BF16): 1835 TFLOPS (TeraFLOPS).
  • BF16 Vector TFLOPs: 28.7.
  • Generational Improvement: Intel claims 2x AI compute (FP8), 4x AI compute (BF16), 2x network bandwidth, and 1.5x memory bandwidth compared to Gaudi 2.
  • Time-to-Train: Intel states that Gaudi 3 can be 1.5x faster in time-to-train than NVIDIA H100 on average for certain models.

Power and Form Factor Specifics:

  • HL-325L (OAM):
    • TDP: 900W.
    • Form Factor: OCP Accelerator Module V2.0 Compliant. These are designed to be integrated into specialized baseboards, often in groups of eight per server node (e.g., in a 7.6 KW integrated subsystem).
  • HL-338 (PCIe Card):
    • TDP: 600W.
    • Form Factor: Full-height, Double-wide, 10.5” length PCIe Card. This allows for installation in a wider range of servers that support double-wide PCIe cards.

Intended Applications:

Intel Gaudi 3 is designed for the most demanding AI workloads, including:

  • Large Language Model (LLM) Training and Inference: Its high memory capacity, bandwidth, and compute power make it ideal for the massive computational and memory requirements of LLMs.
  • Generative AI: Powering multi-modal generative AI applications, including text-to-image, text-to-video, and other content creation tasks.
  • High-Performance Computing (HPC): Accelerating complex scientific simulations and data analytics that leverage parallel processing.
  • Enterprise AI: Providing a scalable and efficient solution for various enterprise AI use cases.

Intel's strategy with Gaudi 3 is to provide a compelling alternative in the AI hardware market, focusing on performance, open software development, and cost-effectiveness through its use of Ethernet for scaling.

 

Найкращі ціни на офіційні сервери DELL PowerEdge R760 в Україні. 

Безкоштовна консультація по телефону +38 (067) 819 38 38

Доступні моделі серверів зі складу у Києві:

Сервер Dell PowerEdge R760 - Intel Xeon Silver 4510 2.4-4.1Ghz 12 Cores 

Сервер Dell PowerEdge R760 - Intel Xeon Silver 4514Y 2.0-3.4Ghz 16 Cores 

Сервер Dell PowerEdge R760 - Intel Xeon Gold 6526Y 2.8-3.9Ghz 16 Cores

Сервер Dell PowerEdge R760 - Intel Xeon Gold 5420+ 2.0-4.1Ghz 28 Cores

Back to blog