List of GPUs Nvidia for Artificial Intelligence

List of Nvidia GPUs for Artificial Intelligence

May 20, 2025 OLEKSANDR SYZOV

List of Nvidia GPUs for Artificial Intelligence

Free Professional Consultation on server equipment.

Phone: +38 (067) 819-38-38 / E-mail: server@systemsolutions.com.ua

DELL PowerEdge R760 Server configurator

NVIDIA offers a diverse range of graphics accelerators (GPUs) specifically designed and optimized for Artificial Intelligence (AI) and deep learning workloads. These GPUs leverage specialized cores like Tensor Cores to dramatically speed up computations critical for AI training, inference, and data processing.

Here's a list of notable NVIDIA GPUs for AI, along with their key technical characteristics:

NVIDIA Data Center GPUs (Designed for high-performance AI workloads)

1. NVIDIA H200 (Hopper Architecture)

Architecture: Hopper
Form Factor: Available in SXM (for HGX systems) and PCIe (NVL)
GPU Memory: 141 GB HBM3e (High Bandwidth Memory 3e)
Memory Bandwidth: 4.8 TB/s
Interconnect:

NVLink: 900 GB/s (bidirectional)
PCIe Gen5: 128GB/s

Tensor Core Performance (with sparsity):

FP8: up to 3958 TFLOPS (SXM), 3341 TFLOPS (PCIe)
FP16/BF16: up to 1979 TFLOPS (SXM), 1671 TFLOPS (PCIe)
TF32: up to 989 TFLOPS (SXM), 835 TFLOPS (PCIe)

FP32 Performance: 67 TFLOPS (SXM), 60 TFLOPS (PCIe)
FP64 Performance: 34 TFLOPS (SXM), 30 TFLOPS (PCIe)
TDP (Thermal Design Power): Up to 700W (SXM), Up to 600W (PCIe)
Multi-Instance GPU (MIG): Yes, up to 7 instances
Key for AI: The H200 is designed for the most demanding AI workloads, especially large language model (LLM) training and inference, offering significant memory capacity and bandwidth improvements over the H100.

2. NVIDIA H100 (Hopper Architecture)

Architecture: Hopper
Form Factor: Available in SXM (for HGX systems) and PCIe
GPU Memory: 80 GB HBM3 (or HBM2e for some variants)
Memory Bandwidth: Up to 3.35 TB/s (HBM3)
Interconnect:

NVLink: 900 GB/s (bidirectional)
PCIe Gen5: 128GB/s

Tensor Core Performance (with sparsity):

FP8: Up to 3958 TFLOPS
FP16/BF16: Up to 1979 TFLOPS
TF32: Up to 989 TFLOPS

FP32 Performance: Up to 67 TFLOPS
FP64 Performance: Up to 34 TFLOPS
TDP: Up to 700W (SXM), 350W (PCIe)
Multi-Instance GPU (MIG): Yes
Key for AI: The H100 is a top-tier GPU for large-scale AI training, especially for generative AI and LLMs, offering groundbreaking performance through its Hopper architecture and Tensor Cores.

3. NVIDIA L40S (Ada Lovelace Architecture)

Architecture: Ada Lovelace
Form Factor: Dual-slot FHFL (Full-Height, Full-Length) PCIe
GPU Memory: 48 GB GDDR6 with ECC
Memory Bandwidth: 864 GB/s
Interconnect: PCIe Gen4 x16 (64 GB/s)
Tensor Core Performance (with sparsity):

FP8: 1466 TFLOPS
FP16/BF16: 733 TFLOPS
TF32: 366 TFLOPS

FP32 Performance: 91.6 TFLOPS
TDP: Up to 350W
Key for AI: Designed as a universal GPU for generative AI, large language model inference and training, and 3D rendering. It combines powerful AI capabilities with excellent graphics features.

4. NVIDIA A100 (Ampere Architecture)

Architecture: Ampere
Form Factor: Available in SXM (for HGX systems) and PCIe
GPU Memory: 40 GB or 80 GB HBM2e
Memory Bandwidth: Up to 1.55 TB/s (40GB) / 2.03 TB/s (80GB)
Interconnect:

NVLink: 600 GB/s (bidirectional)
PCIe Gen4 x16 (64GB/s)

Tensor Core Performance (with sparsity):

FP16/BF16: Up to 624 TFLOPS
TF32: Up to 312 TFLOPS
INT8: Up to 1248 TOPS

FP32 Performance: 19.5 TFLOPS
FP64 Performance: 9.7 TFLOPS (19.5 TFLOPS with Tensor Core)
TDP: 250W (40GB PCIe), 300W (80GB PCIe), up to 400W (SXM)
Multi-Instance GPU (MIG): Yes, up to 7 instances
Key for AI: A workhorse for a wide range of AI/ML workloads, including training complex deep learning models, high-performance computing (HPC), and data analytics. MIG functionality allows for efficient multi-tenant environments.

5. NVIDIA A40 (Ampere Architecture)

Architecture: Ampere
Form Factor: Dual-slot FHFL (Full-Height, Full-Length) PCIe
GPU Memory: 48 GB GDDR6 with ECC
Memory Bandwidth: 696 GB/s
Interconnect:

NVLink: 112.5 GB/s (bidirectional, when linked)
PCIe Gen4 x16 (64GB/s)

Tensor Core Performance (with sparsity):

FP16/BF16: Up to 299.4 TFLOPS
TF32: Up to 149.6 TFLOPS
INT8: Up to 1197.4 TOPS

FP32 Performance: 37.4 TFLOPS
TDP: 300W
Key for AI: Excellent for visual computing combined with AI, such as virtual workstations, 3D rendering, simulation, and enterprise AI inference, especially where high memory capacity is beneficial.

6. NVIDIA L4 (Ada Lovelace Architecture)

Architecture: Ada Lovelace
Form Factor: Single-slot, low-profile PCIe
GPU Memory: 24 GB GDDR6
Memory Bandwidth: 300 GB/s
Interconnect: PCIe Gen4 x16 (64 GB/s)
Tensor Core Performance (with sparsity):

FP8: 485 TFLOPS
FP16/BF16: 242 TFLOPS
TF32: 120 TFLOPS

FP32 Performance: 30.3 TFLOPS
TDP: 72W
Key for AI: A highly energy-efficient GPU ideal for AI inference and smaller-scale AI training at the edge or in data centers where power and space are constrained. Also supports video processing and generative AI tasks.

7. NVIDIA A2 (Ampere Architecture)

Architecture: Ampere
Form Factor: Single-slot, low-profile PCIe
GPU Memory: 16 GB GDDR6
Memory Bandwidth: 200 GB/s
Interconnect: PCIe Gen4 x8
Tensor Core Performance (with sparsity):

FP16/BF16: Up to 36 TFLOPS
INT8: Up to 72 TOPS

FP32 Performance: 4.5 TFLOPS
TDP: 40-60W (configurable)
Key for AI: Entry-level inference GPU designed for edge deployments and smaller AI workloads where low power consumption and a compact form factor are critical.

Key Technical Characteristics Explained:

Architecture (eg, Hopper, Ampere, Ada Lovelace): The underlying design of the GPU, which dictates its core capabilities, efficiency, and features like Tensor Cores. Newer architectures generally offer significant performance gains.
GPU Memory (VRAM): The amount of dedicated high-speed memory on the GPU. Crucial for handling large datasets and complex AI models (eg, large language models). HBM (High Bandwidth Memory) provides significantly more bandwidth than GDDR.
Memory Bandwidth: How quickly data can be moved to and from the GPU's memory. Higher bandwidth is essential for data-intensive AI workloads.
Tensor Cores: Specialized processing units on NVIDIA GPUs designed to accelerate matrix multiplications, which are fundamental operations in deep learning. They support various precision formats (FP16, BF16, TF32, FP8, INT8).
TFLOPS (TeraFLOPS) / TOPS (TeraOPS): Measures of floating-point operations per second (TFLOPS) or integer operations per second (TOPS). Higher numbers indicate greater computational power.

FP32 (Single-Precision Floating Point): General-purpose computation.
FP16 (Half-Precision Floating Point): Common for AI training to save memory and increase speed with minimal accuracy loss.
BF16 (Bfloat16): Another 16-bit floating-point format, offering a wider dynamic range than FP16, often used in AI training.
TF32 (Tensor Float 32): NVIDIA's format that provides FP32 range with FP16 precision, accelerating AI training on Tensor Cores.
FP8 / INT8: Lower precision formats used primarily for highly efficient AI inference.
Sparsity: A technique where parts of a neural network with negligible impact are removed, allowing Tensor Cores to achieve even higher performance.

Interconnect (NVLink, PCIe):

NVLink: NVIDIA's high-speed, point-to-point interconnect technology that allows GPUs to communicate directly with each other and with CPUs at much higher bandwidths than PCIe, crucial for multi-GPU training.
PCIe (PCI Express): The standard interface for connecting GPUs to the server's motherboard. PCIe Gen5 offers double the bandwidth of Gen4.

TDP (Thermal Design Power): The maximum amount of heat generated by the GPU that the cooling system needs to dissipate. Impacts power consumption and cooling requirements.
Multi-Instance GPU (MIG): A feature that allows a single GPU to be partitioned into multiple, isolated GPU instances, each with dedicated resources. This improves GPU utilization for diverse or smaller workloads.

The best prices for official DELL PowerEdge R760 servers in Ukraine.

Free consultation by phone +38 (067) 819 38 38

Available server models from the warehouse in Kyiv:

Server Dell PowerEdge R760 - Intel Xeon Silver 4510 2.4-4.1Ghz 12 Cores

Server Dell PowerEdge R760 - Intel Xeon Silver 4514Y 2.0-3.4Ghz 16 Cores

Server Dell PowerEdge R760 - Intel Xeon Gold 6526Y 2.8-3.9Ghz 16 Cores

Server Dell PowerEdge R760 - Intel Xeon Gold 5420+ 2.0-4.1Ghz 28 Cores

Back to blog

Item added to your cart

DELL Data Center servers

Server DELL PowerEdge R770

Server DELL PowerEdge R770

DELL PowerEdge R760xa server

DELL PowerEdge R760xa server

DELL PowerEdge R760 server

DELL PowerEdge R760 server

DELL PowerEdge R670 server

DELL PowerEdge R670 server

Server DELL PowerEdge R570

Server DELL PowerEdge R570

Server DELL PowerEdge R470

Server DELL PowerEdge R470

DELL PowerEdge R660xs server

DELL PowerEdge R660xs server

DELL PowerEdge R660 server

DELL PowerEdge R660 server

List of Nvidia GPUs for Artificial Intelligence

List of Nvidia GPUs for Artificial Intelligence

Free Professional Consultation on server equipment.

Phone: +38 (067) 819-38-38 / E-mail: server@systemsolutions.com.ua

DELL PowerEdge R760 Server configurator

Language