Compare XPUs

Select up to 5 XPUs to compare side-by-side

Select XPUs to Compare

Clear all (6)Jump to results

Filter by Vendor

Showing 128 XPUs • 6 selected

Alibaba

Hanguang 800

AMD

MI100

23.1 TFLOPs

AMD

MI210

181 TFLOPs

AMD

MI250X

383 TFLOPs

AMD

MI300X

1,307 TFLOPs

AMD

MI325X

1,400 TFLOPs

AMD

MI350X

2,100 TFLOPs

AMD

MI355X

5,300 TFLOPs

AMD

Radeon Pro V520

23.04 TFLOPs

AWS

Inferentia2

190 TFLOPs

AWS

Trainium

190 TFLOPs

AWS

Trainium2

680 TFLOPs

Baidu

Kunlun II

Biren Technology

BR100

Cambricon

MLU370

256 TFLOPs

Cerebras

WSE-3

Enflame Technology

CloudBlazer T20

FuriosaAI

RNGD (Renegade)

256 TFLOPs

FuriosaAI

Warboy

Google

TPU v4

275 TFLOPs

Google

TPU v5e

197 TFLOPs

Google

TPU v5p

459 TFLOPs

Graphcore

Bow IPU

Graphcore

IPU-M2000

Groq

LPU Inference Engine

Huawei

Ascend 910B

Iluvatar CoreX

BI-V150

300 TFLOPs

Intel

Data Center GPU Max 1100

177 TFLOPs

Intel

Data Center GPU Max 1550

419 TFLOPs

Intel Habana

Gaudi 2

432 TFLOPs

Intel Habana

Gaudi 3

1,835 TFLOPs

Multi-Metric Comparison

Relative performance across 5 key metrics (normalized to 100 = best in comparison)

Compute Performance (BF16)

Memory Capacity

Power Consumption

Power Efficiency

Specifications

Specification	Groq LPU Inference Engine	NVIDIA RTX 5880 Ada Generation	NVIDIA GeForce RTX 5060 Ti	NVIDIA T4G	NVIDIA A100 SXM	NVIDIA RTX 4000
Architecture	TSP (Tensor Streaming Processor)	Ada Lovelace	Blackwell	Turing	Ampere	Turing
Form Factor	—	PCIe	PCIe	PCIe	SXM	PCIe
VRAM	230 GB	48 GB	16 GB	16 GB	80 GB	8 GB
Memory Bandwidth	—	960 GB/s	544 GB/s	320 GB/s	2,039 GB/s	416 GB/s
TFLOPs (FP32)	—	83.5	25	8.1	19.5	7.1
TFLOPs (FP16)	—	—	—	—	312	—
TFLOPs	—	167	88	65	312	57.6
TFLOPs (FP8)	—	—	—	—	—	—
TDP	300 W	250 W	220 W	70 W	400 W	160 W
Launch Date	Feb 2024	Mar 2024	Mar 2025	May 2020	May 2020	Nov 2018

Efficiency Metrics

Metric	LPU Inference Engine	RTX 5880 Ada Generation	GeForce RTX 5060 Ti	T4G	A100 SXM	RTX 4000
TFLOPs per Watt (FP32-eq)	—	0.33	0.20	0.46	0.39	0.18
Memory Bandwidth per GB	—	20.0 GB/s	34.0 GB/s	20.0 GB/s	25.5 GB/s	52.0 GB/s

Performance Equivalence

How many units of each GPU are needed to match the performance of the others?

To match 1x Groq LPU Inference Engine

NVIDIA RTX 5880 Ada Generation

VRAM

4.79x

Need 4.79x RTX 5880 Ada Generation

NVIDIA GeForce RTX 5060 Ti

VRAM

14.38x

Need 14.38x GeForce RTX 5060 Ti

NVIDIA T4G

VRAM

14.38x

Need 14.38x T4G

NVIDIA A100 SXM

VRAM

2.88x

Need 2.88x A100 SXM

NVIDIA RTX 4000

VRAM

28.75x

Need 28.75x RTX 4000

To match 1x NVIDIA RTX 5880 Ada Generation

Groq LPU Inference Engine

VRAM

0.21x

LPU Inference Engine has 4.79x more

NVIDIA GeForce RTX 5060 Ti

Compute (FP32-eq)

1.90x

Need 1.90x GeForce RTX 5060 Ti

FP32 Compute

3.34x

Need 3.34x GeForce RTX 5060 Ti

VRAM

3.00x

Need 3.00x GeForce RTX 5060 Ti

Memory Bandwidth

1.76x

Need 1.76x GeForce RTX 5060 Ti

NVIDIA T4G

Compute (FP32-eq)

2.57x

Need 2.57x T4G

FP32 Compute

10.31x

Need 10.31x T4G

VRAM

3.00x

Need 3.00x T4G

Memory Bandwidth

3.00x

Need 3.00x T4G

NVIDIA A100 SXM

Compute (FP32-eq)

0.54x

A100 SXM is 1.87x faster

FP32 Compute

4.28x

Need 4.28x A100 SXM

VRAM

0.60x

A100 SXM has 1.67x more

Memory Bandwidth

0.47x

A100 SXM has 2.12x more

NVIDIA RTX 4000

Compute (FP32-eq)

2.90x

Need 2.90x RTX 4000

FP32 Compute

11.76x

Need 11.76x RTX 4000

VRAM

6.00x

Need 6.00x RTX 4000

Memory Bandwidth

2.31x

Need 2.31x RTX 4000

To match 1x NVIDIA GeForce RTX 5060 Ti

Groq LPU Inference Engine

VRAM

0.07x

LPU Inference Engine has 14.38x more

NVIDIA RTX 5880 Ada Generation

Compute (FP32-eq)

0.53x

RTX 5880 Ada Generation is 1.90x faster

FP32 Compute

0.30x

RTX 5880 Ada Generation is 3.34x faster

VRAM

0.33x

RTX 5880 Ada Generation has 3.00x more

Memory Bandwidth

0.57x

RTX 5880 Ada Generation has 1.76x more

NVIDIA T4G

Compute (FP32-eq)

1.35x

Need 1.35x T4G

FP32 Compute

3.09x

Need 3.09x T4G

VRAM

1.00x

T4G has 1.00x more

Memory Bandwidth

1.70x

Need 1.70x T4G

NVIDIA A100 SXM

Compute (FP32-eq)

0.28x

A100 SXM is 3.55x faster

FP32 Compute

1.28x

Need 1.28x A100 SXM

VRAM

0.20x

A100 SXM has 5.00x more

Memory Bandwidth

0.27x

A100 SXM has 3.75x more

NVIDIA RTX 4000

Compute (FP32-eq)

1.53x

Need 1.53x RTX 4000

FP32 Compute

3.52x

Need 3.52x RTX 4000

VRAM

2.00x

Need 2.00x RTX 4000

Memory Bandwidth

1.31x

Need 1.31x RTX 4000

To match 1x NVIDIA T4G

Groq LPU Inference Engine

VRAM

0.07x

LPU Inference Engine has 14.38x more

NVIDIA RTX 5880 Ada Generation

Compute (FP32-eq)

0.39x

RTX 5880 Ada Generation is 2.57x faster

FP32 Compute

0.10x

RTX 5880 Ada Generation is 10.31x faster

VRAM

0.33x

RTX 5880 Ada Generation has 3.00x more

Memory Bandwidth

0.33x

RTX 5880 Ada Generation has 3.00x more

NVIDIA GeForce RTX 5060 Ti

Compute (FP32-eq)

0.74x

GeForce RTX 5060 Ti is 1.35x faster

FP32 Compute

0.32x

GeForce RTX 5060 Ti is 3.09x faster

VRAM

1.00x

GeForce RTX 5060 Ti has 1.00x more

Memory Bandwidth

0.59x

GeForce RTX 5060 Ti has 1.70x more

NVIDIA A100 SXM

Compute (FP32-eq)

0.21x

A100 SXM is 4.80x faster

FP32 Compute

0.42x

A100 SXM is 2.41x faster

VRAM

0.20x

A100 SXM has 5.00x more

Memory Bandwidth

0.16x

A100 SXM has 6.37x more

NVIDIA RTX 4000

Compute (FP32-eq)

1.13x

Need 1.13x RTX 4000

FP32 Compute

1.14x

Need 1.14x RTX 4000

VRAM

2.00x

Need 2.00x RTX 4000

Memory Bandwidth

0.77x

RTX 4000 has 1.30x more

To match 1x NVIDIA A100 SXM

Groq LPU Inference Engine

VRAM

0.35x

LPU Inference Engine has 2.88x more

NVIDIA RTX 5880 Ada Generation

Compute (FP32-eq)

1.87x

Need 1.87x RTX 5880 Ada Generation

FP32 Compute

0.23x

RTX 5880 Ada Generation is 4.28x faster

VRAM

1.67x

Need 1.67x RTX 5880 Ada Generation

Memory Bandwidth

2.12x

Need 2.12x RTX 5880 Ada Generation

NVIDIA GeForce RTX 5060 Ti

Compute (FP32-eq)

3.55x

Need 3.55x GeForce RTX 5060 Ti

FP32 Compute

0.78x

GeForce RTX 5060 Ti is 1.28x faster

VRAM

5.00x

Need 5.00x GeForce RTX 5060 Ti

Memory Bandwidth

3.75x

Need 3.75x GeForce RTX 5060 Ti

NVIDIA T4G

Compute (FP32-eq)

4.80x

Need 4.80x T4G

FP32 Compute

2.41x

Need 2.41x T4G

VRAM

5.00x

Need 5.00x T4G

Memory Bandwidth

6.37x

Need 6.37x T4G

NVIDIA RTX 4000

Compute (FP32-eq)

5.42x

Need 5.42x RTX 4000

FP32 Compute

2.75x

Need 2.75x RTX 4000

VRAM

10.00x

Need 10.00x RTX 4000

Memory Bandwidth

4.90x

Need 4.90x RTX 4000

To match 1x NVIDIA RTX 4000

Groq LPU Inference Engine

VRAM

0.03x

LPU Inference Engine has 28.75x more

NVIDIA RTX 5880 Ada Generation

Compute (FP32-eq)

0.34x

RTX 5880 Ada Generation is 2.90x faster

FP32 Compute

0.09x

RTX 5880 Ada Generation is 11.76x faster

VRAM

0.17x

RTX 5880 Ada Generation has 6.00x more

Memory Bandwidth

0.43x

RTX 5880 Ada Generation has 2.31x more

NVIDIA GeForce RTX 5060 Ti

Compute (FP32-eq)

0.65x

GeForce RTX 5060 Ti is 1.53x faster

FP32 Compute

0.28x

GeForce RTX 5060 Ti is 3.52x faster

VRAM

0.50x

GeForce RTX 5060 Ti has 2.00x more

Memory Bandwidth

0.76x

GeForce RTX 5060 Ti has 1.31x more

NVIDIA T4G

Compute (FP32-eq)

0.89x

T4G is 1.13x faster

FP32 Compute

0.88x

T4G is 1.14x faster

VRAM

0.50x

T4G has 2.00x more

Memory Bandwidth

1.30x

Need 1.30x T4G

NVIDIA A100 SXM

Compute (FP32-eq)

0.18x

A100 SXM is 5.42x faster

FP32 Compute

0.36x

A100 SXM is 2.75x faster

VRAM

0.10x

A100 SXM has 10.00x more

Memory Bandwidth

0.20x

A100 SXM has 4.90x more

Pricing

Price Type	LPU Inference Engine	RTX 5880 Ada Generation	GeForce RTX 5060 Ti	T4G	A100 SXM	RTX 4000
CAPEX (Street Price)	—	—	—	—	$15,000	—
OPEX (per hour)	—	—	$0.09/hr	$0.42/hr	$4.05/hr	$0.34/hr
Price per TFLOPs (FP32-eq)	—	—	—	—	$96	—