Compare XPUs

Select up to 5 XPUs to compare side-by-side

Select XPUs to Compare

Clear all (7)Jump to results

Filter by Vendor

Showing 128 XPUs • 7 selected

Alibaba

Hanguang 800

AMD

MI100

23.1 TFLOPs

AMD

MI210

181 TFLOPs

AMD

MI250X

383 TFLOPs

AMD

MI300X

1,307 TFLOPs

AMD

MI325X

1,400 TFLOPs

AMD

MI350X

2,100 TFLOPs

AMD

MI355X

5,300 TFLOPs

AMD

Radeon Pro V520

23.04 TFLOPs

AWS

Inferentia2

190 TFLOPs

AWS

Trainium

190 TFLOPs

AWS

Trainium2

680 TFLOPs

Baidu

Kunlun II

Biren Technology

BR100

Cambricon

MLU370

256 TFLOPs

Cerebras

WSE-3

Enflame Technology

CloudBlazer T20

FuriosaAI

RNGD (Renegade)

256 TFLOPs

FuriosaAI

Warboy

Google

TPU v4

275 TFLOPs

Google

TPU v5e

197 TFLOPs

Google

TPU v5p

459 TFLOPs

Graphcore

Bow IPU

Graphcore

IPU-M2000

Groq

LPU Inference Engine

Huawei

Ascend 910B

Iluvatar CoreX

BI-V150

300 TFLOPs

Intel

Data Center GPU Max 1100

177 TFLOPs

Intel

Data Center GPU Max 1550

419 TFLOPs

Intel Habana

Gaudi 2

432 TFLOPs

Intel Habana

Gaudi 3

1,835 TFLOPs

Multi-Metric Comparison

Relative performance across 5 key metrics (normalized to 100 = best in comparison)

Compute Performance (BF16)

Memory Capacity

Power Consumption

Power Efficiency

Specifications

Specification	Groq LPU Inference Engine	AMD MI300X	NVIDIA GeForce RTX 5060 Ti	NVIDIA B100	NVIDIA P40	NVIDIA GeForce GT 710	NVIDIA A100 SXM
Architecture	TSP (Tensor Streaming Processor)	CDNA 3	Blackwell	Blackwell	Pascal	Kepler	Ampere
Form Factor	—	OAM	PCIe	SXM	PCIe	PCIe	SXM
VRAM	230 GB	192 GB	16 GB	192 GB	24 GB	2 GB	80 GB
Memory Bandwidth	—	5,300 GB/s	544 GB/s	8,000 GB/s	346 GB/s	14.4 GB/s	2,039 GB/s
TFLOPs (FP32)	—	163.4	25	45	12	0.366	19.5
TFLOPs (FP16)	—	1,307	—	—	—	—	312
TFLOPs	—	1,307	88	1,800	12	0.366	312
TFLOPs (FP8)	—	2,614	—	—	—	—	—
TDP	300 W	750 W	220 W	700 W	250 W	19 W	400 W
Launch Date	Feb 2024	Dec 2023	Mar 2025	Mar 2024	Sep 2016	Mar 2014	May 2020

Efficiency Metrics

Metric	LPU Inference Engine	MI300X	GeForce RTX 5060 Ti	B100	P40	GeForce GT 710	A100 SXM
TFLOPs per Watt (FP32-eq)	—	0.87	0.20	1.29	0.05	0.02	0.39
Memory Bandwidth per GB	—	27.6 GB/s	34.0 GB/s	41.7 GB/s	14.4 GB/s	7.2 GB/s	25.5 GB/s

Performance Equivalence

How many units of each GPU are needed to match the performance of the others?

To match 1x Groq LPU Inference Engine

AMD MI300X

VRAM

1.20x

Need 1.20x MI300X

NVIDIA GeForce RTX 5060 Ti

VRAM

14.38x

Need 14.38x GeForce RTX 5060 Ti

NVIDIA B100

VRAM

1.20x

Need 1.20x B100

NVIDIA P40

VRAM

9.58x

Need 9.58x P40

NVIDIA GeForce GT 710

VRAM

115.00x

Need 115.00x GeForce GT 710

NVIDIA A100 SXM

VRAM

2.88x

Need 2.88x A100 SXM

To match 1x AMD MI300X

Groq LPU Inference Engine

VRAM

0.83x

LPU Inference Engine has 1.20x more

NVIDIA GeForce RTX 5060 Ti

Compute (FP32-eq)

14.85x

Need 14.85x GeForce RTX 5060 Ti

FP32 Compute

6.54x

Need 6.54x GeForce RTX 5060 Ti

VRAM

12.00x

Need 12.00x GeForce RTX 5060 Ti

Memory Bandwidth

9.74x

Need 9.74x GeForce RTX 5060 Ti

NVIDIA B100

Compute (FP32-eq)

0.73x

B100 is 1.38x faster

FP32 Compute

3.63x

Need 3.63x B100

VRAM

1.00x

B100 has 1.00x more

Memory Bandwidth

0.66x

B100 has 1.51x more

NVIDIA P40

Compute (FP32-eq)

54.46x

Need 54.46x P40

FP32 Compute

13.62x

Need 13.62x P40

VRAM

8.00x

Need 8.00x P40

Memory Bandwidth

15.32x

Need 15.32x P40

NVIDIA GeForce GT 710

Compute (FP32-eq)

1785.52x

Need 1785.52x GeForce GT 710

FP32 Compute

446.45x

Need 446.45x GeForce GT 710

VRAM

96.00x

Need 96.00x GeForce GT 710

Memory Bandwidth

368.06x

Need 368.06x GeForce GT 710

NVIDIA A100 SXM

Compute (FP32-eq)

4.19x

Need 4.19x A100 SXM

FP32 Compute

8.38x

Need 8.38x A100 SXM

VRAM

2.40x

Need 2.40x A100 SXM

Memory Bandwidth

2.60x

Need 2.60x A100 SXM

To match 1x NVIDIA GeForce RTX 5060 Ti

Groq LPU Inference Engine

VRAM

0.07x

LPU Inference Engine has 14.38x more

AMD MI300X

Compute (FP32-eq)

0.07x

MI300X is 14.85x faster

FP32 Compute

0.15x

MI300X is 6.54x faster

VRAM

0.08x

MI300X has 12.00x more

Memory Bandwidth

0.10x

MI300X has 9.74x more

NVIDIA B100

Compute (FP32-eq)

0.05x

B100 is 20.45x faster

FP32 Compute

0.56x

B100 is 1.80x faster

VRAM

0.08x

B100 has 12.00x more

Memory Bandwidth

0.07x

B100 has 14.71x more

NVIDIA P40

Compute (FP32-eq)

3.67x

Need 3.67x P40

FP32 Compute

2.08x

Need 2.08x P40

VRAM

0.67x

P40 has 1.50x more

Memory Bandwidth

1.57x

Need 1.57x P40

NVIDIA GeForce GT 710

Compute (FP32-eq)

120.22x

Need 120.22x GeForce GT 710

FP32 Compute

68.31x

Need 68.31x GeForce GT 710

VRAM

8.00x

Need 8.00x GeForce GT 710

Memory Bandwidth

37.78x

Need 37.78x GeForce GT 710

NVIDIA A100 SXM

Compute (FP32-eq)

0.28x

A100 SXM is 3.55x faster

FP32 Compute

1.28x

Need 1.28x A100 SXM

VRAM

0.20x

A100 SXM has 5.00x more

Memory Bandwidth

0.27x

A100 SXM has 3.75x more

To match 1x NVIDIA B100

Groq LPU Inference Engine

VRAM

0.83x

LPU Inference Engine has 1.20x more

AMD MI300X

Compute (FP32-eq)

1.38x

Need 1.38x MI300X

FP32 Compute

0.28x

MI300X is 3.63x faster

VRAM

1.00x

MI300X has 1.00x more

Memory Bandwidth

1.51x

Need 1.51x MI300X

NVIDIA GeForce RTX 5060 Ti

Compute (FP32-eq)

20.45x

Need 20.45x GeForce RTX 5060 Ti

FP32 Compute

1.80x

Need 1.80x GeForce RTX 5060 Ti

VRAM

12.00x

Need 12.00x GeForce RTX 5060 Ti

Memory Bandwidth

14.71x

Need 14.71x GeForce RTX 5060 Ti

NVIDIA P40

Compute (FP32-eq)

75.00x

Need 75.00x P40

FP32 Compute

3.75x

Need 3.75x P40

VRAM

8.00x

Need 8.00x P40

Memory Bandwidth

23.12x

Need 23.12x P40

NVIDIA GeForce GT 710

Compute (FP32-eq)

2459.02x

Need 2459.02x GeForce GT 710

FP32 Compute

122.95x

Need 122.95x GeForce GT 710

VRAM

96.00x

Need 96.00x GeForce GT 710

Memory Bandwidth

555.56x

Need 555.56x GeForce GT 710

NVIDIA A100 SXM

Compute (FP32-eq)

5.77x

Need 5.77x A100 SXM

FP32 Compute

2.31x

Need 2.31x A100 SXM

VRAM

2.40x

Need 2.40x A100 SXM

Memory Bandwidth

3.92x

Need 3.92x A100 SXM

To match 1x NVIDIA P40

Groq LPU Inference Engine

VRAM

0.10x

LPU Inference Engine has 9.58x more

AMD MI300X

Compute (FP32-eq)

0.02x

MI300X is 54.46x faster

FP32 Compute

0.07x

MI300X is 13.62x faster

VRAM

0.13x

MI300X has 8.00x more

Memory Bandwidth

0.07x

MI300X has 15.32x more

NVIDIA GeForce RTX 5060 Ti

Compute (FP32-eq)

0.27x

GeForce RTX 5060 Ti is 3.67x faster

FP32 Compute

0.48x

GeForce RTX 5060 Ti is 2.08x faster

VRAM

1.50x

Need 1.50x GeForce RTX 5060 Ti

Memory Bandwidth

0.64x

GeForce RTX 5060 Ti has 1.57x more

NVIDIA B100

Compute (FP32-eq)

0.01x

B100 is 75.00x faster

FP32 Compute

0.27x

B100 is 3.75x faster

VRAM

0.13x

B100 has 8.00x more

Memory Bandwidth

0.04x

B100 has 23.12x more

NVIDIA GeForce GT 710

Compute (FP32-eq)

32.79x

Need 32.79x GeForce GT 710

FP32 Compute

32.79x

Need 32.79x GeForce GT 710

VRAM

12.00x

Need 12.00x GeForce GT 710

Memory Bandwidth

24.03x

Need 24.03x GeForce GT 710

NVIDIA A100 SXM

Compute (FP32-eq)

0.08x

A100 SXM is 13.00x faster

FP32 Compute

0.62x

A100 SXM is 1.63x faster

VRAM

0.30x

A100 SXM has 3.33x more

Memory Bandwidth

0.17x

A100 SXM has 5.89x more

To match 1x NVIDIA GeForce GT 710

Groq LPU Inference Engine

VRAM

0.01x

LPU Inference Engine has 115.00x more

AMD MI300X

Compute (FP32-eq)

0.00x

MI300X is 1785.52x faster

FP32 Compute

0.00x

MI300X is 446.45x faster

VRAM

0.01x

MI300X has 96.00x more

Memory Bandwidth

0.00x

MI300X has 368.06x more

NVIDIA GeForce RTX 5060 Ti

Compute (FP32-eq)

0.01x

GeForce RTX 5060 Ti is 120.22x faster

FP32 Compute

0.01x

GeForce RTX 5060 Ti is 68.31x faster

VRAM

0.13x

GeForce RTX 5060 Ti has 8.00x more

Memory Bandwidth

0.03x

GeForce RTX 5060 Ti has 37.78x more

NVIDIA B100

Compute (FP32-eq)

0.00x

B100 is 2459.02x faster

FP32 Compute

0.01x

B100 is 122.95x faster

VRAM

0.01x

B100 has 96.00x more

Memory Bandwidth

0.00x

B100 has 555.56x more

NVIDIA P40

Compute (FP32-eq)

0.03x

P40 is 32.79x faster

FP32 Compute

0.03x

P40 is 32.79x faster

VRAM

0.08x

P40 has 12.00x more

Memory Bandwidth

0.04x

P40 has 24.03x more

NVIDIA A100 SXM

Compute (FP32-eq)

0.00x

A100 SXM is 426.23x faster

FP32 Compute

0.02x

A100 SXM is 53.28x faster

VRAM

0.03x

A100 SXM has 40.00x more

Memory Bandwidth

0.01x

A100 SXM has 141.60x more

To match 1x NVIDIA A100 SXM

Groq LPU Inference Engine

VRAM

0.35x

LPU Inference Engine has 2.88x more

AMD MI300X

Compute (FP32-eq)

0.24x

MI300X is 4.19x faster

FP32 Compute

0.12x

MI300X is 8.38x faster

VRAM

0.42x

MI300X has 2.40x more

Memory Bandwidth

0.38x

MI300X has 2.60x more

NVIDIA GeForce RTX 5060 Ti

Compute (FP32-eq)

3.55x

Need 3.55x GeForce RTX 5060 Ti

FP32 Compute

0.78x

GeForce RTX 5060 Ti is 1.28x faster

VRAM

5.00x

Need 5.00x GeForce RTX 5060 Ti

Memory Bandwidth

3.75x

Need 3.75x GeForce RTX 5060 Ti

NVIDIA B100

Compute (FP32-eq)

0.17x

B100 is 5.77x faster

FP32 Compute

0.43x

B100 is 2.31x faster

VRAM

0.42x

B100 has 2.40x more

Memory Bandwidth

0.25x

B100 has 3.92x more

NVIDIA P40

Compute (FP32-eq)

13.00x

Need 13.00x P40

FP32 Compute

1.63x

Need 1.63x P40

VRAM

3.33x

Need 3.33x P40

Memory Bandwidth

5.89x

Need 5.89x P40

NVIDIA GeForce GT 710

Compute (FP32-eq)

426.23x

Need 426.23x GeForce GT 710

FP32 Compute

53.28x

Need 53.28x GeForce GT 710

VRAM

40.00x

Need 40.00x GeForce GT 710

Memory Bandwidth

141.60x

Need 141.60x GeForce GT 710

Pricing

Price Type	LPU Inference Engine	MI300X	GeForce RTX 5060 Ti	B100	P40	GeForce GT 710	A100 SXM
CAPEX (Street Price)	—	$35,000	—	—	—	—	$15,000
OPEX (per hour)	—	$10.40/hr	$0.09/hr	—	$2.07/hr	$0.07/hr	$4.05/hr
Price per TFLOPs (FP32-eq)	—	$54	—	—	—	—	$96