Compare XPUs

Select up to 5 XPUs to compare side-by-side

Select XPUs to Compare

Clear all (6)Jump to results

Filter by Vendor

Showing 128 XPUs • 6 selected

Alibaba

Hanguang 800

AMD

MI100

23.1 TFLOPs

AMD

MI210

181 TFLOPs

AMD

MI250X

383 TFLOPs

AMD

MI300X

1,307 TFLOPs

AMD

MI325X

1,400 TFLOPs

AMD

MI350X

2,100 TFLOPs

AMD

MI355X

5,300 TFLOPs

AMD

Radeon Pro V520

23.04 TFLOPs

AWS

Inferentia2

190 TFLOPs

AWS

Trainium

190 TFLOPs

AWS

Trainium2

680 TFLOPs

Baidu

Kunlun II

Biren Technology

BR100

Cambricon

MLU370

256 TFLOPs

Cerebras

WSE-3

Enflame Technology

CloudBlazer T20

FuriosaAI

RNGD (Renegade)

256 TFLOPs

FuriosaAI

Warboy

Google

TPU v4

275 TFLOPs

Google

TPU v5e

197 TFLOPs

Google

TPU v5p

459 TFLOPs

Graphcore

Bow IPU

Graphcore

IPU-M2000

Groq

LPU Inference Engine

Huawei

Ascend 910B

Iluvatar CoreX

BI-V150

300 TFLOPs

Intel

Data Center GPU Max 1100

177 TFLOPs

Intel

Data Center GPU Max 1550

419 TFLOPs

Intel Habana

Gaudi 2

432 TFLOPs

Intel Habana

Gaudi 3

1,835 TFLOPs

Multi-Metric Comparison

Relative performance across 5 key metrics (normalized to 100 = best in comparison)

Compute Performance (BF16)

Memory Capacity

Power Consumption

Power Efficiency

Specifications

Specification	Intel Habana Gaudi 3	Meta MTIA v1	NVIDIA L40S	FuriosaAI RNGD (Renegade)	NVIDIA A16	NVIDIA GeForce RTX 3090
Architecture	Gaudi Gen3	Meta Training & Inference Accelerator	Ada Lovelace	Tensor Contraction Processor	Ampere	Ampere
Form Factor	OAM	—	PCIe	PCIe	PCIe	PCIe
VRAM	128 GB	128 GB	48 GB	48 GB	64 GB	24 GB
Memory Bandwidth	3,700 GB/s	—	864 GB/s	1,500 GB/s	232 GB/s	936 GB/s
TFLOPs (FP32)	—	—	91.6	—	9.5	35.6
TFLOPs (FP16)	—	—	733	—	—	—
TFLOPs	1,835	—	733	256	71.2	71
TFLOPs (FP8)	3,670	—	1,466	512	—	—
TDP	900 W	400 W	350 W	180 W	250 W	350 W
Launch Date	Apr 2024	May 2023	Oct 2023	Jan 2024	Apr 2021	Sep 2020

Efficiency Metrics

Metric	Gaudi 3	MTIA v1	L40S	RNGD (Renegade)	A16	GeForce RTX 3090
TFLOPs per Watt (FP32-eq)	1.02	—	1.05	0.71	0.14	0.10
Memory Bandwidth per GB	28.9 GB/s	—	18.0 GB/s	31.3 GB/s	3.6 GB/s	39.0 GB/s

Performance Equivalence

How many units of each GPU are needed to match the performance of the others?

To match 1x Intel Habana Gaudi 3

Meta MTIA v1

VRAM

1.00x

MTIA v1 has 1.00x more

NVIDIA L40S

Compute (FP32-eq)

2.50x

Need 2.50x L40S

VRAM

2.67x

Need 2.67x L40S

Memory Bandwidth

4.28x

Need 4.28x L40S

FuriosaAI RNGD (Renegade)

Compute (FP32-eq)

7.17x

Need 7.17x RNGD (Renegade)

VRAM

2.67x

Need 2.67x RNGD (Renegade)

Memory Bandwidth

2.47x

Need 2.47x RNGD (Renegade)

NVIDIA A16

Compute (FP32-eq)

25.77x

Need 25.77x A16

VRAM

2.00x

Need 2.00x A16

Memory Bandwidth

15.95x

Need 15.95x A16

NVIDIA GeForce RTX 3090

Compute (FP32-eq)

25.85x

Need 25.85x GeForce RTX 3090

VRAM

5.33x

Need 5.33x GeForce RTX 3090

Memory Bandwidth

3.95x

Need 3.95x GeForce RTX 3090

To match 1x Meta MTIA v1

Intel Habana Gaudi 3

VRAM

1.00x

Gaudi 3 has 1.00x more

NVIDIA L40S

VRAM

2.67x

Need 2.67x L40S

FuriosaAI RNGD (Renegade)

VRAM

2.67x

Need 2.67x RNGD (Renegade)

NVIDIA A16

VRAM

2.00x

Need 2.00x A16

NVIDIA GeForce RTX 3090

VRAM

5.33x

Need 5.33x GeForce RTX 3090

To match 1x NVIDIA L40S

Intel Habana Gaudi 3

Compute (FP32-eq)

0.40x

Gaudi 3 is 2.50x faster

VRAM

0.38x

Gaudi 3 has 2.67x more

Memory Bandwidth

0.23x

Gaudi 3 has 4.28x more

Meta MTIA v1

VRAM

0.38x

MTIA v1 has 2.67x more

FuriosaAI RNGD (Renegade)

Compute (FP32-eq)

2.86x

Need 2.86x RNGD (Renegade)

VRAM

1.00x

RNGD (Renegade) has 1.00x more

Memory Bandwidth

0.58x

RNGD (Renegade) has 1.74x more

NVIDIA A16

Compute (FP32-eq)

10.29x

Need 10.29x A16

FP32 Compute

9.64x

Need 9.64x A16

VRAM

0.75x

A16 has 1.33x more

Memory Bandwidth

3.72x

Need 3.72x A16

NVIDIA GeForce RTX 3090

Compute (FP32-eq)

10.32x

Need 10.32x GeForce RTX 3090

FP32 Compute

2.57x

Need 2.57x GeForce RTX 3090

VRAM

2.00x

Need 2.00x GeForce RTX 3090

Memory Bandwidth

0.92x

GeForce RTX 3090 has 1.08x more

To match 1x FuriosaAI RNGD (Renegade)

Intel Habana Gaudi 3

Compute (FP32-eq)

0.14x

Gaudi 3 is 7.17x faster

VRAM

0.38x

Gaudi 3 has 2.67x more

Memory Bandwidth

0.41x

Gaudi 3 has 2.47x more

Meta MTIA v1

VRAM

0.38x

MTIA v1 has 2.67x more

NVIDIA L40S

Compute (FP32-eq)

0.35x

L40S is 2.86x faster

VRAM

1.00x

L40S has 1.00x more

Memory Bandwidth

1.74x

Need 1.74x L40S

NVIDIA A16

Compute (FP32-eq)

3.60x

Need 3.60x A16

VRAM

0.75x

A16 has 1.33x more

Memory Bandwidth

6.47x

Need 6.47x A16

NVIDIA GeForce RTX 3090

Compute (FP32-eq)

3.61x

Need 3.61x GeForce RTX 3090

VRAM

2.00x

Need 2.00x GeForce RTX 3090

Memory Bandwidth

1.60x

Need 1.60x GeForce RTX 3090

To match 1x NVIDIA A16

Intel Habana Gaudi 3

Compute (FP32-eq)

0.04x

Gaudi 3 is 25.77x faster

VRAM

0.50x

Gaudi 3 has 2.00x more

Memory Bandwidth

0.06x

Gaudi 3 has 15.95x more

Meta MTIA v1

VRAM

0.50x

MTIA v1 has 2.00x more

NVIDIA L40S

Compute (FP32-eq)

0.10x

L40S is 10.29x faster

FP32 Compute

0.10x

L40S is 9.64x faster

VRAM

1.33x

Need 1.33x L40S

Memory Bandwidth

0.27x

L40S has 3.72x more

FuriosaAI RNGD (Renegade)

Compute (FP32-eq)

0.28x

RNGD (Renegade) is 3.60x faster

VRAM

1.33x

Need 1.33x RNGD (Renegade)

Memory Bandwidth

0.15x

RNGD (Renegade) has 6.47x more

NVIDIA GeForce RTX 3090

Compute (FP32-eq)

1.00x

Need 1.00x GeForce RTX 3090

FP32 Compute

0.27x

GeForce RTX 3090 is 3.75x faster

VRAM

2.67x

Need 2.67x GeForce RTX 3090

Memory Bandwidth

0.25x

GeForce RTX 3090 has 4.03x more

To match 1x NVIDIA GeForce RTX 3090

Intel Habana Gaudi 3

Compute (FP32-eq)

0.04x

Gaudi 3 is 25.85x faster

VRAM

0.19x

Gaudi 3 has 5.33x more

Memory Bandwidth

0.25x

Gaudi 3 has 3.95x more

Meta MTIA v1

VRAM

0.19x

MTIA v1 has 5.33x more

NVIDIA L40S

Compute (FP32-eq)

0.10x

L40S is 10.32x faster

FP32 Compute

0.39x

L40S is 2.57x faster

VRAM

0.50x

L40S has 2.00x more

Memory Bandwidth

1.08x

Need 1.08x L40S

FuriosaAI RNGD (Renegade)

Compute (FP32-eq)

0.28x

RNGD (Renegade) is 3.61x faster

VRAM

0.50x

RNGD (Renegade) has 2.00x more

Memory Bandwidth

0.62x

RNGD (Renegade) has 1.60x more

NVIDIA A16

Compute (FP32-eq)

1.00x

A16 is 1.00x faster

FP32 Compute

3.75x

Need 3.75x A16

VRAM

0.38x

A16 has 2.67x more

Memory Bandwidth

4.03x

Need 4.03x A16

Pricing

Price Type	Gaudi 3	MTIA v1	L40S	RNGD (Renegade)	A16	GeForce RTX 3090
CAPEX (Street Price)	$15,000	—	$10,000	—	—	—
OPEX (per hour)	$1.20/hr	—	$1.50/hr	—	$0.47/hr	$0.11/hr
Price per TFLOPs (FP32-eq)	$16	—	$27	—	—	—