Compare XPUs

Select up to 5 XPUs to compare side-by-side

Select XPUs to Compare

Clear all (7)Jump to results

Filter by Vendor

Showing 128 XPUs • 7 selected

Alibaba

Hanguang 800

AMD

MI100

23.1 TFLOPs

AMD

MI210

181 TFLOPs

AMD

MI250X

383 TFLOPs

AMD

MI300X

1,307 TFLOPs

AMD

MI325X

1,400 TFLOPs

AMD

MI350X

2,100 TFLOPs

AMD

MI355X

5,300 TFLOPs

AMD

Radeon Pro V520

23.04 TFLOPs

AWS

Inferentia2

190 TFLOPs

AWS

Trainium

190 TFLOPs

AWS

Trainium2

680 TFLOPs

Baidu

Kunlun II

Biren Technology

BR100

Cambricon

MLU370

256 TFLOPs

Cerebras

WSE-3

Enflame Technology

CloudBlazer T20

FuriosaAI

RNGD (Renegade)

256 TFLOPs

FuriosaAI

Warboy

Google

TPU v4

275 TFLOPs

Google

TPU v5e

197 TFLOPs

Google

TPU v5p

459 TFLOPs

Graphcore

Bow IPU

Graphcore

IPU-M2000

Groq

LPU Inference Engine

Huawei

Ascend 910B

Iluvatar CoreX

BI-V150

300 TFLOPs

Intel

Data Center GPU Max 1100

177 TFLOPs

Intel

Data Center GPU Max 1550

419 TFLOPs

Intel Habana

Gaudi 2

432 TFLOPs

Intel Habana

Gaudi 3

1,835 TFLOPs

Multi-Metric Comparison

Relative performance across 5 key metrics (normalized to 100 = best in comparison)

Compute Performance (BF16)

Memory Capacity

Power Consumption

Power Efficiency

Specifications

Specification	Intel Habana Gaudi 2	Intel Habana Gaudi 3	Google TPU v4	NVIDIA Quadro P600	NVIDIA GeForce RTX 3090	NVIDIA A100 SXM	NVIDIA GeForce RTX 3070 Ti
Architecture	Gaudi Gen2	Gaudi Gen3	TPU v4	Pascal	Ampere	Ampere	Ampere
Form Factor	OAM	OAM	Mezzanine	PCIe	PCIe	SXM	PCIe
VRAM	96 GB	128 GB	32 GB	2 GB	24 GB	80 GB	8 GB
Memory Bandwidth	2,450 GB/s	3,700 GB/s	—	64 GB/s	936 GB/s	2,039 GB/s	608 GB/s
TFLOPs (FP32)	—	—	—	1.117	35.6	19.5	21.8
TFLOPs (FP16)	—	—	—	—	—	312	—
TFLOPs	432	1,835	275	1.117	71	312	43.2
TFLOPs (FP8)	—	3,670	—	—	—	—	—
TDP	600 W	900 W	300 W	40 W	350 W	400 W	290 W
Launch Date	May 2022	Apr 2024	May 2021	Feb 2017	Sep 2020	May 2020	Jun 2021

Efficiency Metrics

Metric	Gaudi 2	Gaudi 3	TPU v4	Quadro P600	GeForce RTX 3090	A100 SXM	GeForce RTX 3070 Ti
TFLOPs per Watt (FP32-eq)	0.36	1.02	0.46	0.03	0.10	0.39	0.07
Memory Bandwidth per GB	25.5 GB/s	28.9 GB/s	—	32.0 GB/s	39.0 GB/s	25.5 GB/s	76.0 GB/s

Performance Equivalence

How many units of each GPU are needed to match the performance of the others?

To match 1x Intel Habana Gaudi 2

Intel Habana Gaudi 3

Compute (FP32-eq)

0.24x

Gaudi 3 is 4.25x faster

VRAM

0.75x

Gaudi 3 has 1.33x more

Memory Bandwidth

0.66x

Gaudi 3 has 1.51x more

Google TPU v4

Compute (FP32-eq)

1.57x

Need 1.57x TPU v4

VRAM

3.00x

Need 3.00x TPU v4

NVIDIA Quadro P600

Compute (FP32-eq)

193.38x

Need 193.38x Quadro P600

VRAM

48.00x

Need 48.00x Quadro P600

Memory Bandwidth

38.28x

Need 38.28x Quadro P600

NVIDIA GeForce RTX 3090

Compute (FP32-eq)

6.08x

Need 6.08x GeForce RTX 3090

VRAM

4.00x

Need 4.00x GeForce RTX 3090

Memory Bandwidth

2.62x

Need 2.62x GeForce RTX 3090

NVIDIA A100 SXM

Compute (FP32-eq)

1.38x

Need 1.38x A100 SXM

VRAM

1.20x

Need 1.20x A100 SXM

Memory Bandwidth

1.20x

Need 1.20x A100 SXM

NVIDIA GeForce RTX 3070 Ti

Compute (FP32-eq)

10.00x

Need 10.00x GeForce RTX 3070 Ti

VRAM

12.00x

Need 12.00x GeForce RTX 3070 Ti

Memory Bandwidth

4.03x

Need 4.03x GeForce RTX 3070 Ti

To match 1x Intel Habana Gaudi 3

Intel Habana Gaudi 2

Compute (FP32-eq)

4.25x

Need 4.25x Gaudi 2

VRAM

1.33x

Need 1.33x Gaudi 2

Memory Bandwidth

1.51x

Need 1.51x Gaudi 2

Google TPU v4

Compute (FP32-eq)

6.67x

Need 6.67x TPU v4

VRAM

4.00x

Need 4.00x TPU v4

NVIDIA Quadro P600

Compute (FP32-eq)

821.40x

Need 821.40x Quadro P600

VRAM

64.00x

Need 64.00x Quadro P600

Memory Bandwidth

57.81x

Need 57.81x Quadro P600

NVIDIA GeForce RTX 3090

Compute (FP32-eq)

25.85x

Need 25.85x GeForce RTX 3090

VRAM

5.33x

Need 5.33x GeForce RTX 3090

Memory Bandwidth

3.95x

Need 3.95x GeForce RTX 3090

NVIDIA A100 SXM

Compute (FP32-eq)

5.88x

Need 5.88x A100 SXM

VRAM

1.60x

Need 1.60x A100 SXM

Memory Bandwidth

1.81x

Need 1.81x A100 SXM

NVIDIA GeForce RTX 3070 Ti

Compute (FP32-eq)

42.48x

Need 42.48x GeForce RTX 3070 Ti

VRAM

16.00x

Need 16.00x GeForce RTX 3070 Ti

Memory Bandwidth

6.09x

Need 6.09x GeForce RTX 3070 Ti

To match 1x Google TPU v4

Intel Habana Gaudi 2

Compute (FP32-eq)

0.64x

Gaudi 2 is 1.57x faster

VRAM

0.33x

Gaudi 2 has 3.00x more

Intel Habana Gaudi 3

Compute (FP32-eq)

0.15x

Gaudi 3 is 6.67x faster

VRAM

0.25x

Gaudi 3 has 4.00x more

NVIDIA Quadro P600

Compute (FP32-eq)

123.10x

Need 123.10x Quadro P600

VRAM

16.00x

Need 16.00x Quadro P600

NVIDIA GeForce RTX 3090

Compute (FP32-eq)

3.87x

Need 3.87x GeForce RTX 3090

VRAM

1.33x

Need 1.33x GeForce RTX 3090

NVIDIA A100 SXM

Compute (FP32-eq)

0.88x

A100 SXM is 1.13x faster

VRAM

0.40x

A100 SXM has 2.50x more

NVIDIA GeForce RTX 3070 Ti

Compute (FP32-eq)

6.37x

Need 6.37x GeForce RTX 3070 Ti

VRAM

4.00x

Need 4.00x GeForce RTX 3070 Ti

To match 1x NVIDIA Quadro P600

Intel Habana Gaudi 2

Compute (FP32-eq)

0.01x

Gaudi 2 is 193.38x faster

VRAM

0.02x

Gaudi 2 has 48.00x more

Memory Bandwidth

0.03x

Gaudi 2 has 38.28x more

Intel Habana Gaudi 3

Compute (FP32-eq)

0.00x

Gaudi 3 is 821.40x faster

VRAM

0.02x

Gaudi 3 has 64.00x more

Memory Bandwidth

0.02x

Gaudi 3 has 57.81x more

Google TPU v4

Compute (FP32-eq)

0.01x

TPU v4 is 123.10x faster

VRAM

0.06x

TPU v4 has 16.00x more

NVIDIA GeForce RTX 3090

Compute (FP32-eq)

0.03x

GeForce RTX 3090 is 31.78x faster

FP32 Compute

0.03x

GeForce RTX 3090 is 31.87x faster

VRAM

0.08x

GeForce RTX 3090 has 12.00x more

Memory Bandwidth

0.07x

GeForce RTX 3090 has 14.62x more

NVIDIA A100 SXM

Compute (FP32-eq)

0.01x

A100 SXM is 139.66x faster

FP32 Compute

0.06x

A100 SXM is 17.46x faster

VRAM

0.03x

A100 SXM has 40.00x more

Memory Bandwidth

0.03x

A100 SXM has 31.86x more

NVIDIA GeForce RTX 3070 Ti

Compute (FP32-eq)

0.05x

GeForce RTX 3070 Ti is 19.34x faster

FP32 Compute

0.05x

GeForce RTX 3070 Ti is 19.52x faster

VRAM

0.25x

GeForce RTX 3070 Ti has 4.00x more

Memory Bandwidth

0.11x

GeForce RTX 3070 Ti has 9.50x more

To match 1x NVIDIA GeForce RTX 3090

Intel Habana Gaudi 2

Compute (FP32-eq)

0.16x

Gaudi 2 is 6.08x faster

VRAM

0.25x

Gaudi 2 has 4.00x more

Memory Bandwidth

0.38x

Gaudi 2 has 2.62x more

Intel Habana Gaudi 3

Compute (FP32-eq)

0.04x

Gaudi 3 is 25.85x faster

VRAM

0.19x

Gaudi 3 has 5.33x more

Memory Bandwidth

0.25x

Gaudi 3 has 3.95x more

Google TPU v4

Compute (FP32-eq)

0.26x

TPU v4 is 3.87x faster

VRAM

0.75x

TPU v4 has 1.33x more

NVIDIA Quadro P600

Compute (FP32-eq)

31.78x

Need 31.78x Quadro P600

FP32 Compute

31.87x

Need 31.87x Quadro P600

VRAM

12.00x

Need 12.00x Quadro P600

Memory Bandwidth

14.63x

Need 14.63x Quadro P600

NVIDIA A100 SXM

Compute (FP32-eq)

0.23x

A100 SXM is 4.39x faster

FP32 Compute

1.83x

Need 1.83x A100 SXM

VRAM

0.30x

A100 SXM has 3.33x more

Memory Bandwidth

0.46x

A100 SXM has 2.18x more

NVIDIA GeForce RTX 3070 Ti

Compute (FP32-eq)

1.64x

Need 1.64x GeForce RTX 3070 Ti

FP32 Compute

1.63x

Need 1.63x GeForce RTX 3070 Ti

VRAM

3.00x

Need 3.00x GeForce RTX 3070 Ti

Memory Bandwidth

1.54x

Need 1.54x GeForce RTX 3070 Ti

To match 1x NVIDIA A100 SXM

Intel Habana Gaudi 2

Compute (FP32-eq)

0.72x

Gaudi 2 is 1.38x faster

VRAM

0.83x

Gaudi 2 has 1.20x more

Memory Bandwidth

0.83x

Gaudi 2 has 1.20x more

Intel Habana Gaudi 3

Compute (FP32-eq)

0.17x

Gaudi 3 is 5.88x faster

VRAM

0.63x

Gaudi 3 has 1.60x more

Memory Bandwidth

0.55x

Gaudi 3 has 1.81x more

Google TPU v4

Compute (FP32-eq)

1.13x

Need 1.13x TPU v4

VRAM

2.50x

Need 2.50x TPU v4

NVIDIA Quadro P600

Compute (FP32-eq)

139.66x

Need 139.66x Quadro P600

FP32 Compute

17.46x

Need 17.46x Quadro P600

VRAM

40.00x

Need 40.00x Quadro P600

Memory Bandwidth

31.86x

Need 31.86x Quadro P600

NVIDIA GeForce RTX 3090

Compute (FP32-eq)

4.39x

Need 4.39x GeForce RTX 3090

FP32 Compute

0.55x

GeForce RTX 3090 is 1.83x faster

VRAM

3.33x

Need 3.33x GeForce RTX 3090

Memory Bandwidth

2.18x

Need 2.18x GeForce RTX 3090

NVIDIA GeForce RTX 3070 Ti

Compute (FP32-eq)

7.22x

Need 7.22x GeForce RTX 3070 Ti

FP32 Compute

0.89x

GeForce RTX 3070 Ti is 1.12x faster

VRAM

10.00x

Need 10.00x GeForce RTX 3070 Ti

Memory Bandwidth

3.35x

Need 3.35x GeForce RTX 3070 Ti

To match 1x NVIDIA GeForce RTX 3070 Ti

Intel Habana Gaudi 2

Compute (FP32-eq)

0.10x

Gaudi 2 is 10.00x faster

VRAM

0.08x

Gaudi 2 has 12.00x more

Memory Bandwidth

0.25x

Gaudi 2 has 4.03x more

Intel Habana Gaudi 3

Compute (FP32-eq)

0.02x

Gaudi 3 is 42.48x faster

VRAM

0.06x

Gaudi 3 has 16.00x more

Memory Bandwidth

0.16x

Gaudi 3 has 6.09x more

Google TPU v4

Compute (FP32-eq)

0.16x

TPU v4 is 6.37x faster

VRAM

0.25x

TPU v4 has 4.00x more

NVIDIA Quadro P600

Compute (FP32-eq)

19.34x

Need 19.34x Quadro P600

FP32 Compute

19.52x

Need 19.52x Quadro P600

VRAM

4.00x

Need 4.00x Quadro P600

Memory Bandwidth

9.50x

Need 9.50x Quadro P600

NVIDIA GeForce RTX 3090

Compute (FP32-eq)

0.61x

GeForce RTX 3090 is 1.64x faster

FP32 Compute

0.61x

GeForce RTX 3090 is 1.63x faster

VRAM

0.33x

GeForce RTX 3090 has 3.00x more

Memory Bandwidth

0.65x

GeForce RTX 3090 has 1.54x more

NVIDIA A100 SXM

Compute (FP32-eq)

0.14x

A100 SXM is 7.22x faster

FP32 Compute

1.12x

Need 1.12x A100 SXM

VRAM

0.10x

A100 SXM has 10.00x more

Memory Bandwidth

0.30x

A100 SXM has 3.35x more

Pricing

Price Type	Gaudi 2	Gaudi 3	TPU v4	Quadro P600	GeForce RTX 3090	A100 SXM	GeForce RTX 3070 Ti
CAPEX (Street Price)	—	$15,000	—	—	—	$15,000	—
OPEX (per hour)	—	$1.20/hr	$3.00/hr	$0.05/hr	$0.11/hr	$4.05/hr	$0.08/hr
Price per TFLOPs (FP32-eq)	—	$16	—	—	—	$96	—