Compare XPUs

Select up to 5 XPUs to compare side-by-side

Select XPUs to Compare

Clear all (8)Jump to results

Filter by Vendor

Showing 128 XPUs • 8 selected

Alibaba

Hanguang 800

AMD

MI100

23.1 TFLOPs

AMD

MI210

181 TFLOPs

AMD

MI250X

383 TFLOPs

AMD

MI300X

1,307 TFLOPs

AMD

MI325X

1,400 TFLOPs

AMD

MI350X

2,100 TFLOPs

AMD

MI355X

5,300 TFLOPs

AMD

Radeon Pro V520

23.04 TFLOPs

AWS

Inferentia2

190 TFLOPs

AWS

Trainium

190 TFLOPs

AWS

Trainium2

680 TFLOPs

Baidu

Kunlun II

Biren Technology

BR100

Cambricon

MLU370

256 TFLOPs

Cerebras

WSE-3

Enflame Technology

CloudBlazer T20

FuriosaAI

RNGD (Renegade)

256 TFLOPs

FuriosaAI

Warboy

Google

TPU v4

275 TFLOPs

Google

TPU v5e

197 TFLOPs

Google

TPU v5p

459 TFLOPs

Graphcore

Bow IPU

Graphcore

IPU-M2000

Groq

LPU Inference Engine

Huawei

Ascend 910B

Iluvatar CoreX

BI-V150

300 TFLOPs

Intel

Data Center GPU Max 1100

177 TFLOPs

Intel

Data Center GPU Max 1550

419 TFLOPs

Intel Habana

Gaudi 2

432 TFLOPs

Intel Habana

Gaudi 3

1,835 TFLOPs

Multi-Metric Comparison

Relative performance across 5 key metrics (normalized to 100 = best in comparison)

Compute Performance (BF16)

Memory Capacity

Power Consumption

Power Efficiency

Specifications

Specification	Intel Habana Gaudi 2	NVIDIA T4G	NVIDIA GeForce GTX 1050 Ti	NVIDIA GeForce RTX 3090	NVIDIA L40	NVIDIA A100 SXM	NVIDIA Quadro M4000	AMD Radeon Pro V520
Architecture	Gaudi Gen2	Turing	Pascal	Ampere	Ada Lovelace	Ampere	Maxwell	RDNA 2
Form Factor	OAM	PCIe	PCIe	PCIe	PCIe	SXM	PCIe	PCIe
VRAM	96 GB	16 GB	4 GB	24 GB	48 GB	80 GB	8 GB	32 GB
Memory Bandwidth	2,450 GB/s	320 GB/s	112 GB/s	936 GB/s	864 GB/s	2,039 GB/s	192 GB/s	448 GB/s
TFLOPs (FP32)	—	8.1	2.138	35.6	45	19.5	2.57	10.1
TFLOPs (FP16)	—	—	—	—	—	312	—	—
TFLOPs	432	65	2.138	71	181.05	312	2.57	23.04
TFLOPs (FP8)	—	—	—	—	—	—	—	—
TDP	600 W	70 W	75 W	350 W	300 W	400 W	120 W	225 W
Launch Date	May 2022	May 2020	Oct 2016	Sep 2020	Oct 2022	May 2020	Jun 2015	Jun 2021

Efficiency Metrics

Metric	Gaudi 2	T4G	GeForce GTX 1050 Ti	GeForce RTX 3090	L40	A100 SXM	Quadro M4000	Radeon Pro V520
TFLOPs per Watt (FP32-eq)	0.36	0.46	0.03	0.10	0.30	0.39	0.02	0.05
Memory Bandwidth per GB	25.5 GB/s	20.0 GB/s	28.0 GB/s	39.0 GB/s	18.0 GB/s	25.5 GB/s	24.0 GB/s	14.0 GB/s

Performance Equivalence

How many units of each GPU are needed to match the performance of the others?

To match 1x Intel Habana Gaudi 2

NVIDIA T4G

Compute (FP32-eq)

6.65x

Need 6.65x T4G

VRAM

6.00x

Need 6.00x T4G

Memory Bandwidth

7.66x

Need 7.66x T4G

NVIDIA GeForce GTX 1050 Ti

Compute (FP32-eq)

101.03x

Need 101.03x GeForce GTX 1050 Ti

VRAM

24.00x

Need 24.00x GeForce GTX 1050 Ti

Memory Bandwidth

21.88x

Need 21.88x GeForce GTX 1050 Ti

NVIDIA GeForce RTX 3090

Compute (FP32-eq)

6.08x

Need 6.08x GeForce RTX 3090

VRAM

4.00x

Need 4.00x GeForce RTX 3090

Memory Bandwidth

2.62x

Need 2.62x GeForce RTX 3090

NVIDIA L40

Compute (FP32-eq)

2.39x

Need 2.39x L40

VRAM

2.00x

Need 2.00x L40

Memory Bandwidth

2.84x

Need 2.84x L40

NVIDIA A100 SXM

Compute (FP32-eq)

1.38x

Need 1.38x A100 SXM

VRAM

1.20x

Need 1.20x A100 SXM

Memory Bandwidth

1.20x

Need 1.20x A100 SXM

NVIDIA Quadro M4000

Compute (FP32-eq)

84.05x

Need 84.05x Quadro M4000

VRAM

12.00x

Need 12.00x Quadro M4000

Memory Bandwidth

12.76x

Need 12.76x Quadro M4000

AMD Radeon Pro V520

Compute (FP32-eq)

18.75x

Need 18.75x Radeon Pro V520

VRAM

3.00x

Need 3.00x Radeon Pro V520

Memory Bandwidth

5.47x

Need 5.47x Radeon Pro V520

To match 1x NVIDIA T4G

Intel Habana Gaudi 2

Compute (FP32-eq)

0.15x

Gaudi 2 is 6.65x faster

VRAM

0.17x

Gaudi 2 has 6.00x more

Memory Bandwidth

0.13x

Gaudi 2 has 7.66x more

NVIDIA GeForce GTX 1050 Ti

Compute (FP32-eq)

15.20x

Need 15.20x GeForce GTX 1050 Ti

FP32 Compute

3.79x

Need 3.79x GeForce GTX 1050 Ti

VRAM

4.00x

Need 4.00x GeForce GTX 1050 Ti

Memory Bandwidth

2.86x

Need 2.86x GeForce GTX 1050 Ti

NVIDIA GeForce RTX 3090

Compute (FP32-eq)

0.92x

GeForce RTX 3090 is 1.09x faster

FP32 Compute

0.23x

GeForce RTX 3090 is 4.40x faster

VRAM

0.67x

GeForce RTX 3090 has 1.50x more

Memory Bandwidth

0.34x

GeForce RTX 3090 has 2.92x more

NVIDIA L40

Compute (FP32-eq)

0.36x

L40 is 2.79x faster

FP32 Compute

0.18x

L40 is 5.56x faster

VRAM

0.33x

L40 has 3.00x more

Memory Bandwidth

0.37x

L40 has 2.70x more

NVIDIA A100 SXM

Compute (FP32-eq)

0.21x

A100 SXM is 4.80x faster

FP32 Compute

0.42x

A100 SXM is 2.41x faster

VRAM

0.20x

A100 SXM has 5.00x more

Memory Bandwidth

0.16x

A100 SXM has 6.37x more

NVIDIA Quadro M4000

Compute (FP32-eq)

12.65x

Need 12.65x Quadro M4000

FP32 Compute

3.15x

Need 3.15x Quadro M4000

VRAM

2.00x

Need 2.00x Quadro M4000

Memory Bandwidth

1.67x

Need 1.67x Quadro M4000

AMD Radeon Pro V520

Compute (FP32-eq)

2.82x

Need 2.82x Radeon Pro V520

FP32 Compute

0.80x

Radeon Pro V520 is 1.25x faster

VRAM

0.50x

Radeon Pro V520 has 2.00x more

Memory Bandwidth

0.71x

Radeon Pro V520 has 1.40x more

To match 1x NVIDIA GeForce GTX 1050 Ti

Intel Habana Gaudi 2

Compute (FP32-eq)

0.01x

Gaudi 2 is 101.03x faster

VRAM

0.04x

Gaudi 2 has 24.00x more

Memory Bandwidth

0.05x

Gaudi 2 has 21.88x more

NVIDIA T4G

Compute (FP32-eq)

0.07x

T4G is 15.20x faster

FP32 Compute

0.26x

T4G is 3.79x faster

VRAM

0.25x

T4G has 4.00x more

Memory Bandwidth

0.35x

T4G has 2.86x more

NVIDIA GeForce RTX 3090

Compute (FP32-eq)

0.06x

GeForce RTX 3090 is 16.60x faster

FP32 Compute

0.06x

GeForce RTX 3090 is 16.65x faster

VRAM

0.17x

GeForce RTX 3090 has 6.00x more

Memory Bandwidth

0.12x

GeForce RTX 3090 has 8.36x more

NVIDIA L40

Compute (FP32-eq)

0.02x

L40 is 42.34x faster

FP32 Compute

0.05x

L40 is 21.05x faster

VRAM

0.08x

L40 has 12.00x more

Memory Bandwidth

0.13x

L40 has 7.71x more

NVIDIA A100 SXM

Compute (FP32-eq)

0.01x

A100 SXM is 72.97x faster

FP32 Compute

0.11x

A100 SXM is 9.12x faster

VRAM

0.05x

A100 SXM has 20.00x more

Memory Bandwidth

0.05x

A100 SXM has 18.21x more

NVIDIA Quadro M4000

Compute (FP32-eq)

0.83x

Quadro M4000 is 1.20x faster

FP32 Compute

0.83x

Quadro M4000 is 1.20x faster

VRAM

0.50x

Quadro M4000 has 2.00x more

Memory Bandwidth

0.58x

Quadro M4000 has 1.71x more

AMD Radeon Pro V520

Compute (FP32-eq)

0.19x

Radeon Pro V520 is 5.39x faster

FP32 Compute

0.21x

Radeon Pro V520 is 4.72x faster

VRAM

0.13x

Radeon Pro V520 has 8.00x more

Memory Bandwidth

0.25x

Radeon Pro V520 has 4.00x more

To match 1x NVIDIA GeForce RTX 3090

Intel Habana Gaudi 2

Compute (FP32-eq)

0.16x

Gaudi 2 is 6.08x faster

VRAM

0.25x

Gaudi 2 has 4.00x more

Memory Bandwidth

0.38x

Gaudi 2 has 2.62x more

NVIDIA T4G

Compute (FP32-eq)

1.09x

Need 1.09x T4G

FP32 Compute

4.40x

Need 4.40x T4G

VRAM

1.50x

Need 1.50x T4G

Memory Bandwidth

2.92x

Need 2.92x T4G

NVIDIA GeForce GTX 1050 Ti

Compute (FP32-eq)

16.60x

Need 16.60x GeForce GTX 1050 Ti

FP32 Compute

16.65x

Need 16.65x GeForce GTX 1050 Ti

VRAM

6.00x

Need 6.00x GeForce GTX 1050 Ti

Memory Bandwidth

8.36x

Need 8.36x GeForce GTX 1050 Ti

NVIDIA L40

Compute (FP32-eq)

0.39x

L40 is 2.55x faster

FP32 Compute

0.79x

L40 is 1.26x faster

VRAM

0.50x

L40 has 2.00x more

Memory Bandwidth

1.08x

Need 1.08x L40

NVIDIA A100 SXM

Compute (FP32-eq)

0.23x

A100 SXM is 4.39x faster

FP32 Compute

1.83x

Need 1.83x A100 SXM

VRAM

0.30x

A100 SXM has 3.33x more

Memory Bandwidth

0.46x

A100 SXM has 2.18x more

NVIDIA Quadro M4000

Compute (FP32-eq)

13.81x

Need 13.81x Quadro M4000

FP32 Compute

13.85x

Need 13.85x Quadro M4000

VRAM

3.00x

Need 3.00x Quadro M4000

Memory Bandwidth

4.88x

Need 4.88x Quadro M4000

AMD Radeon Pro V520

Compute (FP32-eq)

3.08x

Need 3.08x Radeon Pro V520

FP32 Compute

3.52x

Need 3.52x Radeon Pro V520

VRAM

0.75x

Radeon Pro V520 has 1.33x more

Memory Bandwidth

2.09x

Need 2.09x Radeon Pro V520

To match 1x NVIDIA L40

Intel Habana Gaudi 2

Compute (FP32-eq)

0.42x

Gaudi 2 is 2.39x faster

VRAM

0.50x

Gaudi 2 has 2.00x more

Memory Bandwidth

0.35x

Gaudi 2 has 2.84x more

NVIDIA T4G

Compute (FP32-eq)

2.79x

Need 2.79x T4G

FP32 Compute

5.56x

Need 5.56x T4G

VRAM

3.00x

Need 3.00x T4G

Memory Bandwidth

2.70x

Need 2.70x T4G

NVIDIA GeForce GTX 1050 Ti

Compute (FP32-eq)

42.34x

Need 42.34x GeForce GTX 1050 Ti

FP32 Compute

21.05x

Need 21.05x GeForce GTX 1050 Ti

VRAM

12.00x

Need 12.00x GeForce GTX 1050 Ti

Memory Bandwidth

7.71x

Need 7.71x GeForce GTX 1050 Ti

NVIDIA GeForce RTX 3090

Compute (FP32-eq)

2.55x

Need 2.55x GeForce RTX 3090

FP32 Compute

1.26x

Need 1.26x GeForce RTX 3090

VRAM

2.00x

Need 2.00x GeForce RTX 3090

Memory Bandwidth

0.92x

GeForce RTX 3090 has 1.08x more

NVIDIA A100 SXM

Compute (FP32-eq)

0.58x

A100 SXM is 1.72x faster

FP32 Compute

2.31x

Need 2.31x A100 SXM

VRAM

0.60x

A100 SXM has 1.67x more

Memory Bandwidth

0.42x

A100 SXM has 2.36x more

NVIDIA Quadro M4000

Compute (FP32-eq)

35.22x

Need 35.22x Quadro M4000

FP32 Compute

17.51x

Need 17.51x Quadro M4000

VRAM

6.00x

Need 6.00x Quadro M4000

Memory Bandwidth

4.50x

Need 4.50x Quadro M4000

AMD Radeon Pro V520

Compute (FP32-eq)

7.86x

Need 7.86x Radeon Pro V520

FP32 Compute

4.46x

Need 4.46x Radeon Pro V520

VRAM

1.50x

Need 1.50x Radeon Pro V520

Memory Bandwidth

1.93x

Need 1.93x Radeon Pro V520

To match 1x NVIDIA A100 SXM

Intel Habana Gaudi 2

Compute (FP32-eq)

0.72x

Gaudi 2 is 1.38x faster

VRAM

0.83x

Gaudi 2 has 1.20x more

Memory Bandwidth

0.83x

Gaudi 2 has 1.20x more

NVIDIA T4G

Compute (FP32-eq)

4.80x

Need 4.80x T4G

FP32 Compute

2.41x

Need 2.41x T4G

VRAM

5.00x

Need 5.00x T4G

Memory Bandwidth

6.37x

Need 6.37x T4G

NVIDIA GeForce GTX 1050 Ti

Compute (FP32-eq)

72.97x

Need 72.97x GeForce GTX 1050 Ti

FP32 Compute

9.12x

Need 9.12x GeForce GTX 1050 Ti

VRAM

20.00x

Need 20.00x GeForce GTX 1050 Ti

Memory Bandwidth

18.21x

Need 18.21x GeForce GTX 1050 Ti

NVIDIA GeForce RTX 3090

Compute (FP32-eq)

4.39x

Need 4.39x GeForce RTX 3090

FP32 Compute

0.55x

GeForce RTX 3090 is 1.83x faster

VRAM

3.33x

Need 3.33x GeForce RTX 3090

Memory Bandwidth

2.18x

Need 2.18x GeForce RTX 3090

NVIDIA L40

Compute (FP32-eq)

1.72x

Need 1.72x L40

FP32 Compute

0.43x

L40 is 2.31x faster

VRAM

1.67x

Need 1.67x L40

Memory Bandwidth

2.36x

Need 2.36x L40

NVIDIA Quadro M4000

Compute (FP32-eq)

60.70x

Need 60.70x Quadro M4000

FP32 Compute

7.59x

Need 7.59x Quadro M4000

VRAM

10.00x

Need 10.00x Quadro M4000

Memory Bandwidth

10.62x

Need 10.62x Quadro M4000

AMD Radeon Pro V520

Compute (FP32-eq)

13.54x

Need 13.54x Radeon Pro V520

FP32 Compute

1.93x

Need 1.93x Radeon Pro V520

VRAM

2.50x

Need 2.50x Radeon Pro V520

Memory Bandwidth

4.55x

Need 4.55x Radeon Pro V520

To match 1x NVIDIA Quadro M4000

Intel Habana Gaudi 2

Compute (FP32-eq)

0.01x

Gaudi 2 is 84.05x faster

VRAM

0.08x

Gaudi 2 has 12.00x more

Memory Bandwidth

0.08x

Gaudi 2 has 12.76x more

NVIDIA T4G

Compute (FP32-eq)

0.08x

T4G is 12.65x faster

FP32 Compute

0.32x

T4G is 3.15x faster

VRAM

0.50x

T4G has 2.00x more

Memory Bandwidth

0.60x

T4G has 1.67x more

NVIDIA GeForce GTX 1050 Ti

Compute (FP32-eq)

1.20x

Need 1.20x GeForce GTX 1050 Ti

FP32 Compute

1.20x

Need 1.20x GeForce GTX 1050 Ti

VRAM

2.00x

Need 2.00x GeForce GTX 1050 Ti

Memory Bandwidth

1.71x

Need 1.71x GeForce GTX 1050 Ti

NVIDIA GeForce RTX 3090

Compute (FP32-eq)

0.07x

GeForce RTX 3090 is 13.81x faster

FP32 Compute

0.07x

GeForce RTX 3090 is 13.85x faster

VRAM

0.33x

GeForce RTX 3090 has 3.00x more

Memory Bandwidth

0.21x

GeForce RTX 3090 has 4.88x more

NVIDIA L40

Compute (FP32-eq)

0.03x

L40 is 35.22x faster

FP32 Compute

0.06x

L40 is 17.51x faster

VRAM

0.17x

L40 has 6.00x more

Memory Bandwidth

0.22x

L40 has 4.50x more

NVIDIA A100 SXM

Compute (FP32-eq)

0.02x

A100 SXM is 60.70x faster

FP32 Compute

0.13x

A100 SXM is 7.59x faster

VRAM

0.10x

A100 SXM has 10.00x more

Memory Bandwidth

0.09x

A100 SXM has 10.62x more

AMD Radeon Pro V520

Compute (FP32-eq)

0.22x

Radeon Pro V520 is 4.48x faster

FP32 Compute

0.25x

Radeon Pro V520 is 3.93x faster

VRAM

0.25x

Radeon Pro V520 has 4.00x more

Memory Bandwidth

0.43x

Radeon Pro V520 has 2.33x more

To match 1x AMD Radeon Pro V520

Intel Habana Gaudi 2

Compute (FP32-eq)

0.05x

Gaudi 2 is 18.75x faster

VRAM

0.33x

Gaudi 2 has 3.00x more

Memory Bandwidth

0.18x

Gaudi 2 has 5.47x more

NVIDIA T4G

Compute (FP32-eq)

0.35x

T4G is 2.82x faster

FP32 Compute

1.25x

Need 1.25x T4G

VRAM

2.00x

Need 2.00x T4G

Memory Bandwidth

1.40x

Need 1.40x T4G

NVIDIA GeForce GTX 1050 Ti

Compute (FP32-eq)

5.39x

Need 5.39x GeForce GTX 1050 Ti

FP32 Compute

4.72x

Need 4.72x GeForce GTX 1050 Ti

VRAM

8.00x

Need 8.00x GeForce GTX 1050 Ti

Memory Bandwidth

4.00x

Need 4.00x GeForce GTX 1050 Ti

NVIDIA GeForce RTX 3090

Compute (FP32-eq)

0.32x

GeForce RTX 3090 is 3.08x faster

FP32 Compute

0.28x

GeForce RTX 3090 is 3.52x faster

VRAM

1.33x

Need 1.33x GeForce RTX 3090

Memory Bandwidth

0.48x

GeForce RTX 3090 has 2.09x more

NVIDIA L40

Compute (FP32-eq)

0.13x

L40 is 7.86x faster

FP32 Compute

0.22x

L40 is 4.46x faster

VRAM

0.67x

L40 has 1.50x more

Memory Bandwidth

0.52x

L40 has 1.93x more

NVIDIA A100 SXM

Compute (FP32-eq)

0.07x

A100 SXM is 13.54x faster

FP32 Compute

0.52x

A100 SXM is 1.93x faster

VRAM

0.40x

A100 SXM has 2.50x more

Memory Bandwidth

0.22x

A100 SXM has 4.55x more

NVIDIA Quadro M4000

Compute (FP32-eq)

4.48x

Need 4.48x Quadro M4000

FP32 Compute

3.93x

Need 3.93x Quadro M4000

VRAM

4.00x

Need 4.00x Quadro M4000

Memory Bandwidth

2.33x

Need 2.33x Quadro M4000

Pricing

Price Type	Gaudi 2	T4G	GeForce GTX 1050 Ti	GeForce RTX 3090	L40	A100 SXM	Quadro M4000	Radeon Pro V520
CAPEX (Street Price)	—	—	—	—	—	$15,000	—	—
OPEX (per hour)	—	$0.42/hr	$0.04/hr	$0.11/hr	$0.69/hr	$4.05/hr	$0.45/hr	$0.38/hr
Price per TFLOPs (FP32-eq)	—	—	—	—	—	$96	—	—