Compare XPUs

Select up to 5 XPUs to compare side-by-side

Select XPUs to Compare

Clear all (9)Jump to results

Filter by Vendor

Showing 128 XPUs • 9 selected

Alibaba

Hanguang 800

AMD

MI100

23.1 TFLOPs

AMD

MI210

181 TFLOPs

AMD

MI250X

383 TFLOPs

AMD

MI300X

1,307 TFLOPs

AMD

MI325X

1,400 TFLOPs

AMD

MI350X

2,100 TFLOPs

AMD

MI355X

5,300 TFLOPs

AMD

Radeon Pro V520

23.04 TFLOPs

AWS

Inferentia2

190 TFLOPs

AWS

Trainium

190 TFLOPs

AWS

Trainium2

680 TFLOPs

Baidu

Kunlun II

Biren Technology

BR100

Cambricon

MLU370

256 TFLOPs

Cerebras

WSE-3

Enflame Technology

CloudBlazer T20

FuriosaAI

RNGD (Renegade)

256 TFLOPs

FuriosaAI

Warboy

Google

TPU v4

275 TFLOPs

Google

TPU v5e

197 TFLOPs

Google

TPU v5p

459 TFLOPs

Graphcore

Bow IPU

Graphcore

IPU-M2000

Groq

LPU Inference Engine

Huawei

Ascend 910B

Iluvatar CoreX

BI-V150

300 TFLOPs

Intel

Data Center GPU Max 1100

177 TFLOPs

Intel

Data Center GPU Max 1550

419 TFLOPs

Intel Habana

Gaudi 2

432 TFLOPs

Intel Habana

Gaudi 3

1,835 TFLOPs

Multi-Metric Comparison

Relative performance across 5 key metrics (normalized to 100 = best in comparison)

Compute Performance (BF16)

Memory Capacity

Power Consumption

Power Efficiency

Specifications

Specification	Intel Habana Gaudi 2	Groq LPU Inference Engine	NVIDIA T4G	NVIDIA GeForce GTX 1050 Ti	NVIDIA GeForce RTX 3090	NVIDIA L40	NVIDIA A100 SXM	NVIDIA RTX 4000	NVIDIA Quadro M4000
Architecture	Gaudi Gen2	TSP (Tensor Streaming Processor)	Turing	Pascal	Ampere	Ada Lovelace	Ampere	Turing	Maxwell
Form Factor	OAM	—	PCIe	PCIe	PCIe	PCIe	SXM	PCIe	PCIe
VRAM	96 GB	230 GB	16 GB	4 GB	24 GB	48 GB	80 GB	8 GB	8 GB
Memory Bandwidth	2,450 GB/s	—	320 GB/s	112 GB/s	936 GB/s	864 GB/s	2,039 GB/s	416 GB/s	192 GB/s
TFLOPs (FP32)	—	—	8.1	2.138	35.6	45	19.5	7.1	2.57
TFLOPs (FP16)	—	—	—	—	—	—	312	—	—
TFLOPs	432	—	65	2.138	71	181.05	312	57.6	2.57
TFLOPs (FP8)	—	—	—	—	—	—	—	—	—
TDP	600 W	300 W	70 W	75 W	350 W	300 W	400 W	160 W	120 W
Launch Date	May 2022	Feb 2024	May 2020	Oct 2016	Sep 2020	Oct 2022	May 2020	Nov 2018	Jun 2015

Efficiency Metrics

Metric	Gaudi 2	LPU Inference Engine	T4G	GeForce GTX 1050 Ti	GeForce RTX 3090	L40	A100 SXM	RTX 4000	Quadro M4000
TFLOPs per Watt (FP32-eq)	0.36	—	0.46	0.03	0.10	0.30	0.39	0.18	0.02
Memory Bandwidth per GB	25.5 GB/s	—	20.0 GB/s	28.0 GB/s	39.0 GB/s	18.0 GB/s	25.5 GB/s	52.0 GB/s	24.0 GB/s

Performance Equivalence

How many units of each GPU are needed to match the performance of the others?

To match 1x Intel Habana Gaudi 2

Groq LPU Inference Engine

VRAM

0.42x

LPU Inference Engine has 2.40x more

NVIDIA T4G

Compute (FP32-eq)

6.65x

Need 6.65x T4G

VRAM

6.00x

Need 6.00x T4G

Memory Bandwidth

7.66x

Need 7.66x T4G

NVIDIA GeForce GTX 1050 Ti

Compute (FP32-eq)

101.03x

Need 101.03x GeForce GTX 1050 Ti

VRAM

24.00x

Need 24.00x GeForce GTX 1050 Ti

Memory Bandwidth

21.88x

Need 21.88x GeForce GTX 1050 Ti

NVIDIA GeForce RTX 3090

Compute (FP32-eq)

6.08x

Need 6.08x GeForce RTX 3090

VRAM

4.00x

Need 4.00x GeForce RTX 3090

Memory Bandwidth

2.62x

Need 2.62x GeForce RTX 3090

NVIDIA L40

Compute (FP32-eq)

2.39x

Need 2.39x L40

VRAM

2.00x

Need 2.00x L40

Memory Bandwidth

2.84x

Need 2.84x L40

NVIDIA A100 SXM

Compute (FP32-eq)

1.38x

Need 1.38x A100 SXM

VRAM

1.20x

Need 1.20x A100 SXM

Memory Bandwidth

1.20x

Need 1.20x A100 SXM

NVIDIA RTX 4000

Compute (FP32-eq)

7.50x

Need 7.50x RTX 4000

VRAM

12.00x

Need 12.00x RTX 4000

Memory Bandwidth

5.89x

Need 5.89x RTX 4000

NVIDIA Quadro M4000

Compute (FP32-eq)

84.05x

Need 84.05x Quadro M4000

VRAM

12.00x

Need 12.00x Quadro M4000

Memory Bandwidth

12.76x

Need 12.76x Quadro M4000

To match 1x Groq LPU Inference Engine

Intel Habana Gaudi 2

VRAM

2.40x

Need 2.40x Gaudi 2

NVIDIA T4G

VRAM

14.38x

Need 14.38x T4G

NVIDIA GeForce GTX 1050 Ti

VRAM

57.50x

Need 57.50x GeForce GTX 1050 Ti

NVIDIA GeForce RTX 3090

VRAM

9.58x

Need 9.58x GeForce RTX 3090

NVIDIA L40

VRAM

4.79x

Need 4.79x L40

NVIDIA A100 SXM

VRAM

2.88x

Need 2.88x A100 SXM

NVIDIA RTX 4000

VRAM

28.75x

Need 28.75x RTX 4000

NVIDIA Quadro M4000

VRAM

28.75x

Need 28.75x Quadro M4000

To match 1x NVIDIA T4G

Intel Habana Gaudi 2

Compute (FP32-eq)

0.15x

Gaudi 2 is 6.65x faster

VRAM

0.17x

Gaudi 2 has 6.00x more

Memory Bandwidth

0.13x

Gaudi 2 has 7.66x more

Groq LPU Inference Engine

VRAM

0.07x

LPU Inference Engine has 14.38x more

NVIDIA GeForce GTX 1050 Ti

Compute (FP32-eq)

15.20x

Need 15.20x GeForce GTX 1050 Ti

FP32 Compute

3.79x

Need 3.79x GeForce GTX 1050 Ti

VRAM

4.00x

Need 4.00x GeForce GTX 1050 Ti

Memory Bandwidth

2.86x

Need 2.86x GeForce GTX 1050 Ti

NVIDIA GeForce RTX 3090

Compute (FP32-eq)

0.92x

GeForce RTX 3090 is 1.09x faster

FP32 Compute

0.23x

GeForce RTX 3090 is 4.40x faster

VRAM

0.67x

GeForce RTX 3090 has 1.50x more

Memory Bandwidth

0.34x

GeForce RTX 3090 has 2.92x more

NVIDIA L40

Compute (FP32-eq)

0.36x

L40 is 2.79x faster

FP32 Compute

0.18x

L40 is 5.56x faster

VRAM

0.33x

L40 has 3.00x more

Memory Bandwidth

0.37x

L40 has 2.70x more

NVIDIA A100 SXM

Compute (FP32-eq)

0.21x

A100 SXM is 4.80x faster

FP32 Compute

0.42x

A100 SXM is 2.41x faster

VRAM

0.20x

A100 SXM has 5.00x more

Memory Bandwidth

0.16x

A100 SXM has 6.37x more

NVIDIA RTX 4000

Compute (FP32-eq)

1.13x

Need 1.13x RTX 4000

FP32 Compute

1.14x

Need 1.14x RTX 4000

VRAM

2.00x

Need 2.00x RTX 4000

Memory Bandwidth

0.77x

RTX 4000 has 1.30x more

NVIDIA Quadro M4000

Compute (FP32-eq)

12.65x

Need 12.65x Quadro M4000

FP32 Compute

3.15x

Need 3.15x Quadro M4000

VRAM

2.00x

Need 2.00x Quadro M4000

Memory Bandwidth

1.67x

Need 1.67x Quadro M4000

To match 1x NVIDIA GeForce GTX 1050 Ti

Intel Habana Gaudi 2

Compute (FP32-eq)

0.01x

Gaudi 2 is 101.03x faster

VRAM

0.04x

Gaudi 2 has 24.00x more

Memory Bandwidth

0.05x

Gaudi 2 has 21.88x more

Groq LPU Inference Engine

VRAM

0.02x

LPU Inference Engine has 57.50x more

NVIDIA T4G

Compute (FP32-eq)

0.07x

T4G is 15.20x faster

FP32 Compute

0.26x

T4G is 3.79x faster

VRAM

0.25x

T4G has 4.00x more

Memory Bandwidth

0.35x

T4G has 2.86x more

NVIDIA GeForce RTX 3090

Compute (FP32-eq)

0.06x

GeForce RTX 3090 is 16.60x faster

FP32 Compute

0.06x

GeForce RTX 3090 is 16.65x faster

VRAM

0.17x

GeForce RTX 3090 has 6.00x more

Memory Bandwidth

0.12x

GeForce RTX 3090 has 8.36x more

NVIDIA L40

Compute (FP32-eq)

0.02x

L40 is 42.34x faster

FP32 Compute

0.05x

L40 is 21.05x faster

VRAM

0.08x

L40 has 12.00x more

Memory Bandwidth

0.13x

L40 has 7.71x more

NVIDIA A100 SXM

Compute (FP32-eq)

0.01x

A100 SXM is 72.97x faster

FP32 Compute

0.11x

A100 SXM is 9.12x faster

VRAM

0.05x

A100 SXM has 20.00x more

Memory Bandwidth

0.05x

A100 SXM has 18.21x more

NVIDIA RTX 4000

Compute (FP32-eq)

0.07x

RTX 4000 is 13.47x faster

FP32 Compute

0.30x

RTX 4000 is 3.32x faster

VRAM

0.50x

RTX 4000 has 2.00x more

Memory Bandwidth

0.27x

RTX 4000 has 3.71x more

NVIDIA Quadro M4000

Compute (FP32-eq)

0.83x

Quadro M4000 is 1.20x faster

FP32 Compute

0.83x

Quadro M4000 is 1.20x faster

VRAM

0.50x

Quadro M4000 has 2.00x more

Memory Bandwidth

0.58x

Quadro M4000 has 1.71x more

To match 1x NVIDIA GeForce RTX 3090

Intel Habana Gaudi 2

Compute (FP32-eq)

0.16x

Gaudi 2 is 6.08x faster

VRAM

0.25x

Gaudi 2 has 4.00x more

Memory Bandwidth

0.38x

Gaudi 2 has 2.62x more

Groq LPU Inference Engine

VRAM

0.10x

LPU Inference Engine has 9.58x more

NVIDIA T4G

Compute (FP32-eq)

1.09x

Need 1.09x T4G

FP32 Compute

4.40x

Need 4.40x T4G

VRAM

1.50x

Need 1.50x T4G

Memory Bandwidth

2.92x

Need 2.92x T4G

NVIDIA GeForce GTX 1050 Ti

Compute (FP32-eq)

16.60x

Need 16.60x GeForce GTX 1050 Ti

FP32 Compute

16.65x

Need 16.65x GeForce GTX 1050 Ti

VRAM

6.00x

Need 6.00x GeForce GTX 1050 Ti

Memory Bandwidth

8.36x

Need 8.36x GeForce GTX 1050 Ti

NVIDIA L40

Compute (FP32-eq)

0.39x

L40 is 2.55x faster

FP32 Compute

0.79x

L40 is 1.26x faster

VRAM

0.50x

L40 has 2.00x more

Memory Bandwidth

1.08x

Need 1.08x L40

NVIDIA A100 SXM

Compute (FP32-eq)

0.23x

A100 SXM is 4.39x faster

FP32 Compute

1.83x

Need 1.83x A100 SXM

VRAM

0.30x

A100 SXM has 3.33x more

Memory Bandwidth

0.46x

A100 SXM has 2.18x more

NVIDIA RTX 4000

Compute (FP32-eq)

1.23x

Need 1.23x RTX 4000

FP32 Compute

5.01x

Need 5.01x RTX 4000

VRAM

3.00x

Need 3.00x RTX 4000

Memory Bandwidth

2.25x

Need 2.25x RTX 4000

NVIDIA Quadro M4000

Compute (FP32-eq)

13.81x

Need 13.81x Quadro M4000

FP32 Compute

13.85x

Need 13.85x Quadro M4000

VRAM

3.00x

Need 3.00x Quadro M4000

Memory Bandwidth

4.88x

Need 4.88x Quadro M4000

To match 1x NVIDIA L40

Intel Habana Gaudi 2

Compute (FP32-eq)

0.42x

Gaudi 2 is 2.39x faster

VRAM

0.50x

Gaudi 2 has 2.00x more

Memory Bandwidth

0.35x

Gaudi 2 has 2.84x more

Groq LPU Inference Engine

VRAM

0.21x

LPU Inference Engine has 4.79x more

NVIDIA T4G

Compute (FP32-eq)

2.79x

Need 2.79x T4G

FP32 Compute

5.56x

Need 5.56x T4G

VRAM

3.00x

Need 3.00x T4G

Memory Bandwidth

2.70x

Need 2.70x T4G

NVIDIA GeForce GTX 1050 Ti

Compute (FP32-eq)

42.34x

Need 42.34x GeForce GTX 1050 Ti

FP32 Compute

21.05x

Need 21.05x GeForce GTX 1050 Ti

VRAM

12.00x

Need 12.00x GeForce GTX 1050 Ti

Memory Bandwidth

7.71x

Need 7.71x GeForce GTX 1050 Ti

NVIDIA GeForce RTX 3090

Compute (FP32-eq)

2.55x

Need 2.55x GeForce RTX 3090

FP32 Compute

1.26x

Need 1.26x GeForce RTX 3090

VRAM

2.00x

Need 2.00x GeForce RTX 3090

Memory Bandwidth

0.92x

GeForce RTX 3090 has 1.08x more

NVIDIA A100 SXM

Compute (FP32-eq)

0.58x

A100 SXM is 1.72x faster

FP32 Compute

2.31x

Need 2.31x A100 SXM

VRAM

0.60x

A100 SXM has 1.67x more

Memory Bandwidth

0.42x

A100 SXM has 2.36x more

NVIDIA RTX 4000

Compute (FP32-eq)

3.14x

Need 3.14x RTX 4000

FP32 Compute

6.34x

Need 6.34x RTX 4000

VRAM

6.00x

Need 6.00x RTX 4000

Memory Bandwidth

2.08x

Need 2.08x RTX 4000

NVIDIA Quadro M4000

Compute (FP32-eq)

35.22x

Need 35.22x Quadro M4000

FP32 Compute

17.51x

Need 17.51x Quadro M4000

VRAM

6.00x

Need 6.00x Quadro M4000

Memory Bandwidth

4.50x

Need 4.50x Quadro M4000

To match 1x NVIDIA A100 SXM

Intel Habana Gaudi 2

Compute (FP32-eq)

0.72x

Gaudi 2 is 1.38x faster

VRAM

0.83x

Gaudi 2 has 1.20x more

Memory Bandwidth

0.83x

Gaudi 2 has 1.20x more

Groq LPU Inference Engine

VRAM

0.35x

LPU Inference Engine has 2.88x more

NVIDIA T4G

Compute (FP32-eq)

4.80x

Need 4.80x T4G

FP32 Compute

2.41x

Need 2.41x T4G

VRAM

5.00x

Need 5.00x T4G

Memory Bandwidth

6.37x

Need 6.37x T4G

NVIDIA GeForce GTX 1050 Ti

Compute (FP32-eq)

72.97x

Need 72.97x GeForce GTX 1050 Ti

FP32 Compute

9.12x

Need 9.12x GeForce GTX 1050 Ti

VRAM

20.00x

Need 20.00x GeForce GTX 1050 Ti

Memory Bandwidth

18.21x

Need 18.21x GeForce GTX 1050 Ti

NVIDIA GeForce RTX 3090

Compute (FP32-eq)

4.39x

Need 4.39x GeForce RTX 3090

FP32 Compute

0.55x

GeForce RTX 3090 is 1.83x faster

VRAM

3.33x

Need 3.33x GeForce RTX 3090

Memory Bandwidth

2.18x

Need 2.18x GeForce RTX 3090

NVIDIA L40

Compute (FP32-eq)

1.72x

Need 1.72x L40

FP32 Compute

0.43x

L40 is 2.31x faster

VRAM

1.67x

Need 1.67x L40

Memory Bandwidth

2.36x

Need 2.36x L40

NVIDIA RTX 4000

Compute (FP32-eq)

5.42x

Need 5.42x RTX 4000

FP32 Compute

2.75x

Need 2.75x RTX 4000

VRAM

10.00x

Need 10.00x RTX 4000

Memory Bandwidth

4.90x

Need 4.90x RTX 4000

NVIDIA Quadro M4000

Compute (FP32-eq)

60.70x

Need 60.70x Quadro M4000

FP32 Compute

7.59x

Need 7.59x Quadro M4000

VRAM

10.00x

Need 10.00x Quadro M4000

Memory Bandwidth

10.62x

Need 10.62x Quadro M4000

To match 1x NVIDIA RTX 4000

Intel Habana Gaudi 2

Compute (FP32-eq)

0.13x

Gaudi 2 is 7.50x faster

VRAM

0.08x

Gaudi 2 has 12.00x more

Memory Bandwidth

0.17x

Gaudi 2 has 5.89x more

Groq LPU Inference Engine

VRAM

0.03x

LPU Inference Engine has 28.75x more

NVIDIA T4G

Compute (FP32-eq)

0.89x

T4G is 1.13x faster

FP32 Compute

0.88x

T4G is 1.14x faster

VRAM

0.50x

T4G has 2.00x more

Memory Bandwidth

1.30x

Need 1.30x T4G

NVIDIA GeForce GTX 1050 Ti

Compute (FP32-eq)

13.47x

Need 13.47x GeForce GTX 1050 Ti

FP32 Compute

3.32x

Need 3.32x GeForce GTX 1050 Ti

VRAM

2.00x

Need 2.00x GeForce GTX 1050 Ti

Memory Bandwidth

3.71x

Need 3.71x GeForce GTX 1050 Ti

NVIDIA GeForce RTX 3090

Compute (FP32-eq)

0.81x

GeForce RTX 3090 is 1.23x faster

FP32 Compute

0.20x

GeForce RTX 3090 is 5.01x faster

VRAM

0.33x

GeForce RTX 3090 has 3.00x more

Memory Bandwidth

0.44x

GeForce RTX 3090 has 2.25x more

NVIDIA L40

Compute (FP32-eq)

0.32x

L40 is 3.14x faster

FP32 Compute

0.16x

L40 is 6.34x faster

VRAM

0.17x

L40 has 6.00x more

Memory Bandwidth

0.48x

L40 has 2.08x more

NVIDIA A100 SXM

Compute (FP32-eq)

0.18x

A100 SXM is 5.42x faster

FP32 Compute

0.36x

A100 SXM is 2.75x faster

VRAM

0.10x

A100 SXM has 10.00x more

Memory Bandwidth

0.20x

A100 SXM has 4.90x more

NVIDIA Quadro M4000

Compute (FP32-eq)

11.21x

Need 11.21x Quadro M4000

FP32 Compute

2.76x

Need 2.76x Quadro M4000

VRAM

1.00x

Quadro M4000 has 1.00x more

Memory Bandwidth

2.17x

Need 2.17x Quadro M4000

To match 1x NVIDIA Quadro M4000

Intel Habana Gaudi 2

Compute (FP32-eq)

0.01x

Gaudi 2 is 84.05x faster

VRAM

0.08x

Gaudi 2 has 12.00x more

Memory Bandwidth

0.08x

Gaudi 2 has 12.76x more

Groq LPU Inference Engine

VRAM

0.03x

LPU Inference Engine has 28.75x more

NVIDIA T4G

Compute (FP32-eq)

0.08x

T4G is 12.65x faster

FP32 Compute

0.32x

T4G is 3.15x faster

VRAM

0.50x

T4G has 2.00x more

Memory Bandwidth

0.60x

T4G has 1.67x more

NVIDIA GeForce GTX 1050 Ti

Compute (FP32-eq)

1.20x

Need 1.20x GeForce GTX 1050 Ti

FP32 Compute

1.20x

Need 1.20x GeForce GTX 1050 Ti

VRAM

2.00x

Need 2.00x GeForce GTX 1050 Ti

Memory Bandwidth

1.71x

Need 1.71x GeForce GTX 1050 Ti

NVIDIA GeForce RTX 3090

Compute (FP32-eq)

0.07x

GeForce RTX 3090 is 13.81x faster

FP32 Compute

0.07x

GeForce RTX 3090 is 13.85x faster

VRAM

0.33x

GeForce RTX 3090 has 3.00x more

Memory Bandwidth

0.21x

GeForce RTX 3090 has 4.88x more

NVIDIA L40

Compute (FP32-eq)

0.03x

L40 is 35.22x faster

FP32 Compute

0.06x

L40 is 17.51x faster

VRAM

0.17x

L40 has 6.00x more

Memory Bandwidth

0.22x

L40 has 4.50x more

NVIDIA A100 SXM

Compute (FP32-eq)

0.02x

A100 SXM is 60.70x faster

FP32 Compute

0.13x

A100 SXM is 7.59x faster

VRAM

0.10x

A100 SXM has 10.00x more

Memory Bandwidth

0.09x

A100 SXM has 10.62x more

NVIDIA RTX 4000

Compute (FP32-eq)

0.09x

RTX 4000 is 11.21x faster

FP32 Compute

0.36x

RTX 4000 is 2.76x faster

VRAM

1.00x

RTX 4000 has 1.00x more

Memory Bandwidth

0.46x

RTX 4000 has 2.17x more

Pricing

Price Type	Gaudi 2	LPU Inference Engine	T4G	GeForce GTX 1050 Ti	GeForce RTX 3090	L40	A100 SXM	RTX 4000	Quadro M4000
CAPEX (Street Price)	—	—	—	—	—	—	$15,000	—	—
OPEX (per hour)	—	—	$0.42/hr	$0.04/hr	$0.11/hr	$0.69/hr	$4.05/hr	$0.34/hr	$0.45/hr
Price per TFLOPs (FP32-eq)	—	—	—	—	—	—	$96	—	—