Compare XPUs

Select up to 5 XPUs to compare side-by-side

Select XPUs to Compare

Clear all (7)Jump to results

Filter by Vendor

Showing 128 XPUs • 7 selected

Alibaba

Hanguang 800

AMD

MI100

23.1 TFLOPs

AMD

MI210

181 TFLOPs

AMD

MI250X

383 TFLOPs

AMD

MI300X

1,307 TFLOPs

AMD

MI325X

1,400 TFLOPs

AMD

MI350X

2,100 TFLOPs

AMD

MI355X

5,300 TFLOPs

AMD

Radeon Pro V520

23.04 TFLOPs

AWS

Inferentia2

190 TFLOPs

AWS

Trainium

190 TFLOPs

AWS

Trainium2

680 TFLOPs

Baidu

Kunlun II

Biren Technology

BR100

Cambricon

MLU370

256 TFLOPs

Cerebras

WSE-3

Enflame Technology

CloudBlazer T20

FuriosaAI

RNGD (Renegade)

256 TFLOPs

FuriosaAI

Warboy

Google

TPU v4

275 TFLOPs

Google

TPU v5e

197 TFLOPs

Google

TPU v5p

459 TFLOPs

Graphcore

Bow IPU

Graphcore

IPU-M2000

Groq

LPU Inference Engine

Huawei

Ascend 910B

Iluvatar CoreX

BI-V150

300 TFLOPs

Intel

Data Center GPU Max 1100

177 TFLOPs

Intel

Data Center GPU Max 1550

419 TFLOPs

Intel Habana

Gaudi 2

432 TFLOPs

Intel Habana

Gaudi 3

1,835 TFLOPs

Multi-Metric Comparison

Relative performance across 5 key metrics (normalized to 100 = best in comparison)

Compute Performance (BF16)

Memory Capacity

Power Consumption

Power Efficiency

Specifications

Specification	Groq LPU Inference Engine	NVIDIA P40	NVIDIA GeForce GT 710	NVIDIA GeForce GTX 1650	NVIDIA RTX A2000	NVIDIA GeForce RTX 3090	NVIDIA A100 SXM
Architecture	TSP (Tensor Streaming Processor)	Pascal	Kepler	Turing	Ampere	Ampere	Ampere
Form Factor	—	PCIe	PCIe	PCIe	PCIe	PCIe	SXM
VRAM	230 GB	24 GB	2 GB	4 GB	12 GB	24 GB	80 GB
Memory Bandwidth	—	346 GB/s	14.4 GB/s	128 GB/s	288 GB/s	936 GB/s	2,039 GB/s
TFLOPs (FP32)	—	12	0.366	2.984	8	35.6	19.5
TFLOPs (FP16)	—	—	—	—	—	—	312
TFLOPs	—	12	0.366	2.984	16	71	312
TFLOPs (FP8)	—	—	—	—	—	—	—
TDP	300 W	250 W	19 W	75 W	70 W	350 W	400 W
Launch Date	Feb 2024	Sep 2016	Mar 2014	Apr 2019	Oct 2021	Sep 2020	May 2020

Efficiency Metrics

Metric	LPU Inference Engine	P40	GeForce GT 710	GeForce GTX 1650	RTX A2000	GeForce RTX 3090	A100 SXM
TFLOPs per Watt (FP32-eq)	—	0.05	0.02	0.04	0.11	0.10	0.39
Memory Bandwidth per GB	—	14.4 GB/s	7.2 GB/s	32.0 GB/s	24.0 GB/s	39.0 GB/s	25.5 GB/s

Performance Equivalence

How many units of each GPU are needed to match the performance of the others?

To match 1x Groq LPU Inference Engine

NVIDIA P40

VRAM

9.58x

Need 9.58x P40

NVIDIA GeForce GT 710

VRAM

115.00x

Need 115.00x GeForce GT 710

NVIDIA GeForce GTX 1650

VRAM

57.50x

Need 57.50x GeForce GTX 1650

NVIDIA RTX A2000

VRAM

19.17x

Need 19.17x RTX A2000

NVIDIA GeForce RTX 3090

VRAM

9.58x

Need 9.58x GeForce RTX 3090

NVIDIA A100 SXM

VRAM

2.88x

Need 2.88x A100 SXM

To match 1x NVIDIA P40

Groq LPU Inference Engine

VRAM

0.10x

LPU Inference Engine has 9.58x more

NVIDIA GeForce GT 710

Compute (FP32-eq)

32.79x

Need 32.79x GeForce GT 710

FP32 Compute

32.79x

Need 32.79x GeForce GT 710

VRAM

12.00x

Need 12.00x GeForce GT 710

Memory Bandwidth

24.03x

Need 24.03x GeForce GT 710

NVIDIA GeForce GTX 1650

Compute (FP32-eq)

4.02x

Need 4.02x GeForce GTX 1650

FP32 Compute

4.02x

Need 4.02x GeForce GTX 1650

VRAM

6.00x

Need 6.00x GeForce GTX 1650

Memory Bandwidth

2.70x

Need 2.70x GeForce GTX 1650

NVIDIA RTX A2000

Compute (FP32-eq)

1.50x

Need 1.50x RTX A2000

FP32 Compute

1.50x

Need 1.50x RTX A2000

VRAM

2.00x

Need 2.00x RTX A2000

Memory Bandwidth

1.20x

Need 1.20x RTX A2000

NVIDIA GeForce RTX 3090

Compute (FP32-eq)

0.34x

GeForce RTX 3090 is 2.96x faster

FP32 Compute

0.34x

GeForce RTX 3090 is 2.97x faster

VRAM

1.00x

GeForce RTX 3090 has 1.00x more

Memory Bandwidth

0.37x

GeForce RTX 3090 has 2.71x more

NVIDIA A100 SXM

Compute (FP32-eq)

0.08x

A100 SXM is 13.00x faster

FP32 Compute

0.62x

A100 SXM is 1.63x faster

VRAM

0.30x

A100 SXM has 3.33x more

Memory Bandwidth

0.17x

A100 SXM has 5.89x more

To match 1x NVIDIA GeForce GT 710

Groq LPU Inference Engine

VRAM

0.01x

LPU Inference Engine has 115.00x more

NVIDIA P40

Compute (FP32-eq)

0.03x

P40 is 32.79x faster

FP32 Compute

0.03x

P40 is 32.79x faster

VRAM

0.08x

P40 has 12.00x more

Memory Bandwidth

0.04x

P40 has 24.03x more

NVIDIA GeForce GTX 1650

Compute (FP32-eq)

0.12x

GeForce GTX 1650 is 8.15x faster

FP32 Compute

0.12x

GeForce GTX 1650 is 8.15x faster

VRAM

0.50x

GeForce GTX 1650 has 2.00x more

Memory Bandwidth

0.11x

GeForce GTX 1650 has 8.89x more

NVIDIA RTX A2000

Compute (FP32-eq)

0.05x

RTX A2000 is 21.86x faster

FP32 Compute

0.05x

RTX A2000 is 21.86x faster

VRAM

0.17x

RTX A2000 has 6.00x more

Memory Bandwidth

0.05x

RTX A2000 has 20.00x more

NVIDIA GeForce RTX 3090

Compute (FP32-eq)

0.01x

GeForce RTX 3090 is 96.99x faster

FP32 Compute

0.01x

GeForce RTX 3090 is 97.27x faster

VRAM

0.08x

GeForce RTX 3090 has 12.00x more

Memory Bandwidth

0.02x

GeForce RTX 3090 has 65.00x more

NVIDIA A100 SXM

Compute (FP32-eq)

0.00x

A100 SXM is 426.23x faster

FP32 Compute

0.02x

A100 SXM is 53.28x faster

VRAM

0.03x

A100 SXM has 40.00x more

Memory Bandwidth

0.01x

A100 SXM has 141.60x more

To match 1x NVIDIA GeForce GTX 1650

Groq LPU Inference Engine

VRAM

0.02x

LPU Inference Engine has 57.50x more

NVIDIA P40

Compute (FP32-eq)

0.25x

P40 is 4.02x faster

FP32 Compute

0.25x

P40 is 4.02x faster

VRAM

0.17x

P40 has 6.00x more

Memory Bandwidth

0.37x

P40 has 2.70x more

NVIDIA GeForce GT 710

Compute (FP32-eq)

8.15x

Need 8.15x GeForce GT 710

FP32 Compute

8.15x

Need 8.15x GeForce GT 710

VRAM

2.00x

Need 2.00x GeForce GT 710

Memory Bandwidth

8.89x

Need 8.89x GeForce GT 710

NVIDIA RTX A2000

Compute (FP32-eq)

0.37x

RTX A2000 is 2.68x faster

FP32 Compute

0.37x

RTX A2000 is 2.68x faster

VRAM

0.33x

RTX A2000 has 3.00x more

Memory Bandwidth

0.44x

RTX A2000 has 2.25x more

NVIDIA GeForce RTX 3090

Compute (FP32-eq)

0.08x

GeForce RTX 3090 is 11.90x faster

FP32 Compute

0.08x

GeForce RTX 3090 is 11.93x faster

VRAM

0.17x

GeForce RTX 3090 has 6.00x more

Memory Bandwidth

0.14x

GeForce RTX 3090 has 7.31x more

NVIDIA A100 SXM

Compute (FP32-eq)

0.02x

A100 SXM is 52.28x faster

FP32 Compute

0.15x

A100 SXM is 6.53x faster

VRAM

0.05x

A100 SXM has 20.00x more

Memory Bandwidth

0.06x

A100 SXM has 15.93x more

To match 1x NVIDIA RTX A2000

Groq LPU Inference Engine

VRAM

0.05x

LPU Inference Engine has 19.17x more

NVIDIA P40

Compute (FP32-eq)

0.67x

P40 is 1.50x faster

FP32 Compute

0.67x

P40 is 1.50x faster

VRAM

0.50x

P40 has 2.00x more

Memory Bandwidth

0.83x

P40 has 1.20x more

NVIDIA GeForce GT 710

Compute (FP32-eq)

21.86x

Need 21.86x GeForce GT 710

FP32 Compute

21.86x

Need 21.86x GeForce GT 710

VRAM

6.00x

Need 6.00x GeForce GT 710

Memory Bandwidth

20.00x

Need 20.00x GeForce GT 710

NVIDIA GeForce GTX 1650

Compute (FP32-eq)

2.68x

Need 2.68x GeForce GTX 1650

FP32 Compute

2.68x

Need 2.68x GeForce GTX 1650

VRAM

3.00x

Need 3.00x GeForce GTX 1650

Memory Bandwidth

2.25x

Need 2.25x GeForce GTX 1650

NVIDIA GeForce RTX 3090

Compute (FP32-eq)

0.23x

GeForce RTX 3090 is 4.44x faster

FP32 Compute

0.22x

GeForce RTX 3090 is 4.45x faster

VRAM

0.50x

GeForce RTX 3090 has 2.00x more

Memory Bandwidth

0.31x

GeForce RTX 3090 has 3.25x more

NVIDIA A100 SXM

Compute (FP32-eq)

0.05x

A100 SXM is 19.50x faster

FP32 Compute

0.41x

A100 SXM is 2.44x faster

VRAM

0.15x

A100 SXM has 6.67x more

Memory Bandwidth

0.14x

A100 SXM has 7.08x more

To match 1x NVIDIA GeForce RTX 3090

Groq LPU Inference Engine

VRAM

0.10x

LPU Inference Engine has 9.58x more

NVIDIA P40

Compute (FP32-eq)

2.96x

Need 2.96x P40

FP32 Compute

2.97x

Need 2.97x P40

VRAM

1.00x

P40 has 1.00x more

Memory Bandwidth

2.71x

Need 2.71x P40

NVIDIA GeForce GT 710

Compute (FP32-eq)

96.99x

Need 96.99x GeForce GT 710

FP32 Compute

97.27x

Need 97.27x GeForce GT 710

VRAM

12.00x

Need 12.00x GeForce GT 710

Memory Bandwidth

65.00x

Need 65.00x GeForce GT 710

NVIDIA GeForce GTX 1650

Compute (FP32-eq)

11.90x

Need 11.90x GeForce GTX 1650

FP32 Compute

11.93x

Need 11.93x GeForce GTX 1650

VRAM

6.00x

Need 6.00x GeForce GTX 1650

Memory Bandwidth

7.31x

Need 7.31x GeForce GTX 1650

NVIDIA RTX A2000

Compute (FP32-eq)

4.44x

Need 4.44x RTX A2000

FP32 Compute

4.45x

Need 4.45x RTX A2000

VRAM

2.00x

Need 2.00x RTX A2000

Memory Bandwidth

3.25x

Need 3.25x RTX A2000

NVIDIA A100 SXM

Compute (FP32-eq)

0.23x

A100 SXM is 4.39x faster

FP32 Compute

1.83x

Need 1.83x A100 SXM

VRAM

0.30x

A100 SXM has 3.33x more

Memory Bandwidth

0.46x

A100 SXM has 2.18x more

To match 1x NVIDIA A100 SXM

Groq LPU Inference Engine

VRAM

0.35x

LPU Inference Engine has 2.88x more

NVIDIA P40

Compute (FP32-eq)

13.00x

Need 13.00x P40

FP32 Compute

1.63x

Need 1.63x P40

VRAM

3.33x

Need 3.33x P40

Memory Bandwidth

5.89x

Need 5.89x P40

NVIDIA GeForce GT 710

Compute (FP32-eq)

426.23x

Need 426.23x GeForce GT 710

FP32 Compute

53.28x

Need 53.28x GeForce GT 710

VRAM

40.00x

Need 40.00x GeForce GT 710

Memory Bandwidth

141.60x

Need 141.60x GeForce GT 710

NVIDIA GeForce GTX 1650

Compute (FP32-eq)

52.28x

Need 52.28x GeForce GTX 1650

FP32 Compute

6.53x

Need 6.53x GeForce GTX 1650

VRAM

20.00x

Need 20.00x GeForce GTX 1650

Memory Bandwidth

15.93x

Need 15.93x GeForce GTX 1650

NVIDIA RTX A2000

Compute (FP32-eq)

19.50x

Need 19.50x RTX A2000

FP32 Compute

2.44x

Need 2.44x RTX A2000

VRAM

6.67x

Need 6.67x RTX A2000

Memory Bandwidth

7.08x

Need 7.08x RTX A2000

NVIDIA GeForce RTX 3090

Compute (FP32-eq)

4.39x

Need 4.39x GeForce RTX 3090

FP32 Compute

0.55x

GeForce RTX 3090 is 1.83x faster

VRAM

3.33x

Need 3.33x GeForce RTX 3090

Memory Bandwidth

2.18x

Need 2.18x GeForce RTX 3090

Pricing

Price Type	LPU Inference Engine	P40	GeForce GT 710	GeForce GTX 1650	RTX A2000	GeForce RTX 3090	A100 SXM
CAPEX (Street Price)	—	—	—	—	—	—	$15,000
OPEX (per hour)	—	$2.07/hr	$0.07/hr	$0.04/hr	$0.04/hr	$0.11/hr	$4.05/hr
Price per TFLOPs (FP32-eq)	—	—	—	—	—	—	$96