Compare XPUs

Select up to 5 XPUs to compare side-by-side

Select XPUs to Compare

Clear all (7)Jump to results

Filter by Vendor

Showing 128 XPUs • 7 selected

Alibaba

Hanguang 800

AMD

MI100

23.1 TFLOPs

AMD

MI210

181 TFLOPs

AMD

MI250X

383 TFLOPs

AMD

MI300X

1,307 TFLOPs

AMD

MI325X

1,400 TFLOPs

AMD

MI350X

2,100 TFLOPs

AMD

MI355X

5,300 TFLOPs

AMD

Radeon Pro V520

23.04 TFLOPs

AWS

Inferentia2

190 TFLOPs

AWS

Trainium

190 TFLOPs

AWS

Trainium2

680 TFLOPs

Baidu

Kunlun II

Biren Technology

BR100

Cambricon

MLU370

256 TFLOPs

Cerebras

WSE-3

Enflame Technology

CloudBlazer T20

FuriosaAI

RNGD (Renegade)

256 TFLOPs

FuriosaAI

Warboy

Google

TPU v4

275 TFLOPs

Google

TPU v5e

197 TFLOPs

Google

TPU v5p

459 TFLOPs

Graphcore

Bow IPU

Graphcore

IPU-M2000

Groq

LPU Inference Engine

Huawei

Ascend 910B

Iluvatar CoreX

BI-V150

300 TFLOPs

Intel

Data Center GPU Max 1100

177 TFLOPs

Intel

Data Center GPU Max 1550

419 TFLOPs

Intel Habana

Gaudi 2

432 TFLOPs

Intel Habana

Gaudi 3

1,835 TFLOPs

Multi-Metric Comparison

Relative performance across 5 key metrics (normalized to 100 = best in comparison)

Compute Performance (BF16)

Memory Capacity

Power Consumption

Power Efficiency

Specifications

Specification	AWS Inferentia2	NVIDIA Quadro M60	NVIDIA GeForce GTX 1080 Ti	NVIDIA GeForce RTX 4070 Ti	NVIDIA GeForce RTX 3090	NVIDIA L40	NVIDIA A100 SXM
Architecture	Inferentia Gen2	Maxwell	Pascal	Ada Lovelace	Ampere	Ada Lovelace	Ampere
Form Factor	—	PCIe	PCIe	PCIe	PCIe	PCIe	SXM
VRAM	32 GB	16 GB	11 GB	12 GB	24 GB	48 GB	80 GB
Memory Bandwidth	—	320 GB/s	484 GB/s	504 GB/s	936 GB/s	864 GB/s	2,039 GB/s
TFLOPs (FP32)	—	4.825	11.34	40.1	35.6	45	19.5
TFLOPs (FP16)	—	—	—	—	—	—	312
TFLOPs	190	4.825	11.34	80.16	71	181.05	312
TFLOPs (FP8)	—	—	—	—	—	—	—
TDP	150 W	300 W	250 W	285 W	350 W	300 W	400 W
Launch Date	Nov 2022	Mar 2016	Mar 2017	Jan 2023	Sep 2020	Oct 2022	May 2020

Efficiency Metrics

Metric	Inferentia2	Quadro M60	GeForce GTX 1080 Ti	GeForce RTX 4070 Ti	GeForce RTX 3090	L40	A100 SXM
TFLOPs per Watt (FP32-eq)	0.63	0.02	0.05	0.14	0.10	0.30	0.39
Memory Bandwidth per GB	—	20.0 GB/s	44.0 GB/s	42.0 GB/s	39.0 GB/s	18.0 GB/s	25.5 GB/s

Performance Equivalence

How many units of each GPU are needed to match the performance of the others?

To match 1x AWS Inferentia2

NVIDIA Quadro M60

Compute (FP32-eq)

19.69x

Need 19.69x Quadro M60

VRAM

2.00x

Need 2.00x Quadro M60

NVIDIA GeForce GTX 1080 Ti

Compute (FP32-eq)

8.38x

Need 8.38x GeForce GTX 1080 Ti

VRAM

2.91x

Need 2.91x GeForce GTX 1080 Ti

NVIDIA GeForce RTX 4070 Ti

Compute (FP32-eq)

2.37x

Need 2.37x GeForce RTX 4070 Ti

VRAM

2.67x

Need 2.67x GeForce RTX 4070 Ti

NVIDIA GeForce RTX 3090

Compute (FP32-eq)

2.68x

Need 2.68x GeForce RTX 3090

VRAM

1.33x

Need 1.33x GeForce RTX 3090

NVIDIA L40

Compute (FP32-eq)

1.05x

Need 1.05x L40

VRAM

0.67x

L40 has 1.50x more

NVIDIA A100 SXM

Compute (FP32-eq)

0.61x

A100 SXM is 1.64x faster

VRAM

0.40x

A100 SXM has 2.50x more

To match 1x NVIDIA Quadro M60

AWS Inferentia2

Compute (FP32-eq)

0.05x

Inferentia2 is 19.69x faster

VRAM

0.50x

Inferentia2 has 2.00x more

NVIDIA GeForce GTX 1080 Ti

Compute (FP32-eq)

0.43x

GeForce GTX 1080 Ti is 2.35x faster

FP32 Compute

0.43x

GeForce GTX 1080 Ti is 2.35x faster

VRAM

1.45x

Need 1.45x GeForce GTX 1080 Ti

Memory Bandwidth

0.66x

GeForce GTX 1080 Ti has 1.51x more

NVIDIA GeForce RTX 4070 Ti

Compute (FP32-eq)

0.12x

GeForce RTX 4070 Ti is 8.31x faster

FP32 Compute

0.12x

GeForce RTX 4070 Ti is 8.31x faster

VRAM

1.33x

Need 1.33x GeForce RTX 4070 Ti

Memory Bandwidth

0.63x

GeForce RTX 4070 Ti has 1.58x more

NVIDIA GeForce RTX 3090

Compute (FP32-eq)

0.14x

GeForce RTX 3090 is 7.36x faster

FP32 Compute

0.14x

GeForce RTX 3090 is 7.38x faster

VRAM

0.67x

GeForce RTX 3090 has 1.50x more

Memory Bandwidth

0.34x

GeForce RTX 3090 has 2.92x more

NVIDIA L40

Compute (FP32-eq)

0.05x

L40 is 18.76x faster

FP32 Compute

0.11x

L40 is 9.33x faster

VRAM

0.33x

L40 has 3.00x more

Memory Bandwidth

0.37x

L40 has 2.70x more

NVIDIA A100 SXM

Compute (FP32-eq)

0.03x

A100 SXM is 32.33x faster

FP32 Compute

0.25x

A100 SXM is 4.04x faster

VRAM

0.20x

A100 SXM has 5.00x more

Memory Bandwidth

0.16x

A100 SXM has 6.37x more

To match 1x NVIDIA GeForce GTX 1080 Ti

AWS Inferentia2

Compute (FP32-eq)

0.12x

Inferentia2 is 8.38x faster

VRAM

0.34x

Inferentia2 has 2.91x more

NVIDIA Quadro M60

Compute (FP32-eq)

2.35x

Need 2.35x Quadro M60

FP32 Compute

2.35x

Need 2.35x Quadro M60

VRAM

0.69x

Quadro M60 has 1.45x more

Memory Bandwidth

1.51x

Need 1.51x Quadro M60

NVIDIA GeForce RTX 4070 Ti

Compute (FP32-eq)

0.28x

GeForce RTX 4070 Ti is 3.53x faster

FP32 Compute

0.28x

GeForce RTX 4070 Ti is 3.54x faster

VRAM

0.92x

GeForce RTX 4070 Ti has 1.09x more

Memory Bandwidth

0.96x

GeForce RTX 4070 Ti has 1.04x more

NVIDIA GeForce RTX 3090

Compute (FP32-eq)

0.32x

GeForce RTX 3090 is 3.13x faster

FP32 Compute

0.32x

GeForce RTX 3090 is 3.14x faster

VRAM

0.46x

GeForce RTX 3090 has 2.18x more

Memory Bandwidth

0.52x

GeForce RTX 3090 has 1.93x more

NVIDIA L40

Compute (FP32-eq)

0.13x

L40 is 7.98x faster

FP32 Compute

0.25x

L40 is 3.97x faster

VRAM

0.23x

L40 has 4.36x more

Memory Bandwidth

0.56x

L40 has 1.79x more

NVIDIA A100 SXM

Compute (FP32-eq)

0.07x

A100 SXM is 13.76x faster

FP32 Compute

0.58x

A100 SXM is 1.72x faster

VRAM

0.14x

A100 SXM has 7.27x more

Memory Bandwidth

0.24x

A100 SXM has 4.21x more

To match 1x NVIDIA GeForce RTX 4070 Ti

AWS Inferentia2

Compute (FP32-eq)

0.42x

Inferentia2 is 2.37x faster

VRAM

0.38x

Inferentia2 has 2.67x more

NVIDIA Quadro M60

Compute (FP32-eq)

8.31x

Need 8.31x Quadro M60

FP32 Compute

8.31x

Need 8.31x Quadro M60

VRAM

0.75x

Quadro M60 has 1.33x more

Memory Bandwidth

1.57x

Need 1.57x Quadro M60

NVIDIA GeForce GTX 1080 Ti

Compute (FP32-eq)

3.53x

Need 3.53x GeForce GTX 1080 Ti

FP32 Compute

3.54x

Need 3.54x GeForce GTX 1080 Ti

VRAM

1.09x

Need 1.09x GeForce GTX 1080 Ti

Memory Bandwidth

1.04x

Need 1.04x GeForce GTX 1080 Ti

NVIDIA GeForce RTX 3090

Compute (FP32-eq)

1.13x

Need 1.13x GeForce RTX 3090

FP32 Compute

1.13x

Need 1.13x GeForce RTX 3090

VRAM

0.50x

GeForce RTX 3090 has 2.00x more

Memory Bandwidth

0.54x

GeForce RTX 3090 has 1.86x more

NVIDIA L40

Compute (FP32-eq)

0.44x

L40 is 2.26x faster

FP32 Compute

0.89x

L40 is 1.12x faster

VRAM

0.25x

L40 has 4.00x more

Memory Bandwidth

0.58x

L40 has 1.71x more

NVIDIA A100 SXM

Compute (FP32-eq)

0.26x

A100 SXM is 3.89x faster

FP32 Compute

2.06x

Need 2.06x A100 SXM

VRAM

0.15x

A100 SXM has 6.67x more

Memory Bandwidth

0.25x

A100 SXM has 4.05x more

To match 1x NVIDIA GeForce RTX 3090

AWS Inferentia2

Compute (FP32-eq)

0.37x

Inferentia2 is 2.68x faster

VRAM

0.75x

Inferentia2 has 1.33x more

NVIDIA Quadro M60

Compute (FP32-eq)

7.36x

Need 7.36x Quadro M60

FP32 Compute

7.38x

Need 7.38x Quadro M60

VRAM

1.50x

Need 1.50x Quadro M60

Memory Bandwidth

2.92x

Need 2.92x Quadro M60

NVIDIA GeForce GTX 1080 Ti

Compute (FP32-eq)

3.13x

Need 3.13x GeForce GTX 1080 Ti

FP32 Compute

3.14x

Need 3.14x GeForce GTX 1080 Ti

VRAM

2.18x

Need 2.18x GeForce GTX 1080 Ti

Memory Bandwidth

1.93x

Need 1.93x GeForce GTX 1080 Ti

NVIDIA GeForce RTX 4070 Ti

Compute (FP32-eq)

0.89x

GeForce RTX 4070 Ti is 1.13x faster

FP32 Compute

0.89x

GeForce RTX 4070 Ti is 1.13x faster

VRAM

2.00x

Need 2.00x GeForce RTX 4070 Ti

Memory Bandwidth

1.86x

Need 1.86x GeForce RTX 4070 Ti

NVIDIA L40

Compute (FP32-eq)

0.39x

L40 is 2.55x faster

FP32 Compute

0.79x

L40 is 1.26x faster

VRAM

0.50x

L40 has 2.00x more

Memory Bandwidth

1.08x

Need 1.08x L40

NVIDIA A100 SXM

Compute (FP32-eq)

0.23x

A100 SXM is 4.39x faster

FP32 Compute

1.83x

Need 1.83x A100 SXM

VRAM

0.30x

A100 SXM has 3.33x more

Memory Bandwidth

0.46x

A100 SXM has 2.18x more

To match 1x NVIDIA L40

AWS Inferentia2

Compute (FP32-eq)

0.95x

Inferentia2 is 1.05x faster

VRAM

1.50x

Need 1.50x Inferentia2

NVIDIA Quadro M60

Compute (FP32-eq)

18.76x

Need 18.76x Quadro M60

FP32 Compute

9.33x

Need 9.33x Quadro M60

VRAM

3.00x

Need 3.00x Quadro M60

Memory Bandwidth

2.70x

Need 2.70x Quadro M60

NVIDIA GeForce GTX 1080 Ti

Compute (FP32-eq)

7.98x

Need 7.98x GeForce GTX 1080 Ti

FP32 Compute

3.97x

Need 3.97x GeForce GTX 1080 Ti

VRAM

4.36x

Need 4.36x GeForce GTX 1080 Ti

Memory Bandwidth

1.79x

Need 1.79x GeForce GTX 1080 Ti

NVIDIA GeForce RTX 4070 Ti

Compute (FP32-eq)

2.26x

Need 2.26x GeForce RTX 4070 Ti

FP32 Compute

1.12x

Need 1.12x GeForce RTX 4070 Ti

VRAM

4.00x

Need 4.00x GeForce RTX 4070 Ti

Memory Bandwidth

1.71x

Need 1.71x GeForce RTX 4070 Ti

NVIDIA GeForce RTX 3090

Compute (FP32-eq)

2.55x

Need 2.55x GeForce RTX 3090

FP32 Compute

1.26x

Need 1.26x GeForce RTX 3090

VRAM

2.00x

Need 2.00x GeForce RTX 3090

Memory Bandwidth

0.92x

GeForce RTX 3090 has 1.08x more

NVIDIA A100 SXM

Compute (FP32-eq)

0.58x

A100 SXM is 1.72x faster

FP32 Compute

2.31x

Need 2.31x A100 SXM

VRAM

0.60x

A100 SXM has 1.67x more

Memory Bandwidth

0.42x

A100 SXM has 2.36x more

To match 1x NVIDIA A100 SXM

AWS Inferentia2

Compute (FP32-eq)

1.64x

Need 1.64x Inferentia2

VRAM

2.50x

Need 2.50x Inferentia2

NVIDIA Quadro M60

Compute (FP32-eq)

32.33x

Need 32.33x Quadro M60

FP32 Compute

4.04x

Need 4.04x Quadro M60

VRAM

5.00x

Need 5.00x Quadro M60

Memory Bandwidth

6.37x

Need 6.37x Quadro M60

NVIDIA GeForce GTX 1080 Ti

Compute (FP32-eq)

13.76x

Need 13.76x GeForce GTX 1080 Ti

FP32 Compute

1.72x

Need 1.72x GeForce GTX 1080 Ti

VRAM

7.27x

Need 7.27x GeForce GTX 1080 Ti

Memory Bandwidth

4.21x

Need 4.21x GeForce GTX 1080 Ti

NVIDIA GeForce RTX 4070 Ti

Compute (FP32-eq)

3.89x

Need 3.89x GeForce RTX 4070 Ti

FP32 Compute

0.49x

GeForce RTX 4070 Ti is 2.06x faster

VRAM

6.67x

Need 6.67x GeForce RTX 4070 Ti

Memory Bandwidth

4.05x

Need 4.05x GeForce RTX 4070 Ti

NVIDIA GeForce RTX 3090

Compute (FP32-eq)

4.39x

Need 4.39x GeForce RTX 3090

FP32 Compute

0.55x

GeForce RTX 3090 is 1.83x faster

VRAM

3.33x

Need 3.33x GeForce RTX 3090

Memory Bandwidth

2.18x

Need 2.18x GeForce RTX 3090

NVIDIA L40

Compute (FP32-eq)

1.72x

Need 1.72x L40

FP32 Compute

0.43x

L40 is 2.31x faster

VRAM

1.67x

Need 1.67x L40

Memory Bandwidth

2.36x

Need 2.36x L40

Pricing

Price Type	Inferentia2	Quadro M60	GeForce GTX 1080 Ti	GeForce RTX 4070 Ti	GeForce RTX 3090	L40	A100 SXM
CAPEX (Street Price)	—	—	—	—	—	—	$15,000
OPEX (per hour)	$6.49/hr	$0.75/hr	$0.04/hr	$0.10/hr	$0.11/hr	$0.69/hr	$4.05/hr
Price per TFLOPs (FP32-eq)	—	—	—	—	—	—	$96