Compare XPUs

Select up to 5 XPUs to compare side-by-side

Select XPUs to Compare

Clear all (7)Jump to results

Filter by Vendor

Showing 128 XPUs • 7 selected

Alibaba

Hanguang 800

AMD

MI100

23.1 TFLOPs

AMD

MI210

181 TFLOPs

AMD

MI250X

383 TFLOPs

AMD

MI300X

1,307 TFLOPs

AMD

MI325X

1,400 TFLOPs

AMD

MI350X

2,100 TFLOPs

AMD

MI355X

5,300 TFLOPs

AMD

Radeon Pro V520

23.04 TFLOPs

AWS

Inferentia2

190 TFLOPs

AWS

Trainium

190 TFLOPs

AWS

Trainium2

680 TFLOPs

Baidu

Kunlun II

Biren Technology

BR100

Cambricon

MLU370

256 TFLOPs

Cerebras

WSE-3

Enflame Technology

CloudBlazer T20

FuriosaAI

RNGD (Renegade)

256 TFLOPs

FuriosaAI

Warboy

Google

TPU v4

275 TFLOPs

Google

TPU v5e

197 TFLOPs

Google

TPU v5p

459 TFLOPs

Graphcore

Bow IPU

Graphcore

IPU-M2000

Groq

LPU Inference Engine

Huawei

Ascend 910B

Iluvatar CoreX

BI-V150

300 TFLOPs

Intel

Data Center GPU Max 1100

177 TFLOPs

Intel

Data Center GPU Max 1550

419 TFLOPs

Intel Habana

Gaudi 2

432 TFLOPs

Intel Habana

Gaudi 3

1,835 TFLOPs

Multi-Metric Comparison

Relative performance across 5 key metrics (normalized to 100 = best in comparison)

Compute Performance (BF16)

Memory Capacity

Power Consumption

Power Efficiency

Specifications

Specification	Intel Habana Gaudi 3	Baidu Kunlun II	NVIDIA GeForce GT 730	NVIDIA Quadro RTX 6000	NVIDIA T1000	NVIDIA GeForce RTX 3090	NVIDIA T4
Architecture	Gaudi Gen3	Kunlun Core	Kepler	Turing	Turing	Ampere	Turing
Form Factor	OAM	—	PCIe	PCIe	PCIe	PCIe	PCIe
VRAM	128 GB	32 GB	2 GB	24 GB	8 GB	24 GB	16 GB
Memory Bandwidth	3,700 GB/s	—	28.5 GB/s	672 GB/s	160 GB/s	936 GB/s	320 GB/s
TFLOPs (FP32)	—	—	0.693	16.3	2.5	35.6	8.1
TFLOPs (FP16)	—	256	—	—	—	—	—
TFLOPs	1,835	—	0.693	130.5	2.5	71	65
TFLOPs (FP8)	3,670	—	—	—	—	—	—
TDP	900 W	200 W	49 W	295 W	50 W	350 W	70 W
Launch Date	Apr 2024	Aug 2021	Jun 2014	Aug 2018	May 2019	Sep 2020	Sep 2018

Efficiency Metrics

Metric	Gaudi 3	Kunlun II	GeForce GT 730	Quadro RTX 6000	T1000	GeForce RTX 3090	T4
TFLOPs per Watt (FP32-eq)	1.02	—	0.01	0.22	0.05	0.10	0.46
Memory Bandwidth per GB	28.9 GB/s	—	14.3 GB/s	28.0 GB/s	20.0 GB/s	39.0 GB/s	20.0 GB/s

Performance Equivalence

How many units of each GPU are needed to match the performance of the others?

To match 1x Intel Habana Gaudi 3

Baidu Kunlun II

VRAM

4.00x

Need 4.00x Kunlun II

NVIDIA GeForce GT 730

Compute (FP32-eq)

1323.95x

Need 1323.95x GeForce GT 730

VRAM

64.00x

Need 64.00x GeForce GT 730

Memory Bandwidth

129.82x

Need 129.82x GeForce GT 730

NVIDIA Quadro RTX 6000

Compute (FP32-eq)

14.06x

Need 14.06x Quadro RTX 6000

VRAM

5.33x

Need 5.33x Quadro RTX 6000

Memory Bandwidth

5.51x

Need 5.51x Quadro RTX 6000

NVIDIA T1000

Compute (FP32-eq)

367.00x

Need 367.00x T1000

VRAM

16.00x

Need 16.00x T1000

Memory Bandwidth

23.13x

Need 23.13x T1000

NVIDIA GeForce RTX 3090

Compute (FP32-eq)

25.85x

Need 25.85x GeForce RTX 3090

VRAM

5.33x

Need 5.33x GeForce RTX 3090

Memory Bandwidth

3.95x

Need 3.95x GeForce RTX 3090

NVIDIA T4

Compute (FP32-eq)

28.23x

Need 28.23x T4

VRAM

8.00x

Need 8.00x T4

Memory Bandwidth

11.56x

Need 11.56x T4

To match 1x Baidu Kunlun II

Intel Habana Gaudi 3

VRAM

0.25x

Gaudi 3 has 4.00x more

NVIDIA GeForce GT 730

VRAM

16.00x

Need 16.00x GeForce GT 730

NVIDIA Quadro RTX 6000

VRAM

1.33x

Need 1.33x Quadro RTX 6000

NVIDIA T1000

VRAM

4.00x

Need 4.00x T1000

NVIDIA GeForce RTX 3090

VRAM

1.33x

Need 1.33x GeForce RTX 3090

NVIDIA T4

VRAM

2.00x

Need 2.00x T4

To match 1x NVIDIA GeForce GT 730

Intel Habana Gaudi 3

Compute (FP32-eq)

0.00x

Gaudi 3 is 1323.95x faster

VRAM

0.02x

Gaudi 3 has 64.00x more

Memory Bandwidth

0.01x

Gaudi 3 has 129.82x more

Baidu Kunlun II

VRAM

0.06x

Kunlun II has 16.00x more

NVIDIA Quadro RTX 6000

Compute (FP32-eq)

0.01x

Quadro RTX 6000 is 94.16x faster

FP32 Compute

0.04x

Quadro RTX 6000 is 23.52x faster

VRAM

0.08x

Quadro RTX 6000 has 12.00x more

Memory Bandwidth

0.04x

Quadro RTX 6000 has 23.58x more

NVIDIA T1000

Compute (FP32-eq)

0.28x

T1000 is 3.61x faster

FP32 Compute

0.28x

T1000 is 3.61x faster

VRAM

0.25x

T1000 has 4.00x more

Memory Bandwidth

0.18x

T1000 has 5.61x more

NVIDIA GeForce RTX 3090

Compute (FP32-eq)

0.02x

GeForce RTX 3090 is 51.23x faster

FP32 Compute

0.02x

GeForce RTX 3090 is 51.37x faster

VRAM

0.08x

GeForce RTX 3090 has 12.00x more

Memory Bandwidth

0.03x

GeForce RTX 3090 has 32.84x more

NVIDIA T4

Compute (FP32-eq)

0.02x

T4 is 46.90x faster

FP32 Compute

0.09x

T4 is 11.69x faster

VRAM

0.13x

T4 has 8.00x more

Memory Bandwidth

0.09x

T4 has 11.23x more

To match 1x NVIDIA Quadro RTX 6000

Intel Habana Gaudi 3

Compute (FP32-eq)

0.07x

Gaudi 3 is 14.06x faster

VRAM

0.19x

Gaudi 3 has 5.33x more

Memory Bandwidth

0.18x

Gaudi 3 has 5.51x more

Baidu Kunlun II

VRAM

0.75x

Kunlun II has 1.33x more

NVIDIA GeForce GT 730

Compute (FP32-eq)

94.16x

Need 94.16x GeForce GT 730

FP32 Compute

23.52x

Need 23.52x GeForce GT 730

VRAM

12.00x

Need 12.00x GeForce GT 730

Memory Bandwidth

23.58x

Need 23.58x GeForce GT 730

NVIDIA T1000

Compute (FP32-eq)

26.10x

Need 26.10x T1000

FP32 Compute

6.52x

Need 6.52x T1000

VRAM

3.00x

Need 3.00x T1000

Memory Bandwidth

4.20x

Need 4.20x T1000

NVIDIA GeForce RTX 3090

Compute (FP32-eq)

1.84x

Need 1.84x GeForce RTX 3090

FP32 Compute

0.46x

GeForce RTX 3090 is 2.18x faster

VRAM

1.00x

GeForce RTX 3090 has 1.00x more

Memory Bandwidth

0.72x

GeForce RTX 3090 has 1.39x more

NVIDIA T4

Compute (FP32-eq)

2.01x

Need 2.01x T4

FP32 Compute

2.01x

Need 2.01x T4

VRAM

1.50x

Need 1.50x T4

Memory Bandwidth

2.10x

Need 2.10x T4

To match 1x NVIDIA T1000

Intel Habana Gaudi 3

Compute (FP32-eq)

0.00x

Gaudi 3 is 367.00x faster

VRAM

0.06x

Gaudi 3 has 16.00x more

Memory Bandwidth

0.04x

Gaudi 3 has 23.13x more

Baidu Kunlun II

VRAM

0.25x

Kunlun II has 4.00x more

NVIDIA GeForce GT 730

Compute (FP32-eq)

3.61x

Need 3.61x GeForce GT 730

FP32 Compute

3.61x

Need 3.61x GeForce GT 730

VRAM

4.00x

Need 4.00x GeForce GT 730

Memory Bandwidth

5.61x

Need 5.61x GeForce GT 730

NVIDIA Quadro RTX 6000

Compute (FP32-eq)

0.04x

Quadro RTX 6000 is 26.10x faster

FP32 Compute

0.15x

Quadro RTX 6000 is 6.52x faster

VRAM

0.33x

Quadro RTX 6000 has 3.00x more

Memory Bandwidth

0.24x

Quadro RTX 6000 has 4.20x more

NVIDIA GeForce RTX 3090

Compute (FP32-eq)

0.07x

GeForce RTX 3090 is 14.20x faster

FP32 Compute

0.07x

GeForce RTX 3090 is 14.24x faster

VRAM

0.33x

GeForce RTX 3090 has 3.00x more

Memory Bandwidth

0.17x

GeForce RTX 3090 has 5.85x more

NVIDIA T4

Compute (FP32-eq)

0.08x

T4 is 13.00x faster

FP32 Compute

0.31x

T4 is 3.24x faster

VRAM

0.50x

T4 has 2.00x more

Memory Bandwidth

0.50x

T4 has 2.00x more

To match 1x NVIDIA GeForce RTX 3090

Intel Habana Gaudi 3

Compute (FP32-eq)

0.04x

Gaudi 3 is 25.85x faster

VRAM

0.19x

Gaudi 3 has 5.33x more

Memory Bandwidth

0.25x

Gaudi 3 has 3.95x more

Baidu Kunlun II

VRAM

0.75x

Kunlun II has 1.33x more

NVIDIA GeForce GT 730

Compute (FP32-eq)

51.23x

Need 51.23x GeForce GT 730

FP32 Compute

51.37x

Need 51.37x GeForce GT 730

VRAM

12.00x

Need 12.00x GeForce GT 730

Memory Bandwidth

32.84x

Need 32.84x GeForce GT 730

NVIDIA Quadro RTX 6000

Compute (FP32-eq)

0.54x

Quadro RTX 6000 is 1.84x faster

FP32 Compute

2.18x

Need 2.18x Quadro RTX 6000

VRAM

1.00x

Quadro RTX 6000 has 1.00x more

Memory Bandwidth

1.39x

Need 1.39x Quadro RTX 6000

NVIDIA T1000

Compute (FP32-eq)

14.20x

Need 14.20x T1000

FP32 Compute

14.24x

Need 14.24x T1000

VRAM

3.00x

Need 3.00x T1000

Memory Bandwidth

5.85x

Need 5.85x T1000

NVIDIA T4

Compute (FP32-eq)

1.09x

Need 1.09x T4

FP32 Compute

4.40x

Need 4.40x T4

VRAM

1.50x

Need 1.50x T4

Memory Bandwidth

2.92x

Need 2.92x T4

To match 1x NVIDIA T4

Intel Habana Gaudi 3

Compute (FP32-eq)

0.04x

Gaudi 3 is 28.23x faster

VRAM

0.13x

Gaudi 3 has 8.00x more

Memory Bandwidth

0.09x

Gaudi 3 has 11.56x more

Baidu Kunlun II

VRAM

0.50x

Kunlun II has 2.00x more

NVIDIA GeForce GT 730

Compute (FP32-eq)

46.90x

Need 46.90x GeForce GT 730

FP32 Compute

11.69x

Need 11.69x GeForce GT 730

VRAM

8.00x

Need 8.00x GeForce GT 730

Memory Bandwidth

11.23x

Need 11.23x GeForce GT 730

NVIDIA Quadro RTX 6000

Compute (FP32-eq)

0.50x

Quadro RTX 6000 is 2.01x faster

FP32 Compute

0.50x

Quadro RTX 6000 is 2.01x faster

VRAM

0.67x

Quadro RTX 6000 has 1.50x more

Memory Bandwidth

0.48x

Quadro RTX 6000 has 2.10x more

NVIDIA T1000

Compute (FP32-eq)

13.00x

Need 13.00x T1000

FP32 Compute

3.24x

Need 3.24x T1000

VRAM

2.00x

Need 2.00x T1000

Memory Bandwidth

2.00x

Need 2.00x T1000

NVIDIA GeForce RTX 3090

Compute (FP32-eq)

0.92x

GeForce RTX 3090 is 1.09x faster

FP32 Compute

0.23x

GeForce RTX 3090 is 4.40x faster

VRAM

0.67x

GeForce RTX 3090 has 1.50x more

Memory Bandwidth

0.34x

GeForce RTX 3090 has 2.92x more

Pricing

Price Type	Gaudi 3	Kunlun II	GeForce GT 730	Quadro RTX 6000	T1000	GeForce RTX 3090	T4
CAPEX (Street Price)	$15,000	—	—	—	—	—	—
OPEX (per hour)	$1.20/hr	—	$0.04/hr	$0.50/hr	$0.17/hr	$0.11/hr	$0.27/hr
Price per TFLOPs (FP32-eq)	$16	—	—	—	—	—	—