NVIDIA Readies Ampere A100 PCIe GPU With 80 GB HBM2e Memory & Up To 2 TB/s Bandwidth

NVIDIA is possibly making its fastest GPU, the Ampere A100, even faster with twice the memory capacity and record-breaking memory bandwidth. This is acknowledged by NVIDIA’s own listing that has been discovered by Videocardz.

NVIDIA’s Fastest GPU, The Ampere A100, Getting More Faster With Twice The Memory & Higher HBM2e Bandwidth

The existing NVIDIA A100 HPC accelerator was introduced last year in June and it looks like the green team is planning to give it a major spec upgrade. The chip is based on NVIDIA’s largest Ampere GPU, the A100, which measures 826mm2 and houses an insane 54 billion transistors. NVIDIA gives its HPC accelerators a spec boost during mid-cycle which means that we will be hearing about the next-generation accelerators at GTC 2022.

NVIDIA To Cut RTX 2060 Supply This Month, Will Increase Production Capacity of GeForce RTX 30 Series GPUs

In terms of specifications, the A100 PCIe GPU accelerator doesn’t change much in terms of core configuration. The GA100 GPU retains the specifications we got to see on the 250W variant with 6912 CUDA cores arranged in 108 SM units, 432 Tensor Cores, and 80 GB of HBM2e memory that delivers higher bandwidth of 2.0 TB/s compared to 1.55 TB/s on the 40 GB variant.

A featured image of the NVIDIA GA100 die.

The A100 SMX variant already comes with 80 GB memory but it doesn’t feature the faster HBM2e dies like this upcoming A100 PCIe variant. This is also the most amount of memory ever featured on a PCIe-based graphics card but don’t expect consumer graphics cards to feature such high capacities any time soon. What’s interesting is that the power rating remains unchanged which means that we are looking at higher density chips binned for high-performance use cases.

Specifications of the A100 PCIe 80 GB graphics card as listed over at NVIDIA’s webpage. (Image Credits: Videocardz)

The FP64 performance is still rated at 9.7/19.5 TFLOPs, FP32 performance is rated at 19.5 /156/312 TFLOPs (Sparsity), FP16 performance is rated at 312/624 TFLOPs (Sparsity) and the INT8 is rated at 624/1248 TOPs (Sparsity). NVIDIA is planning to release its latest HPC accelerator next week and we can also expect the pricing of over $20,000 US considering the 40 GB A100 variant sells for around $15,000 US.

NVIDIA Ampere GA100 GPU Based Tesla A100 Specs:

NVIDIA Tesla Graphics Card	Tesla K40 (PCI-Express)	Tesla M40 (PCI-Express)	Tesla P100 (PCI-Express)	Tesla P100 (SXM2)	Tesla V100 (SXM2)	Tesla V100S (PCIe)	NVIDIA A100 (SXM4)	NVIDIA A100 (PCIe4)
GPU	GK110 (Kepler)	GM200 (Maxwell)	GP100 (Pascal)	GP100 (Pascal)	GV100 (Volta)	GV100 (Volta)	GA100 (Ampere)	GA100 (Ampere)
Process Node	28nm	28nm	16nm	16nm	12nm	12nm	7nm	7nm
Transistors	7.1 Billion	8 Billion	15.3 Billion	15.3 Billion	21.1 Billion	21.1 Billion	54.2 Billion	54.2 Billion
GPU Die Size	551 mm2	601 mm2	610 mm2	610 mm2	815mm2	815mm2	826mm2	826mm2
SMs	15	24	56	56	80	80	108	108
TPCs	15	24	28	28	40	40	54	54
FP32 CUDA Cores Per SM	192	128	64	64	64	64	64	64
FP64 CUDA Cores / SM	64	4	32	32	32	32	32	32
FP32 CUDA Cores	2880	3072	3584	3584	5120	5120	6912	6912
FP64 CUDA Cores	960	96	1792	1792	2560	2560	3456	3456
Tensor Cores	N/A	N/A	N/A	N/A	640	640	432	432
Texture Units	240	192	224	224	320	320	432	432
Boost Clock	875 MHz	1114 MHz	1329MHz	1480 MHz	1530 MHz	1601 MHz	1410 MHz	1410 MHz
TOPs (DNN/AI)	N/A	N/A	N/A	N/A	125 TOPs	130 TOPs	1248 TOPs 2496 TOPs with Sparsity	1248 TOPs 2496 TOPs with Sparsity
FP16 Compute	N/A	N/A	18.7 TFLOPs	21.2 TFLOPs	30.4 TFLOPs	32.8 TFLOPs	312 TFLOPs 624 TFLOPs with Sparsity	312 TFLOPs 624 TFLOPs with Sparsity
FP32 Compute	5.04 TFLOPs	6.8 TFLOPs	10.0 TFLOPs	10.6 TFLOPs	15.7 TFLOPs	16.4 TFLOPs	156 TFLOPs (19.5 TFLOPs standard)	156 TFLOPs (19.5 TFLOPs standard)
FP64 Compute	1.68 TFLOPs	0.2 TFLOPs	4.7 TFLOPs	5.30 TFLOPs	7.80 TFLOPs	8.2 TFLOPs	19.5 TFLOPs (9.7 TFLOPs standard)	19.5 TFLOPs (9.7 TFLOPs standard)
Memory Interface	384-bit GDDR5	384-bit GDDR5	4096-bit HBM2	4096-bit HBM2	4096-bit HBM2	4096-bit HBM2	6144-bit HBM2e	6144-bit HBM2e
Memory Size	12 GB GDDR5 @ 288 GB/s	24 GB GDDR5 @ 288 GB/s	16 GB HBM2 @ 732 GB/s 12 GB HBM2 @ 549 GB/s	16 GB HBM2 @ 732 GB/s	16 GB HBM2 @ 900 GB/s	16 GB HBM2 @ 1134 GB/s	Up To 40 GB HBM2 @ 1.6 TB/s Up To 80 GB HBM2 @ 1.6 TB/s	Up To 40 GB HBM2 @ 1.6 TB/s Up To 80 GB HBM2 @ 2.0 TB/s
L2 Cache Size	1536 KB	3072 KB	4096 KB	4096 KB	6144 KB	6144 KB	40960 KB	40960 KB
TDP	235W	250W	250W	300W	300W	250W	400W	250W

[ad_2]