NVIDIA Ampere Unleashed: NVIDIA Announces New GPU Architecture, A100 GPU, and Accelerator

While NVIDIA’s usual presentation efforts for the year were dashed by the current coronavirus outbreak, the company’s march towards developing and releasing newer products has continued unabated. To that end, at today’s now digital GPU Technology Conference 2020 keynote, the company and its CEO Jensen Huang are taking to the virtual stage to announce NVIDIA’s next-generation GPU architecture, Ampere, and the first products that will be using it.

Like the Volta reveal 3 years ago – and is now traditional for NVIDIA GTC reveals – today’s focus is on the very high end of the market. In 2017 NVIDIA launched the Volta-based GV100 GPU, and with it the V100 accelerator. V100 was a massive success for the company, greatly expanding their datacenter business on the back of the Volta architecture’s novel tensor cores and sheer brute force that can only be provided by a 800mm2+ GPU. Now in 2020, the company is looking to continue that growth with Volta’s successor, the Ampere architecture.

Now a much more secretive company than they once were, NVIDIA has been holding its future GPU roadmap close to its chest. While the Ampere codename (among others) has been floating around for quite some time now, it’s only this morning that we’re finally getting confirmation that Ampere is in, as well as our first details on the architecture. Due to the nature of NVIDIA’s digital presentation – as well as the limited information given in NVIDIA’s press pre-briefings – we don’t have all of the details on Ampere quite yet. However for this morning at least, NVIDIA is touching upon the highlights of the architecture for its datacenter compute and AI customers, and what major innovations Ampere is bringing to help with their workloads.

Kicking things off for the Ampere family is the A100. Officially, this is the name of both the GPU and the accelerator incorporating it; and at least for the moment they’re both one in the same, since there is only the single accelerator using the GPU.

NVIDIA Accelerator Specification Comparison
  A100 V100 P100
FP32 CUDA Cores 6912 5120 3584
Boost Clock ~1.41GHz 1530MHz 1480MHz
Memory Clock 3.2Gbps HBM2? 1.75Gbps HBM2 1.4Gbps HBM2
Memory Bus Width 5120-bit? 4096-bit 4096-bit
Memory Bandwidth 1.6TB/sec 900GB/sec 720GB/sec
VRAM 40GB 16GB/32GB 16GB
Single Precision 19.5 TFLOPs 15.7 TFLOPs 10.6 TFLOPs
Double Precision 9.7 TFLOPs
(1/2 rate)
7.8 TFLOPs
(1/2 rate)
5.3 TFLOPs
(1/2 rate)
INT8 Tensor 624 TOPs N/A N/A
FP16 Tensor 312 TFLOPs 125 TFLOPs N/A
TF32 Tensor 156 TFLOPs N/A N/A
Interconnect NVLink 3?
12 Links (600GB/sec)
NVLink 2
6 Links (300GB/sec)
NVLink 1
4 Links (160GB/sec)
GPU A100
(826mm2)
GV100
(815mm2)
GP100
(610mm2)
Transistor Count 54B 21.1B 15.3B
TDP 400W 300W/350W 300W
Manufacturing Process TSMC 7N TSMC 12nm FFN TSMC 16nm FinFET
Architecture Ampere Volta Pascal

Designed to be the successor to the V100 accelerator, the A100 aims just as high, just as we’d expect from NVIDIA’s new flagship accelerator for compute.  The leading Ampere part is built on TSMC’s 7nm process and incorporates a whopping 54 billion transistors, 2.5x as many as the V100 before it. NVIDIA has put the full density improvements offered by the 7nm process in use, and then some, as the resulting GPU die is 826mm2 in size, even larger than the GV100. NVIDIA went big on the last generation, and in order to top themselves they’ve gone even bigger this generation.

We’ll touch more on the individual specifications a bit later, but at a high level it’s clear that NVIDIA has invested more in some areas than others. FP32 performance is, on paper, only modestly improved from the V100. Meanwhile tensor performance is greatly improved – almost 2.5x for FP16 tensors – and NVIDIA has greatly expanded the formats that can be used with INT8/4 support, as well as a new FP32-ish format called TF32. Memory bandwidth is also significantly expected, with multiple stacks of HBM2 memory delivering a total of 1.6TB/second of bandwidth to feed the beast that is Ampere.

NVIDIA will be delivering the initial version of this accelerator in their now-common SXM form factor, which is a mezzanine-style card well-suited for installation in servers. On a generation-over-generation basis, power consumption has once again gone up, which is probably fitting for a generation called Ampere. Altogether the A100 is rated for 400W, as opposed to 300W and 350W for various versions of the V100. This makes the SXM form factor all the more important for NVIDIA’s efforts, as PCIe cards would not be suitable for that kind of power consumption.

As for the Ampere architecture itself, NVIDIA is releasing limited details about it today. Expect we’ll hear more over the coming weeks, but for now NVIDIA is confirming that they are keeping their various product lines architecturally compatible, albeit in potentially vastly different configurations. So while the company is not talking about Ampere (or derivatives) for video cards today, they are making it clear that what they’ve been working on is not a pure compute architecture, and that Ampere’s technologies will be coming to graphics parts as well, presumably with some new features for them as well. Ultimately this is part of NVIDIA’s ongoing strategy to ensure that they have a single ecosystem, where, to quote Jensen, “Every single workload runs on every single GPU.”

This is breaking news