NVIDIA Announces the GeForce RTX 30 Series: Ampere For Gaming, Starting With RTX 3080 & RTX 3090

With much anticipation and more than a few leaks, NVIDIA this morning is announcing the next generation of video cards, the GeForce RTX 30 series. Based upon the gaming and graphics variant of NVIDIA’s Ampere architecture and built on an optimized version of Samsung’s 8nm process, NVIDIA is touting the new cards as delivering some of their greatest gains ever in gaming performance. All the while, the latest generation of GeForce will also be coming with some new features to further set the cards apart from and ahead of NVIDIA’s Turing-based RTX 20 series.

Out of the gate, NVIDIA is announcing the first three cards to make up the new RTX 30 series: the RTX 3090, RTX 3080, and RTX 3070. These cards are all launching within the next month and a half – albeit at slightly separate times – with the RTX 3090 and RTX 3080 leading the charge. The two cards, in turn, will serve as the successors to NVIDIA’s GeForce RTX 2080 Ti and RTX 2080/2080S respectively, hitting new highs in graphics performance, albeit while also hitting new highs in prices in the case of the RTX 3090.

The first card out the door will be the GeForce RTX 3080. With NVIDIA touting upwards of 2x the performance of the RTX 2080, this card will go on sale on September 17th for $700. That will be followed up a week later by the even more powerful GeFoce RTX 3090, which hits the shelves September 24th for $1500. Finally, the RTX 3070, which is being positioned as more of a traditional sweet spot card, will arrive next month at $499.

NVIDIA GeForce Specification Comparison
  RTX 3090 RTX 3080 RTX 3070 RTX 2080 Ti
CUDA Cores 10496 8704 5888 4352
Boost Clock 1.7GHz 1.71GHz 1.73GHz 1545MHz
Memory Clock 19.5Gbps GDDR6X 19Gbps GDDR6X 16Gbps GDDR6 14Gbps GDDR6
Memory Bus Width 384-bit 320-bit 256-bit 352-bit
Single Precision Perf. 35.7 TFLOPs 29.8 TFLOPs 20.4 TFLOPs 13.4 TFLOPs
Tensor Perf. (FP16) 285 TFLOPs 238 TFLOPs 163 TFLOPs 114 TFLOPs
Ray Perf. 69 TFLOPs 58 TFLOPs 40 TFLOPs ?
TDP 350W 320W 220W 250W
GPU GA102? GA102? GA104? TU102
Transistor Count 28B 28B ? 18.6B
Architecture Ampere Ampere Ampere Turing
Manufacturing Process Samsung 8nm Samsung 8nm Samsung 8nm TSMC 12nm “FFN”
Launch Date 09/24/2020 09/17/2020 10/2020 09/20/2018
Launch Price MSRP: $1499 MSRP: $699 MSRP: $499 MSRP: $999
Founders $1199

Ampere for Gaming: GA102

As is traditionally the case for NVIDIA, this morning’s public presentation was not an architectural deep dive. Though the purely virtual presentation was certainly a change of pace for a company who treats every video card launch like a party, NVIDIA stuck to their successful launch playbook. That means a lot of demonstrations, testimonials, and promotional videos, along with some high-level overviews of several of the technologies and engineering design decisions that went into making their latest generation of GPUs. The net result is that we have a decent idea of what’s in store for the RTX 30 series, but we’ll have to wait for NVIDIA to offer some deep dive technical briefings to fill in the blanks and get to the heart of matters in true AnandTech style.

At a high level, Ampere and the GA102 GPU being used in these top-tier cards brings several major hardware advancements to NVIDIA’s lineup. The biggest of which is the ever-shrinking size of transistors, thanks to a customized version of Samsung’s 8nm process. We only have limited information about this process – mostly because it hasn’t been used too many places – but at a high level it’s Samsung’s densest traditional, non-EUV process, derived from their earlier 10nm process. All told, NVIDIA has ended up as a bit of a latecomer in moving to smaller processes, but as the company has re-developed an affinity for shipping large GPUs first, they need higher wafer yields (fewer defects) to get chips out the door.

In any case, for NVIDIA’s products Samsung’s 8nm process is a full generational jump from their previous process, TSMC’s 12nm “FFN”, which itself was an optimized version of TSMC’s 16nm process. So NVIDIA’s transistor densities have gone up significantly, which is reflected in the sheer number of CUDA cores and other hardware available. Whereas mid-generation architectures like Turing and Maxwell saw most of their gains at an architectural level, Ampere (like Pascal before it) benefits greatly from a proper jump in lithographic processes. The only hitch in all of this is that Dennard Scaling has died and isn’t coming back, so while NVIDIA can pack more transistors than ever into a chip, power consumption is creeping back up, which is reflected in the cards’ TDPs.

NVIDIA hasn’t given us specific die sizes for GA102, but based on some photos we’re reasonably confident it’s over 500mm2. Which is notably smaller than the ridiculously-sized 754mm2 TU102, but it’s still a sizable chip, and among the largest chips produced at Samsung.

Moving on, let’s talk about the Ampere architecture itself. First introduced this spring as part of NVIDIA’s A100 accelerator, until now we’ve only seen Ampere from a matching compute-oriented perspective. GA100 lacked several graphics features so that NVIDIA could maximize the amount of die space allocated to compute, so while graphics-focused Ampere GPUs like GA102 are still a member of the Ampere family, there are a significant number of distinctions or differences between the two. Which is to say that NVIDIA was able to keep a lot under wraps about the gaming side of Ampere until now.

From a compute perspective, Ampere looked a fair bit like Volta before it, and the same can be said from a graphics perspective. GA102 doesn’t introduce any exotic new functional blocks like RT cores or tensor cores, but their capabilities and relative sizes have been tweaked. The most notable change here is that, like Ampere GA100, the gaming Ampere parts inherit updated and more powerful tensor cores. A single GA102 SM can provide double the tensor throughput of a Turing SM – despite having half as many distinct tensor cores – and can support features like sparsity for additional performance, underscoring NVIDIA’s commitment to neural networking and AI performance. NVIDIA’s Deep Learning Super Sampling (DLSS) tech relies in part on this, and NVIDIA is still looking at more ways to put their tensor cores to good use.

The RT cores have also been beefed up, though to what degree we’re not certain. Besides having more of them overall by virtue of GA102 having a larger number of SMs, the individual RT cores are said to be faster. Which is very good news for the gaming industry’s ray tracing ambitions, as ray tracing had a heavy performance cost on RTX 20 series cards. Now with that said, nothing NVIDIA does is going to completely eliminate that penalty – ray tracing is a lot of work, period – but more and rebalanced hardware can help bring that cost down.

GDDR6X: Cooking With PAM

Outside of the core GPU architecture itself, GA102 also introduces support for another new memory type: GDDR6X. A Micron and NVIDIA developed evolution of GDDR6, GDDR6X is designed to allow for higher memory bus speeds (and thus more memory bandwidth) by using multi-level signaling on the memory bus. By employing this strategy, NVIDIA and Micron can continue to push the envelope on cost-effective discrete memory technologies, and thus continue to feed the beast that is NVIDIA’s latest generation of GPUs. This marks the third memory technology in as many generations for NVIDIA, having gone from GDDR5X to GDDR6 to GDDR6X

Micron accidentally spilt the beans on the subject last month, when they posted some early technical documents on the technology. By employing Pulse Amplitude Modulation-4 (PAM4), GDDR6X is able to transmit one of four different symbols per clock, in essence moving two bits per clock instead of the usual one bit per clock. For the sake of brevity I won’t completely rehash that discussion, but I’ll go over the highlights.

At a very high level, what PAM4 does versus NRZ (binary coding) is to take a page from the MLC NAND playbook, and double the number of electrical states a single cell (or in this case, transmission) will hold. Rather than traditional 0/1 high/low signaling, PAM4 uses 4 signal levels, so that a signal can encode for four possible two-bit patterns: 00/01/10/11. This allows PAM4 to carry twice as much data as NRZ without having to double the transmission bandwidth, which would have presented an even greater challenge.

NRZ vs. PAM4 (Base Diagram Courtesy Intel)

PAM4 in turn requires more complex memory controllers and memory devices to handle the multiple signal states, but it also backs off on the memory bus frequency, simplifying some other aspects. Perhaps most importantly of which for NVIDIA at this point is that it’s more power efficient, taking around 15% less power per bit of bandwidth. To be sure, total DRAM power consumption is still up because that’s more than offset by bandwidth gains, but every joule saved on DRAM is another joule that can be dedicated to the GPU instead.

According to Micron’s documents, the company designed the first generation of their GDDR6X to go to 21Gbps; however NVIDIA is keeping things a bit more conservative and stopping at 19.5Gbps for the RTX 3090, and 19Gbps for the RTX 3080. Even at those speeds, that’s still a 36%-39% increase in memory bandwidth over the previous generation of cards, assuming identically-sized memory buses. Overall this kind of progress remains the exception to the norm; historically speaking we typically don’t see memory bandwidth gains quite this large over successive generations. But with many more SMs to feed, I can only imagine that NVIDIA’s product teams are glad to have it.

GDDR6X does come with one somewhat immediate drawback however: capacity. While Micron has plans for 16Gbit chips in the future, to start things out today they’re only making 8Gbit chips in the future. This is the same density as the memory chips on NVIDIA’s RTX 20 series cards, and their GTX 1000 series cards for that matter. So there are no “free” memory capacity upgrades, at least for these initial cards. RTX 3080 only gets 10GB of VRAM versus 8GB on RTX 2080, and that’s by virtue of using a larger 320-bit memory bus (which is to say, 10 chips instead of 8). Meanwhile RTX 3090 gets 24GB of VRAM, but only by using 12 pairs of chips in clamshell mode on a 384-bit memory bus, making for more than twice as many memory chips as on RTX 2080 Ti.

HDMI 2.1 Is In, VirtualLink Is Out

Finally, on the display I/O front, Ampere and the new GeForce RTX 30 series cards make a couple of notable changes here. The most important of which is that, at long last, HDMI 2.1 support has arrived. Already shipping in TVs (and set to ship in this year’s consoles), HDMI 2.1 brings a few features to the table, most notably support for much greater cable bandwidth. An HDMI 2.1 cable can carry up to 48Gbps of data – more than 2.6x as much as HDMI 2.0 – allowing for much higher display resolutions and refresh rates, such as 8K TVs or 4K displays running at upwards of 165Hz. This significant jump in bandwidth even puts HDMI ahead of DisplayPort, at least for now; DisplayPort 1.4 only offers around 66% the bandwidth, and while DisplayPort 2.0 will eventually beat that, it would seem that Ampere is just a bit too early for that technology.

With all of that said, I’m still waiting on confirmation from NVIDIA about whether they support a full 48Gbps signaling rate with their new GeForce cards. Some HDMI 2.1 TVs have been shipping with support for lower data rates, so it’s not inconceivable that NVIDIA may do the same here.

HDMI 2.1’s other marquee feature from a gaming standpoint is support for variable refresh rates over HDMI. However this feature is not exclusive to HDMI 2.1, and indeed has already been backported to NVIDIA’s RTX 20 cards, so while support for it is going to be more useful here with the greater cable bandwidth, it technically is not a new feature to NVIDIA’s cards.

Meanwhile VirtualLink ports, which were introduced on the RTX 20 series of cards, are on their way out. The industry’s attempt to build a port combing video, data, and power in a single cable for VR headsets has fizzled, and none of the big 3 headset manufacturers (Oculus, HTC, Valve) used the port. So you will not find the port returning on RTX 30 series cards.

Finally, it looks like SLI support will be sticking with us, for at least one more generation. NVIDIA’s RTX 3090 card includes a single NVLInk connector for SLI and other multi-GPU purposes. I suspect this is more a play for compute users – many of whom will be drooling over a card with 24GB of VRAM – but NVIDIA is never one to pass up an opportunity to upsell on the graphics front as well.