AMD Reveals The Radeon RX 6000 Series: RDNA2 Starts At The High-End, Coming November 18th

Preparing to close out a major month of announcements for AMD – and to open the door to the next era of architectures across the company – AMD wrapped up its final keynote presentation of the month by announcing their Radeon RX 6000 series of video cards. Hosted once more by AMD CEO Dr. Lisa Su, AMD’s hour-long keynote revealed the first three parts in AMD’s new RDNA2 architecture video card family: the Radeon RX 6800, 6800 XT, and 6900 XT. The core of AMD’s new high-end video card lineup, AMD means to do battle with the best of the best out of arch-rival NVIDIA. And we’ll get to see first-hand if AMD can retake the high-end market on November 18th, when the first two cards hit retail shelves.

AMD’s forthcoming video card launch has been a long time coming for the company, and one they’ve been teasing particularly heavily. For AMD, the Radeon RX 6000 series represents the culmination of efforts from across the company as everyone from the GPU architecture team and the semi-custom SoC team to the Zen CPU team has played a role in developing AMD’s latest GPU technology. All the while, these new cards are AMD’s best chance in at least half a decade to finally catch up to NVIDIA at the high-end of the video card market. So understandably, the company is jazzed – and in more than just a marketing manner – about what the RX 6000 means.

Anchoring the new cards is AMD’s RDNA2 GPU architecture. RDNA2 is launching near-simultaneously across consoles and PC video cards next month, where it will be the backbone of some 200 million video game consoles plus countless AMD GPUs and APUs to come. Accordingly, AMD has pulled out all of the stops in designing it, assembling an architecture that’s on the cutting-edge of technical features like ray tracing and DirectX 12 Ultimate support, all the while leveraging the many things they’ve learned from their successful Zen CPU architectures to maximize RDNA2’s performance. RDNA2 is also rare in that it isn’t being built on a new manufacturing process, so coming from AMD’s earlier RDNA (1) architecture and associated video cards, AMD is relying on architectural improvements to deliver virtually all of their performance gains. Truly, it’s AMD’s RDNA2 architecture that’s going to make or break their new cards.

Over the coming months, RDNA2 will filter down into an increasing number of AMD chip designs. But for now, in the PC space, AMD is starting with enthusiast-level video cards. The first RDNA2 GPU out of the works is Navi 21 – AKA “Big Navi” – which AMD will be using as the basis of a trio of video cards. These are the Radeon RX 6900 XT, 6800 XT, and 6800 respectively. With these latest cards, AMD is aiming squarely at NVIDIA’s recently-launched GeForce RTX 30-series lineup, aiming to meet (or beat) the RTX 3090, 3080, and 3070 respectively. Suffice it to say, AMD hasn’t been able to match NVIDIA’s top cards for several years now, so these are very bold claims from a company that has re-learned how to become very bold in the last five years.

AMD Radeon RX Series Specification Comparison
  AMD Radeon RX 6900 XT AMD Radeon RX 6800 XT AMD Radeon RX 6800 AMD Radeon RX 5700 XT
Stream Processors 5120?
(80 CUs)
4608?
(72 CUs)
3840
(60 CUs)
2560
(40 CUs)
Game Clock 2015MHz 2015MHz 1815MHz 1755MHz
Boost Clock 2250MHz 2250MHz 2105MHz 1905MHz
Throughput (FP32) 20.6 TFLOPs 18.6 TFLOPs 13.9 TFLOPs 9.75 TFLOPs
Memory Clock 16 Gbps(?) GDDR6 16 Gbps(?) GDDR6 14 Gbps(?) GDDR6 14 Gbps GDDR6
Memory Bus Width 256-bit 256-bit 256-bit 256-bit
VRAM 16GB 16GB 16GB 8GB
Infinity Cache 128MB 128MB 128MB N/A
Total Board Power 300W 300W 250W 225W
Manufacturing Process TSMC 7nm TSMC 7nm TSMC 7nm TSMC 7nm
Transistor Count 26.8B 26.8B 26.8B 10.3B
Architecture RDNA2 RDNA2 RDNA2 RDNA (1)
GPU Navi 21 Navi 21 Navi 21 Navi 10
Launch Date 12/08/2020 11/18/2020 11/18/2020 07/07/2019
Launch Price $999 $649 $579 $399

As today’s announcement is not a full tech deep dive – like the Zen 3 announcement, the deep dive will come closer to launch – AMD is only sharing some high-level specifications of the new cards. But with information on the number of CUs, memory support, and power consumption now in hand, we have a good idea of what AMD is bringing to the table, and why they believe the RX 6000 series will once again make them fully-competitive at the high-end.

Radeon RX 6900 XT

Amongst AMD’s video card stack, the company is taking a very similar tack as NVIDIA. So although the RX 6900 XT is technically their flagship part, at $999 it’s still in a similar “more money than sense” category as the RTX 3090 – which is to say that AMD is looking to charge a good premium for the card. Since it’s based on a fully-enabled and heavily binned Navi 21 GPU, it’s the hardest part for AMD to produce, and thus the rarest. How rare remains to be seen, but as Navi 21 is undoubtedly a large chip thanks to the 26.8 billion transistors housing 80 CUs and 128MB Infinity Cache (more on that later), I don’t expect AMD to be flooding the market with 6900 XTs any time soon.

At any rate, the card will be the full Navi 21 experience. All 80CUs will be enabled, and thanks to AMD’s efforts in optimizing the RDNA2 architecture for clockspeeds, it will have a game clock of 2015MHz and a peak turbo clock (boost clock) of 2250MHz. Assuming that AMD hasn’t done anything too crazy with the RDNA2 architecture, this means that the card will have an average FP32 shader throughput of around 20.6 TFLOPS, more than double the throughput of the RX 5700 XT – and right in line with AMD’s goals to double their performance over the RX 5000 series.

On the memory side of matters the card comes with 16GB of GDDR6. AMD is not officially disclosing the memory clockspeed, but based on current tech trends we’re expecting that GDDR6 to be running at 16Gbps/pin. Meanwhile, based on other AMD slides, it looks like this memory is attached to a 256-bit memory bus, which would give the card a total memory bandwidth of 512GB/sec. This is admittedly a bit anemic for a flagship video card, and is where AMD’s new Infinity Cache technology will come into play.

Finally, AMD says that the card will have a Total Board Power (TBP) of 300W. At a time when ASIC designers everywhere are struggling to keep power consumption in check – and when even NVIDIA’s top cards now hit 350W – a 300W TBP is a significant claim. Not only is it potentially a competitive advantage for AMD, but it means they haven’t needed to increase their TBP versus their previous high-end card, the also 7nm-based Radeon VII.

With all of that said, however, some expectation management is in order. Although AMD is comparing the RX 6900 XT to NVIDIA’s RTX 3090, there are a few potential gotchas involved. We’ll break down AMD’s specific performance claims and benchmark slides later, but suffice it to say and as AMD is pitching things now, if the RX 6900 XT can match the 3090, it’s going to be at the very top edge of the AMD card’s performance profile, overclocking include. Which would go hand-in-hand with the heavy binning mentioned earlier.

Radeon RX 6800 XT

Moving down the line we have the Radeon RX 6800 XT. Based on a slightly cut-down Navi 21 GPU, this is the AMD card that mere mortals can afford. At $650 it’s still expensive, to say the least, but significantly less so than the $1000 6900 XT.

Out of the box, the RX 6800 XT comes with 72 CUs enabled. Combined with clockspeeds identical to the RX 6900 XT – a game clock of 2015MHz and a boost clock of 2250MHz – and pending additional technical details from AMD, the card should average around 18.6 TFLOPS of FP32 shader performance. Relative to the RX 5700, this is just under 2x the performance of AMD’s last-gen leading card.

Other than the reduction in CUs, according to AMD’s specifications there are no other notable differences between the RX 6800 XT and RX 6900 XT. The more affordable card still comes with 16GB of GDDR6 memory, and I expect it’ll be clocked at 16Gbps/pin as well, giving it an expected total bandwidth of 512GB/sec. Augmenting this, like all of today’s cards, will be 128MB of AMD’s Infinity Cache. Meanwhile the card’s TBP carries the same 300W rating as the 6900 XT as well.

AMD is positioning this card directly against NVIDIA’s RTX 3080; and unlike the 6900 XT, there are no asterisks to speak of here. So if AMD can deliver on their claims, then this 6800 XT will be doing so right out of the box. In which case, meeting the 3080 in performance and possibly beating it in power consumption would be a big win for AMD and the RDNA2 architecture, and a sign of how far they’ve come in the last few years.

Overall, given AMD’s performance expectations and pricing, along with the expected competition, it’s unsurprising that AMD lead with this card today for the Radeon keynote. It’s technically not the flagship, but like the RTX 3080 on NVIDIA’s side, it’s going to be the card that garners the most attention.

Radeon RX 6800

It will be the final card in AMD’s new product stack, however, that will garner the most in sales. Rounding out the Navi 21 trio of cards we have the Radeon RX 6800, a further cut-down Navi 21 part that trades off a drop in performance for lower power consumption and a lower $579 price.

The RX 6800 will ship with 60 CUs enabled, making it three-quarters of a full Navi 21. Relative to the other cards, clockspeeds are going to be lower, with a game clock of 1815MHz and a boost clock of 2105MHz. Which, per our math, would give the card 13.9 TFLOPS of FP32 shader performance, or around 75% of the shader performance of the RX 6800 XT.

Surprisingly, AMD isn’t dialing back much (if at all) on the memory. Despite being the third and most budget-friendly card in the stack, it still ships with 16GB of GDDR6 memory. And while I suspect AMD will go with cheaper 14Gbps memory here – giving the card 448GB/sec of bandwidth – that’s still a lot of memory overall. Furthermore the complete 128MB Infinity Cache will be available, so the GPU itself still has a full pool of its fastest local memory on-hand.

Typical for lower-tier cards, the RX 6800 also carries a lower TBP rating of 250W, 50W less than the RX 6800 XT. AMD of course is no stranger to offering high-powered cards, but in some respects the RX 6800 is going to be the most interesting of the bunch, as it’s going to be our first chance to look at what RDNA2 can do at a more reasonable TBP.

For the moment, AMD is classifying the RX 6800’s performance relative to NVIDIA’s RTX 2080 Ti. In reality this is card that will (roughly) compete with the RTX 3070 – the same card that replaced the RTX 2080 Ti – but as RTX 3070 doesn’t launch until tomorrow, AMD doesn’t have a card that they draw apples-to-apples comparisons against. So for the moment, AMD is instead playing with the transitive property of equality. With AMD’s own launch not coming until November 18th, there’s still plenty of time for AMD to frame this card’s performance with respect to the RTX 3070, so it will be interesting to see what (if anything) they have to say in a week’s time. AMD’s promotional material places the RX 6800 ahead of the RTX 2080 Ti, but we’ll see if that holds true for the RTX 3070 as well. For now, AMD seems very confident that they’re going to beat NVIDIA’s cheapest RTX-30 series card, and with a projected performance and VRAM capacity lead they’ve priced it accordingly at $579.

One other curiosity on positioning is that AMD is also promoting the RX 6800 as a 4K gaming card, just like its more powerful siblings. Working to the card’s advantage, its 16GB of VRAM means that it has plenty of memory to handle the larger buffers that come with the higher resolution. However as the card only has 75% of the RX 6800 XT’s theoretical shader performance, I do wonder whether it’s going to be outright fast enough for 4K over the long run. Current-generation games should be very doable, judging from what the RTX 3080 and RTX 2080 Ti can do, however there’s a bit more concern over the long run as game developers switch to the next-gen consoles as their performance and feature baselines. In which case, the RX 6800 will instead be AMD’s 1440p gaming king, and fittingly the company has published some 1440p performance slides for the card as well, alongside the 4K performance slides.

Overall, AMD some very ambitious performance goals for the Radeon RX 6000 series of cards. At this point AMD knows exactly what kind of performance they need to deliver to take on NVIDIA for the high-end, and the company believes that the Navi 21 GPU they’ve built is just the tool to do that. Now with 3 weeks to go until the first RX 6800 series cards launch, it’s just a matter of time until we can see first-hand whether AMD can deliver on those ambitious, RDNA2-fueled promises.

Underpinning virtually everything that AMD wants to accomplish with their new RX 6000 series cards will be the RDNA2 GPU architecture. Not to harp on this point more than is necessary, but it needs to be underscored that AMD doesn’t get the benefit of a new process node for the RX 6000 series cards; the company will be using the same TSMC 7nm process that they used for RX 5000 family. So almost everything AMD gets out of the new parts performance-wise is going to come from architectural improvements. And architectural improvements are not easy to come by.

Ever since AMD announced the RDNA2 architecture, they have reiterated a singular goal: they wanted to achieve a 50% jump in perf-per-watt over RDNA1. And that they would do it entirely with architectural improvements, not process improvements.

To put it diplomatically, this is a tall order, especially for AMD. RDNA (1), even with its complete replumbing of AMD’s core GPU architecture and the benefit of the move to TMSC’s 7nm node only achieved a bit more than a 50% improvement. And now AMD has to do the same thing only a bit over a year later, without a new manufacturing node? These kinds of major leaps have been done before – most famously with NVIDIA’s Maxwell architecture – but they are few and far between, as they are very hard to pull off.

But pull it off AMD has, according to today’s keynote. The company is touting that RDNA2 offers a 54% perf-per-watt increase over RDNA (1) thanks to the combined efforts of their enhancements. We will obviously be testing AMD’s RX 6000 series cards in great detail in the coming weeks to confirm whether this is true, but taking AMD at face value here, if this is true, then it’s a major accomplishment on their end.

AMD will disclose more about RDNA2 and its optimizations later on in their full tech deep dives, but for today there are offering a high-level overview focusing on the three big sources of efficiency gains for RDNA2: more energy-efficient CUs, higher frequencies at the same power levels, and an increase in real-world perf-per-clock/IPC thanks to the Infinity Cache.

Starting with the CUs, RDNA2 incorporates a revised CU design that has undergone extensive power management upgrades. According to AMD they’re not only doing more fine-grained clock gating than ever before to eliminate energy wastage, but they have reworked the CU’s data paths as well to reduce energy spent there. This is a more important change than it would first appear, as the costs of moving data, rather than just processing it, are quickly becoming a major energy burden. Shuffling bits in and out of VRAM is very expensive, and even sending them elsewhere in the chip has a cost. So minimizing the amount of energy spent on moving data is a core optimization strategy for ASICs in this day and age.

AMD’s presentation also quickly talked about aggressive pipeline rebalancing, though the company isn’t disclosing any more details than that. For the moment, I’m assuming that the core architecture hasn’t seen any significant changes – that each CU still contains two SIMD32 units – but AMD may have made some smaller changes around that. So we’ll have to see what AMD tells us over the coming weeks.

Going hand-in-hand with the energy optimizations to the CUs, AMD has also done chip-wide optimizations to improve on how high they can clock their silicon, and the power costs of those high frequencies. So if you’ve been wondering how Sony is able to get the PS5’s iGPU up to 2.2GHz+, we’re about to see just how that’s been done on the RX 6000, as AMD has implemented similar changes here. Overall AMD is claiming that they can clock 30% higher at the same power, which that alone would deliver a lot of AMD’s claimed 54% improvement in perf-per-watt over RDNA (1). AMD has previously talked about how their experiences with the Zen architecture have guided their GPU efforts as well, and this is an area in particular where their Zen learnings have been very applicable.

Infinity Cache

Last, but not least, we have the mysterious Infinity Cache. AMD is not going into too many ideas about the Infinity Cache today, but at a high level, this is a sizable on-chip cache that according to AMD functions a lot like an L3 cache on a CPU. Just how similar remains to be seen, but conceptually, it can be thought of as a local cache that buffers against reads and writes to the main memory, and provides a backstop for large operations that’s a lot faster than having to go out to VRAM.

On-chip caches for GPU usage are not a new idea, especially for AMD. The company included a 32MB eSRAM cache for the Xbox One (and Xbox One S) SoC, and even before that the Xbox 360 had an on-package cache as well. But this is the first time we’ve seen a large cache on a PC GPU.

Navi 21 will have a 128MB Infinity Cache. Meanwhile AMD isn’t speaking about other GPUs, but those will presumably include smaller caches as fast caches eat up a lot of die space. On which note, doing some quick paper napkin math and assuming AMD is using standard 6T SRAM, Navi 21’s Infinity Cache would be at least 6 billion transistors in size, which is a significant number of transistors even on TSMC’s 7nm process (for reference, the entirety of Navi 10 is 10.3B transistors). In practice I suspect AMD has some optimizations in place to minimize the number of transistors used and space occupied, but regardless, the amount of die space they have to be devoting to the Infinity Cache is significant. So this is a major architectural trade-off for the company.

The advantages of the Infinity cache, in turn, are seen in a few areas. As far as perf-per-watt goes, the cache further improves RDNA2’s energy efficiency by reducing the amount of traffic that has to go to energy-expensive VRAM. It also allows AMD to get away with a smaller memory subsystem with fewer DRAM chips and fewer memory controllers, reducing the power consumed there. Along these lines, AMD justifies the use of the cache in part by comparing the power costs of the cache versus a 384-bit memory bus configuration. Here a 256-bit bus with an Infinity Cache only consumes 90% of the power of a 384-bit solution, all the while delivering more than twice the peak bandwidth.

Furthermore, according to AMD the cache improves the amount of real-world work achieved per clock cycle on the GPU, presumably by allowing the GPU to more quickly fetch data rather than having to wait around for it to come in from VRAM. And finally, the Infinity Cache is also a big factor in AMD’s ray tracing accelerator cores, which keep parts of their significant BVH scene data in the cache.

Ultimately, today is just a teaser of sorts for the Infinity Cache. Besides the die space costs, there are numerous outstanding questions about performance, how the cache is used (can developers directly access it), and how well it works with existing software. Large GPU caches as a whole are a risky decision – it should be noted that while Microsoft followed a similar strategy for the Xbox One, they went with a tried and true large GDDR memory bus for the Xbox One X – so I’m very curious how AMD is going to use a relatively small 128MB cache to make up for what would have otherwise been another 4GB+ of VRAM and 256GB/sec more VRAM bandwidth. Of everything AMD is doing today the Infinity Cache is the most revolutionary change, but it’s also the most questionable for a PC GPU.

Finally, it should be noted that as far as the Infinity Cache goes, AMD isn’t talking about the consoles either. While it’s billed as a central part of the RDNA2 architecture, it’s unclear at this time whether either console vendor opted to include it (even in a smaller form) for their respective devices. Both the PS5 and XSX come with significant DRAM subsystems, and neither console maker has talked about a large cache in their technical disclosures.

RDNA2 Features: Making The Jump To DirectX 12 Ultimate

Along with numerous optimizations to the power efficiency of their GPU architecture, RDNA2 also includes a much-needed update to the graphics side of AMD’s GPU architecture. RDNA (1), though a massive replumbing of the core compute architecture, did not include any graphics feature upgrades. As a result, AMD only offered a DirectX feature level 12_1 feature set – the same as the Radeon RX Vega series – at a time when NVIDIA was offering ray tracing and the other features that have since become DirectX 12 Ultimate (fl 12_2). So RDNA2 is AMD’s cache to catch up in features as well as performance, as they are integrating the latest and greatest in GPU graphics tech.

AMD has previously disclosed that RDNA2 would be a DX12 Ultimate-level architecture, so there aren’t much in the way of major reveals to be had here, nor will I recap DX12U in great depth.

DirectX 12 Feature Levels
  12_2
(DX12 Ult.)
12_1 12_0
GPU Architectures
(Introduced as of)
NVIDIA: Turing
AMD: RDNA2
Intel: Xe-HPG
NVIDIA: Maxwell 2
AMD: Vega
Intel: Gen9
NVIDIA: Maxwell 2
AMD: Hawaii
Intel: Gen9
Ray Tracing
(DXR 1.1)
Yes No No
Variable Rate Shading
(Tier 2)
Yes No No
Mesh Shaders Yes No No
Sampler Feedback Yes No No
Conservative Rasterization Yes Yes No
Raster Order Views Yes Yes No
Tiled Resources
(Tier 2)
Yes Yes Yes
Bindless Resources
(Tier 2)
Yes Yes Yes
Typed UAV Load Yes Yes Yes

Overall, DirectX 12 Ultimate focuses on 4 major features (and some minor spec changes): Ray tracing, tier 2 variable rate shading, mesh shaders, and sampler feedback. Ray tracing has received the most attention for obvious reasons, especially as NVIDIA has heavily promoted it. The rendering model based on how light bounces around in the real world can add a lot of detail with better lighting and reflections, but it is relatively expensive to use.

Meanwhile variable rate shading and mesh shaders are going to be less visible to end users, but they offer tangible performance improvements, and in the case of mesh shaders will eventually dramatically alter the geometry pipeline for games designed with mesh shaders as a baseline feature. Finally, sampler feedback will allow game developers to get a better idea of what textures and texel blocks within those textures are being used, allowing developers to better manage what assets are in VRAM and what needs to be pre-loaded.

Ray tracing itself does require additional functional hardware blocks, and AMD has confirmed for the first time that RDNA2 includes this hardware. Using what they are terming a ray accelerator, there is an accelerator in each CU. The ray accelerator in turn will be leaning on the Infinity Cache in order to improve its performance, by allowing the cache to help hold and manage the large amount of data that ray tracing requires, exploiting the cache’s high bandwidth while reducing the amount of data that goes to VRAM.

AMD is not offering any performance estimates at this time, or discussing in depth how these ray accelerators work. So that will be something else to look forward to once AMD offers deep dives on the technology.

The inclusion of sampler feedback also means that AMD will be able to support Microsoft’s forthcoming DirectStorage API. Derived from tech going into the next-gen consoles, DirectStorage will allow game assets to be streamed directly from storage to GPUs, with the GPUs decompressing assets on their own. This bypasses the CPU, which under the current paradigm has to do the decompression and then send those decompressed assets to the GPU.

Platform Synergy: AMD Smart Access Memory

Besides fleshing out their feature capabilities at the hardware level in the RDNA2 architecture directly, as a CPU and GPU (and APU) maker, AMD has also been looking at how to leverage any possible synergy between their CPU and GPU platforms. And for their forthcoming Ryzen 5000 platforms, they’ve added a new trick to their toolkit when the Radeon RX 6000 is used with those new CPUs, in a feature AMD calls Smart Access Memory.

For the purposes of today’s keynote from AMD, the company is saying very little about the technology. From the bits and pieces that AMD has disclosed, the company has told me that the tech adjusts how data is transferred between the GPU and the CPU by giving the CPU direct access to the full 16GB of the GPU’s VRAM. The net result is that Smart Access Memory will be able to reduce memory fragmentation within the VRAM pool, which will improve performance.