
With AMD’s first-wave of Zen 4 CPUs now within the books with the Ryzen 7000 sequence, the patron arm of AMD is now shifting its consideration over to its graphics enterprise. In a presentation that ended moments in the past dubbed “collectively we advance_gaming”, Dr. Lisa Su and different AMD leaders laid out the way forward for AMD’s graphics merchandise. And that future is the RDNA 3 structure, which would be the foundation of the brand new Radeon RX 7900 XTX and Radeon RX 7900 XT video playing cards.
The 2 playing cards, set to be launched on December 13th, would be the first merchandise launched utilizing the RDNA 3 structure. Based on AMD, the brand new flagship 7900 XTX will ship as much as 70% extra efficiency at 4K than their earlier flagship, the 6950 XT. This efficiency increase comes curtesy of a number of architectural enhancements in RDNA that cumulatively provide 54% larger efficiency per watt than RDNA 2, in addition to larger clockspeeds courtesy of TSMC’s 5nm (and 6nm) processes, and better total energy consumption.
The total-fledged RX 7900 XTX can be hitting the streets at $999. In the meantime the second-tier RX 7900 XT will run for $899.
AMD Radeon RX 7000 Sequence Specification Comparability | ||||||
AMD Radeon RX 7900 XTX | AMD Radeon RX 7900 XT | AMD Radeon RX 6950 XT | AMD Radeon RX 6900 XT | |||
Stream Processors | 12288 (96 CUs) |
10752 (84 CUs) |
5120 (80 CUs) |
5120 (80 CUs) |
||
ROPs | 192 | 192 | 128 | 128 | ||
Recreation Clock | 2.3GHz | 2.0GHz | 2100MHz | 2015MHz | ||
Enhance Clock | 2.5GHz | 2.4GHz | 2310MHz | 2250MHz | ||
Throughput (FP32) | 56.5 TFLOPS | 43 TFLOPS | 21.5 TFLOPS | 20.6 TFLOPS | ||
Reminiscence Clock | 20 Gbps GDDR6 | 20 Gbps GDDR6 | 18 Gbps GDDR6 | 16 Gbps GDDR6 | ||
Reminiscence Bus Width | 384-bit | 320-bit | 256-bit | 256-bit | ||
VRAM | 24GB | 20GB | 16GB | 16GB | ||
Infinity Cache | 96MB | 80MB | 128MB | 128MB | ||
Complete Board Energy | 355W | 300W | 335W | 300W | ||
Manufacturing Course of | GCD: TSMC 5nm MCD: TSMC 6nm |
GCD: TSMC 5nm MCD: TSMC 6nm |
TSMC 7nm | TSMC 7nm | ||
Transistor Rely | 58B | 58B – (1 MCD) | 26.8B | 26.8B | ||
Structure | RDNA3 | RDNA3 | RDNA2 | RDNA2 | ||
GPU | Large Navi 3x | Large Navi 3x | Navi 21 | Navi 21 | ||
Launch Date | 12/13/2022 | 12/13/2022 | 05/10/2022 | 12/08/2020 | ||
Launch Worth | $999 | $899 | $1099 | $999 |
AMD’s eagerly anticipated replace to their GPU structure comes as the corporate has been firing on all cylinders for the previous couple of years. On the CPU aspect of issues the Zen 3 and Zen 4 architectures specifically have confirmed very performant, and in the meantime AMD has been capable of claw its approach again from its graphics stoop with the RDNA household of GPU architectures. RDNA 2, the idea of the Radeon RX 6000 sequence, exceeded expectations and proved to be a really sturdy competitor, and now AMD is seemingly setup to exceed expectations as soon as once more, with RDNA 3’s 54% performance-per-watt coming in forward of AMD’s earliest guarantees of a 50% acquire.
AMD Goes Chiplets For GPUs
Whereas as we speak’s reveal from AMD was a extra carefully guarded occasion than the Ryzen 7000 unveil a few months in the past, AMD has nonetheless given us a fairly a little bit of element on the RDNA 3 structure and the playing cards – greater than we’ve got time to cowl right here – so let’s get began from the highest, with the development of the primary RDNA 3 GPU.
The Navi 3x GPU (AMD isn’t confirming the precise GPU identify right now) breaks new floor for AMD not solely on the efficiency entrance, however by way of its building. For the primary time from any of the massive 3 GPU producers, AMD is using chiplets within the building of the GPU.
Chiplets are in some respects the holy grail of GPU building, as a result of they offer GPU designers choices for breaking up complicated monolithic GPU designs into a number of smaller components – permitting for brand spanking new choices for scaling, in addition to mixing and matching the method node utilized in manufacturing. That stated, it’s additionally a holy grail as a result of the immense quantity of information that have to be handed between totally different components of a GPU (on the order of terabytes per second) may be very arduous to do – and really essential to do in order for you a multi-chip GPU to have the ability to current itself as a single machine.
For his or her huge Navi 3x chip, AMD has assembled two sorts of chiplets, basically breaking off the reminiscence capabilities from a basic GPU into their very own chiplets. Which means the core capabilities of the GPU are housed in what AMD is asking the Graphics Compute Die (GCD), which homes all the ALU/compute {hardware}, the graphics {hardware}, in addition to ancillary blocks just like the show and media engines.
As a result of the GCD homes the performance-critical elements of the general GPU, it’s being constructed on TSMC’s 5nm course of. This offers AMD the perfect density, energy consumption, and clockspeeds for these components, although clearly at the next manufacturing price. The GCD die measurement measures 300mm2.
In the meantime the brand new Reminiscence Cache Die (MCD) homes AMD’s infinity cache (L3 cache), in addition to a 64-bit (technically 2×32-bit) GDDR6 reminiscence controllers. The MCD is among the scalable elements of the chiplet design, as Large Navi 3x GPU SKUs may be configured by paring them with extra or fewer MCDs. A full configuration on this case is 6 energetic MCDs, which is what we see within the 7900 XTX. In the meantime the 7900 XT may have 5 energetic MCDs, with a 6th faulty/spacer MCD current for salvaging functions and bodily package deal stability.
A person MCD is 37mm2 in die measurement, and is constructed on TSMC’s 6nm course of. That is an instance of AMD’s course of node flexibility, placing the much less vital GDDR6 reminiscence controllers and Infinity Cache on a less expensive course of node. GDDR6 controllers are a type of basic examples of a know-how that doesn’t scale very effectively with smaller course of geometries (like most types of I/O), so it’s simple to see why AMD would wish to keep away from constructing it on 5nm for minimal advantages.
Within the full 6 MCD configuration (7900 XTX), Large Navi 3x provides a 384-bit GDDR6 reminiscence bus, together with 96MB of L3 cache. In the meantime a 5 MCD (7900 XT) provides a 320-bit GDDR6 reminiscence bus and 80MB of L3 cache.
For the needs of as we speak’s announcement, AMD has not gone into nice depth on how they managed to make a chiplet-based GPU work, however they’ve confirmed a number of essential particulars. In the beginning, with a view to provide the die-to-die bandwidth wanted have the reminiscence subsystem positioned off-chip, AMD is utilizing an unspecified fanout bridge know-how. their Elevated Fanout Bridge (EFB) packaging know-how, which AMD first used for his or her Intuition MI200 sequence accelerators (CDNA2). On these accelerator components it was used to hook up the monolithic GPUs to one another, in addition to HBM2e reminiscence. On RDNA 3, it’s getting used to hook up the MCDs to the GCD.
Notably, fanout bridges are a non-organic packaging know-how, which is to say it’s complicated. That AMD is ready to get 5.3TB/second of die-to-die bandwidth by way of it underscores its utility, however it additionally signifies that AMD is undoubtedly paying deal extra for packaging on Large Navi 3x than they have been on Navi 21 (or Ryzen 7000).
Internally, AMD is asking this memory-to-graphics hyperlink Infinity Hyperlink. Which, because the identify implies, is liable for (transparently) routing AMD’s Infinity Cloth between dies.
As talked about earlier than, the cumulative bandwidth right here between the MCDs and GCD is 5.3TB/second. It’s unclear if the constraining issue is the bandwidth of the Infinity Hyperlink, or that the mixed Infinity Cache + GDDR6 reminiscence controllers can not transfer sufficient information to totally saturate the hyperlink. However regardless, it means there’s basically simply shy of 900GB/second of bandwidth between a person MCD and GCD – greater than all the mixed off-die reminiscence bandwidth of the last-generation Radeon RX 6950 XT (and a pair of.7x greater than Navi 21’s on-die bandwidth).
Whereas we’re as regards to AMD’s L3 Infinity Cache, it’s notable right here that it’s really a bit smaller on Large Navi 3x than it was on Navi 21, with a most capability of 96MB versus 128MB on the previous. Based on AMD they’ve made additional enhancements to enhance information reuse on the Infinity Cache to offset this drop in capability. At this level it’s not clear if the change is a operate of software program algorithms, or in the event that they’ve made extra basic {hardware} adjustments.
Lastly, whereas AMD is quoting die sizes for the GCD and MCD, they aren’t quoting particular person transistor counts. So whereas we all know {that a} full 6 MCD Large Navi 3x configuration is comprised of 58 billion transistors (2.16x greater than Navi 21), we don’t know the way a lot of that’s the GCD versus the MCDs.
AMD RDNA 3 Compute & Graphics Structure: Bringing Again ILP & Enhancing RT
Diving down a stage, let’s check out the precise graphics and compute structure backing RDNA 3 and Large Navi 3x.
Whereas nonetheless clearly sharing lots of the core design parts of AMD’s overarching RDNA structure, RDNA 3 is in some respects a a lot larger shift in architectural design than RDNA 2 was. Whereas RDNA 2’s compute core was basically unchanged from RDNA (1)’s, RDNA 3 makes a number of huge adjustments.
The most important influence is how AMD is organizing their ALUs. Briefly, AMD has doubled the variety of ALUs (Stream Processors) inside a CU, going from 64 ALUs in a single Twin Compute Unit to 128 inside the identical unit. AMD is conducting this not by doubling up on the Twin Compute Items, however as an alternative by giving the Twin Compute Items the power to dual-issue directions. Briefly, every SIMD lane can now execute as much as two directions per cycle.
However, as with all dual-issue configurations, there’s a trade-off concerned. The SIMDs can solely difficulty a second instruction when AMD’s {hardware} and software program can extract a second instruction from the present wavefront. Which means RDNA 3 is now explicitly reliant on extracting Instruction Stage Parallelism (ILP) from wavefronts with a view to hit most utilization. If the subsequent instruction in a wavefront can’t be executed in parallel with the present instruction, then these further ALUs will go unfilled.
This can be a notable change as a result of AMD developed RDNA (1) partly to get away from a reliance on ILP, which was recognized as a weak spot of GCN – which was why AMD’s real-world throughput was not as quick as their on-paper FLOPS numbers would indicated. So AMD has, in some respects, walked backwards on that change by re-introducing an ILP dependence.
We’re nonetheless ready on extra info from AMD outlining why they made this modification. However dual-issue is often an inexpensive approach so as to add extra throughput to a processor design (you don’t must do all of the instruction monitoring required for a completely separate Twin Compute Unit), and it may be worthwhile tradeoff in the event you can make sure you’ll have the ability to dual-issue more often than not. But it surely signifies that AMD’s real-world ALU utilization fee is probably going decrease on RDNA 3 than RDNA 2, as a result of bubbles from not having the ability to dual-issue.
Which to carry issues again to gaming and the merchandise at hand, it signifies that the FLOPS numbers between RDNA 3 and RDNA 2 components are usually not going to be totally comparable. 7900 XTX might push 2.6x as many FP32 FLOPs as 6950 XTX on paper, however the true world benefit on something lower than ultimate code goes to be much less. Which is among the the explanation why AMD is simply selling a real-world efficiency uplift of 1.7x for the 7900 XTX.
In any case, SIMDs aren’t the one adjustments to the core compute structure of RDNA 3. Feeding the beast, AMD has made the Vector Common Goal Register (VGPR) financial institution 50% bigger than on RDNA 2.
Extra vital than that’s that AMD is lastly integrating devoted silicon for AI processing on their shopper GPUs. That is an space the place each of AMD’s opponents (NVIDIA and Intel) have already made the funding on their shopper components, and as the usage of GPU inference in workloads continues to develop, it’s not one thing AMD can ignore any longer.
Given the gaming-centric focus of as we speak’s presentation, AMD didn’t spend a lot time speaking concerning the new AI items. Every RDNA 3 CU may have 2 of those items, and they’re going to help new AI directions (some form of INT8 tensor operation looks as if a given). All informed, AMD is saying that the brand new AI items give the Radeon RX 7900 XTX 2.7x the AI efficiency, which AMD is measuring as bfloat16 efficiency versus the RX 6950 XT.
General, the significance of this to players is one thing that continues to be to be seen. AMD isn’t at the moment utilizing AI items for FSR 2 (in contrast to NVIDIA’s DLSS 2). However that might change for future initiatives. In any other case, for extra skilled customers (or anybody who likes to mess with Steady Diffusion), that is an addition that’s excellent news.
Transferring on, AMD has additionally up to date their raytracing {hardware} for RDNA 3. The second-generation RT accelerator, as AMD calls it, can deal with 1.5x extra rays in flight. There are additionally new {hardware} field sorting and traversal options that weren’t current in RDNA 2’s preliminary RT performance. AMD’s presentation gave the technical particulars a light-weight remedy, however it actually seems to be like AMD is transferring to doing an even bigger a part of the ray tracing course of in devoted {hardware}. Which in flip would assist enhance their efficiency, and preserve efficiency steadier by not stealing fairly so many assets from the remainder of the CU.
AMD’s personal efficiency slides tout wherever between a 47% and 84% enhance in RT efficiency. Although it needs to be famous that AMD’s numbers are with FSR enabled; so we can not divorce these positive aspects from any adjustments that enhance FSR efficiency on the 7900 XTX.
Final, however not least, AMD has made an fascinating resolution with clockspeeds on the RDNA 3. Briefly, AMD has decoupled their clocks; somewhat than operating your complete GCD on the similar clockspeed, AMD can be operating the shaders and front-end at totally different clockspeeds. Within the case of the 7900 XTX, it will see the shaders operating at 2.3GHz (the marketed sport clock pace), whereas the front-end will run at a barely speedier 2.5GHz (about 9% sooner).
AMD didn’t go into nice element on why they’ve made this modification, however at a excessive stage it’s all about balancing efficiency versus energy consumption. The shaders might run at 2.5GHz as effectively (certainly, the 7900 XTX’s rated increase clock is 2.5GHz), however as we’ve seen time and time once more, these last clocks are the costliest by way of energy as you go up the v/f curve. So AMD has made the selection to surrender a little bit of potential efficiency to save lots of quite a bit on energy, as 96 CUs/12288 ALUs is quite a lot of silicon to gentle up. Conversely, the front-end is comparatively small, and with AMD having beefed up their CUs by a lot, spending a bit extra energy on the front-end is presumably price it to maintain them from bottlenecking the remainder of the GPU.
RDNA 3 Show & Media Engines: The Newest and the Best
AMD’s core compute/graphics structure was not the one a part of the RDNA 3 structure to get an replace on this era. AMD has additionally used the chance to improve their show and media engines to help new options and new codecs.
On the show engine entrance, AMD’s show engine, which they’re now calling the “AMD Radiance Show Engine” has been upgraded to help DisplayPort 2.1. Particularly, AMD has added help for the DisplayPort 2.x feature suite in addition to the UHBR 10 and UHBR 13.5 information charges. Which means RDNA 3 playing cards can provide 2x the DisplayPort bandwidth of their DisplayPort 1.4-enabled predecessors, which in flip permits for larger resolutions and better refresh charges. Notably, this ever so barely exceeds HDMI 2.1’s bandwidth as effectively, placing DisplayPort again into the lead, no less than on AMD playing cards.
Unsurprisingly, AMD is utilizing this performance to push forthcoming larger decision and better refresh fee gaming screens, together with a Samsung ultrawide show set to launch in 2023 with a horizonal 8K decision. So it’s not only for displaying off specs, and AMD and its companions are intending to place it to good use.
AMD has not stated something concerning the complete variety of supported shows. So at this level I anticipate it’s nonetheless a most of 4 shows.
In the meantime on the media engine entrance, AMD has given RDNA 3 help for the most recent and best video codecs. Together with the same old H.264 and H.265 help, RDNA 3’s media engines additionally add full AV1 encode and decode help, making this the most recent GPU household to roll out help for the next-generation open format codec. RDNA 3 will have the ability to encode and decode AV1 at as much as 8Kp60.
The general efficiency of the media engine has been elevated considerably. Based on AMD the media engine runs 80% sooner than it did on RDNA 2 components, permitting for simultaneous encoding (or decoding) of as much as two H.264/H.265 streams. Although it’s unclear if that additionally applies to AV1.
Lastly as regards to AMD’s GPU uncore, whereas not explicitly known as out in AMD’s presentation, it’s price noting that AMD has not up to date their PCIe controller. So RDNA 3 nonetheless maxes out at PCIe 4.0 speeds, with Large Navi 3x providing the same old 16 lanes. Which means despite the fact that AMD’s newest Ryzen platform helps PCIe 5.0 for graphics (and different PCIe playing cards), their video playing cards gained’t be reciprocating on this era. In actual fact, which means that nobody may have a PCIe 5.0 shopper video card.
Radeon RX 7900 XTX & Radeon RX 7900 XT: Launching December 13th
Bringing as we speak’s reveal full circle, let’s flip again to the playing cards themselves, the Radeon RX 7900 XTX and RX 7900 XT.
AMD’s flagship card would be the Radeon RX 7900 XTX. Whereas we’re nonetheless ready on affirmation of this, this is able to appear to be a fully-enabled Large Navi 3x half, with all the blocks in each the GCDs and the person MCDs themselves enabled. As talked about beforehand, AMD is touting a broad efficiency uplift of as much as 70% versus the previous-generation flagship, the RX 6950 XT.
Internally, this implies 96CUs and 96MB of L3 Infinity Cache can be accessible on the cardboard. The sport clock (common clockspeed) can be 2.3GHz, whereas based mostly on different AMD figures, we are able to infer that the increase (most) clockspeed can be 2.5GHz. The sport clock specifically is a ~10% enchancment over the 6950 XT, so AMD is having fun with a modest frequency uplift generation-over-generation, however nothing too huge. A lot of the heavy lifting will come courtesy of the structure and reminiscence adjustments.
Talking of reminiscence, the RX 7900 XTX can be paired with 24GB of GDDR6 reminiscence operating at (no less than) 20Gbps. Apparently, AMD’s companions have the headroom to go even larger than this with manufacturing facility overclocking, however the flooring worth for the half can be 20Gbps total. This can be a modest enhance in reminiscence clockspeeds versus the 6950 XT (11%). As a substitute, the majority of the VRAM bandwidth positive aspects will come from the 50% bigger reminiscence bus, with the 7900 XTX transferring to a large 384-bit bus. In complete, this implies the 7900 XTX may have 960GB/sec of reminiscence bandwidth, 66% greater than its predecessor. AMD obtained their “free” reminiscence subsystem efficiency increase within the final era with Infinity Cache, so for this time round, they’re again to needing so as to add extra bodily reminiscence bandwidth to maintain the ever-growing beast correctly fed.
In the meantime, the 7900 XT can be a chip off the block, with fewer CUs, much less VRAM, and decrease clockspeeds. All informed we’re 84CUs paired with 20GB of 20Gbps GDDR6, and backed by an 80MB infinity cache. The cardboard’s sport clockspeed score is 2.0GHz, and we should not have any info on the increase clockspeed. The mix of a 13% drop in clockspeeds and 13% drop in CUs provides up to what’s, on paper, a 24% deficit in compute/shading efficiency. That stated, AMD’s pricing signifies that the real-world efficiency hole shouldn’t be this excessive, and we’re nonetheless lacking some essential particulars comparable to ROP counts. So for higher or worse, we don’t have body of reference fright now for the way the 7900 XT will carry out relative to anything, current-generation or final.
Unsurprisingly, energy consumption on the excessive finish can be going up. The 7900 XTX can be a 355W card, up 20W from the 335W 6950 XT (and 55W from the 300W 6900 XT). This can be a extra modest energy requirement than on NVIDIA’s high-end RTX 4090 Ti (450W), however we’re nonetheless speaking a few card effectively north of 300W. For players with a barely smaller urge for food for giant energy payments, the 7900 XT can be holding the road at 300W. Each playing cards would require 2 8-pin PCIe energy connectors.
AMD has additionally despatched over footage of each the reference 7900 XTX and 7900 XT. Of specific be aware, each playing cards will function a USB-C port for show outputs. This can be a function that AMD launched with the RX 6000 sequence and has opted to hold ahead into the RX 7000 sequence. As with the previous-gen playing cards, the presence of the USB-C port is for straight hooking up screens that depend on DisplayPort Alt Mode over USB-C. In the meantime, rounding out the gathering can be a paid of DisplayPorts (2.1) and an HDMI 2.1 port.
Each playing cards are utilizing a brand new triple fan blower design from AMD. We’re nonetheless ready on additional particulars right here, however AMD has informed us that the 7900 XTX measures 287mm lengthy, and is 2.5 slots vast.
Wrapping issues up, each playing cards can be launching on December 13th, with AMD planning on having each reference and AIB companions’ playing cards on the shelf for launch day. The 7900 XTX will begin at $999, in the meantime the 7900 XT can be proper behind it at $899. AMD isn’t providing any efficiency comparisons versus NVIDIA playing cards, however at this juncture it looks as if the wildcard is the soon-to-launch GeForce RTX 4080 16GB. By the point AMD launches in December, we must always have a significantly better thought of the place AMD and NVIDIA’s dueling lineups stand compared to one another.