Inquisitive about studying what’s subsequent for the gaming trade? Be a part of gaming executives to debate rising components of the trade this October at GamesBeat Summit Subsequent. Be taught extra.
Again in 2014, Superior Micro Units set an aggressive purpose of 25×20, or reaching 25 occasions higher vitality effectivity for its processors and graphics chips by 2020. The corporate exceeded that purpose, and now it has set a brand new 30×25 purpose, or 30 occasions higher vitality effectivity by 2025 within the machine studying and high-performance computing house in information facilities.
I talked about this ambition with Sam Naffziger, who’s AMD senior vp, company fellow and product expertise architect. Naffziger mentioned that AMD’s graphics processing items (GPUs) and central processing items (CPUs) have undergone massive modifications over the previous few generations as the corporate tries to stability the calls for of fanatic players, information middle computing, and the necessity to ship higher energy effectivity and performance-per-watt.
It’s a recognition that efficiency isn’t the one useful metric to pursue. If our information facilities soften the polar ice caps, they’re not very useful anymore. Whereas the chip trade is bumping up towards the bounds of Moore’s Legislation, Naffziger says he has numerous confidence within the trade and his fellow engineers to innovate.
Right here’s an edited transcript of our interview.

VentureBeat: Are you able to inform us about your background and AMD’s curiosity in vitality effectivity?
Sam Naffziger: I’ve been at AMD 16 years. I’ve been main our energy effectivity, energy expertise for a lot of that point. For the previous few years I’ve been in a product structure position throughout the corporate, optimizing all of our merchandise to make them the very best on the earth. Beginning in late 2017, I went to the graphics division to steer an effort to drive the performance-per-watt and total efficiency and effectivity to regain competitiveness and management there. That’s what I’ve been centered on for plenty of years.
We’ve developed a particularly sturdy observe report now that we’re fairly enthusiastic about. It comes at a compelling time in the place the trade is at. The ability consumption of just about all the pieces, from servers to high-performance computing to gaming, goes up and to the suitable. It’s a really opportune time to concentrate on effectivity enhancements. That’s what we’ve been doing for fairly a while. In reality, it goes again – I don’t know in case you’re acquainted with the 25 by 20 initiative that kicked off way back. It looks as if an entire totally different world now. However that was a daring purpose set in 2014 to develop our pocket book processors to a 25X effectivity enchancment.
The way in which we love to do issues at AMD may be very clear, and never broad, unmeasurable targets. The type that sound compelling, however you possibly can’t be held accountable to. We’re very clear with the methodology for measuring there. We tracked generational enhancements over time. By the 2020 product deployment, we had met and exceeded that 25X purpose, which was not a straightforward factor to do. It required driving efficiency up and energy down concurrently, numerous innovation on the engineering stage.
We wished to construct on that success. Notebooks are nice, and positively effectivity and battery life drive numerous the patron expertise enhancements there. However so far as having an enormous environmental influence and bettering the general vitality footprint of IT gear, we raised our sights to the info middle as properly, with the 30 by 25 purpose that we rolled out final 12 months to drive a 30X effectivity achieve within the machine studying and high-performance computing house. That’s an space that you simply watch intently. I used to be tremendous excited that we acquired into the latest Prime 500 and Inexperienced 500 lists and took the highest spots there with our Epyc merchandise. That’s step one on the highway to 30X effectivity.
These CDNA merchandise go hand in glove with RDNA. They share a typical core of graphics IP and parts. The methodologies and approaches apply to each. That’s the place we’ve been specializing in the gaming facet as properly. What we did is, again once I joined the graphics group, we set out a long-term highway map. These kinds of enhancements take a few years to develop and to ship to the market. We set a long-term plan which encompassed 4 generations of GPU improvement. We began with the ground-up RDNA structure, with the Navi 10 product. With 7nm and all the pieces else we acquired an excellent 50 performance-per-watt increase with that product. Then, in 2020 we delivered what individuals referred to as the Massive Navi, Navi 21, which was the identical 7nm expertise, nevertheless it was the recipient of lots of the methodologies and approaches that we drove within the intervening years to ship one other 50% plus on high of the primary RDNA technology.
What was notably attention-grabbing about that achievement, and one thing that we proceed to construct on, is we’re leveraging the distinctive strengths of AMD in having management CPU and GPU expertise. Our opponents both have good CPUs or good GPUs, however no one has each, a minimum of not but. Now we have a really collaborative engineering tradition right here. We simply thrive on innovating, fixing laborious issues, working collectively throughout the corporate. As we checked out what it could require to hit our effectivity targets for graphics, we engaged our CPU designers, who had finished a incredible job with the Zen structure and supply there.
Graphics structure is a really totally different design house. It’s dealing with textures and pixels, extremely parallel. It has traditionally been hovering round 1 GHz eternally. We did a bunch of deep dives and design critiques to determine what we may do to leverage CPU capabilities and radically enhance what graphics may ship for effectivity. That’s the place numerous the RDNA 2 features got here from.

VentureBeat: My impression through the years has been that Nvidia all the time pushed for efficiency, and very often didn’t care a lot concerning the energy consumption. They tried to set themselves aside on that entrance comparatively, and relative to somebody like Intel that made sense. Whereas AMD was in a special house that checked out some tradeoffs between efficiency and vitality effectivity. You could possibly compete properly towards somebody like Nvidia by placing two graphics playing cards into the house the place one Nvidia card would match, as a result of the Nvidia card was utilizing a lot energy. I believed that was an attention-grabbing option to place, however is there extra nuance you possibly can carry to that image so far as the way you see a few of these aggressive dynamics? Possibly you’d leapfrog at one level, however then they’d leapfrog at one other. The competitors and market share would consistently swing backwards and forwards.
Naffziger: There are numerous video games that may be performed. A twin GPU will be working at a extra environment friendly level, delivering extra performance-per-watt. Whether or not that’s helpful to the common gaming expertise is one other query. That’s troublesome to coordinate. However it’s a matter of focus. We actually had been – not short-changing Nvidia’s contributions, as a result of they do have very power-efficient designs, and have had that. We had been behind for plenty of years. We made a strategic plan to by no means fall behind once more on performance-per-watt.
Energy effectivity offers extra flexibility in design. With a extra power-efficient design, we are able to select to both maximize efficiency, nonetheless burning numerous energy, or optimize the effectivity. That was one other facet that we’ve exploited and invested in considerably: energy administration. It takes benefit of the broad working vary of those merchandise. We’ve pushed the frequency up, and that’s one thing distinctive to AMD. Our GPU frequencies are 2.5 GHz plus now, which is hitting ranges not earlier than achieved. It’s not that the method expertise is that a lot sooner, however we’ve systematically gone by way of the design, re-architected the vital paths at a low stage, the issues that get in the best way of excessive frequency, and finished that in a power-efficient means.
Frequency tends to have a repute of leading to excessive energy. However in actuality, if it’s finished proper, and we simply re-architect the paths to cut back the degrees of logic required, with out including a bunch of giant gates and further pipe phases and such, we are able to get the work finished sooner. If what drives energy consumption in silicon processors, it’s voltage. That’s a quadratic impact on energy. To hit 2.5 GHz, Nvidia may try this, and in reality they do it with overclocked components, however that drives the voltage as much as very excessive ranges, 1.2 or 1.3 volts. That’s a squared influence on energy. Whereas we obtain these excessive frequencies at modest voltages and achieve this far more effectively.
With the sensible energy administration we are able to detect if we’re in a part of a sport that wants excessive frequency, or if we’re in a part that’s restricted by reminiscence bandwidth, for example. We will modulate the working level of the processor to be as energy environment friendly as doable. No must run the engine at most frequency in case you’re ready on reminiscence entry. We invested closely in that with some very high-bandwidth microcontrollers that faucet into the efficiency displays deep within the design to get insights into what’s happening within the engine and modulate the working level up and down very quickly. Whenever you mix that functionality with the excessive frequency, we are able to find yourself with a way more balanced design.
The opposite factor is simply the bread-and-butter of switching capacitance optimizations. Most of my background is in CPU design. I drove numerous the ability enhancements there that culminated within the Zen structure. There’s numerous detailed engineering metrics that we drive that analyze the effectivity of the structure. As you possibly can think about, we have now billions of transistors in this stuff. We should always solely be wiggling those which might be delivering helpful work. We’d burn hundreds of watts if we switched all of the transistors concurrently. Solely a tiny fraction of them are essential to do the work at a given cut-off date.
We analyze our design pre-silicon, as we’re within the technique of growing it, to evaluate that effectivity. In different phrases, when a gate switches, did we really need to modify it? It’s a mentality change that’s analyzing the implementations to have a look at each little bit of exercise and see whether or not it’s required for efficiency. If it’s not, shut it off. We took these sorts of approaches and that pondering from our CPU facet and drove a reasonably dramatic enchancment in all of these switching metrics. We completely analyzed closely the Nvidia designs and what they had been doing, and naturally focused doing a lot better.

VentureBeat: I bear in mind when Raja Koduri shifted over to Intel in 2017. I do know that one individual can’t make that vast a distinction, however is there something you’d hint to pre-Raja and post-Raja when it comes to how AMD appears at graphics? Is there something you gravitated kind of towards?
Naffziger: Raja is a visionary. He paints a terrific and compelling image of the gaming future and options which might be required to drive the gaming expertise to the following stage. He’s nice at that. So far as hands-on silicon execution, his background is in software program. He positively helped AMD to enhance our software program sport and have units. I labored intently with Raja, however I didn’t be a part of the graphics group till after he had left. He had a sabbatical there and went to Intel. So so far as the performance-per-watt, that was probably not Raja’s footprint. However among the software program dimensions and such.
VentureBeat: How a lot do you credit score issues like, say, manufacturing staying on observe and design taking the suitable method as properly? It was an attention-grabbing time in the previous few years, the place TSMC outdid Intel. That was such a shock to the system. It was so totally different from what individuals had been used to. How vital was it to have this stuff occurring on the identical time? Attention-grabbing instructions in design, but additionally far more aggressive foundries.
Naffziger: That’s a vital level. The underlying manufacturing expertise is completely vital. In reality, normally after we do the product launches, we get away the proportion features that we acquired from every dimension – performance-per-watt, energy effectivity optimizations, course of expertise. That was key. We positioned our bets with TSMC and the 7nm delivered. After all we’re persevering with to leverage their newest technology of expertise. Nvidia has the liberty to decide on TSMC as properly. As , Intel goes to be leveraging TSMC additionally, particularly for graphics. Their new Arc line has the identical course of expertise as our GPUs. In some sense, with freedom of alternative we have now a stage enjoying area there in tech. Nevertheless it’s key.
The opposite factor to level out is that from RDNA 1 to RDNA 2, that was the identical 7nm, and we nonetheless managed to squeeze a doubling of efficiency and a 50% achieve in performance-per-watt. That’s simply design prowess. We’re happy with that. A few of that was not simply the fundamentals of optimizable switching. We additionally did revolutionary structure developments. The Infinity Cache particularly was an thrilling factor to carry to market. That, in addition to among the energy optimizations, was a CPU-leveraged functionality. On the core of that’s the identical dense SRAM array that we use in our CPU designs for the L3 cache. It’s very power-efficient, very excessive bandwidth, and it turned out it was a terrific match for graphics. Nobody had finished such a big last-level cache like that. In reality, there was numerous uncertainty as as to whether the charges can be excessive sufficient to justify it. However we positioned a wager, as a result of going to a a lot wider GDDR6 interface is actually a high-power answer for getting that bandwidth. We positioned a wager on that. We went with a narrower bus interface and a big cache. That’s labored properly for us. We see Nvidia following go well with with bigger last-level caches. However nobody’s at 128MB but.
VentureBeat: What has it been like for AMD to get within the information middle in a a lot larger means with graphics, and moving into supercomputers as properly?
Naffziger: It’s been a terrific engineering problem. We made a strategic option to bifurcate our graphics line. They share numerous frequent parts, however totally different structure traces, the Compute DNA and Radeon DNA. That enabled us to optimize the compute structure to be the very best on simply these capabilities. A lot wider math information paths, a lot larger bandwidth to the caches and to reminiscence in fact, utilizing HBM. And in addition jettisoning the overhead for 3D rendering. There’s no want for pixel processing in case you’re simply deploying in a supercomputer or an AI-training community. That freed up extra space for high-bandwidth reminiscence, for giant math information paths, and the capabilities that compute wants.

That was numerous enjoyable as soon as we had that separate sandbox, if you’ll, the place it’s only a compute optimized design. Let’s go and simply kill it for that market house. And the identical approaches of optimizing the switching, the clocking, the ability administration, all the pieces else, these in fact may very well be leveraged between gaming and compute. That’s been nice. It’s a continuous studying course of. However as you possibly can see, we’ve achieved nice effectivity.
The opposite factor we rolled out at our monetary analyst day that we’re trying ahead to delivering later this 12 months is the RDNA 3. We’re not going to let our momentum sluggish in any respect within the effectivity features. We publicly went out with a dedication to a different 50% performance-per-watt enchancment. That’s three generations of compounded effectivity features there, 1.5 or extra. We’re not speaking about all the small print of how we’re going to do it, however one element is leveraging our chiplet experience to unlock the complete capabilities of the silicon we are able to buy. It’s going to be enjoyable as we get extra of that element out.
VentureBeat: So far as the priority that we had been working into partitions with issues like Moore’s Legislation hitting limits and different bodily limitations looming, how involved are you about that at this level?
Naffziger: I’m involved within the sense that it drives new dimensions of innovation to get the efficiencies. The silicon expertise just isn’t going to do it for us. We’ve seen this coming for a very long time. Like I mentioned, lead occasions are lengthy. We’ve been investing in issues just like the Infinity Cache, chiplet structure and all these approaches that exploit new dimensions to maintain the features coming. So sure, it’s an enormous concern, however for many who put together upfront and put money into the suitable expertise, we have now numerous alternative nonetheless.

VentureBeat: In comparison with Nvidia and Intel, do you are feeling like we’re in a state of divergence in the case of designs, or some sort of convergence?
Naffziger: It’s laborious to invest. Nvidia actually hasn’t jumped on the chiplet bandwagon but. Now we have an enormous lead there and we see massive alternatives with that. They’ll be pressured to take action. We’ll see after they deploy it. Intel actually has jumped on that. Ponte Vecchio is the poster baby for chiplet extremes. I might say that there’s extra convergence than divergence. However the corporations that innovate in the suitable house the soonest achieve a bonus. It’s once you ship the brand new expertise as a lot as what the expertise is. Whoever is first with innovation has the benefit.
GamesBeat’s creed when masking the sport trade is “the place ardour meets enterprise.” What does this imply? We wish to let you know how the information issues to you — not simply as a decision-maker at a sport studio, but additionally as a fan of video games. Whether or not you learn our articles, hearken to our podcasts, or watch our movies, GamesBeat will make it easier to be taught concerning the trade and revel in participating with it. Be taught extra about membership.









