Clik here to view.

The architecture of choice to facilitate further software advances in AI
John Hennessy and David Patterson gave their Turing lecture A New Golden Age for Computer Architecture on June 4, 2018, as the recipients of the 2017 Turing Award, the equivalent of the Nobel Prize for Computer Science. The three key insights of the lecture are:
- Software advances can inspire architecture innovation.
- Elevating the hardware/software interface creates opportunities for architecture innovation.
- The marketplace ultimately settles the architecture debates.
I want to complete the loop by amending the three key insights with a fourth one:
The winning architecture facilitates the subsequent software advances.
Since the Hennessy/Patterson lecture, the marketplace has arguably accomplished insight #3 in AI and settled on the Graphics Processing Unit (GPU) as the winning architecture that has facilitated the AI revolution. In this article, I explore how the AI revolution is inspiring architecture innovations and re-inventing the GPU. I hope to answer a significant question of my own:
Will the GPU star in a new golden age for computer architecture?
Domain-Specific Architecture
Henessy and Patterson proposed Domain-Specific Architecture (DSA) to innovate computer architecture and strive toward a new golden age. As the name suggests, the GPU is a DSA for 3D Graphics. It aims to render photo-realistic images of the 3D virtual world; however, almost all AI researchers use the GPU to explore ideas beyond 3D Graphics, making breakthroughs in AI “software,” a.k.a. Neural Network architectures. While still indispensable in 3D, the GPU has become the “CPU” of AI since it facilitates software innovations in AI. GPU architects have been making available the GPU’s computing resources for non-3D, in addition to 3D uses. We dub this design philosophy General-Purpose GPU (GPGPU).
Nowadays, we see a proliferation of AI DSAs instead of GPGPU, attempting to replace the GPU with better performance. Even the GPU itself is struggling between its dual personalities: AI DSA and 3D DSA. The reason is that AI DSA requires accelerating tensor operations, which are abundant in AI but not in 3D. At the same time, 3D fixed-function hardware sounds unnecessary for AI.
Thus, the primary architecture debate seems to ask
- Will the GPU keep its throne as the “CPU” of AI?
- Will the GPU diverge into two DSAs, one for AI and the other for 3D?
My prediction is the following:
- The GPU hardware/software interface will keep the GPU the “CPU” for AI.
- AI-based Rendering will make tensor acceleration a mainstay in the GPU.
- Digital Twins, in which the virtual and the real worlds mirror each other, will preside over the marketplace, at last settling the architecture debate.
GPU Hardware/Software Interface
We can attribute the GPU’s dominance in 3D and runaway success in AI to its hardware/software interface, which GPU and 3D Graphics software architects endeavor to embrace. This interface is the key to resolving the following paradox. While the GPU community continues to make the GPU more general-purpose, the rest of the industry has switched to more specialized hardware to work around the demise of Moore’s Law.
Clik here to view.

Two-Tier Programmability
A GPU is conceptually a long linear pipeline of processing stages. Different types of work items are processed as they flow through the pipeline. In the early days, each processing stage was a fixed-function block. The only control programmers had over the GPU was to tweak the parameters of each block. These days, the GPU hardware/software interface gives programmers the freedom to do as they please with each work item, be it a vertex or a pixel. There is no need to address the loop head in each vertex or pixel loop, as GPU architects implement it in a fixed function. This architectural choice leaves the programmers with the responsibility to concern themselves with the loop body, or “shader,” which is often named after the type of work item, such as “vertex shader” for processing vertices and “pixel shader” for handling pixels.
How do modern games produce stunning pictures with such a linear pipeline? In addition to controlling different types of shaders in one pass through the pipeline, programmers can progressively go through the pipeline multiple times to produce intermediate images that ultimately yield the final image seen on the screen. Programmers effectively create a computation graph, describing the relationships among the intermediate images. Each node in the graph represents one pass through the GPU pipeline.
A Centralized Pool of General-Purpose Computing Resources
A centralized pool of general-purpose computing resources is shared among the processing stages and does the heavy lifting. The initial motivation for such a scheme was load balancing; a processing stage may have drastically varying workloads in different usage scenarios. The computing resources, referred to as the Shader Cores, have become more general-purpose to achieve flexibility and product differentiation.
GPU architects opportunistically made the centralized shader pool available to non-3D applications as GPGPU. This design scheme enabled the GPU to achieve breakthroughs in running AI tasks even as a part-time job.
Balanced Specialization
GPU architects regularly “accelerate” or “domain-specify” the shader pool by adding co-processing units without altering the hardware/software interface. The Texture unit is such a co-processing unit, with which texels in texture maps are fetched and filtered on their way to the shader pool. The Special Function Unit (SFU) is another co-processing unit that performs transcendental math functions, such as logarithms, inverse square roots, etc. Although having multiple functions sounds similar to the superscalar design in a CPU, there is one significant difference: GPU architects apportion the throughput of a co-processing unit according to how often an “average” shader program uses it. For example, we can give Texture units one-eighth of the throughput of the shader pool, assuming Texture operations appear in benchmarks or games on average one-eighth of the time. When a co-processing unit is busy, the GPU switches tasks to keep itself occupied.
Tensor Acceleration for 3D
In my introduction, I pointed out that the GPU struggled to adopt tensor acceleration in 3D. Let’s see how this trend might reverse if we change how a GPU renders a typical game frame. The GPU first generates and stores all information necessary to shade a pixel in G-buffer for each pixel. From G-buffers, we calculate how to light a pixel, followed by several processing steps, including
- Remove jagged edges (anti-aliasing (AA))
- Upscale a low-resolution image to a higher one (super-resolution (SR))
- Add to the whole frame specific visual effects, such as Ambient Occlusion, Motion Blur, Bloom Filter, or Depth-of-Field.
We call this rendering scheme Deferred Shading since shading a pixel is “deferred” until every pixel gets the information it needs. We refer to the processing steps following lighting as Post-Processing. Today, Post-Processing consumes about 90% of the rendering time, meaning that a GPU spends its screen time predominantly in 2D instead of 3D!
NVIDIA has demonstrated AI-based DLSS 2.0 for AA and SR, which claims to produce better-looking images than those rendered natively without DLSS 2.0. In addition, NVIDIA also offers AI-based Monte-Carlo de-noising for Ray Tracing, with which we can use few rays to achieve the quality that’s otherwise only possible with many more rays. Moreover, AI inspires a new class of solutions to many other types of Post-Processing, such as NNAO for Ambient Occlusion and DeepLens for Depth-of-Field.
If AI-based Post-Processing becomes mainstream, tensor acceleration will become a mainstay in the 3D side of the GPU’s personality. The GPU’s divergence into 3D DSA and AI DSA will become less likely.
3D/AI Convergence
To settle the architecture debate, we will want to address the last piece of the puzzle: should we eventually remove the fixed-function hardware in 3D rendering, especially for AI? Note that through GPGPU, the GPU can do the 3D rendering as pure “software” without using any fixed-function hardware.
In a strict sense, given the scene parameters, 3D rendering simulates how photons are transported from light sources through space to interact with objects in the 3D virtual world. Conventional 3D rendering by the GPU is a very crude approximation of this process. Thus, Microsoft said that “[Conventional rasterization-based] 3D Graphics is a lie” in an announcement to promote Ray Tracing as “the full 3D effects of tomorrow.” However, a 3D rendering purist might still dismiss Ray Tracing, in which we achieve 3D rendering by tracing the rays of light backward from the pixels into the 3D virtual world, as also not truthful.
Both approaches are approximations to simulation-based 3D rendering. In either case, we decouple modeling of the 3D virtual world, or content creation, from rendering. In the first case, modeling the 3D virtual world requires engineers and artists a vast amount of laborious and creative work to describe every object and its physical property regarding how it interacts with lights. In the second case, regarding rendering, total truthfulness is impossible since we need to drastically simplify 3D rendering to meet different performance targets within resource budgets.
In contrast to finding a solution with the best-known scientific knowledge and mathematical theories for a given problem, the AI approach is about “learning” a computational model, or a Neural Network, from data. We adjust the network parameters iteratively by trial and error. We run the network forward through previous parameter estimates and measure mismatch or “loss.” We then adjust the parameters to reduce the loss according to its gradient, effectively navigating the loss landscape in the opposite direction of the gradient. This mechanism, referred to as backpropagation, requires all computations along the forward path to be differentiable to participate in calculating the gradient.
Neural Rendering is an emerging AI research field that studies 3D rendering using the approach described above. Below is my mindmap to keep track of progress in Neural Rendering:
Clik here to view.
This model of the 3D virtual world is represented implicitly as Neural Network parameters (see NeRF, GRAF, GIRAFFE), which we infer from comparing the real-world images with the ones we render from the virtual world. Then we backpropagate the gradient of the comparison to adjust the Neural Network parameters. Optionally, we can learn explicit 3D meshes from data (see Deep Marching Cube, GAN2Shape). Effectively, modeling the 3D virtual world is the same thing as learning the Neural Network parameters. This process requires us to include a 3D rendering pipeline in the forward path and integrate modeling and rendering of the 3D virtual world in tight loops. Through iterations of rendering and testing against real-world images, we obtain the desired models and scene parameters that we can use to render new views of the virtual world.
Within this framework, we can choose not to adjust the whole of each parameter, for example, keeping the shape of an object intact but estimating its location (see iNeRF). This way, we effectively try to recognize and locate the object in question instead of modeling it. There is no longer a difference between modeling and recognition tasks. Instead, It is a matter of which scene parameters we want to “learn” or “estimate.”
Conclusion
Thus, under the AI problem-solving paradigm, 3D rendering is not only about producing photo-realistic images of the 3D virtual world but also for building the virtual world from the real world. Furthermore, the new framework redefines 3D and AI in the following ways:
- 3D rendering becomes an essential operation in the training loop of AI
- Training, or “gradient descent,” which used to happen only to train Neural Networks in the cloud, is now part of inference.
- Photo-realism is as much about looking great as maintaining the correspondence between the real and the virtual worlds.
Digital Twins will demand bringing the massive and ever-changing real world to its under-developed twin and constantly maintaining the correspondence between the twins. The virtual objects acquired through Neural Rendering need to co-exist with classically built ones. Hence, I believe Neural Rendering and conventional rendering will converge on the GPU, leveraging its mature and performant 3D pipeline. The demands of Digital Twins will fall on the shoulders of future GPUs. Work needs to be done on the GPU side to become “differentiable” in order to participate in the AI training loop’s gradient calculation.
Suppose the GPU becomes natively differentiable and tensor-accelerated in response to the AI advances in 3D, I foresee the dual personalities of the GPU becoming one.
Then, the GPU maintains its position as the architecture of choice to facilitate further software advances in AI and at last stars in a new golden age for computer architecture.Image may be NSFW.
Clik here to view.
Will The GPU Star in A New Golden Age of Computer Architecture? was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.