How do Computers render graphics on the screen?
When you play a game have you wondered how do all these images, characters get to where they are? How does a 2D Screen display something 3D? Or in the jargon, how does it all get rendered on the screen?
This article attempts to answer this question by discussing the OpenGL graphics pipeline. The pipeline is pretty much the same in any Graphics Specification like Vulkan, DirectX, Metal…
For those who don’t know about OpenGL, here is a quick overview.
OpenGL is a graphics specification developed by Khronos Group. The specification provides an API to interact with the GPU (Graphical Processing Unit).
It merely specifies that certain functions should exist and what those functions should do, the implementation is up to the GPU manufacturers like Nvidia or AMD. For this reason, OpenGL is not and cannot be open-source code. The specification is publicly available (hence the name Open) but implementation is not.
So, how does the stuff get rendered?
As with anything related to computers, everything starts with data.
In the case of graphics programming, the data is in the form of Vertex Data.
Vertex in the graphics sense means not only just the coordinates of the shapes, but anything you can think of that defines/describes the part of the image/texture or shape to be rendered. It is a collection of data per 3D coordinate.
A texture is an image we want to be displayed/rendered onto the screen. (But they can be more than that).
Vertex Data is described using Vertex Attributes and is contained inside buffer (viz, Vertex Buffer Objects) which are then sometimes directly sent to the GPU (from CPU) for processing along the pipeline.
The pipeline takes in the vertices one at a time.
What happens briefly is that the pipeline takes in 3D coordinates (since OpenGL is a 3D graphics specification) and transforms it such that it is displayed on the 2D screen.
And in fact, this is what computer graphics is all about, conversion from a 3D scene description to a 2D image.
How does it do that?
In the early days, GPUs were implemented with a fixed function pipeline but now we have more flexibility and almost all parts of the pipeline can be programmed as you wish.
The Vertex Data that the user sent to the GPU is passed to Shader (specifically Vertex Shader), which are small programs that run on the GPU (much like how a C program runs on the CPU).
OpenGL has C-like language viz. GLSL (OpenGL Shading Language) for writing Shaders.
Vertex Shader: the purpose of the vertex shader is to transform the 3D coordinates into a different type of 3D coordinates, one that OpenGL understands. These coordinates are then further used to assemble shapes.
This brings us on to the next checkpoint in the pipeline:
Shape or Primitive Assembly: OpenGL requires us to hint at what kind of render type to from with the data like triangle, rectangle, points…. This is important because with a given set of coordinates (vertices) many shapes can be drawn. These predefined shapes are called primitives (Some examples are Triangles, Points, Rectangles,…).
In most cases, triangles are used as primitives because a triangle always defines a plane. Or in simpler words, all the vertices lie in the same plane.
Hence, you can say we enter the world of triangles when we play video games.
The assembled shape is then passed on to the next checkpoint on the way to the screen:
Geometry Shader: It takes as input a set of vertices that form a single primitive, e.g., triangle point. The geometry shader can then transform these vertices as it sees fit.
For example, when the to be rendered image/scene is out of the bound of the screen, OpenGL, splits the primitive into smaller sub primitives, and keeps the once that are in the bounds, and discards rest.
The exploding effect is also obtained by programming this shader.
The geometry shader is interesting because it can convert the original primitive (set of vertices) to completely different primitives, possibly generating more vertices than were initially given.
With the shapes to be rendered finalized, the vertex data/shape data is then passed on to the Rasteriser.
Rasterizer: This stage is actually responsible for all that we see on the screen, it maps the primitives to the pixels. It takes in the vertex data and makes up an image out of them. More particular the image is called the raster image, a collective term used to refer to a series of pixels, dots, or lines, which, when displayed together, create the image, these pixels are represented using dot matrices which is a fancy word for a rectangular grid of pixels. Several different algorithms are used to achieve this, for instance, Digital Differential Analyzer is used in Line Rasterization.
Now what we have is a raster image, that needs to be displayed. But the image can be too large for the display and we definitely don’t want to squander rendering the parts that the user won’t see.
Hence, the image is clipped in the Clipping Stage of the pipeline.
Clipping: It discards the fragments outside the view/screen. A fragment is the data required to render a single pixel.
This clipped image is then taken in by the fragment shader, which is responsible for all the vibrant colors we see on the screen.
Fragment Shader: It processes the raster image fragments (single pixels) to calculate the final color of the pixel, we can specify the color using the usual RGBA scheme or we can also give an image or texture to sample the color data from.
After this, we are pretty much done, but with an optional step called i.e.,
Alpha Testing and Blending, which takes into account the opacity and the depth and stencil value of the image respectively while rendering, is usually a 3D graphics step. Describing them is beyond the scope of this article.
One important thing to note is that, besides Vertex Shader and Fragment Shader, every step has its own default shader, hence at the least, we need to program Vertex Shader and Fragment Shader.
And with that, finally, you get the image rendered to the screen.