An introduction to how modern graphics work in video games.
Graphics programming is often the story of balancing a dozen different restrictions to find the sweet spot. We saw last time that each time the scene is drawn, we build it up out of a lot of smaller building blocks - people, cars, traffic signs, buildings, everything that is on-screen is composed out of individual components that need to be drawn. There are several fine balancing acts going on here that we'll dive into.
The CPU and the graphics card have to work together to draw a frame. The CPU is where all of the rest of the game is running, so it is involved in deciding what things to draw for this frame, where the camera is looking, what animations are playing. The graphics card is the workhorse that does all of the mind-bubblingly complicated work that goes into drawing the pixels, and that's the big reason why the graphics card is so specialised for what it does.
It turns out then that both the CPU and the graphics card have limitations on how fast or how much they can process - but they care about different types of limitations.
Generally the CPU will care most about its own kind of work; how many objects are you drawing total? How different are those objects - is it 100 identical lamp posts, or 100 bushes, plants, trees? Are those objects animated and dynamically moving about, or are many of them static and unmoving?
The first thing we do to squeeze out as much as we can is to only draw the things that you can see on-screen. This seems obvious, but it does require a good deal of careful work. Remember that we are building every frame from scratch, so every time we might need to look at every object and determine if it is visible or not. What this means then is that in every game there is an empty black void following you around, and whenever you aren't looking objects and people will vanish from sight.
In this animation we rotate the camera around on a pivot, showing the empty void behind you. Careful viewers will note that it's not quite empty...
Getting a rule of thumb for this is difficult, but at broad strokes you can draw about 1000 objects in a scene without worrying and having room to spare - but if you're rendering 5000 objects you'd better have some tricks up your sleeve. Remember that in most games where the player can control a camera, you never know which angle they're looking from, so you have to allow some wiggle room.
It turns out there are a lot of tricks to getting the most out of this limitation, and especially in a game like Watch Dogs this is a primary concern. The closer you can get to the limits and the more visual fidelity you can get out of the same number of objects, the better the game will look.
Even if you look in front of you in the scene, that doesn't mean that you have to draw the entire rest of the city from here to the countryside. If there's a giant building to your right or to your left, everything behind that is blocked from sight - so let's just skip drawing it. Likewise some objects in the far distance get really small so we don't need to worry about drawing tiny plants and scrubs if they're far away.
This animation shows a tour down the street where we were originally looking, showing everything that's missing from side streets and the lack of detail in the distance.
In fact there are still many objects being drawn in this scene that don't end up being visible. This is an area where there's still a lot of work going on and different complicated techniques. It's always a trade-off, but if you can spend a little bit of time - or come up with a very clever method - and avoid drawing 100 objects without much work, you can make the scene that much more complicated or that much denser.
There's also another subtle issue in here. In some cases, we'll draw a building that's only barely on the screen, and extends quite far off the screen. This is quite wasteful. We could always split these objects into multiple parts - then each part could either be drawn or skipped and we'd have less waste overall. Except now we've increased the number of objects we're drawing in total when they are on screen and caused a problem for ourselves the other way!
This particular case is one of hundreds, but it's an easy one to explain and it gives a window into the kind of decisions and experimentation that needs to happen from game to game to find the sweet spot in the middle.
Since this captured frame of Watch Dogs takes place in the city, we can take an aerial view to get a rough idea of what's currently being drawn. You can see especially in the static picture that Watch Dogs probably does some amount of work on a block-by-block basis.
Here we zoom out from above to show the visible area in front of the camera (forgive the jumpcut to save time).
Overlayed on the zoomed out image, here's roughly what section is visible to the camera. How narrow or wide this triangle is is determined by the Field of View - sometimes an option in games, sometimes a fixed value.
Here is a wireframe view of the scene, with the visible area from the camera outlined in white.
By now someone may have thought of a neat little cheat to get around this annoying limitation of only being able to draw a limited number of objects - what if you made objects a really complicated combination of everything in a small area, down to scattered leaves? That way drawing 1000 objects would be more than enough.
Here we run smack into a whole new set of limitations - graphics cards only have a limited amount of power, and the more complicated an object is then the longer it will take to draw. That means that even one object could mean the game runs at 20 FPS if it was complicated enough. That's part of the reason why I said that the object budget was a bit fuzzy.
How time-consuming an object is depends on how complicated and detailed the model and textures are, as well as how sophisticated the lighting and shading is. This is part of the reason why often games that attempt more complex and sophisticated graphics techniques will trend towards less complicated and detailed scenes - the balancing see-saw tips one way or another, so you can give yourself more breathing room by sacrificing where it's not as important to your game.
This is a kind of heat-map showing which areas of the scene have particularly complicated models. Note that trees and vegetation can be sons of bitches.
There are another set of techniques called "level of detail" or LODs that are specifically built to address this kind of issue. In the same way that we squeezed out the most that we could out of our object budget by stripping everything that wasn't necessary, so too can we squeeze the most out of our "complexity" budget by removing waste.
One form of this is in fact very well known - texture resolution. This topic tends to intersect with many other things so I'll try and keep it straightforward.
Textures in games are typically rectangles that are exact powers of 2 in each direction - so 512, 1024, 2048, 4096. There are many many good reasons for this, but one benefit is that you can take a texture that is 1024x1024 and make a smaller version of it that's only 512x512 very easily.
For reasons that I will talk about later, you always want to have every smaller size available when you have a texture. So the 1024x1024 texture above will have 512x512, 256x256, 128x128, 64x64, 32x32, 16x16, 8x8, 4x4, 2x2, 1x1 around. However one benefit of this is that in the distance for objects that are only a small size on screen, having 1024x1024 texture for them is totally wasteful. We can save some complexity by just using the smaller versions of the same texture.
Likewise even nearby, we can make the decision that not every object needs to be super detailed and allow smaller textures.
Usually whenever you get up close to an object the largest possible texture will be used, but as most people have seen in one game or another this doesn't always happen and the textures look overly blurry before they load in. This is usually because the textures can't be loaded all the way from DVD or hard disk all the way to the graphics card's memory in time for rendering. It's most common any time you suddenly change location, either by respawning, spawning in a new level, or move really quickly. In most other cases, normally the textures can be brought in gradually as you move around the world.
You can also apply this simplification process to the models used in the game, although it's a lot more complicated to do. By making simple versions of complicated objects, you can ensure that in the distance they're not using more of your complexity budget than needed.
Having simplified objects and cut-down models can be a huge saving and is required to pull off any game like Watch Dogs, but it can also be a huge drain on resources. You have to make a very careful and considered decision about how many simplified versions to have. Too few, and either you don't save enough or the transition 'pop' will be visible. Too many, and you waste memory and man hours creating objects with little gain.
This is what complex objects and characters look like in the distance, when it's indistinguishable from their high-detail versions.
Hopefully now you have some idea of the issues facing graphics programmers, artists and level designers when trying to squeeze the highest visual fidelity while preserving performance. On consoles the equation is a little simpler than PCs since the hardware is absolutely fixed in stone.
I said last time that I would talk about why reflections and other things are really difficult to do right. The reason is fairly simple - those budgets I've been yammering on about don't change depending on whether you want to do reflections or not. If you want to do a reflection that might show the whole scene again, you have to do all of that work I've been talking about over again. On top of this, reflections are very specific to where you are viewing them from, so to get accurate reflections every reflective object would need its own reflections!
This can very quickly get out of control, and you'll usually find that games that do have reflections take some liberties or shortcuts. Very rarely will a game have a reflection in a complicated environment - it might be limited to a mirror in a bathroom where the budget isn't strained and they can afford to pay the extra cost. Perhaps they will have true reflections only on a rippling or wavey water surface, so that a very rough and un-detailed scene is enough to provide convincing reflections.
Typically it's rare to completely avoid reflective surfaces - so games will use a pre-rendered image of a nearby environment that's "close enough". This doesn't hold up to scrutiny and if you look straight at the reflections you will see they are nonsense. There are some modern techniques to help provide some reflections in certain circumstances, I may talk about them later, but these pre-rendered images are still needed.
This pre-rendered image called a 'cube map' isn't accurate to where Aiden is actually standing, but it's close enough to get by.
Watch Dogs does render reflections in real-time. I haven't investigated thoroughly but I believe they are always rendered as long as you are outside, and they are primarily used to get accurate reflections on whichever car you are in, to help ground the image and give a subtle sense of reality. Since you will always be focussing on your own car and the nearby surroundings, the fact that reflections are incorrect on other vehicles further away is barely noticeable.
There are a number of approximations used to help render these reflections quickly. For one, there are far fewer objects rendered than would be in a normal scene - only about 350 - and of them many are very simplified instead of the full-complexity versions. I suspect complex objects like people are entirely skipped no matter how close they are, but I haven't verified that. There are no shadows at all on these objects, and the lighting is very basic and simple - only coming from the sun and sky. The reflections are rendered from the ground up as a fisheye - meaning that reflections of the ground are impossible, and anything not immediately above will be very low in detail.
Even so, with all of those approximations, it works for what it's intended for. If you drive under an L-train track you get the right reflection from above your car, something that would not be practical otherwise.
This decision will have been deliberate and not taken lightly. The budget is a fixed quantitiy, so making room for these reflections means that something else somewhere was sacrificed.
This is a very fish-eyed view of the scene around Aiden from the ground up, used for reflections. You can orient yourself with the two lamp-posts and the L-train track.
There is another batch of work that I want to touch on here - shadows. I plan to talk about how shadows actually work later in this series, since it's an interesting topic, but for now the important thing to remember is that shadows are similar to reflections - each light that casts a shadow needs to render an image of the scene from its point of view. This time there are not nearly so many possible fudges - to correctly calculate shadows, each light needs that image.
I'm only talking about in-game calculated lights here. Historically some lights were calculated by pre-rendering all the information needed before the game runs, and so the shadows and lighting would be "baked" into the level - fixed and unable to change or react dynamically. This technique isn't so common in modern games and is obviously impossible with moving light sources, or lights that can be shot out.
The most obvious and certainly the most significant shadowing light is the sun (or the moon when at night). Since the sun is so large, it will typically have 3-5 images rendered for it, instead of just the 1 you might need for a headlight or a torch.
Unfortunately this is one case where Watch Dogs does not serve as a great illustration since their shadowing is fairly complex and I believe it's optimised specifically for the case of shadowing within the city - instead I'm going to briefly switch over to Far Cry 4 and look at the shadowing in a captured frame there.
Here's the Far Cry 4 scene that I'm hopping over to, just for reference
Here are the shadowing information images for that scene - each of these requires a whole new render of the scene.
The result of this is that if you want to add shadow casting to a light you are going to have to render the scene once again. Some of the approximations we saw in the reflections can still apply, but far fewer. You can skip small or distant objects, but bear in mind that it means those objects will appear not to have a shadow. You can render the image quite small, but then the shadows will be blobby and low-detailed. You usually cannot use a very simplified version of the object since then the object will appear to cast shadows on itself, or there will be gaps of light in between the object and its shadow.
Another implication that is easy to overlook is the necessity for a shadowing image per-light. In many cases you can get away with simplifying the lights by joining them together - in Watch Dogs this happens for the headlights on cars.
When both headlights are on, only one light is drawn but with a special shape to make it look like two beams. If the headlights have shadows then this is no longer so easily possible, and it will be more obvious if you walk in front of a car that the light is coming from somewhere in between the headlights. Perhaps you have to split the headlights up - now not only do you have the extra cost of the shadows, but on top you have extra lights to draw.
The main thing that I want to emphasise is that in all that I've talked about there are trade-offs. It's perfectly possible to eliminate many of these approximations, but you have to be willing to spend the budget on it - and that means sacrificing elsewhere. Each game developer has to decide what are the important things to concentrate on for their game, and what they think will be most impressive or least annoying for the player.