in development

a mild correction

The discussion of overdraw in the last post really needs a qualifier. I wrote that “Modern GPUs are fantastic at eliminating hidden surfaces within a single drawcall, and hardware depth-testing avoids [overdraw] across multiple drawcalls“. This last bit is, in a sense, correct—but misses the point. The depth test prevents polygons from being filled in if they’re behind those you’ve already drawn. However, if you draw multiple surfaces across multiple drawcalls, and each one is closer to the camera than the next, the GPU will happily overdraw them.

I ran into this problem in my current project, which has a skybox and a play area. In the rendering code, I would draw the skybox, then draw the play area. It’s something I’ve done before in desktop projects without an issue. On my Android tablet, I was running into framerate drops while panning the camera across the scene.

The problem, of course, was overdraw. If my camera was pointed just above the horizon, I was filling the screen with skybox pixels, then filling it a second time with play area pixels.

The solution? Draw the play area first, then the skybox. The GPU uses the depth information from the play area polygons to reject the bits of the skybox that aren’t visible, and only fills in the areas that are. No more framerate drops!

  1. Also, and this may sound counter-intuitive, but make sure you always clear the frame buffer using a Clear() call, even if you’re going to overwrite every pixel. Some GPUs implement hierarchical z-buffers, and the glClear() call is how the hierarchy is re-established. That is, they use a set of z-buffers, where each location in the z-buffer covers an entire rectangle of pixels in the frame buffer. As you move down the list of z-buffers, the rectangles get smaller, until finally you’re down to single pixels (or fragments, if you’re doing FSAA). It’s kind of like mipmaps, but for depth. This allows for cheaper z-testing and culling. Some GPUs implement just two levels of depth, often called a tiled depth buffer. It’s not necessary to clear the color buffer, but it is necessary (or rather, highly desirable) to clear the depth buffer.

    If you just do over-draw on every frame, even if you’re drawing without blending or z-testing so that you assign all depth values, then you never get the benefit of the the hierarchical z-buffers.

    Of course, you should always test — the perf trade-offs will always depend on the particular machine you’re working with.

    Also, never write to the depth value in your pixel / fragment shader. This prevents early z-testing; it requires that the fragment shader actually be run in order to get the final depth value, rather than using the depth value produced by the vertex stage.

  2. Good points. I tend to instinctively call glClear() before doing anything, but was a little hazy on the mechanics of why it’s a good idea.

    I did consider writing depth values from a fragment shader at one point to round the corners of some billboards. I tried alpha blending and discard, but experienced performance problems with both. One article I read talked about modulating the depth value to drop fragments. I never tried it–glad I didn’t bother now. In the end, the best advice I found was simply to spend the extra vertexes and make the corners round.

  3. Have you seen the Loop-Blinn paper on drawing filled Bezier curves using pixel shaders? It’s pretty spiffy. http://research.microsoft.com/en-us/um/people/cloop/LoopBlinn05.pdf Basically, with some rather modest math, you can determine whether a given pixel is inside or outside a curve, or if it’s very close to the curve you can choose an appropriate alpha value for it. If you want to do something like rounded corners, and you want to have them be smooth at any resolution, you could do something like this. For simpler curves like ellipse sections, you probably don’t even need the full Loop-Blinn technique. For a circular corner, a simple multiply (squaring a value) and a comparison (is radius^2 in or out of curve?) would do.

    Also I should probably not have said “never” to writing depth values — just keep in mind that this capability has a significant cost. It gets even worse when you’re writing into a multi-sampled target, since you’ll run N copies of the fragment shader, even if they all end up with the same (or roughly the same) depth value.

    Another good thing about calling glClear() or (ClearRenderTargetView() in DX), is that it gives the device driver a clear indication that there are no data-flow dependencies between previous commands and the commands that follow the Clear(). On resource-rich machines, many drivers will allocate more than one physical buffer for a render target. Let’s say you’re currently rendering into buffer #0, and you call Clear() and then issue some rendering commands. The driver can immediately begin clearing buffer #1 and then execute rendering commands against it. This is most commonly done around presentation / flipping, but I know of at least one driver that will do this on any render target around Clear(), regardless of whether a Present() is in the pipeline or not.

  4. I hadn’t seen that paper–very interesting! I’ve played with selective alpha-blending (and discards) to create resolution-independent discs and squiggles across a single polygon, but doing it across multiple polygons is new to me. I’ll have to review that one in full.

    Never say never: agreed. When I get back to doing WebGL and desktop stuff again I might look at that technique. On mobile I have to do everything I can to stay out of the fragment shader.

Comments are closed.