I have been working on a feature that requires reading the depth buffer and storing the data in an image. While this was quickly done in the OpenGL-based renderer, I struggled getting it to work with the Metal-based renderer. Things got weirder and weirder … until the source of all problems presented itself as a single missing line of code!
Private GPU Memory
I had code that grabbed the color buffer working for months now so I assumed getting the depth buffer should be just as easy. The first problem was that starting with iOS 9, one cannot simply do getBytes on depthAttacment.texture because
“Textures with a depth, stencil, or depth/stencil pixel format can only be allocated with the private storage mode.” – Metal Programming Guide, What’s New in OS X and iOS.
So a blit has to be done to copy over the depth data to a separate buffer that is located in CPU-accessible memory. This can be done by using MTLBlitCommandEncoder’s copyFromTexture method as stated in the document. So after ending the render encoding of the frame but before commiting the command buffer, create a new blit command encoder, do the blit and then commit the buffer.
Strange Blit Problems
And this is where weird problems started to occur. When I used a large render target size (e.g. 2048×2048 or more), everything worked fine. But when I dropped the resolution to for example 512×512, the depth buffer was empty or looked like the depth buffer was blitted before all of the drawables were rendered. Some parts were in there, some were not. In case of a large mesh, it looked like it did the blit in the middle or processing the triangles.
Although the algorithm normally uses the render-to-texture function only once during scene loading, I added it to the beginning of the frame and this way was able to see it in the XCode GPU frame capture. And lo and behold, there was a warning at the blit command encoder creation:
“The application created a command encoder but did not encode any work with it.”
This was particularly strange because two lines below in the same frame capture, sure enough, there was the blit operation.
I then replaced the blit encoder by a compute shader and tried to did a manual copying of the depth values to an RGBA texture that lies in CPU-memory. This made things even weirder! When stepping through the frame capture and looking at the bound texture, it seemed to have correctly blitted everything to the target buffer but in the application it was still incorrect.
StoreAction to the Rescue
So how could that be? Was the blit encoder buggy? Did I read the bytes incorrectly? Well, the answer was hidden in a completely different place: The store action of the depth attachment! By default, the depth attachment texture is created with MTLStoreActionDontCare which allows Metal to recycle it whenever it wants to. By changing it to MTLStoreActionStore, everything suddenly worked! Apparently, for smaller sizes the texture was recycled sooner and for larger ones it survived long enough for the blit to happen correctly.
What made this tricky to find is that for the color attachment reading the bytes worked in the first place and the store action did not have to be set explicitly. In addition, XCode’s GPU frame capture seems to cause the textures to not be recycled (because it has to get the snapshots for the visualization) which forced the correct behavior when looking at the capture.
The following example is an abbreviated version of rendering to texture (non-multisampled):
MTLRenderPassDescriptor * renderPass = [MTLRenderPassDescriptor renderPassDescriptor]; // Create the color buffer MTLTextureDescriptor * colorBufferDescriptor = [MTLTextureDescriptor texture2DDescriptorWithPixelFormat:MTLPixelFormatBGRA8Unorm width:imageSize.getWidth() height:imageSize.getHeight() mipmapped:NO]; colorBufferDescriptor.usage = MTLTextureUsageRenderTarget; renderPass.colorAttachments.texture = [self.mtlDevice newTextureWithDescriptor:colorBufferDescriptor]; renderPass.colorAttachments.clearColor = MTLClearColorMake(0.0, 0.0, 0.0, 0.0); renderPass.colorAttachments.loadAction = MTLLoadActionClear; // Create the depth buffer MTLTextureDescriptor * depthBufferDescriptor = [MTLTextureDescriptor texture2DDescriptorWithPixelFormat:MTLPixelFormatDepth32Float_Stencil8 width:imageSize.getWidth() height:imageSize.getHeight() mipmapped:NO]; depthBufferDescriptor.usage = MTLTextureUsageRenderTarget | MTLTextureUsageShaderRead; renderPass.depthAttachment.texture = [self.mtlDevice newTextureWithDescriptor:depthBufferDescriptor]; renderPass.depthAttachment.loadAction = MTLLoadActionClear; renderPass.depthAttachment.storeAction = MTLStoreActionStore; renderPass.stencilAttachment.texture = renderPass.depthAttachment.texture; // We create a new command buffer for this render-to-texture frame. id<MTLCommandBuffer> commandBuffer = [self.mtlCommandQueue commandBuffer]; id<MTLRenderCommandEncoder> renderEncoder = [commandBuffer renderCommandEncoderWithDescriptor:renderPass]; // [...] the actual rendering is done here [renderEncoder endEncoding]; // Now add a blit to the CPU-accessible buffer id<MTLBuffer> depthImageBuffer = [_self.mtlDevice newBufferWithLength:(4 * pixelCount) options:MTLResourceOptionCPUCacheModeDefault]; id<MTLBlitCommandEncoder> blitCommandEncoder = commandBuffer.blitCommandEncoder; [blitCommandEncoder copyFromTexture:renderPass.depthAttachment.texture sourceSlice:0 sourceLevel:0 sourceOrigin:MTLOriginMake(0, 0, 0) sourceSize:MTLSizeMake(imageSize.getWidth(), imageSize.getHeight(), 1) toBuffer:depthImageBuffer destinationOffset:0 destinationBytesPerRow:(4 * imageSize.getWidth()) destinationBytesPerImage:(4 * pixelCount) options:MTLBlitOptionDepthFromDepthStencil]; [blitCommandEncoder endEncoding]; // Commit and wait for completion of rendering [commandBuffer commit]; [commandBuffer waitUntilCompleted]; // Now the depth values can be accessed in the buffer. float * depthValues = (float*)[depthImageBuffer contents];