Moving from OpenGLES to Metal

I used the purchase of my iPad Pro as a welcome reason to finally develop a Metal API backend for my modeler code. I already had OpenGL 3.3., OpenGLES 2.0 and OpenGLES 3.0 renderers so adding a new one wasn’t much of a problem. It’s also a good test for the architecture of the scene graph as Metal is one of the next-gen APIs in line with Vulcan and DirectX12.

The Evolution of Rendering APIs

Roughly speaking, there are five types/eras of 3D APIs:

Immediate rendering (OpenGL 1.x/2.x era): The rendering pipeline is fixed (no shaders) and instead one uses the state engine to activate/deactivate features (e.g. is the first texture unit on, how does it blend, etc). Each primitive is manually send to the API (e.g. glBegin(GL_Triangles) followed by a number of vertex positions, colors, texture coordinates).
Vertex Array Objects (OpenGL 2.x/3.x era): For the most part, the fixed function pipeline is used as before but one uses VAOs (or display lists, VBOs, etc) to combine multiple state changes into one object and renders a list of primitives in one go instead of issuing each primitive individually.
Shader Usage (OpenGL 3.x/DirectX 9 era): The fixed function pipeline is augmented by shader code. People still use the fixed function pipeline in the majority of cases and use shaders only in a limited fashion (e.g. the one object that uses a water shader in a scene).
Shader-only (DirectX 10, OpenGL 4.x era): The fixed function pipeline has completely been removed but providing the input to a shader (e.g. material parameters, vertex attributes) is still done using the traditional methods
Low-Overhead APIs (Metal, Mental, DirectX 12 era): This is where we currently are. Everything is shader only but the data has also been abstracted. The user creates abstract buffers of data but how they are used is dependent on the shaders. Also the complete pipeline state (with a few exceptions) is stored in a single object and the user switches it according the to object that is rendered.

So there have been a lot of changes over the years and the way we write renderers has completely changed. Where in the beginning, the mesh was the primary object and the state engine was toggled to produce the desired rendering output, it now is the other way round. The Render Pipeline State sets up how the GPU works and then we have to provide the data for it accordingly. Basically everything feels turned upside down for someone who is used to the good old OpenGL 1.x/2.x days.

Using Metal

Apple has been smart in that Metal in many places feels like OpenGLES. For example, cube maps have the same weird face definition as OpenGLES (see older post in this blog) and matrices are still in column-first format. The shading language is not too different from other languages and depth testing, viewports and so on are pretty much the same as before.

The biggest adaption hurdle therefore lies in the way one thinks about rendering. Instead of toggling textures, one has to create render pipeline states (and therefore shaders) for each combination of rendering aspects that the application needs. Those pipeline state objects are expensive to create so this should not happen every frame. And for performance reasons, similar materials should use the same render pipeline state with just different parameters (e.g. two Phong materials with exactly one diffuse texture would just have to provide a different texture instead of redefining the complete pipeline).

In general, one is kindly (but relentlessly) forced into a direction of “preparing things in advance” instead of just traversing the scene graph and toggling states on the fly… which unfortunately pretty much every old-style renderer was based on. But that is an interesting point: Even before the latest generation of APIs, it did make sense to render sorted by pipeline state, it did make sense to not re-traverse the scene graph when nothing has changed but instead keep a flat list of “things to be rendered”, it did make sense to store accumulated transformation matrices instead of doing the same multiplication each frame… but most people didn’t.

In the end, I feel like we’re forced to clean up the mess we always should have dealt with but never did, more so then being forced to completely relearn how 3D rendering is done. It has gotten a bit more difficult to get something on screen compares to the good old immediate mode days, but maybe that’s a good thing if it pushes us into the right direction.

The Metal Toolchain and Documentation

I have to say, the tools Apple gives us are very impressive. I particularly liked the low-level profiling (shown in the screenshot at the top of this post). However, make sure to watch the WWDC 2014/2015 videos on metal! I felt that the Metal documentation is still lacking, and there were a couple of questions I found no answer for (e.g. how exactly the sides of a cube map are defined regarding their orientation). Other areas a are bit vague with regard of which of two ways should be used to do something. This shouldn’t be much of a problem for new developers, but as someone coming from another API you’ll be constantly looking for details on how to adapt your existing renderer’s concepts and you won’t find them.

Performance

At the beginning, I was a bit shocked: My initial attempt of porting my OpenGLES renderer to Metal was 20% slower. But after some profiling and watching the WWDC videos (especially the one on using multiple uniform buffers per object and cycling through them to prevent stalling) the performance is up by 30% compared to the OpenGLES renderer. Right now, I seem to be limited primarily by the number of draw calls as the car model shown in the image above consists of a ton of small meshes.

I’m actually not sure how much of the performance increase can actually be attributed to Metal itself or the fact that it – by its API design – forced me to completely rethink the way my renderer code works. But the whole notion of switching the complete rendering pipeline state is quite nice once you get used to it.

Conclusions

In general, I like Metal (or rather using any of the next-gen APIs) a lot. It is a bit of a pain at first, but it forces one to write a better renderer. I haven’t reached the full feature set of my OpenGLES renderer yet but that has mostly to do with the fact that as a modeler, things in my scene graph can change a lot. I would assume for games or other applications where the list of objects and there renderer pipepline state is static, writing a Metal renderer would be even easier.