1

One of the key moments of WWDC 2014 was the announcement of Metal, a graphical application programming interface. Not so long ago, Unity announced its support and told why it is needed. We suggest reading the translation of this story, kindly provided by App2Top.ru Unity’s Russian office. 

Wonderful times are coming for graphics on iOS 8!

At the recent WWDC, Apple introduced a new graphics API called Metal, a distinctive feature of which was high efficiency, low costs and optimization for the A7 chip. This gives developers the opportunity to take advantage of all the hardware advantages of iOS devices and achieve a much higher level of realism, detail and interactivity in games than ever before. At the moment, the process of implementing Metal support has already been launched, but in the meantime we would like to tell you something about this technology and explain why it is so cool.

Glossy Metal

Metal has several key points that allow you to reduce CPU usage, calculate its performance under certain conditions and optimize all processes:

  • Creating and checking preliminary rendering components. Shaders can be compiled and partially optimized offline. Everything related to the rendering pipeline (shaders, vertex layouts, blending modes, render target formats, etc.) can be created and confirmed even before the rendering itself actually begins. In other words, it is not necessary to check every component of the rendering and CPU power is released for other purposes.
  • Providing an order of magnitude more flexible multithreading. Resources can be created from any thread. Now it is possible to prepare an application for rendering from several parallel threads.
  • All iOS devices have a single memory for CPU and GPU. There is no need to imagine that the data from the CPU should be ‘copied’ to some kind of video memory. When you create a clipboard, you just need to specify that there is data in it, and this is exactly the same memory that the GPU will use.
  • The engine itself performs synchronization, decides what to use at a given time – CPU or GPU. 
  • All GPUs in iOS devices use a tile-based deferred rendering architecture. This is used in the Metal API, especially when it comes to rendering purposes. The API now does not try to predict anything – all actions with the frame buffer, such as loading and saving tiles, and the implementation of smoothing, occur explicitly.
  • All the points listed above are necessary in order to significantly reduce the load on the processor and much more clearly predict the degree of its performance and load.
  • The new C/C++11-based language is designed for both graphics and computational shaders. And this means that iOS will be able to work with computational shaders, atomicity, arbitrary writing to the buffer and other chips with cool names that are now available on the GPU.
  • No bad heredity, the new API is very simple and perfectly optimized. Oh, and it also has a super-useful optional ‘debugging layer’ that additionally checks everything and notifies you of any error or misconception that you have made. And now let’s delve into the details even deeper!

If you are creating games, especially for mobile devices, you probably know that a large number of requests for rendering heavily loads the processor. Any rendering object requires a certain amount of processor power for itself, and in reality on mobile devices you will not be able to simultaneously implement the rendering of more than several hundred visible objects at a time. 

In reality, you probably really want to use CPU resources for other needs: gameplay logic, physics, AI, character animation and everything else. 

2

Ori and the Blind Forest

Unity has some estimated parameters to minimize the number of requests made for rendering – static and dynamic batching (or a funny Russian-language term – ‘dosing’ – approx. transl.), occlusion culling (another funny term ‘occlusive circumcision’ – approx. transl.), LOD (levels of detail) and distance-based layer culling (removing layers depending on their distance); in addition, you can combine close objects that stand next to each other and put textures in atlases to reduce the amount of materials. 

A good question is, why should a CPU resource be used to render something? 

After all, this is what the GPU really should be doing. Some costs occur on the side of the ‘engine’ – the processor needs to manage visible objects, find out which shader needs to be rendered now, which of the objects needs to be interacted with by which light source, which material parameter needs to be applied now, and all that. Some of this is cached, some is executed in multiple threads; and in general, it is platform-independent code. 

In every Unity release, we try to optimize this part, and Metal, in general, does not affect it in any way. However, it is the ‘graphics API and driver’ that can be caught in the rest of the processor costs. Depending on the game, this part can be very important. Metal is an attempt to solve the issue with this part, being much more suitable for modern hardware, at a slightly lower level, and performing a monstrously fewer guesses than OpenGL ES usually did.

Proactive rendering, its creation and validation; explicit loading and saving of render targets; the absence of tasks with tambourines for synchronization on the API side – all these things contribute to reducing the load on the processor. As far as we have already tested, the new API+driver loads the CPU by only a few percent. This is a significant decrease, especially compared to the fact that earlier this indicator was at the level of 15-40% of the full CPU load! This means that the rest is hidden somewhere in our code. And it seems to me that we need to continue optimizing it (smile). Well, I can’t wait to get acquainted with Metal’s capabilities for combining rendering from multiple streams; this also opens up very interesting optimization opportunities for us.

3

CounterSpy

Computing capabilities

Thanks to Metal, it will be possible to use the GPU for calculations beyond typical scenarios: not only for vertex + fragment shaders, but also for shaders known as “computational”. In principle, this makes it possible to run any type of ‘parallel computing’ on a variety of small processors inside the GPU. 

Computational shaders also use the concept of ‘local storage’ – a very fast dedicated part of memory-on-GPU that can be used to exchange data between parallel work items. Such a piece of memory allows you to use the GPU for things that would be very difficult to implement with the help of good old vertex and fragment shaders. There are a lot of interesting areas for which computational shaders can be used — optimization of post-processing effects, particle systems, work with sampling and cutting off light and shadow, and all in this spirit. At the same time, computational shaders are not yet used in Unity, we are all looking forward to using them for many, many cool pieces. Exciting times are coming!

FAQ

When can I use it? 

We can’t wait to start shipping, but we can’t name specific dates yet. We have already done a lot, but we still need to work and work before everything is ready. Our current plan is to integrate all Metal parts, which will provide a powerful increase in processor performance. We hope that this will happen in Unity 5.0. And a little later we want to add support for computational shaders (this point is a little more complicated and requires more attention on our part). 

What will be the system requirements?

Metal will work on iOS 8 and a device with a processor no weaker than A7 (iPhone 5S, iPad Air, iPad Mini Retina).

4

Shattered Planet

What will I need to do to get the benefits of optimizing CPU usage with Metal?

In general, nothing. As soon as we add Metal support to Unity, everything will work by itself. All your existing projects, all your shaders and graphic effects will just work. Just enjoy the low CPU usage! 

But what about shaders, because Metal uses a different language for them? 

We’ll take care of that, too. Right now you are most likely writing shaders in Cg/HLSL, and we are converting this to GLSL for OpenGL ES behind the scenes. For Metal, we will convert all this in about the same way. 

Once again, what can I do due to the fact that the CPU load will be optimized, and I will have free resources? 

Improve physics, AI or make logic and gameplay even more complex and complex. Place and draw more objects on the screen. Or just enjoy saving the device’s battery. It all depends on you!

Tags: