This a collection of my experiences I've gained by using the profiler to optimize unity games (mainly Operation Starfall). This project wasn't really optimized when I started there so I started profiling and found a lot of things that could be better.
Software - Unity (Profiler), C#
Team size - 1 person
Duration - Ongoing
The profiler was showing a lot of recurring spikes which where labelled WaitForTargetFPS. It took up 49.2% of the CPU performance every spike because the CPU was going to fast for the framerate the game was set too. So after turning Vsync off the game could reach more frames and it didn't have to wait for frames. So as a result we get a smooth framerate instead of a spiky one.
While profiling the GPU I found a thing that was constantly using 37.3% of the GPUs performance called DXGI.WaitOnSwapChain. It basically means that the CPU is waiting for the GPU to finish rendering. So I started looking at what was the most performance demanding asset currently loaded.
After testing most of the assets it noticed a giant gain of frames once I turned of the trees. So I started looking at the individuals parts of the tree and found out the leaves are really heavy on the GPU.
I set up a test scene to test the leaves of the trees because I was wondering what was causing the performance issues. First I thought that it came because of the shader because the leaves are always moving. So I had a group of trees with a normal lit shader and one with the shader to move the leaves. The profiler showed that that trees with with normal lit shader was heavier than the one with the leave shader. Inconclusion it wasn't the shader but its all the alphas of the trees clipping through each other.
In the same test scene as above I put 1000 grass prefabs in the scene, 250 of each kind. When turned on there is a big change in performance used. So I tested each prefab individually and found that the 4th prefab is a combination of the other 3 prefabs which makes it very heavy.
Unity manages its own heap with the garbage collector which makes it so you don't need to do it yourself. But you can make it better or worse depending on your code. Things like raycasts, changing strings, making new lists or arrays makes it so that there is a lot of garbage.
The ForceSystem is a system which manages all the forebodies by looping trough them every update. There was 2.0KB being allocated every frame which was caused by a few lists that where being created every time it looped. A simple change from using new list to list.Clear I got it down to 144B per update.
Before
After
The Interactor is a feature which constantly checks for things to interact with. But this takes up a lot of memory by using a overlapBox which was using 7.0KB of memory for every update. but by simply adding a certain layer for the box to scan I changed it down to 32B per update.