When I developed the Unity Assets, Fast Shadow Receiver, I needed to implement Thread Pool by myself.
Basically, Unity runs on a single thread. On multi-core processor, skinning and rendering might be processed in multiple threads, but programs written in scripts are supposed to be handled in a single thread.
However, it is possible to run a script in another thread, as far as the script doesn’t touch any Unity Objects. You cannot even check if a Unity Object is null or not in multi-threads. Object class has overridden version of comparison operators which access Unity Engine. So, if you want to do multithreading in Unity, you need to collect data from Unity Objects into a struct or a class which is not derived from Object in the main thread. Then, you can distribute your tasks into multiple threads.
If you have a lot of tasks to be processed in multi-threads, System.Threading.ThreadPool might be useful, I thought first… But it was not a good idea. When I tested it on iPhone 4S, the ThreadPool had created 20 threads. I assumed that ThreadPool would create the same number of threads as hardware threads (SystemInfo.processorCount) to reduce context switches and unnecessary synchronization cost. I don’t know whether this is the specification of .NET Framework or due to Mono implementation.
Anyway, I checked the performance with Instruments by using Fast Shadow Receiver Demo. The images below are the result (Click to show a large image). The left image shows that 36.7% of CPU cycles was taken by the worker threads, and 22.6% of CPU cycles was used for synchronization objects. It took only 12.0% to process the tasks! Also, in the main thread, it took 8.8% of CPU time to push tasks into a queue (The right side image).
So, I decided to implement a thread pool by myself. The images below show how the performance was improved. CPU cycles used for synchronization was reduced to 3.4%, which was 22.6% before (The left image). This is the case where the thread pool created 2 threads. In case of one thread in the thread pool, the synchronization cost was only 1.2%. Also, it took only 2.0% of CPU cycles to push tasks into a queue in the main thread (The right image).
However, my thread pool has a restriction, that is, the the thread pool is not thread-safe. Only a single thread can push tasks into the thread pool. The source code of this thread pool is included in Fast Shadow Receiver. I would also attach the source code below. If you would like to use this thread pool, please test your program a lot by yourself. It is difficult to perfectly test multithreaded programs. I cannot have liability for any damages given by this thread pool. If you find any bugs, please let me know via E-mail or Comments.
Usage of my thread pool is very simple. It has only 2 public functions, InitInstance()
and QueueUserWorkItem()
. The difference from System.Threading.ThreadPool is that you need to initialize a singleton instance by calling InitInstance()
. Please be noted that you cannot wait for synchronization objects in multi-threaded tasks. It might cause a deadlock. Of course, it is ok to wait for multi-threaded tasks in the main thread.
If you already have Fast Shadow Receiver, please have a look at MeshShadowReceiver.cs for reference.
Cool implementation *thumbsup*. I’d like to dwelve deeper into multi-threading myself, and I’ll probably look to your code for reference
By the way, regarding the issue where System.Threading.ThreadPool created 20 or so worker threads, I believe we can set the max number of concurrent threads now (as well as max number of concurrent file I/O operations). This is done by
ThreadPool.SetMaxThreads(maxThreads, maxFileWriteOperations);
. If we use this, could the performance issues you initially described go away?Thank you for the comment! I didn’t know that ThreadPool had SetMaxThreads method. Yes, it could actually reduce the synchronization cost a lot. However, my thread pool is still faster than System.Threading.ThreadPool.
Currently I don’t have iOS devices that I used for the test before. So, I couldn’t do the same test. I just check FPS on my MacBook Air. The result was
System ThreadPool: 100FPS
System ThreadPool + SetMaxThreads: 150FPS
My ThreadPool: 160FPS
The reason why my thread pool is still faster is, I think, the thread pool itself is not implemented as thread safe so that it can minimize the synchronization cost.