When I developed the Unity Assets, Fast Shadow Receiver, I needed to implement Thread Pool by myself.
Basically, Unity runs on a single thread. On multi-core processor, skinning and rendering might be processed in multiple threads, but programs written in scripts are supposed to be handled in a single thread.
However, it is possible to run a script in another thread, as far as the script doesn’t touch any Unity Objects. You cannot even check if a Unity Object is null or not in multi-threads. Object class has overridden version of comparison operators which access Unity Engine. So, if you want to do multithreading in Unity, you need to collect data from Unity Objects into a struct or a class which is not derived from Object in the main thread. Then, you can distribute your tasks into multiple threads.
If you have a lot of tasks to be processed in multi-threads, System.Threading.ThreadPool might be useful, I thought first… But it was not a good idea. When I tested it on iPhone 4S, the ThreadPool had created 20 threads. I assumed that ThreadPool would create the same number of threads as hardware threads (SystemInfo.processorCount) to reduce context switches and unnecessary synchronization cost. I don’t know whether this is the specification of .NET Framework or due to Mono implementation.
Anyway, I checked the performance with Instruments by using Fast Shadow Receiver Demo. The images below are the result (Click to show a large image). The left image shows that 36.7% of CPU cycles was taken by the worker threads, and 22.6% of CPU cycles was used for synchronization objects. It took only 12.0% to process the tasks! Also, in the main thread, it took 8.8% of CPU time to push tasks into a queue (The right side image).
So, I decided to implement a thread pool by myself. The images below show how the performance was improved. CPU cycles used for synchronization was reduced to 3.4%, which was 22.6% before (The left image). This is the case where the thread pool created 2 threads. In case of one thread in the thread pool, the synchronization cost was only 1.2%. Also, it took only 2.0% of CPU cycles to push tasks into a queue in the main thread (The right image).
However, my thread pool has a restriction, that is, the the thread pool is not thread-safe. Only a single thread can push tasks into the thread pool. The source code of this thread pool is included in Fast Shadow Receiver. I would also attach the source code below. If you would like to use this thread pool, please test your program a lot by yourself. It is difficult to perfectly test multithreaded programs. I cannot have liability for any damages given by this thread pool. If you find any bugs, please let me know via E-mail or Comments.
Usage of my thread pool is very simple. It has only 2 public functions, InitInstance()
and QueueUserWorkItem()
. The difference from System.Threading.ThreadPool is that you need to initialize a singleton instance by calling InitInstance()
. Please be noted that you cannot wait for synchronization objects in multi-threaded tasks. It might cause a deadlock. Of course, it is ok to wait for multi-threaded tasks in the main thread.
If you already have Fast Shadow Receiver, please have a look at MeshShadowReceiver.cs for reference.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 |
// // ThreadPool.cs // // Copyright 2014 NYAHOON GAMES PTE. LTD. All Rights Reserved. // using UnityEngine; using System.Threading; namespace Nyahoon { /// <summary> /// Thread pool. /// This class itself is not thread safe. Only a single thread can call QueueUserWorkItem safely. /// </summary> public class ThreadPool { private static ThreadPool s_instance = null; public static void InitInstance() { if (s_instance == null) { InitInstance(128, 0); } } public static bool InitInstance(int queueSize, int threadNum) { if (s_instance != null) { Debug.LogWarning("TreadPool instance is already created."); return false; } s_instance = new ThreadPool(queueSize, threadNum); return true; } public static ThreadPool Instance { get { return s_instance; } } public static void QueueUserWorkItem(WaitCallback callback, object state) { s_instance.EnqueueTask(callback, state); } private Thread[] m_threadPool; struct TaskInfo { public WaitCallback callback; public object args; } private TaskInfo[] m_taskQueue; private int m_nPutPointer; private int m_nGetPointer; private int m_numTasks; private AutoResetEvent m_putNotification; private AutoResetEvent m_getNotification; #if !UNITY_WEBPLAYER // according to this page (https://docs.unity3d.com/401/Documentation/ScriptReference/MonoCompatibility.html), // Semaphore is not available on web player. private Semaphore m_semaphore; #endif private ThreadPool(int queueSize, int threadNum) { #if UNITY_WEBPLAYER threadNum = 1; #else if (threadNum == 0) { threadNum = SystemInfo.processorCount; } #endif m_threadPool = new Thread[threadNum]; m_taskQueue = new TaskInfo[queueSize]; m_nPutPointer = 0; m_nGetPointer = 0; m_numTasks = 0; m_putNotification = new AutoResetEvent(false); m_getNotification = new AutoResetEvent(false); #if !UNITY_WEBPLAYER if (1 < threadNum) { m_semaphore = new Semaphore(0, queueSize); for (int i = 0; i < threadNum; ++i) { m_threadPool[i] = new Thread(ThreadFunc); m_threadPool[i].Start(); } } else #endif { m_threadPool[0] = new Thread(SingleThreadFunc); m_threadPool[0].Start(); } } private void EnqueueTask(WaitCallback callback, object state) { while (m_numTasks == m_taskQueue.Length) { m_getNotification.WaitOne(); } m_taskQueue[m_nPutPointer].callback = callback; m_taskQueue[m_nPutPointer].args = state; ++m_nPutPointer; if (m_nPutPointer == m_taskQueue.Length) { m_nPutPointer = 0; } #if !UNITY_WEBPLAYER if (m_threadPool.Length == 1) { #endif if (Interlocked.Increment(ref m_numTasks) == 1) { m_putNotification.Set(); } #if !UNITY_WEBPLAYER } else { Interlocked.Increment(ref m_numTasks); m_semaphore.Release(); } #endif } #if !UNITY_WEBPLAYER private void ThreadFunc() { for (;;) { m_semaphore.WaitOne(); int nCurrentPointer, nNextPointer; do { nCurrentPointer = m_nGetPointer; nNextPointer = nCurrentPointer + 1; if (nNextPointer == m_taskQueue.Length) { nNextPointer = 0; } } while (Interlocked.CompareExchange(ref m_nGetPointer, nNextPointer, nCurrentPointer) != nCurrentPointer); TaskInfo task = m_taskQueue[nCurrentPointer]; if (Interlocked.Decrement(ref m_numTasks) == m_taskQueue.Length - 1) { m_getNotification.Set(); } task.callback(task.args); } } #endif private void SingleThreadFunc() { for (;;) { while (m_numTasks == 0) { m_putNotification.WaitOne(); } TaskInfo task = m_taskQueue[m_nGetPointer++]; if (m_nGetPointer == m_taskQueue.Length) { m_nGetPointer = 0; } if (Interlocked.Decrement(ref m_numTasks) == m_taskQueue.Length - 1) { m_getNotification.Set(); } task.callback(task.args); } } } } |
Cool implementation *thumbsup*. I’d like to dwelve deeper into multi-threading myself, and I’ll probably look to your code for reference 😀
By the way, regarding the issue where System.Threading.ThreadPool created 20 or so worker threads, I believe we can set the max number of concurrent threads now (as well as max number of concurrent file I/O operations). This is done by
ThreadPool.SetMaxThreads(maxThreads, maxFileWriteOperations);
. If we use this, could the performance issues you initially described go away?Thank you for the comment! I didn’t know that ThreadPool had SetMaxThreads method. Yes, it could actually reduce the synchronization cost a lot. However, my thread pool is still faster than System.Threading.ThreadPool.
Currently I don’t have iOS devices that I used for the test before. So, I couldn’t do the same test. I just check FPS on my MacBook Air. The result was
System ThreadPool: 100FPS
System ThreadPool + SetMaxThreads: 150FPS
My ThreadPool: 160FPS
The reason why my thread pool is still faster is, I think, the thread pool itself is not implemented as thread safe so that it can minimize the synchronization cost.