Wednesday, June 8, 2011

[pymex] Threading

Threading didn't really work in the first version of pymex. You could create a Python thread, but it wouldn't actually get any execution time. I recently found out why, and fixed it!

Python has limited support for multi-threading. The Python interpreter is not fully thread-safe, and something called a global interpreter lock (GIL) is used as a sort of mutex on the interpreter. Only the thread which has acquired the GIL can interact with Python objects or use the Python C API. The negative side-effect of the GIL is that Python threads run more-or-less contiguously, instead of concurrently. Thus, Python threads can't take advantage of parallel hardware.

By default, Python runs in one main thread and the GIL is not initialized. Threads and the GIL are automatically initialized when a Python thread is created. Python automatically shares the GIL between threads so each thread can execute. This is accomplished by threads yielding on certain functions like I/O, sleep, etc. This doesn't happen for C extensions, they must explicitly release the GIL using the C API.

The old version of pymex would never release the GIL. The GIL would still get released, but only while the Python interpreter was running (during a call to PyRun_SimpleString). This meant that threads would only get CPU time during calls to pymex, and not during normal Matlab execution. Python scripts which joined their threads before ending didn't have a problem, but daemonized threads didn't work at all.

The fix was actually very simple. All I had to do was release the GIL using the C API before returning from the mex function, and reacquire it on reentry. This was done using PyEval_SaveThread and PyEval_RestoreThread. I also decided to release the GIL inside the matlab module during various (slow) mex calls, like mexPrintf. I did this to hopefully increase Python's threading performance.

But there was one drawback: added complexity. Programmers now need to consider thread-safety. Unfortunately, Matlab's mex interface isn't thread-safe. By extension, the matlab module isn't thread-safe. Don't use the matlab module inside a thread, it is liable to segfault! There is an exception to this rule. It is safe to use the matlab module in a thread as long as it ends before the script ends (i.e. the thread is joined). That way the underlying mex calls only occur during the call to pymex, so memory access violations are avoided.

Check out the pymex website for more details!

No comments:

Post a Comment