Wednesday, June 8, 2011

[pymex] Threading

Threading didn't really work in the first version of pymex. You could create a Python thread, but it wouldn't actually get any execution time. I recently found out why, and fixed it!

Python has limited support for multi-threading. The Python interpreter is not fully thread-safe, and something called a global interpreter lock (GIL) is used as a sort of mutex on the interpreter. Only the thread which has acquired the GIL can interact with Python objects or use the Python C API. The negative side-effect of the GIL is that Python threads run more-or-less contiguously, instead of concurrently. Thus, Python threads can't take advantage of parallel hardware.

By default, Python runs in one main thread and the GIL is not initialized. Threads and the GIL are automatically initialized when a Python thread is created. Python automatically shares the GIL between threads so each thread can execute. This is accomplished by threads yielding on certain functions like I/O, sleep, etc. This doesn't happen for C extensions, they must explicitly release the GIL using the C API.

The old version of pymex would never release the GIL. The GIL would still get released, but only while the Python interpreter was running (during a call to PyRun_SimpleString). This meant that threads would only get CPU time during calls to pymex, and not during normal Matlab execution. Python scripts which joined their threads before ending didn't have a problem, but daemonized threads didn't work at all.

The fix was actually very simple. All I had to do was release the GIL using the C API before returning from the mex function, and reacquire it on reentry. This was done using PyEval_SaveThread and PyEval_RestoreThread. I also decided to release the GIL inside the matlab module during various (slow) mex calls, like mexPrintf. I did this to hopefully increase Python's threading performance.

But there was one drawback: added complexity. Programmers now need to consider thread-safety. Unfortunately, Matlab's mex interface isn't thread-safe. By extension, the matlab module isn't thread-safe. Don't use the matlab module inside a thread, it is liable to segfault! There is an exception to this rule. It is safe to use the matlab module in a thread as long as it ends before the script ends (i.e. the thread is joined). That way the underlying mex calls only occur during the call to pymex, so memory access violations are avoided.

Check out the pymex website for more details!

Friday, June 3, 2011

I graduated!

I graduated! After four years of undergrad in EE and two years doing my MS in CompE, I'm finally done with school and ready for the real world!

School has been great, and I was very fortunate to work and study with great people. My advisor John Spletzer is a great guy, a genius, and has taught me more than I can remember. My coworkers, Jason Derenick, James Evans, Amy Forando, Chao Gao, Sean Kelly, Ben Mak, Tom Miller, Mike Sands, Constantin Savtchenko, Justin Sonntag, Dave Stolfo, and Nick Welton, made VADER Lab a great place to work. They are all intelligent, stimulating, and fun people. I could not thank John and everyone else enough for their support, insight, and hard work.

It's always sad to leave, but I'm also happy to be moving forward. All I need now is a job... on to the job search!