From Bugzilla Helper: User-Agent: Mozilla/5.0 Galeon/1.2.7 (X11; Linux i686; U;) Gecko/20021220 Debian/1.2.7-5 Description of problem: When threads are used in Python, it _always_ deadlocks within a couple of seconds. The script included below runs with LD_ASSUME_KERNEL=2.4.1 and deadlocks otherwise. While debugging the problem I've discovered that python uses a combination of mutexes and condition variables for locking, e.g. it explicitly signals other threads to wake up when a given lock is released. (Python/thread_pthread.h contains the pthread specific thread implementation). The threading within Python works so that each thread releases the interpreter lock every 100 byte-code instructions, and as it seems this causes the deadlock. from threading import * class TestThread(Thread): def __init__(self, id): Thread.__init__(self) self.id = id def run(self): for i in range(1,10000): pass print 'ready: %d' % self.id for i in range(1,10): t = TestThread(i) t.start() Version-Release number of selected component (if applicable): kernel-2.4.20-8 How reproducible: Always Steps to Reproduce: 1. run the program above with NPTL enabled 2. it should deadlock Actual Results: the script deadlocked Expected Results: the script should have finished. Additional info:
I cannot reproduce any hangs in several hundred runs on an SMP machine. But then, my system is fully updated. The originally shipped glibc had, I think some issues with condvar. Those are used by Python. Update to the latest glibc version and the latest kernel. If you still see problems report exactly what kind of hardware you're using.
the update to the latest libc+kernel solved the problem indeed. thanks.
The current code works.
We have kernel 2.4.20-18.9 and glibc-2.3.2-27.9 and are still getting fairly consistent thread deadlocks in a massively threaded python application. Interesting thing is that often an strace or new thread will free up the other threads. What is the "CURRENTRELEASE" that is supposed to solve this issue? FWIW, here's what strace tells me about all the threads of a deadlocked process: [root@demo9 root]# strace -p 3356 -p 3357 -p 12929 -p 3596 -p 3364 -p 3595 [pid 3356] --- SIGSTOP (Stopped (signal)) @ 0 (0) --- [pid 12929] --- SIGSTOP (Stopped (signal)) @ 0 (0) --- [pid 3596] --- SIGSTOP (Stopped (signal)) @ 0 (0) --- [pid 3364] --- SIGSTOP (Stopped (signal)) @ 0 (0) --- [pid 3595] --- SIGSTOP (Stopped (signal)) @ 0 (0) --- [pid 3357] futex(0xb015a04, FUTEX_WAIT, 0, NULL <unfinished ...> [pid 3356] --- SIGSTOP (Stopped (signal)) @ 0 (0) --- [pid 12929] --- SIGSTOP (Stopped (signal)) @ 0 (0) --- [pid 3596] --- SIGSTOP (Stopped (signal)) @ 0 (0) --- [pid 3364] --- SIGSTOP (Stopped (signal)) @ 0 (0) --- [pid 3595] --- SIGSTOP (Stopped (signal)) @ 0 (0) --- [pid 3356] select(5, [4], [], [], {1, 160000} <unfinished ...> [pid 3596] futex(0x965f88c, FUTEX_WAIT, 0, NULL <unfinished ...> [pid 3364] select(0, NULL, NULL, NULL, {0, 820000} <unfinished ...> [pid 3595] futex(0xbc2fc2c, FUTEX_WAIT, 0, NULL <unfinished ...> After running this strace, thread 12929 is scheduled in and the deadlock releases.
Also, as previously noted, LD_ASSUME_KERNEL=2.4.1 makes the deadlocks go away.
hmm... now an rpm just hung in... [root@demo9 root]# strace -p 8367 futex(0x40586f20, FUTEX_WAIT, 0, NULL <unfinished ...>
I don't see any problems. If you really have some and they are not heardware related you might want to try Severn, the just released beta for the RHLP.