Description of problem: Unkillable zombie thread occurs when a thread in epoll has a signal 1 delivered, followed by a signal 9 Version-Release number of selected component (if applicable): RHEL 4.7 Kernel 2.6.9-78.0.8.ELsmp How reproducible: We have a multi-threaded application where a thread one thread performs epoll work. The process (thread group) is then delivered sig 1, which it doesn't handle correctly. Subsequently a sig 9 is delivered. One of the threads goes into the signal handler, the other remains in epoll and produces an unkillable zombie. Expected results: Signal 9 is supposed to terminate both threads. Additional info: This is the information from examining crash dump of the kernel: crash> ps -g 4710 PID: 4710 TASK: f4cde730 CPU: 1 COMMAND: "ourdaemon" PID: 4712 TASK: f4d017f0 CPU: 1 COMMAND: "ourdaemon" crash> bt 4710 PID: 4710 TASK: f4cde730 CPU: 1 COMMAND: "ourdaemon" #0 [f4cc9e44] schedule at c02de8bd #1 [f4cc9ea8] do_exit at c0124c8a #2 [f4cc9ec0] do_group_exit at c0124d7f #3 [f4cc9ed8] get_signal_to_deliver at c012d1d5 #4 [f4cc9f00] do_signal at c0105bd4 #5 [f4cc9fb8] do_notify_resume at c0105c80 #6 [f4cc9fc0] system_call at c02e0ae1 EAX: fffffffc EBX: bffff30c ECX: 00000000 EDX: 000000e3 DS: 007b ESI: bffff058 ES: 007b EDI: 000000e3 SS: 007b ESP: bffff050 EBP: bffff0dc CS: 0073 EIP: 002767a2 ERR: 000000f0 EFLAGS: 00200286 crash> bt 4712 PID: 4712 TASK: f4d017f0 CPU: 1 COMMAND: "ourdaemon" #0 [f4cceeb4] schedule at c02de8bd #1 [f4ccef18] rwsem_down_write_failed at c02df536 #2 [f4ccef40] .text.lock.eventpoll (via ep_events_transfer) at c017f799 #3 [f4ccef68] ep_poll at c017f684 #4 [f4ccefa8] sys_epoll_wait at c017ea1a #5 [f4ccefc0] system_call at c02e0a7c EAX: 00000100 EBX: 00000004 ECX: 00134fa8 EDX: 00000040 DS: 007b ESI: 00000852 ES: 007b EDI: 00000002 SS: 007b ESP: 00134ef0 EBP: 00135388 CS: 0073 EIP: 002767a2 ERR: 00000100 EFLAGS: 00200217 crash> sig -g 4710 PID: 4710 TASK: f4cde730 CPU: 1 COMMAND: "ourdaemon" SIGNAL_STRUCT: f7117540 COUNT: 2 ... SHARED_PENDING SIGNAL: 0000000000000101 SIGQUEUE: SIG SIGINFO 9 f3a31188 1 f32bb750 PID: 4710 TASK: f4cde730 CPU: 1 COMMAND: "ourdaemon" SIGPENDING: yes BLOCKED: 00000000000a4a23 PRIVATE_PENDING SIGNAL: 0000000000000000 SIGQUEUE: (empty) PID: 4712 TASK: f4d017f0 CPU: 1 COMMAND: "ourdaemon" SIGPENDING: yes BLOCKED: 00000000000a5a23 PRIVATE_PENDING SIGNAL: 0000000000000100 SIGQUEUE: (empty) crash> sig -s 0000000000000101 SIGHUP SIGKILL crash> sig -s 0000000000000100 SIGKILL
We narrowed this down to what appears to be a bug in the i386 rw_semaphore code. In a separate crash dump, we found three threads all sleeping in schedule() called from rwsem_down_failed() on the ep->sem. That rw_semaphore had a count value of 0xfffd0000 and a wait_list with three rsem_waiter structures, one for each sleeping thread. So whatever thread last decremented count did not notice that the RWSEM_ACTIVE_MASK was zero and did not wake a waiter. We believe this might be http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6-stable.git;a=commitdiff;h=b862f3b099f3ea672c7438c0b282ce8201d39dfc;hp=e2a3d40258fe20d205f8ed592e1e2c0d5529c2e1 We are currently testing a patch that changes the type of ep->sem to semaphore, since the epoll code no longer takes a read lock on it.
Created attachment 338372 [details] Converts ep-> sem from rw_semaphore to a normal semaphore
Created attachment 338373 [details] Backports Linus' assembly changes for lock related code to 2.6..9
The patch converting ep->sem to an rw_semaphore had been proved to eliminate the issue for us. Additionally, we've back-ported the assembly changes to 2.6.9.
Would anyone on the RedHat team comment on the issue and our findings?
upstream is converting ep->mtx to ep->ctx using mutex instead, by commit d47de16c7221968d3eab899d7540efa5ba77af5a, mind to take a try on this?
The problem occurs rarely and we haven't been able to reliably reproduce it outside of production. After applying the two attached patches, it hasn't reproduced. There are substantial other changes to the epoll code upstream which might or might not be required to safely convert to a mutex. So I don't think we'd be willing to try backporting the current upstream code and rolling it out to production. Note the fixes in attachment 338373 [details] would also apply to any other users of rw_semaphore.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Created attachment 514074 [details] patch tested by customer The customer tested the attached patch, and says the problem is resolved. They would like this fix in RHEL4.9
My customer (a large Wall Street firm) is experiencing these symptoms on production x86_64 systems also. We are interested in getting this patch tested and hopefully released, even considering how late rhel4 is in production. Their rhel4 isn't going away for quite a while still.