484815 – Unkillable zombies result on epoll rw_semaphore contention

Bug 484815 - Unkillable zombies result on epoll rw_semaphore contention

Summary: Unkillable zombies result on epoll rw_semaphore contention

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.7
Hardware:	i686
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Danny Feng
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2009-02-10 01:04 UTC by Alex Feinberg
Modified:	2018-11-14 15:50 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2012-06-14 20:58:47 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Converts ep-> sem from rw_semaphore to a normal semaphore (1.64 KB, patch) 2009-04-06 17:56 UTC, Alex Feinberg	no flags	Details \| Diff
Backports Linus' assembly changes for lock related code to 2.6..9 (10.01 KB, patch) 2009-04-06 17:56 UTC, Alex Feinberg	no flags	Details \| Diff
patch tested by customer (1.80 KB, patch) 2011-07-20 19:33 UTC, Guy Streeter	no flags	Details \| Diff
View All

Description Alex Feinberg 2009-02-10 01:04:35 UTC

Description of problem:
Unkillable zombie thread occurs when a thread in epoll has a signal 1 delivered, followed by a signal 9

Version-Release number of selected component (if applicable):
RHEL 4.7
Kernel 2.6.9-78.0.8.ELsmp

How reproducible:

We have a multi-threaded application where a thread one thread performs
epoll work. The process (thread group) is then delivered sig 1, which it doesn't
handle correctly. Subsequently a sig 9 is delivered. One of the threads goes
into the signal handler, the other remains in epoll and produces an unkillable
zombie.

Expected results:

Signal 9 is supposed to terminate both threads.

Additional info:

This is the information from examining crash dump of the kernel:

crash> ps -g 4710
PID: 4710   TASK: f4cde730  CPU: 1   COMMAND: "ourdaemon"
  PID: 4712   TASK: f4d017f0  CPU: 1   COMMAND: "ourdaemon"
crash> bt 4710
PID: 4710   TASK: f4cde730  CPU: 1   COMMAND: "ourdaemon"
 #0 [f4cc9e44] schedule at c02de8bd
 #1 [f4cc9ea8] do_exit at c0124c8a
 #2 [f4cc9ec0] do_group_exit at c0124d7f
 #3 [f4cc9ed8] get_signal_to_deliver at c012d1d5
 #4 [f4cc9f00] do_signal at c0105bd4
 #5 [f4cc9fb8] do_notify_resume at c0105c80
 #6 [f4cc9fc0] system_call at c02e0ae1
    EAX: fffffffc  EBX: bffff30c  ECX: 00000000  EDX: 000000e3 
    DS:  007b      ESI: bffff058  ES:  007b      EDI: 000000e3
    SS:  007b      ESP: bffff050  EBP: bffff0dc
    CS:  0073      EIP: 002767a2  ERR: 000000f0  EFLAGS: 00200286 

crash> bt 4712
PID: 4712   TASK: f4d017f0  CPU: 1   COMMAND: "ourdaemon"
 #0 [f4cceeb4] schedule at c02de8bd
 #1 [f4ccef18] rwsem_down_write_failed at c02df536
 #2 [f4ccef40] .text.lock.eventpoll (via ep_events_transfer) at c017f799
 #3 [f4ccef68] ep_poll at c017f684
 #4 [f4ccefa8] sys_epoll_wait at c017ea1a
 #5 [f4ccefc0] system_call at c02e0a7c
    EAX: 00000100  EBX: 00000004  ECX: 00134fa8  EDX: 00000040 
    DS:  007b      ESI: 00000852  ES:  007b      EDI: 00000002
    SS:  007b      ESP: 00134ef0  EBP: 00135388
    CS:  0073      EIP: 002767a2  ERR: 00000100  EFLAGS: 00200217 

crash> sig -g 4710
PID: 4710   TASK: f4cde730  CPU: 1   COMMAND: "ourdaemon"
SIGNAL_STRUCT: f7117540  COUNT: 2
...
SHARED_PENDING
    SIGNAL: 0000000000000101
  SIGQUEUE:  SIG  SIGINFO
               9  f3a31188
               1  f32bb750

  PID: 4710   TASK: f4cde730  CPU: 1   COMMAND: "ourdaemon"
  SIGPENDING: yes
     BLOCKED: 00000000000a4a23
  PRIVATE_PENDING
      SIGNAL: 0000000000000000
    SIGQUEUE: (empty)

  PID: 4712   TASK: f4d017f0  CPU: 1   COMMAND: "ourdaemon"
  SIGPENDING: yes
     BLOCKED: 00000000000a5a23
  PRIVATE_PENDING
      SIGNAL: 0000000000000100
    SIGQUEUE: (empty)
crash> sig -s 0000000000000101
SIGHUP SIGKILL
crash> sig -s 0000000000000100
SIGKILL

Comment 1 John G. Myers 2009-03-25 22:53:26 UTC

We narrowed this down to what appears to be a bug in the i386 rw_semaphore code.

In a separate crash dump, we found three threads all sleeping in schedule() called from rwsem_down_failed() on the ep->sem.  That rw_semaphore had a count value of 0xfffd0000 and a wait_list with three rsem_waiter structures, one for each sleeping thread.

So whatever thread last decremented count did not notice that the RWSEM_ACTIVE_MASK was zero and did not wake a waiter.

We believe this might be http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6-stable.git;a=commitdiff;h=b862f3b099f3ea672c7438c0b282ce8201d39dfc;hp=e2a3d40258fe20d205f8ed592e1e2c0d5529c2e1

We are currently testing a patch that changes the type of ep->sem to semaphore, since the epoll code no longer takes a read lock on it.

Comment 2 Alex Feinberg 2009-04-06 17:56:00 UTC

Created attachment 338372 [details]
Converts ep-> sem from rw_semaphore to a normal semaphore

Comment 3 Alex Feinberg 2009-04-06 17:56:44 UTC

Created attachment 338373 [details]
Backports Linus' assembly changes for lock related code to 2.6..9

Comment 4 Alex Feinberg 2009-04-06 17:57:41 UTC

The patch converting ep->sem to an rw_semaphore had been proved to eliminate the issue for us. Additionally, we've back-ported the assembly changes to 2.6.9.

Comment 5 Alex Feinberg 2009-04-17 18:27:18 UTC

Would anyone on the RedHat team comment on the issue and our findings?

Comment 6 Danny Feng 2010-02-08 10:43:23 UTC

upstream is converting ep->mtx to ep->ctx using mutex instead, by commit d47de16c7221968d3eab899d7540efa5ba77af5a, mind to take a try on this?

Comment 7 John G. Myers 2010-02-08 18:45:53 UTC

The problem occurs rarely and we haven't been able to reliably reproduce it outside of production. After applying the two attached patches, it hasn't reproduced. There are substantial other changes to the epoll code upstream which might or might not be required to safely convert to a mutex.

So I don't think we'd be willing to try backporting the current upstream code and rolling it out to production.

Note the fixes in attachment 338373 [details] would also apply to any other users of rw_semaphore.

Comment 15 RHEL Program Management 2011-03-17 12:49:00 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 20 Guy Streeter 2011-07-20 19:33:04 UTC

Created attachment 514074 [details]
patch tested by customer

The customer tested the attached patch, and says the problem is resolved. They would like this fix in RHEL4.9

Comment 21 Matthew Whitehead 2011-11-23 15:38:51 UTC

My customer (a large Wall Street firm) is experiencing these symptoms on production x86_64 systems also. We are interested in getting this patch tested and hopefully released, even considering how late rhel4 is in production. Their rhel4 isn't going away for quite a while still.

Note You need to log in before you can comment on or make changes to this bug.