Bug 446409

Summary: RHEL4 U6 hang in epoll_wait
Product: Red Hat Enterprise Linux 4 Reporter: Issue Tracker <tao>
Component: kernelAssignee: Josef Bacik <jbacik>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 4.6CC: esandeen, fybanez, jlau, tao, vgoyal
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHSA-2008-0665 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-07-24 19:29:45 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
analysis.txt
none
patch.txt
none
proposed fix. none

Description Issue Tracker 2008-05-14 15:03:50 UTC
Escalated to Bugzilla from IssueTracker

Comment 2 Issue Tracker 2008-05-14 15:03:52 UTC
Uploading file vmcore_epoll2_179553.tgz
to dropbox.redhat.com/incoming
Estimated finish 45 min.
Size: 510415216
> md5sum vmcore_epoll2_179553.tgz
76781684802ba7eeca937eb17498eaea  vmcore_epoll2_179553.tgz



This event sent from IssueTracker by fleitner  [Support Engineering Group]
 issue 179553

Comment 4 Issue Tracker 2008-05-14 15:03:55 UTC
File uploaded: analysis.txt

This event sent from IssueTracker by fleitner  [Support Engineering Group]
 issue 179553
it_file 132870

Comment 5 Issue Tracker 2008-05-14 15:03:56 UTC
File uploaded: patch.txt

This event sent from IssueTracker by fleitner  [Support Engineering Group]
 issue 179553
it_file 132871

Comment 6 Issue Tracker 2008-05-14 15:03:58 UTC
Uploading two files.
"analysis.txt" analysis from customer
"patch.txt" patch file from customer - reported to fix for testcase

Given this and core dump is sosreport still required?



This event sent from IssueTracker by fleitner  [Support Engineering Group]
 issue 179553

Comment 7 Issue Tracker 2008-05-14 15:03:59 UTC
SEG,

Ok, this issue officially goes beyond my reach and understanding.  

------------
>>Provide time and date of the problem<<
n/a

>>Provide clear and concise problem description as it is understood at the
time of escalation<<

Basically, System appears hung. Interrupts are being processed so
Alt-SysRq works. Also any system calls that have timeouts will have those
timers expire. Thus processes waiting on variations of calls select,
futex, poll,sleep, etc. will all have the timer expire and be placed in
the runnable state making the load average appear quite high when a crash
is examined. It appears that
the epoll_wait remains in kernel context.

    * Observed behavior
      Customer's java process is calling epoll_wait and a hang ensues
    * Desired behavior 
      Not have a hang :)

>>State specific action requested of SEG<<

take a look at the core that I've setup on CAS and compare the
customer's analysis in the "analysis.txt" file attached to this ticket,
with what you see and determine if the customer is correct.  If the
customer is correct, then escalate the patch in "patch.txt" to BZ and
help us get this patch included in RHEL 4.8.

>>State whether or not a defect in the product is suspected<<

Yes, this appears to be a kernel bug.

>>This is especially important for severity one and two issues. What is
the impact to the customer when they experience this problem?<<

Requested this information, but haven't received it yet.

>>Location of core file<<

Your corefile is ready for you
You may view it at megatron.gsslab.rdu.redhat.com
Login with kerberos name/password
$ cd /cores/20080512103955/work

>>Misc info<<

Customer provided the core file from a non-smp kernel to allow for better
debugging, however this issue was first seen on the smp kernel.

Thanks
Jeremy West
/cores/20080512103955/work$ ./crash


Issue escalated to Support Engineering Group by: jwest.
Internal Status set to 'Waiting on SEG'

This event sent from IssueTracker by fleitner  [Support Engineering Group]
 issue 179553

Comment 8 Flavio Leitner 2008-05-14 15:06:22 UTC
The problem description part of analysis.txt

1. PID 6976(java) performs epoll_wait().

2. File descriptor is ready, and ep_send_event() call __put_user() to
copy epoll_event structure to user space.

3. Page_fault occurs because @page-out is generated by the user space.
, and process switch occurs.
   In this context, epitem links to txlinst(fs/eventpoll.c line:1443)
of stack of PID:6976
.
4. In same fact, file descriptor is ready again. the above epitem links
 to rdllist of eventpol.
.
5. PID:8204(java) operates before PID:6976 starts.
.
6. PID:8204 performs epoll_wati().
.
7. "if block"(fs/eventpoll.c line:1488) is not performed because
rdllist of eventpol is not empty in ep_poll().
.
8. ep_event_transfer()(fs/eventpoll.c line:1531) operates, and
ep_collect_ready_itmes()(fs/eventpoll.c line:1454) operates.
.
9. ep_collect_ready_item() returns "0(zero)" not to be ready because
epitem that links to rdllist links to txlist of PID:6976.
.
10. ep_event_transfer() returns "0(zero)" too.
.
11. ep_poll() confirms eventpoll again(fs/eventpoll.c line:1532).
.
Because the processing of 7-11 is repeated after this, kernel hangs up
without generating the process switch.
However, interrupt is accepted.
<snipped>


Comment 9 Flavio Leitner 2008-05-14 15:07:07 UTC
Proposed patch:
--- fs/eventpoll.c.org   2008-05-12 19:30:23.000000000 +0900
+++ fs/eventpoll.c   2008-05-12 19:31:11.000000000 +0900
@@ -1529,8 +1529,10 @@
     * more luck.
     */
    if (!res && eavail &&
-       !(res = ep_events_transfer(ep, events, maxevents)) && jtimeout)
+       !(res = ep_events_transfer(ep, events, maxevents)) && jtimeout) {
+      schedule();
       goto retry;
+   }
    return res;
 }


Comment 10 Flavio Leitner 2008-05-14 15:12:29 UTC
Created attachment 305369 [details]
analysis.txt

Comment 11 Flavio Leitner 2008-05-14 15:13:05 UTC
Created attachment 305370 [details]
patch.txt

Comment 12 Issue Tracker 2008-05-14 17:54:32 UTC
By the end of work day 14-May-08 we need to provide:
1) IBM analysis of problem and patch provided.
2) Red Hat analysis of problem and patch provided.

This is impacting a customer and an official fix or high quality
workaround is requested to be delivered 15-May-08.

The deadline appears to be in order to meet condition of service
agreement.

I do not know what happens if this is not fixed in next update.

The analysis by IBM agrees with the problem determination and we feel
that the fix of adding a call to "schedule()" prior to "goto retry"
will solve the problem by allowing another process to empty the
private delivery list. This seems safe and to provide a solution.
However it is unknown if such a fix will be accepted by kernel.org. An
alternate approach may be to use a wait queue and wake process when
the list has been emptied. Such a change is considerably more complex.



This event sent from IssueTracker by dmosby 
 issue 179553

Comment 13 Issue Tracker 2008-05-14 20:09:17 UTC
In continuing code analysis of eventpoll code IBM continues to believe
that the proposed patch does represent a fix that would work.
We think there are two other areas that should be considered:
1) Change the down_read/up_read in ep_events_transfer() to
   down_write/up_write.
2) The latest stable kernel.org (2.6.25.3) eventpoll.c file has
   had locking completely re-written. See if this compiles if
   replaced in the RHEL source tree and fixes problem.

We have not yet tested either of these.



This event sent from IssueTracker by dmosby 
 issue 179553

Comment 16 Issue Tracker 2008-05-16 19:33:49 UTC
The customer requests a hotfix as soon as possible.
Actually they would have liked it a couple days ago.

I am receiving daily requests for status on this and
to be issued a fix.

If you are able to commit to a date for hotfix
please do this and begin preparation of a hotfix.




This event sent from IssueTracker by dmosby 
 issue 179553

Comment 19 Josef Bacik 2008-05-16 20:02:12 UTC
Created attachment 305739 [details]
proposed fix.

I agree with their summary, this problem goes away upstream because ep->sem was
converted to just a plain jane mutex.  Since thats not an option changing the
down_read() to a down_write() in ep_events_transfer is the best option at this
point to keep the second process from getting stuck in this infinite loop and
keeping the other process from doing its work.	Please have the customer test
and verify this fixes their problem.

Comment 20 Issue Tracker 2008-05-16 20:33:27 UTC
I will pass that patch on to customer. I set up a lab system
and can reproduce this so will see how the patch works on
that system as well.

Note that I tried to reproduce on an x86 system and could not
get the bug to trigger there. Only saw it when I moved
to an x86_64 system. Took about three tries with the "client"
program sending data to lock up. That was with a one cpu
(but dual core) smp kernel (2.6.9-67.ELsmp) and two gig memory.

As for workaround we could only think of 1) replace epoll_wait()
with poll(), or 2) use mlock to prevent the page fault when
copying to user data. Unfortunately these are not possible as
this is a large Java application. The current epoll_wait code is
either compiled as part of the application or Java library.
At any rate, it can't be changed so only solution I can think of
is hotfix until they can get the next RHEL4 release.



This event sent from IssueTracker by dmosby 
 issue 179553

Comment 21 Issue Tracker 2008-05-16 23:17:01 UTC
I built a 2.6.9-67 kernel on a lab system and verified that I
hit the bug using this kernel. I applied the patch and was unable
to hit it. In addition to executing the "client" program several
times by hand I ran it for 15 minutes in a shell script with a 5
second sleep to allow the memory consuming program to loop several
times. This still did not trigger the bug where I was able to hit
it quite quickly in my testing.

Please advise me as to if a hotfix can be created and if so
when that could be available. I know that this information
will be requested.



This event sent from IssueTracker by dmosby 
 issue 179553

Comment 25 Issue Tracker 2008-05-22 14:41:13 UTC
The patch has been tested on the actual customer application
and this solves the problem. Prior testing was using a small
test case.

The customer has a maintenance window 23-May-08 which would
allow installing a new kernel.

Is it possible to obtain a hotfix kernel by end of business
(Japan time) 23-May-08. They are UTC +9 so I believe we would
need this available for download by end of day 22-May-08 US time.
This is a very large customer and problem has high visibility
in their organization as wel las within IBM.



This event sent from IssueTracker by dmosby 
 issue 179553

Comment 29 Vivek Goyal 2008-05-29 20:51:53 UTC
Committed in 71.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 33 errata-xmlrpc 2008-07-24 19:29:45 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2008-0665.html

Comment 34 Marco Bill-Peter 2008-07-30 01:46:38 UTC
*** Bug 254218 has been marked as a duplicate of this bug. ***

Comment 36 Dave Anderson 2009-04-07 14:30:55 UTC
*** Bug 485073 has been marked as a duplicate of this bug. ***

Comment 44 Josef Bacik 2009-07-09 18:40:53 UTC
I dont have a reproducer, you'll have to talk to the customer.