Bug 446409
Summary: | RHEL4 U6 hang in epoll_wait | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Issue Tracker <tao> | ||||||||
Component: | kernel | Assignee: | Josef Bacik <jbacik> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Martin Jenner <mjenner> | ||||||||
Severity: | urgent | Docs Contact: | |||||||||
Priority: | urgent | ||||||||||
Version: | 4.6 | CC: | esandeen, fybanez, jlau, tao, vgoyal | ||||||||
Target Milestone: | rc | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | All | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | RHSA-2008-0665 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2008-07-24 19:29:45 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Issue Tracker
2008-05-14 15:03:50 UTC
Uploading file vmcore_epoll2_179553.tgz
to dropbox.redhat.com/incoming
Estimated finish 45 min.
Size: 510415216
> md5sum vmcore_epoll2_179553.tgz
76781684802ba7eeca937eb17498eaea vmcore_epoll2_179553.tgz
This event sent from IssueTracker by fleitner [Support Engineering Group]
issue 179553
File uploaded: analysis.txt This event sent from IssueTracker by fleitner [Support Engineering Group] issue 179553 it_file 132870 File uploaded: patch.txt This event sent from IssueTracker by fleitner [Support Engineering Group] issue 179553 it_file 132871 Uploading two files. "analysis.txt" analysis from customer "patch.txt" patch file from customer - reported to fix for testcase Given this and core dump is sosreport still required? This event sent from IssueTracker by fleitner [Support Engineering Group] issue 179553 SEG, Ok, this issue officially goes beyond my reach and understanding. ------------ >>Provide time and date of the problem<< n/a >>Provide clear and concise problem description as it is understood at the time of escalation<< Basically, System appears hung. Interrupts are being processed so Alt-SysRq works. Also any system calls that have timeouts will have those timers expire. Thus processes waiting on variations of calls select, futex, poll,sleep, etc. will all have the timer expire and be placed in the runnable state making the load average appear quite high when a crash is examined. It appears that the epoll_wait remains in kernel context. * Observed behavior Customer's java process is calling epoll_wait and a hang ensues * Desired behavior Not have a hang :) >>State specific action requested of SEG<< take a look at the core that I've setup on CAS and compare the customer's analysis in the "analysis.txt" file attached to this ticket, with what you see and determine if the customer is correct. If the customer is correct, then escalate the patch in "patch.txt" to BZ and help us get this patch included in RHEL 4.8. >>State whether or not a defect in the product is suspected<< Yes, this appears to be a kernel bug. >>This is especially important for severity one and two issues. What is the impact to the customer when they experience this problem?<< Requested this information, but haven't received it yet. >>Location of core file<< Your corefile is ready for you You may view it at megatron.gsslab.rdu.redhat.com Login with kerberos name/password $ cd /cores/20080512103955/work >>Misc info<< Customer provided the core file from a non-smp kernel to allow for better debugging, however this issue was first seen on the smp kernel. Thanks Jeremy West /cores/20080512103955/work$ ./crash Issue escalated to Support Engineering Group by: jwest. Internal Status set to 'Waiting on SEG' This event sent from IssueTracker by fleitner [Support Engineering Group] issue 179553 The problem description part of analysis.txt 1. PID 6976(java) performs epoll_wait(). 2. File descriptor is ready, and ep_send_event() call __put_user() to copy epoll_event structure to user space. 3. Page_fault occurs because @page-out is generated by the user space. , and process switch occurs. In this context, epitem links to txlinst(fs/eventpoll.c line:1443) of stack of PID:6976 . 4. In same fact, file descriptor is ready again. the above epitem links to rdllist of eventpol. . 5. PID:8204(java) operates before PID:6976 starts. . 6. PID:8204 performs epoll_wati(). . 7. "if block"(fs/eventpoll.c line:1488) is not performed because rdllist of eventpol is not empty in ep_poll(). . 8. ep_event_transfer()(fs/eventpoll.c line:1531) operates, and ep_collect_ready_itmes()(fs/eventpoll.c line:1454) operates. . 9. ep_collect_ready_item() returns "0(zero)" not to be ready because epitem that links to rdllist links to txlist of PID:6976. . 10. ep_event_transfer() returns "0(zero)" too. . 11. ep_poll() confirms eventpoll again(fs/eventpoll.c line:1532). . Because the processing of 7-11 is repeated after this, kernel hangs up without generating the process switch. However, interrupt is accepted. <snipped> Proposed patch: --- fs/eventpoll.c.org 2008-05-12 19:30:23.000000000 +0900 +++ fs/eventpoll.c 2008-05-12 19:31:11.000000000 +0900 @@ -1529,8 +1529,10 @@ * more luck. */ if (!res && eavail && - !(res = ep_events_transfer(ep, events, maxevents)) && jtimeout) + !(res = ep_events_transfer(ep, events, maxevents)) && jtimeout) { + schedule(); goto retry; + } return res; } Created attachment 305369 [details]
analysis.txt
Created attachment 305370 [details]
patch.txt
By the end of work day 14-May-08 we need to provide: 1) IBM analysis of problem and patch provided. 2) Red Hat analysis of problem and patch provided. This is impacting a customer and an official fix or high quality workaround is requested to be delivered 15-May-08. The deadline appears to be in order to meet condition of service agreement. I do not know what happens if this is not fixed in next update. The analysis by IBM agrees with the problem determination and we feel that the fix of adding a call to "schedule()" prior to "goto retry" will solve the problem by allowing another process to empty the private delivery list. This seems safe and to provide a solution. However it is unknown if such a fix will be accepted by kernel.org. An alternate approach may be to use a wait queue and wake process when the list has been emptied. Such a change is considerably more complex. This event sent from IssueTracker by dmosby issue 179553 In continuing code analysis of eventpoll code IBM continues to believe that the proposed patch does represent a fix that would work. We think there are two other areas that should be considered: 1) Change the down_read/up_read in ep_events_transfer() to down_write/up_write. 2) The latest stable kernel.org (2.6.25.3) eventpoll.c file has had locking completely re-written. See if this compiles if replaced in the RHEL source tree and fixes problem. We have not yet tested either of these. This event sent from IssueTracker by dmosby issue 179553 The customer requests a hotfix as soon as possible. Actually they would have liked it a couple days ago. I am receiving daily requests for status on this and to be issued a fix. If you are able to commit to a date for hotfix please do this and begin preparation of a hotfix. This event sent from IssueTracker by dmosby issue 179553 Created attachment 305739 [details]
proposed fix.
I agree with their summary, this problem goes away upstream because ep->sem was
converted to just a plain jane mutex. Since thats not an option changing the
down_read() to a down_write() in ep_events_transfer is the best option at this
point to keep the second process from getting stuck in this infinite loop and
keeping the other process from doing its work. Please have the customer test
and verify this fixes their problem.
I will pass that patch on to customer. I set up a lab system and can reproduce this so will see how the patch works on that system as well. Note that I tried to reproduce on an x86 system and could not get the bug to trigger there. Only saw it when I moved to an x86_64 system. Took about three tries with the "client" program sending data to lock up. That was with a one cpu (but dual core) smp kernel (2.6.9-67.ELsmp) and two gig memory. As for workaround we could only think of 1) replace epoll_wait() with poll(), or 2) use mlock to prevent the page fault when copying to user data. Unfortunately these are not possible as this is a large Java application. The current epoll_wait code is either compiled as part of the application or Java library. At any rate, it can't be changed so only solution I can think of is hotfix until they can get the next RHEL4 release. This event sent from IssueTracker by dmosby issue 179553 I built a 2.6.9-67 kernel on a lab system and verified that I hit the bug using this kernel. I applied the patch and was unable to hit it. In addition to executing the "client" program several times by hand I ran it for 15 minutes in a shell script with a 5 second sleep to allow the memory consuming program to loop several times. This still did not trigger the bug where I was able to hit it quite quickly in my testing. Please advise me as to if a hotfix can be created and if so when that could be available. I know that this information will be requested. This event sent from IssueTracker by dmosby issue 179553 The patch has been tested on the actual customer application and this solves the problem. Prior testing was using a small test case. The customer has a maintenance window 23-May-08 which would allow installing a new kernel. Is it possible to obtain a hotfix kernel by end of business (Japan time) 23-May-08. They are UTC +9 so I believe we would need this available for download by end of day 22-May-08 US time. This is a very large customer and problem has high visibility in their organization as wel las within IBM. This event sent from IssueTracker by dmosby issue 179553 Committed in 71.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/ An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2008-0665.html *** Bug 254218 has been marked as a duplicate of this bug. *** *** Bug 485073 has been marked as a duplicate of this bug. *** I dont have a reproducer, you'll have to talk to the customer. |