Bug 150135
Summary: | Kernel OOPS in jbd While Running Network Stress | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Amit Bhutani <amit_bhutani> |
Component: | kernel | Assignee: | Stephen Tweedie <sct> |
Status: | CLOSED NEXTRELEASE | QA Contact: | Brian Brock <bbrock> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 4.0 | CC: | davej, riel, robert_hentosh, thomas_chenault, wwlinuxengineering |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2005-04-22 20:08:50 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 150568 | ||
Attachments: |
Description
Amit Bhutani
2005-03-02 20:17:48 UTC
Created attachment 111591 [details]
PE750_oops_trace_20050302
This is an OOPS trace from the last failure that did not have any sort of
special VLAN setup.
Created attachment 111593 [details]
PE750_sysrq_20050302
This is the SysRq output from the same failure from the previous comment.
Created attachment 111614 [details]
sysreport from afflicted system
This is a bug I've seen reported elsewhere, but so far I have not been able to reproduce it nor to get to the bottom of it. I do have a couple of debugging patches which implement extensive ext3 buffer tracing and a few extra consistency checks; it would be very helpful if we could get the results of reproducing this problem with these patches in place. Created attachment 111616 [details]
Debug patch 1 of 2: core ext3 buffer tracing
Run "make oldconfig" and enable CONFIG_BUFFER_DEBUG when using this patch.
ext3 may need to be built in, not modular.
Created attachment 111617 [details]
Debug patch 2 of 2: targetted assert tests for kjournald t_locked_list oops
This patch depends on the previous ext3-debug.patch.
Created attachment 111782 [details]
Fix for destroying in-use journal_head
The following patch has been committed for U1 to fix this problem. Please
report testing with it enabled.
Awesome! I am building a kernel with the patch from your previous comment right now. We will test as soon as possible. For what it's worth, I will provide you the output for the failure on the debug kernel that incorporated patches from comment #5 and #6 by the end of the day since that test got finally kicked off just today morning. We ran the debug kernel till Friday afternoon without a failure. So it seems that the debug prints has masked the issue. On Friday we were able to setup another test platform that exhibited the failure before the end of the day. (using the stock RHEL kernel) We then switched both machines to the Fixed kernel mentioned in #7. Both boxes ran the entire weekend without failure. That bodes well... but it is disconcerting that the debug kernel also masked the issue. Robert has also done a code-review and confirms that the fix is good; he's also run with it successfully for over 1 week. Confirmed that "ext3-release-race.patch", now in U1 Beta. Closing. |