Bug 711519
Summary: | GFS2: resource group bitmap corruption resulting in panics and withdraws | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Benjamin Kahn <bkahn> | ||||
Component: | kernel | Assignee: | Robert Peterson <rpeterso> | ||||
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 5.8 | CC: | adas, adrew, ajb2, bmarzins, cww, John.Hadad, jwest, liko, mjuricek, pm-eus, rpeterso, rryder, rwheeler, swhiteho, syeghiay, teigland | ||||
Target Milestone: | rc | Keywords: | ZStream | ||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | kernel-2.6.18-238.15.1.el5 | Doc Type: | Bug Fix | ||||
Doc Text: |
Multiple GFS2 nodes attempted to unlink, rename, or manipulate files at the same time, causing various forms of file system corruption, panics, and withdraws. This update adds multiple checks for dinode's i_nlink value to assure inode operations such as link, unlink, or rename no longer cause the aforementioned problems.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2011-07-15 06:10:44 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 690555 | ||||||
Bug Blocks: | |||||||
Attachments: |
|
Description
Benjamin Kahn
2011-06-07 17:51:53 UTC
Reassigning to myself; hope to post the 5.6.z patch shortly. The patch was posted to rhkernel-list for inclusion into 5.6.z. Changing status to POST. Created attachment 504744 [details]
Patch I posted
Comments aside, this is the patch I posted.
in kernel-2.6.18-238.15.1.el5 linux-2.6-fs-gfs2-fix-resource-group-bitmap-corruption.patch Which test stream kernel is this in? (We're running -262 and I don't want to step back out of the fixes already in that) Answering my own question.... It's in -261. We're running -262 and still seeing this occasionally under heavy load. (In reply to comment #9) > Answering my own question.... It's in -261. > > We're running -262 and still seeing this occasionally under heavy load. The code to resolve 690555 was tested quite heavily by Red Hat and partners. One of the partners who deployed this code has in their lab the one of the most aggressive workloads that we're aware of on GFS2 and has not documented a single occurrence of this issue on this code (when previously they could reproduce it in 2.5 hours reliably.) I believe we're fairly confident that BZ 690555 has been successfully resolved. Could you be hitting a new or different issue then? It would be great if you could open a case with support so that my team and I can help you out with this issue. If you could open a case with sosreports from your cluster, a description of what you suspect, and let us know the approximate time and date of the last withdraw that you suspect to be 69055 on the -262 then we can help. You could do this either here https://access.redhat.com/support/cases/new or by calling in. Also, please feel free to point to this note in this BZ and I'm sure my colleagues will alert me to the case so that I can personally assist. Thanks in advance. As luck would have it, I just had another instance and am generating a sosreport. It will be attached to support ticket #00353457 Thanks Alan Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Multiple GFS2 nodes attempted to unlink, rename, or manipulate files at the same time, causing various forms of file system corruption, panics, and withdraws. This update adds multiple checks for dinode's i_nlink value to assure inode operations such as link, unlink, or rename no longer cause the aforementioned problems. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0927.html |