Bug 711519 - GFS2: resource group bitmap corruption resulting in panics and withdraws
GFS2: resource group bitmap corruption resulting in panics and withdraws
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.8
x86_64 Linux
urgent Severity high
: rc
: ---
Assigned To: Robert Peterson
Cluster QE
: ZStream
Depends On: 690555
Blocks:
  Show dependency treegraph
 
Reported: 2011-06-07 13:51 EDT by Benjamin Kahn
Modified: 2011-07-15 02:10 EDT (History)
16 users (show)

See Also:
Fixed In Version: kernel-2.6.18-238.15.1.el5
Doc Type: Bug Fix
Doc Text:
Multiple GFS2 nodes attempted to unlink, rename, or manipulate files at the same time, causing various forms of file system corruption, panics, and withdraws. This update adds multiple checks for dinode's i_nlink value to assure inode operations such as link, unlink, or rename no longer cause the aforementioned problems.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-07-15 02:10:44 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Patch I posted (5.55 KB, patch)
2011-06-14 15:03 EDT, Robert Peterson
no flags Details | Diff

  None (edit)
Description Benjamin Kahn 2011-06-07 13:51:53 EDT
This bug has been copied from bug #690555 and has been proposed
to be backported to 5.6 z-stream (EUS).
Comment 3 Robert Peterson 2011-06-14 12:20:15 EDT
Reassigning to myself; hope to post the 5.6.z patch shortly.
Comment 4 Robert Peterson 2011-06-14 15:02:02 EDT
The patch was posted to rhkernel-list for inclusion into 5.6.z.
Changing status to POST.
Comment 5 Robert Peterson 2011-06-14 15:03:00 EDT
Created attachment 504744 [details]
Patch I posted

Comments aside, this is the patch I posted.
Comment 6 Phillip Lougher 2011-06-17 05:14:25 EDT
in kernel-2.6.18-238.15.1.el5

linux-2.6-fs-gfs2-fix-resource-group-bitmap-corruption.patch
Comment 7 Alan Brown 2011-06-17 09:01:23 EDT
Which test stream kernel is this in? (We're running -262 and I don't want to step back out of the fixes already in that)
Comment 9 Alan Brown 2011-06-24 05:47:29 EDT
Answering my own question.... It's in -261.

We're running -262 and still seeing this occasionally under heavy load.
Comment 10 Adam Drew 2011-06-24 17:37:08 EDT
(In reply to comment #9)
> Answering my own question.... It's in -261.
> 
> We're running -262 and still seeing this occasionally under heavy load.

The code to resolve 690555 was tested quite heavily by Red Hat and partners. One of the partners who deployed this code has in their lab the one of the most aggressive workloads that we're aware of on GFS2 and has not documented a single occurrence of this issue on this code (when previously they could reproduce it in 2.5 hours reliably.) I believe we're fairly confident that BZ 690555 has been successfully resolved. Could you be hitting a new or different issue then?

It would be great if you could open a case with support so that my team and I can help you out with this issue. If you could open a case with sosreports from your cluster, a description of what you suspect, and let us know the approximate time and date of the last withdraw that you suspect to be 69055 on the -262 then we can help. You could do this either here https://access.redhat.com/support/cases/new or by calling in. Also, please feel free to point to this note in this BZ and I'm sure my colleagues will alert me to the case so that I can personally assist.

Thanks in advance.
Comment 11 Alan Brown 2011-06-27 10:27:39 EDT
As luck would have it, I just had another instance and am generating a sosreport.

It will be attached to support ticket #00353457

Thanks
Alan
Comment 13 Martin Prpic 2011-07-12 07:51:44 EDT
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Multiple GFS2 nodes attempted to unlink, rename, or manipulate files at the same time, causing various forms of file system corruption, panics, and withdraws. This update adds multiple checks for dinode's i_nlink value to assure inode operations such as link, unlink, or rename no longer cause the aforementioned problems.
Comment 14 errata-xmlrpc 2011-07-15 02:10:44 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0927.html

Note You need to log in before you can comment on or make changes to this bug.