Red Hat Bugzilla – Bug 711519
GFS2: resource group bitmap corruption resulting in panics and withdraws
Last modified: 2011-07-15 02:10:44 EDT
This bug has been copied from bug #690555 and has been proposed
to be backported to 5.6 z-stream (EUS).
Reassigning to myself; hope to post the 5.6.z patch shortly.
The patch was posted to rhkernel-list for inclusion into 5.6.z.
Changing status to POST.
Created attachment 504744 [details]
Patch I posted
Comments aside, this is the patch I posted.
Which test stream kernel is this in? (We're running -262 and I don't want to step back out of the fixes already in that)
Answering my own question.... It's in -261.
We're running -262 and still seeing this occasionally under heavy load.
(In reply to comment #9)
> Answering my own question.... It's in -261.
> We're running -262 and still seeing this occasionally under heavy load.
The code to resolve 690555 was tested quite heavily by Red Hat and partners. One of the partners who deployed this code has in their lab the one of the most aggressive workloads that we're aware of on GFS2 and has not documented a single occurrence of this issue on this code (when previously they could reproduce it in 2.5 hours reliably.) I believe we're fairly confident that BZ 690555 has been successfully resolved. Could you be hitting a new or different issue then?
It would be great if you could open a case with support so that my team and I can help you out with this issue. If you could open a case with sosreports from your cluster, a description of what you suspect, and let us know the approximate time and date of the last withdraw that you suspect to be 69055 on the -262 then we can help. You could do this either here https://access.redhat.com/support/cases/new or by calling in. Also, please feel free to point to this note in this BZ and I'm sure my colleagues will alert me to the case so that I can personally assist.
Thanks in advance.
As luck would have it, I just had another instance and am generating a sosreport.
It will be attached to support ticket #00353457
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
Multiple GFS2 nodes attempted to unlink, rename, or manipulate files at the same time, causing various forms of file system corruption, panics, and withdraws. This update adds multiple checks for dinode's i_nlink value to assure inode operations such as link, unlink, or rename no longer cause the aforementioned problems.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.