Bug 515267

Summary: NFS over GFS problem - invalid metadata block
Product: Red Hat Enterprise Linux 4 Reporter: Robert Peterson <rpeterso>
Component: GFS-kernelAssignee: Robert Peterson <rpeterso>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: high Docs Contact:
Priority: urgent    
Version: 4.9CC: adas, ben.yarwood, bernhard.furtmueller, bkahn, bmarzins, bstevens, cfeist, edamato, grimme, hklein, jplans, mchristi, merz, michael.hagmann, pcfe, ra, rdassen, revers, rpeterso, rrottmann, swhiteho, tao
Target Milestone: ---Keywords: OtherQA, Reopened
Target Release: 4.9   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: GFS-kernel-2.6.9-86.1.el4 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 455696
: 674403 (view as bug list) Environment:
Last Closed: 2011-02-16 16:34:06 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 455696    
Bug Blocks: 674403    
Attachments:
Description Flags
Patch to try none

Comment 1 Robert Peterson 2009-08-03 15:05:19 UTC
Created attachment 356047 [details]
Patch to try

This patch fixed the problem on my cluster.  I'd like the users
to try it and report whether it worked properly for them.

Comment 2 Robert Peterson 2009-08-03 15:06:17 UTC
Setting NEEDINFO flag until I hear back on the results from the
patch in comment #1.

Comment 4 Robert Peterson 2010-02-04 22:34:31 UTC
It's been six months and I still have not heard whether the
patch fixes the customer's problem.  I'll close this as
INSUFFICIENT_DATA for now.  If the results come in, we can re-open
it.

Comment 5 Thomas Merz 2010-06-07 12:59:51 UTC
We were not able to reproduce the issue using the newest RedHat provided RPMs for RHEL4, so the problem seems to be fixed.

Comment 7 Thomas Merz 2010-06-15 16:34:33 UTC
Little Add-On to my Comment #5:
With the "patch to try" and the newest RedHat provided packages we were not
able to reproduce the issue.

Comment 8 Robert Peterson 2010-06-16 13:32:19 UTC
I'll try to get this patch into 4.9 then.  Requesting ack flags
accordingly.

Comment 9 Robert Peterson 2010-06-17 21:06:38 UTC
The patch was pushed to the RHEL4 and RHEL49 branches of the
cluster git tree for inclusion into 4.9.  It was tested by
me a long time ago on the trin cluster, and by various customers
as shown in comment #6 above.  Changing status to POST.
Chris Feist does the builds for RHEL4 so I'm reassigning to him
to get this into a build.

Comment 11 Nate Straz 2011-01-14 21:16:04 UTC
I wrote a new regression test and was able to recreate the bug using RHEL 4.7.  I will let the regression test run on 4.9 over the weekend before marking this verified.

Comment 12 Nate Straz 2011-01-17 14:47:42 UTC
I hit the get_leaf assertion while running the new regression test.

GFS: fsid=dash-cluster:dash-cluster0.2: fatal: invalid metadata block
GFS: fsid=dash-cluster:dash-cluster0.2:   bh = 654416609 (type: exp=6, found=0)
GFS: fsid=dash-cluster:dash-cluster0.2:   function = get_leaf
GFS: fsid=dash-cluster:dash-cluster0.2:   file = /builddir/build/BUILD/gfs-kernel-2.6.9-87/up/src/gfs/dir.c, line = 438
GFS: fsid=dash-cluster:dash-cluster0.2:   time = 1295140811
GFS: fsid=dash-cluster:dash-cluster0.2: about to withdraw from the cluster
GFS: fsid=dash-cluster:dash-cluster0.2: waiting for outstanding I/O
------------[ cut here ]------------
kernel BUG at /builddir/build/BUILD/gfs-kernel-2.6.9-87/up/src/gfs/lm.c:190!
invalid operand: 0000 [#1]
Modules linked in: vfat fat nfs nfsd exportfs lockd nfs_acl lock_dlm(U) dm_cmirror(U) gnbd(U) lock_nolock(U) gfs(U) lock_harness(U) dlm(U) 
cman(U) parport_pc lp parport autofs4 i2c_dev i2c_core md5 ipv6 sunrpc cpufreq_powersave button battery ac uhci_hcd ehci_hcd i3000_edac edac_
mc tg3 qla2400 qla2xxx scsi_transport_fc dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod ata_piix libata sd_mod scsi_mod
CPU:    0
EIP:    0060:[<f912546c>]    Not tainted VLI
EFLAGS: 00010202   (2.6.9-94.EL) 
EIP is at gfs_lm_withdraw+0x50/0xbc [gfs]
eax: 00000044   ebx: f916f94c   ecx: f9148456   edx: dfb09da4
esi: f915b000   edi: 00000000   ebp: f915b000   esp: dfb09db8
ds: 007b   es: 007b   ss: 0068
Process find (pid: 10901, threadinfo=dfb09000 task=f5b412a0)
Stack: f916f94c cb3fd400 f9144647 f915b000 f914cb87 f916f94c f916f94c 27019ae1 
       00000000 00000006 00000000 f916f94c f9144a0f f916f94c f91464be 000001b6 
       f916f94c 4d3247cb ecb7cf1c f910e544 00000000 f9144a0f f91464be 000001b6 
Call Trace:
 [<f9144647>] gfs_metatype_check_ii+0x34/0x3f [gfs]
 [<f910e544>] get_leaf+0xc1/0xd5 [gfs]
 [<f911051d>] dir_e_read+0x1f2/0x2c9 [gfs]
 [<f9110c24>] gfs_dir_read+0x18/0x25 [gfs]
 [<f9131a9d>] filldir_reg_func+0x0/0x12c [gfs]
 [<f9131cd3>] readdir_reg+0x10a/0x12c [gfs]
 [<f9131a9d>] filldir_reg_func+0x0/0x12c [gfs]
 [<c0183d99>] filldir64+0x0/0x11a
 [<c0183d99>] filldir64+0x0/0x11a
 [<c0183d99>] filldir64+0x0/0x11a
 [<f9132098>] gfs_readdir+0x4e/0x5b [gfs]
 [<c0183a02>] vfs_readdir+0x8a/0xb7
 [<c018404f>] sys_getdents64+0x80/0xba
 [<c03246eb>] syscall_call+0x7/0xb
 [<c032007b>] packet_recvmsg+0xef/0x11a

Comment 13 Robert Peterson 2011-02-01 18:59:18 UTC
We discussed this problem in our weekly meeting.  We decided
that the patch makes things better, not worse, so although
the problem apparently isn't completely fixed, shipping the
patch in 4.9 is better than not shipping it.

Bug #674403 was opened to address any ongoing issues.
Changing status to ON_QA.

Comment 14 errata-xmlrpc 2011-02-16 16:34:06 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0276.html