Bug 606447 - GFS2 - kernel BUG at fs/gfs2/lm.c:109
GFS2 - kernel BUG at fs/gfs2/lm.c:109
Status: CLOSED DUPLICATE of bug 586008
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.5
All Linux
low Severity medium
: rc
: ---
Assigned To: Abhijith Das
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-06-21 12:05 EDT by Jaroslav Kortus
Modified: 2010-11-09 08:09 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-07-02 16:23:30 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
reproducer (660 bytes, application/x-sh)
2010-06-21 12:06 EDT, Jaroslav Kortus
no flags Details
metadata from broken FS (1.12 MB, application/x-bzip2)
2010-06-21 12:07 EDT, Jaroslav Kortus
no flags Details

  None (edit)
Description Jaroslav Kortus 2010-06-21 12:05:40 EDT
Description of problem:
During tests of bug 586006 I found one different problem:
Kernel OOPS:
GFS2: fsid=a_cluster:vedder0.2: fatal: invalid metadata block
GFS2: fsid=a_cluster:vedder0.2:   bh = 35502140 (magic number)
GFS2: fsid=a_cluster:vedder0.2:   function = gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line = 334
GFS2: fsid=a_cluster:vedder0.2: about to withdraw this file system
kernel BUG at fs/gfs2/lm.c:109!
pdflush[143]: bugcheck! 0 [1]


Version-Release number of selected component (if applicable):
2.6.18-194.3.1.el5

How reproducible:
20%
I could not reproduce this with kernel with fix for bug 586006, so maybe it's somehow related. I haven't hit this without quota=on option either.

Steps to Reproduce:
1. create cluster + gfs2 FS with -o quota=on option
2. run reproducer for couple of mins
3. see the crash. It will be actually oops for bug 586006 many times
  
Actual results:
oops, metadata corrupted

Expected results:
no oops

Additional info:
 GFS2: fsid=a_cluster:vedder0.2: fatal: invalid metadata block
GFS2: fsid=a_cluster:vedder0.2:   bh = 35502140 (magic number)
GFS2: fsid=a_cluster:vedder0.2:   function = gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line = 334
GFS2: fsid=a_cluster:vedder0.2: about to withdraw this file system
kernel BUG at fs/gfs2/lm.c:109!
pdflush[143]: bugcheck! 0 [1]
Modules linked in: nfs fscache nfs_acl lock_dlm gfs2 dlm configfs autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc ipv6 xfrm_nalgo crypto_api vfat fat dm_multipath scsi_dh wmi power_meter hwmon button parport_pc lp parport sg lpfc scsi_transport_fc ide_cd e1000 cdrom dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd

Pid: 143, CPU 1, comm:              pdflush
psr : 00001010085a6010 ifs : 800000000000060f ip  : [<a0000002031060d0>]    Not tainted (2.6.18-194.3.1.el5)
ip is at gfs2_lm_withdraw+0x190/0x2a0 [gfs2]
unat: 0000000000000000 pfs : 000000000000060f rsc : 0000000000000003
rnat: a000000100b23668 bsps: 0000000000000004 pr  : 000000000000a541
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70433f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a0000002031060d0 b6  : a000000100011000 b7  : a0000001002b1c00
f6  : 1003e00000000000000a0 f7  : 1003e20c49ba5e353f7cf
f8  : 1003e00000000000004e2 f9  : 1003e000000000fa00000
f10 : 1003e000000003b9aca00 f11 : 1003e431bde82d7b634db
r1  : a000000100c478a0 r2  : a000000100a60750 r3  : a00000010098c060
r8  : 0000000000000023 r9  : a000000100a60780 r10 : a000000100a60780
r11 : 0000000000000000 r12 : e0000001feb1fb20 r13 : e0000001feb18000
r14 : a000000100a60750 r15 : 0000000000000000 r16 : a00000010098c068
r17 : e000000106767e18 r18 : 0000000000000000 r19 : 0000000000000000
r20 : a000000100889300 r21 : a000000100a47f20 r22 : a000000100a60758
r23 : a000000100a60758 r24 : a00000010080d054 r25 : 0000000000000000
r26 : a00000010080d05c r27 : a00000010080d040 r28 : a00000010080c008
r29 : 0000063ff9c00000 r30 : 0000000000000000 r31 : 0000000000000000
Call Trace:
 [<a000000100013b40>] show_stack+0x40/0xa0
                                sp=e0000001feb1f6b0 bsp=e0000001feb19498
 [<a000000100014470>] show_regs+0x870/0x8c0
                                sp=e0000001feb1f880 bsp=e0000001feb19440
 [<a000000100037e20>] die+0x1c0/0x2c0
Comment 1 Jaroslav Kortus 2010-06-21 12:06:29 EDT
Created attachment 425683 [details]
reproducer
Comment 2 Jaroslav Kortus 2010-06-21 12:07:41 EDT
Created attachment 425684 [details]
metadata from broken FS
Comment 3 Jaroslav Kortus 2010-06-21 12:09:14 EDT
gfs2_fsck is not able to fix the filesystem:

gfs2_fsck -yvvvvvv /dev/vedder/vedder0 
Initializing fsck
Initializing lists...
jid=0: Looking at journal...
jid=0: Journal is clean.
jid=1: Looking at journal...
jid=1: Journal is clean.
jid=2: Looking at journal...
jid=2: Replaying journal...
jid=2: Failed
Recovering journals (this may take a while)
(initialize.c:401)      <backtrace> - initialize()
#
Comment 4 Jaroslav Kortus 2010-06-21 12:13:15 EDT
Logs from a2 show that the node was fenced and journal replayed:

Jun 21 10:26:18 a2 fenced[4551]: fence "a1" success 
Jun 21 10:26:18 a2 kernel: GFS2: fsid=a_cluster:vedder0.0: jid=2: Trying to acquire journal lock...
Jun 21 10:26:18 a2 kernel: GFS2: fsid=a_cluster:vedder0.0: jid=2: Looking at journal...
Jun 21 10:26:18 a2 kernel: GFS2: fsid=a_cluster:vedder0.0: jid=2: Acquiring the transaction lock...
Jun 21 10:26:18 a2 kernel: GFS2: fsid=a_cluster:vedder0.0: jid=2: Replaying journal...
Jun 21 10:26:18 a2 kernel: GFS2: fsid=a_cluster:vedder0.0: jid=2: Replayed 5344 of 5345 blocks
Jun 21 10:26:18 a2 kernel: GFS2: fsid=a_cluster:vedder0.0: jid=2: Found 1 revoke tags
Jun 21 10:26:18 a2 kernel: GFS2: fsid=a_cluster:vedder0.0: jid=2: Journal replayed in 1s
Jun 21 10:26:18 a2 kernel: GFS2: fsid=a_cluster:vedder0.0: jid=2: Done
Comment 5 Robert Peterson 2010-06-22 11:45:12 EDT
I highly suspect this is a duplicate of Abhi's quota bug, but
I'm reassigning the bug to him to make that assessment.
Comment 7 Abhijith Das 2010-07-02 08:50:54 EDT
I believe this is a duplicate of bug 586008. I'm requesting needinfo so that it can be verified by the reporter that this is the case. According the bug 586006, the fix went into 2.6.18-194.4.1.el5.
Comment 8 Jaroslav Kortus 2010-07-02 15:16:02 EDT
can't reproduce it on  2.6.18-194.8.1.el5 (current RHN 5.5), ia64. These symptoms were most probably related, so I'd suggest closing this as duplicate and I will reopen it if it pops up again.
Comment 9 Abhijith Das 2010-07-02 16:23:30 EDT
Closing as duplicate of bug 586008 which has been fixed already.

*** This bug has been marked as a duplicate of bug 586008 ***

Note You need to log in before you can comment on or make changes to this bug.