Bug 450156 - GFS2: kernel panic mounting volume
GFS2: kernel panic mounting volume
Status: CLOSED UPSTREAM
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
rawhide
All Linux
medium Severity medium
: ---
: ---
Assigned To: Robert Peterson
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-06-05 12:02 EDT by Robert Peterson
Modified: 2008-06-13 12:44 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-06-13 12:44:40 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Proposed patch to fix the problem (468 bytes, patch)
2008-06-09 13:04 EDT, Robert Peterson
no flags Details | Diff

  None (edit)
Description Robert Peterson 2008-06-05 12:02:41 EDT
Description of problem:
Eric Sandeen had a 2TB GFS2 volume (/dev/sdc) on system
east-10.lab.bos.redhat.com.

Apparently the system was rebooted in the middle of a gfs2_fsck.
When the system came back, he tried to mount the volume, and it
panicked the kernel.  Eric says he did not use any special mount
parameters.

Version-Release number of selected component (if applicable):
RHEL5 running the 2.6.26-rc2 kernel (Linus's kernel) which is
pretty recent wrt the mounting code (ops_fstype.c, mount.c and such).

How reproducible:
Unknown.  I tried editing the superblock on one of my gfs2 volumes
so it looked the same, but I got an error message rather than a
kernel panic.

Steps to Reproduce:
1.
2.
3.
  
Actual results:
Kernel panic

Expected results:
Error message

Additional info:
GFS2 (built May 30 2008 16:40:57) installed
BUG: unable to handle kernel NULL pointer dereference at 000000000000082c
IP: [<ffffffff804738e5>] _spin_lock_irq+0x6/0x16
PGD 11e829067 PUD 11dcdf067 PMD 0 
Oops: 0002 [1] SMP 
CPU 2 
Modules linked in: gfs2 autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6
cpufreq_ondemand dm_multipath sbs sbshc battery acpi_memhotplugd
Pid: 3904, comm: mount.gfs2 Not tainted 2.6.26-rc2 #2
RIP: 0010:[<ffffffff804738e5>]  [<ffffffff804738e5>] _spin_lock_irq+0x6/0x16
RSP: 0018:ffff81011d5efba0  EFLAGS: 00010092
RAX: 0000000000000100 RBX: 0000000000000828 RCX: 0000000000000001
RDX: ffff81021e9d8288 RSI: 0000000000000000 RDI: 000000000000082c
RBP: 0000000000000000 R08: 8000000000000000 R09: ffff81011f5cd220
R10: ffff81021e88a400 R11: ffff81011f5cd220 R12: 0000000000000828
R13: 0000000000000000 R14: ffff81021e8b3c00 R15: 0000000000000002
FS:  00007fbaf75756e0(0000) GS:ffff81011fa876c0(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000000082c CR3: 000000021bc9a000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process mount.gfs2 (pid: 3904, threadinfo ffff81011d5ee000, task ffff81011d997040)
Stack:  ffffffff804736e9 ffff81011f47a460 0000000000000007 ffff81011f47a280
 0000000000000282 0000000000000000 0000000000000000 0000000000000000
 ffffffffa039c639 ffff81011f5d3a40 ffffffff8029b896 ffff81011f5cd220
Call Trace:
 [<ffffffff804736e9>] __down_write_nested+0x12/0x8b
 [<ffffffffa039c639>] :gfs2:__gfs2_log_flush+0x1f/0x43a
 [<ffffffff8029b896>] d_kill+0x2e/0x43
 [<ffffffffa03a5baa>] :gfs2:gfs2_sync_fs+0x1a/0x1e
 [<ffffffff802c46c1>] vfs_quota_off+0x450/0x53e
 [<ffffffffa03a353f>] :gfs2:fill_super+0x0/0x731
 [<ffffffff8028dc0c>] deactivate_super+0x50/0x78
 [<ffffffff8028e2b3>] get_sb_bdev+0x10f/0x145
 [<ffffffffa03a2631>] :gfs2:gfs2_get_sb+0x13/0x2f
 [<ffffffff8028dcc7>] vfs_kern_mount+0x93/0x11b
 [<ffffffff8028dda2>] do_kern_mount+0x43/0xdc
 [<ffffffff802a23d1>] do_new_mount+0x5b/0x94
 [<ffffffff802a25c7>] do_mount+0x1bd/0x1e7
 [<ffffffff8026930a>] __alloc_pages_internal+0xe2/0x3c2
 [<ffffffff802a267b>] sys_mount+0x8a/0xcf
 [<ffffffff8020bee2>] tracesys+0xd5/0xda


Code: dc ff fe 07 48 8b 3c 24 e9 2e 3a dc ff 9c 58 fa ba 00 01 00 00 f0 66 0f c1
17 38 f2 74 06 f3 90 8a 17 eb f6 c3 fa b8 00 01 00 00  
RIP  [<ffffffff804738e5>] _spin_lock_irq+0x6/0x16
 RSP <ffff81011d5efba0>
CR2: 000000000000082c
Comment 1 Robert Peterson 2008-06-05 12:20:14 EDT
The metadata file is just over 20MB: too big to attach.
Comment 2 Robert Peterson 2008-06-09 13:04:44 EDT
Created attachment 308727 [details]
Proposed patch to fix the problem

This started with a not-too-improbable mount failure because the
locking protocol was never set back to its proper "lock_dlm" after the
system was rebooted in the middle of a gfs2_fsck.  That left a
(purposely) invalid locking protocol in the superblock, which caused an
error when the file system was mounted the next time.

When there's an error mounting, vfs calls DQUOT_OFF, which calls
vfs_quota_off which calls gfs2_sync_fs.  Next, gfs2_sync_fs calls
gfs2_log_flush passing s_fs_info.  But due to the error, s_fs_info
had been previously set to NULL, and so we have the kernel oops.

My solution in this patch is to test for the NULL value before passing
it.  I tested this patch and it fixes the problem.  I will post it to
cluster-devel shortly.

I believe the problem was caused due to changes in what the DQUOTA_OFF
macro does in newer kernels.  That's why I couldn't recreate the
problem on a RHEL kernel.  I don't believe this affects RHEL.
Comment 3 Robert Peterson 2008-06-13 12:44:40 EDT
This patch is now posted to the -nmw upstream git tree for GFS2,
so I'm closing this bug as UPSTREAM.

Note You need to log in before you can comment on or make changes to this bug.