450156 – GFS2: kernel panic mounting volume

Bug 450156 - GFS2: kernel panic mounting volume

Summary: GFS2: kernel panic mounting volume

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	rawhide
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Robert Peterson
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-06-05 16:02 UTC by Robert Peterson
Modified:	2008-06-13 16:44 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2008-06-13 16:44:40 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Proposed patch to fix the problem (468 bytes, patch) 2008-06-09 17:04 UTC, Robert Peterson	no flags	Details \| Diff
View All

Description Robert Peterson 2008-06-05 16:02:41 UTC

Description of problem:
Eric Sandeen had a 2TB GFS2 volume (/dev/sdc) on system
east-10.lab.bos.redhat.com.

Apparently the system was rebooted in the middle of a gfs2_fsck.
When the system came back, he tried to mount the volume, and it
panicked the kernel.  Eric says he did not use any special mount
parameters.

Version-Release number of selected component (if applicable):
RHEL5 running the 2.6.26-rc2 kernel (Linus's kernel) which is
pretty recent wrt the mounting code (ops_fstype.c, mount.c and such).

How reproducible:
Unknown.  I tried editing the superblock on one of my gfs2 volumes
so it looked the same, but I got an error message rather than a
kernel panic.

Steps to Reproduce:
1.
2.
3.
  
Actual results:
Kernel panic

Expected results:
Error message

Additional info:
GFS2 (built May 30 2008 16:40:57) installed
BUG: unable to handle kernel NULL pointer dereference at 000000000000082c
IP: [<ffffffff804738e5>] _spin_lock_irq+0x6/0x16
PGD 11e829067 PUD 11dcdf067 PMD 0 
Oops: 0002 [1] SMP 
CPU 2 
Modules linked in: gfs2 autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6
cpufreq_ondemand dm_multipath sbs sbshc battery acpi_memhotplugd
Pid: 3904, comm: mount.gfs2 Not tainted 2.6.26-rc2 #2
RIP: 0010:[<ffffffff804738e5>]  [<ffffffff804738e5>] _spin_lock_irq+0x6/0x16
RSP: 0018:ffff81011d5efba0  EFLAGS: 00010092
RAX: 0000000000000100 RBX: 0000000000000828 RCX: 0000000000000001
RDX: ffff81021e9d8288 RSI: 0000000000000000 RDI: 000000000000082c
RBP: 0000000000000000 R08: 8000000000000000 R09: ffff81011f5cd220
R10: ffff81021e88a400 R11: ffff81011f5cd220 R12: 0000000000000828
R13: 0000000000000000 R14: ffff81021e8b3c00 R15: 0000000000000002
FS:  00007fbaf75756e0(0000) GS:ffff81011fa876c0(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000000082c CR3: 000000021bc9a000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process mount.gfs2 (pid: 3904, threadinfo ffff81011d5ee000, task ffff81011d997040)
Stack:  ffffffff804736e9 ffff81011f47a460 0000000000000007 ffff81011f47a280
 0000000000000282 0000000000000000 0000000000000000 0000000000000000
 ffffffffa039c639 ffff81011f5d3a40 ffffffff8029b896 ffff81011f5cd220
Call Trace:
 [<ffffffff804736e9>] __down_write_nested+0x12/0x8b
 [<ffffffffa039c639>] :gfs2:__gfs2_log_flush+0x1f/0x43a
 [<ffffffff8029b896>] d_kill+0x2e/0x43
 [<ffffffffa03a5baa>] :gfs2:gfs2_sync_fs+0x1a/0x1e
 [<ffffffff802c46c1>] vfs_quota_off+0x450/0x53e
 [<ffffffffa03a353f>] :gfs2:fill_super+0x0/0x731
 [<ffffffff8028dc0c>] deactivate_super+0x50/0x78
 [<ffffffff8028e2b3>] get_sb_bdev+0x10f/0x145
 [<ffffffffa03a2631>] :gfs2:gfs2_get_sb+0x13/0x2f
 [<ffffffff8028dcc7>] vfs_kern_mount+0x93/0x11b
 [<ffffffff8028dda2>] do_kern_mount+0x43/0xdc
 [<ffffffff802a23d1>] do_new_mount+0x5b/0x94
 [<ffffffff802a25c7>] do_mount+0x1bd/0x1e7
 [<ffffffff8026930a>] __alloc_pages_internal+0xe2/0x3c2
 [<ffffffff802a267b>] sys_mount+0x8a/0xcf
 [<ffffffff8020bee2>] tracesys+0xd5/0xda


Code: dc ff fe 07 48 8b 3c 24 e9 2e 3a dc ff 9c 58 fa ba 00 01 00 00 f0 66 0f c1
17 38 f2 74 06 f3 90 8a 17 eb f6 c3 fa b8 00 01 00 00  
RIP  [<ffffffff804738e5>] _spin_lock_irq+0x6/0x16
 RSP <ffff81011d5efba0>
CR2: 000000000000082c

Comment 1 Robert Peterson 2008-06-05 16:20:14 UTC

The metadata file is just over 20MB: too big to attach.

Comment 2 Robert Peterson 2008-06-09 17:04:44 UTC

Created attachment 308727 [details]
Proposed patch to fix the problem

This started with a not-too-improbable mount failure because the
locking protocol was never set back to its proper "lock_dlm" after the
system was rebooted in the middle of a gfs2_fsck.  That left a
(purposely) invalid locking protocol in the superblock, which caused an
error when the file system was mounted the next time.

When there's an error mounting, vfs calls DQUOT_OFF, which calls
vfs_quota_off which calls gfs2_sync_fs.  Next, gfs2_sync_fs calls
gfs2_log_flush passing s_fs_info.  But due to the error, s_fs_info
had been previously set to NULL, and so we have the kernel oops.

My solution in this patch is to test for the NULL value before passing
it.  I tested this patch and it fixes the problem.  I will post it to
cluster-devel shortly.

I believe the problem was caused due to changes in what the DQUOTA_OFF
macro does in newer kernels.  That's why I couldn't recreate the
problem on a RHEL kernel.  I don't believe this affects RHEL.

Comment 3 Robert Peterson 2008-06-13 16:44:40 UTC

This patch is now posted to the -nmw upstream git tree for GFS2,
so I'm closing this bug as UPSTREAM.

Note You need to log in before you can comment on or make changes to this bug.