426296 – GFS2: kernel panic mounting w/o enough journals

Bug 426296 - GFS2: kernel panic mounting w/o enough journals

Summary: GFS2: kernel panic mounting w/o enough journals

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.2
Hardware:	All
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	rc
Target Release:	---
Assignee:	Robert Peterson
QA Contact:	GFS Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-12-19 20:39 UTC by Robert Peterson
Modified:	2009-05-28 03:38 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-12-21 20:12:10 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Robert Peterson 2007-12-19 20:39:53 UTC

Description of problem:
I was trying to recreate bug #425421 to see if the problem was a
GFS1-only bug or both GFS1 and GFS2.  When I tried to recreate, I
got this:

GFS2: fsid=bobs_roth:roth_lv.1: can't mount journal #1
GFS2: fsid=bobs_roth:roth_lv.1: there are only 1 journals (0 - 0)
Unable to handle kernel NULL pointer dereference at 0000000000000008 RIP: 
 [<ffffffff80102e0d>] list_del+0x1/0x71
PGD 6fc75067 PUD 6e944067 PMD 0 
Oops: 0000 [1] SMP 
last sysfs file: /kernel/dlm/roth_lv/event_done
CPU 1 
Modules linked in: lock_dlm(U) gfs2(U) dlm(U) configfs(U) autofs4(U) hidp(U)
rfcomm(U) l2cap(U) bluetooth(U) sunrpc(U) ipv6(U) dm_multipath(U) video(U)
sbs(U) backlight(U) i2c_ec(U) button(U) battery(U) asus_acpi(U) ac(U)
parport_pc(U) lp(U) parport(U) joydev(U) ide_cd(U) sg(U) i2c_i801(U) cdrom(U)
i2c_core(U) serio_raw(U) tg3(U) pcspkr(U) dm_snapshot(U) dm_zero(U) dm_mirror(U)
dm_mod(U) qla2xxx(U) scsi_transport_fc(U) ata_piix(U) libata(U) sd_mod(U)
scsi_mod(U) ext3(U) jbd(U) ehci_hcd(U) ohci_hcd(U) uhci_hcd(U)
Pid: 4170, comm: mount.gfs2 Not tainted 2.6.18-prep #1
RIP: 0010:[<ffffffff80102e0d>]  [<ffffffff80102e0d>] list_del+0x1/0x71
RSP: 0018:ffff8100684eb9c8  EFLAGS: 00010203
RAX: ffff81007eafe490 RBX: 0000000000000000 RCX: ffffffff803b1a20
RDX: ffff8100684eb9d8 RSI: ffff81006b7fe520 RDI: 0000000000000000
RBP: ffff81006b7fe000 R08: 000000000000000d R09: 0000000000000020
R10: 0000000000000000 R11: 0000000000000000 R12: ffff81007eafe480
R13: 0000000000000001 R14: ffff81007f971800 R15: 0000000000000000
FS:  00002aaaaaab9230(0000) GS:ffff810002f5af40(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000008 CR3: 0000000068ee7000 CR4: 00000000000006e0
Process mount.gfs2 (pid: 4170, threadinfo ffff8100684ea000, task ffff81006f550830)
Stack:  0000000000000000 ffffffff88320355 ffff81007eafe480 ffff81007eafe480
 ffff81006b7fe92c 00000000ffffffea ffff81006b7fe530 ffffffff88317397
 ffff8100688b51e8 0000000000000000 ffff81006b7fe000 ffffffff8830d1e4
Call Trace:
 [<ffffffff88320355>] :gfs2:gfs2_jindex_free+0x6a/0xac
 [<ffffffff88317397>] :gfs2:init_journal+0x524/0x53a
 [<ffffffff8830d1e4>] :gfs2:gfs2_glock_nq+0x1ae/0x1d4
 [<ffffffff8830efe2>] :gfs2:iget_set+0x0/0x14
 [<ffffffff8003e594>] wake_up_bit+0x11/0x22
 [<ffffffff8830eece>] :gfs2:gfs2_inode_lookup+0x1a4/0x1fb
 [<ffffffff8830bd45>] :gfs2:gfs2_glock_put+0x26/0x133
 [<ffffffff8831fb06>] :gfs2:gfs2_jindex_hold+0x54/0x18d
 [<ffffffff8831740c>] :gfs2:init_inodes+0x5f/0x1e5
 [<ffffffff88317dce>] :gfs2:fill_super+0x450/0x5a1
 [<ffffffff8830d691>] :gfs2:gfs2_glock_nq_num+0x3b/0x68
 [<ffffffff8008599b>] test_bdev_super+0x0/0xd
 [<ffffffff8831797e>] :gfs2:fill_super+0x0/0x5a1
 [<ffffffff80086983>] get_sb_bdev+0x10a/0x164
 [<ffffffff88316a5d>] :gfs2:gfs2_get_sb+0x13/0x2f
 [<ffffffff80086320>] vfs_kern_mount+0x93/0x11a
 [<ffffffff800863e9>] do_kern_mount+0x36/0x4d
 [<ffffffff8009b542>] do_mount+0x68c/0x6ff
 [<ffffffff8003e5a5>] autoremove_wake_function+0x0/0x2e
 [<ffffffff801d04fe>] do_sock_read+0xc0/0xcb
 [<ffffffff801d0b69>] sock_aio_read+0x4f/0x5e
 [<ffffffff8023d7ba>] thread_return+0x0/0xdf
 [<ffffffff80064ba5>] __alloc_pages+0x65/0x2ce
 [<ffffffff8009b63f>] sys_mount+0x8a/0xd0
 [<ffffffff80009d2d>] tracesys+0xd5/0xe0


Code: 48 8b 47 08 48 89 fb 48 8b 10 48 39 fa 74 1b 48 89 fe 31 c0 
RIP  [<ffffffff80102e0d>] list_del+0x1/0x71
 RSP <ffff8100684eb9c8>
CR2: 0000000000000008
 <0>Kernel panic - not syncing: Fatal exception
 
Version-Release number of selected component (if applicable):
gfs2-kmod-1.53-4.2

How reproducible:
Always

Steps to Reproduce:
1. service cman start (on all nodes)
2. service clvmd start (on all nodes)
3. mkfs -j1
4. mount -tgfs2 from first node
5. mount -tgfs2 from second node
  
Actual results:
Kernel panic

Expected results:
Error message (no panic)

Additional info:
I backtracked this a little while.  The error is not related to recent
changes for 253990 and such.  Also, the glock that's getting a glock
callback, resulting in the error, is the master directory.

Comment 1 Robert Peterson 2007-12-19 21:28:53 UTC

The previous error might have been due to the fact that the extent_list
is not initialized in the failure case.  The statement should be moved
from its current location to the beginning of function gfs2_jindex_hold:

INIT_LIST_HEAD(&jd->extent_list);

Even with that change, I still get this problem:

GFS2: fsid=bobs_roth:roth_lv.1: can't mount journal #1
GFS2: fsid=bobs_roth:roth_lv.1: there are only 1 journals (0 - 0)
Unable to handle kernel NULL pointer dereference at 0000000000000108 RIP: 
 [<ffffffff8830d684>] :gfs2:gfs2_foreach_page+0x2c/0x149
PGD 6ebdc067 PUD 6f9a2067 PMD 0 
Oops: 0000 [1] SMP 
last sysfs file: /kernel/dlm/roth_lv/event_done
CPU 1 
Modules linked in: lock_dlm(U) gfs2(U) dlm(U) configfs(U) autofs4(U) hidp(U)
rfcomm(U) l2cap(U) bluetooth(U) sunrpc(U) ipv6(U) dm_multipath(U) video(U)
sbs(U) backlight(U) i2c_ec(U) button(U) battery(U) asus_acpi(U) ac(U)
parport_pc(U) lp(U) parport(U) joydev(U) ide_cd(U) i2c_i801(U) i2c_core(U)
tg3(U) serio_raw(U) sg(U) cdrom(U) pcspkr(U) dm_snapshot(U) dm_zero(U)
dm_mirror(U) dm_mod(U) qla2xxx(U) scsi_transport_fc(U) ata_piix(U) libata(U)
sd_mod(U) scsi_mod(U) ext3(U) jbd(U) ehci_hcd(U) ohci_hcd(U) uhci_hcd(U)
Pid: 3090, comm: lock_dlm1 Not tainted 2.6.18-prep #1
RIP: 0010:[<ffffffff8830d684>]  [<ffffffff8830d684>]
:gfs2:gfs2_foreach_page+0x2c/0x149
RSP: 0018:ffff81006a5f5d30  EFLAGS: 00010286
RAX: ffff81006a975000 RBX: ffff81006a530358 RCX: 000000000000000c
RDX: 0000000000000000 RSI: ffffffff8830dfd4 RDI: ffff81006a530358
RBP: 0000000000000000 R08: ffff81006a530358 R09: ffff810000000001
R10: ffff81006a5f5ec0 R11: 0000000000000001 R12: ffff81006a975000
R13: 0000000000000000 R14: ffffffff883225c0 R15: ffff81006a530358
FS:  0000000000000000(0000) GS:ffff810002f5af40(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000108 CR3: 000000006cd65000 CR4: 00000000000006e0
Process lock_dlm1 (pid: 3090, threadinfo ffff81006a5f4000, task ffff81006a973040)
Stack:  ffffffff883225c0 ffffffff8830dfd4 0000000000000003 0000000000000002
 ffff810002c1a380 0000000100000000 0000000000000001 0000000000000086
 0000000000000000 ffff81006fa11898 ffff81006a973040 000000000000006e
Call Trace:
 [<ffffffff8830dfd4>] :gfs2:gfs2_zap_glock_buffers+0x0/0x2b
 [<ffffffff8003e205>] keventd_create_kthread+0x0/0x61
 [<ffffffff8023d7ba>] thread_return+0x0/0xdf
 [<ffffffff8830db19>] :gfs2:gfs2_meta_inval+0x10/0x24
 [<ffffffff8830dce5>] :gfs2:inode_go_inval+0x13/0x51
 [<ffffffff8830c228>] :gfs2:drop_bh+0xbf/0x14c
 [<ffffffff8830c09d>] :gfs2:gfs2_glock_cb+0xca/0x153
 [<ffffffff8837f8e1>] :lock_dlm:gdlm_thread+0x516/0x5cd
 [<ffffffff80025c3c>] default_wake_function+0x0/0xe
 [<ffffffff8003e205>] keventd_create_kthread+0x0/0x61
 [<ffffffff8837f99f>] :lock_dlm:gdlm_thread1+0x0/0xa
 [<ffffffff8003e205>] keventd_create_kthread+0x0/0x61
 [<ffffffff8003e47b>] kthread+0xd4/0x109
 [<ffffffff8000aa51>] child_rip+0xa/0x11
 [<ffffffff8003e205>] keventd_create_kthread+0x0/0x61
 [<ffffffff8003e3a7>] kthread+0x0/0x109
 [<ffffffff8000aa47>] child_rip+0x0/0x11


Code: 48 8b 92 08 01 00 00 48 89 54 24 10 2b 88 98 00 00 00 4c 8b 
RIP  [<ffffffff8830d684>] :gfs2:gfs2_foreach_page+0x2c/0x149
 RSP <ffff81006a5f5d30>
CR2: 0000000000000108
 <0>Kernel panic - not syncing: Fatal exception

Comment 2 Robert Peterson 2007-12-19 21:49:52 UTC

This works properly in the gfs2-kmod-1.53-7 kernel module, so it
probably is due to recent changes.

Comment 3 Robert Peterson 2007-12-21 20:12:10 UTC

I verified this works properly with the -61 kernel plus all my patches
for bug #253990.  So it's definitely something to do with the latest
round of oom patches.  I'm closing this one out on the assumption that
the oom fix will not cause this to break when the fix for bug #349271
eventually ships.

Note You need to log in before you can comment on or make changes to this bug.