Description of problem: I was trying to recreate bug #425421 to see if the problem was a GFS1-only bug or both GFS1 and GFS2. When I tried to recreate, I got this: GFS2: fsid=bobs_roth:roth_lv.1: can't mount journal #1 GFS2: fsid=bobs_roth:roth_lv.1: there are only 1 journals (0 - 0) Unable to handle kernel NULL pointer dereference at 0000000000000008 RIP: [<ffffffff80102e0d>] list_del+0x1/0x71 PGD 6fc75067 PUD 6e944067 PMD 0 Oops: 0000 [1] SMP last sysfs file: /kernel/dlm/roth_lv/event_done CPU 1 Modules linked in: lock_dlm(U) gfs2(U) dlm(U) configfs(U) autofs4(U) hidp(U) rfcomm(U) l2cap(U) bluetooth(U) sunrpc(U) ipv6(U) dm_multipath(U) video(U) sbs(U) backlight(U) i2c_ec(U) button(U) battery(U) asus_acpi(U) ac(U) parport_pc(U) lp(U) parport(U) joydev(U) ide_cd(U) sg(U) i2c_i801(U) cdrom(U) i2c_core(U) serio_raw(U) tg3(U) pcspkr(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) dm_mod(U) qla2xxx(U) scsi_transport_fc(U) ata_piix(U) libata(U) sd_mod(U) scsi_mod(U) ext3(U) jbd(U) ehci_hcd(U) ohci_hcd(U) uhci_hcd(U) Pid: 4170, comm: mount.gfs2 Not tainted 2.6.18-prep #1 RIP: 0010:[<ffffffff80102e0d>] [<ffffffff80102e0d>] list_del+0x1/0x71 RSP: 0018:ffff8100684eb9c8 EFLAGS: 00010203 RAX: ffff81007eafe490 RBX: 0000000000000000 RCX: ffffffff803b1a20 RDX: ffff8100684eb9d8 RSI: ffff81006b7fe520 RDI: 0000000000000000 RBP: ffff81006b7fe000 R08: 000000000000000d R09: 0000000000000020 R10: 0000000000000000 R11: 0000000000000000 R12: ffff81007eafe480 R13: 0000000000000001 R14: ffff81007f971800 R15: 0000000000000000 FS: 00002aaaaaab9230(0000) GS:ffff810002f5af40(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000008 CR3: 0000000068ee7000 CR4: 00000000000006e0 Process mount.gfs2 (pid: 4170, threadinfo ffff8100684ea000, task ffff81006f550830) Stack: 0000000000000000 ffffffff88320355 ffff81007eafe480 ffff81007eafe480 ffff81006b7fe92c 00000000ffffffea ffff81006b7fe530 ffffffff88317397 ffff8100688b51e8 0000000000000000 ffff81006b7fe000 ffffffff8830d1e4 Call Trace: [<ffffffff88320355>] :gfs2:gfs2_jindex_free+0x6a/0xac [<ffffffff88317397>] :gfs2:init_journal+0x524/0x53a [<ffffffff8830d1e4>] :gfs2:gfs2_glock_nq+0x1ae/0x1d4 [<ffffffff8830efe2>] :gfs2:iget_set+0x0/0x14 [<ffffffff8003e594>] wake_up_bit+0x11/0x22 [<ffffffff8830eece>] :gfs2:gfs2_inode_lookup+0x1a4/0x1fb [<ffffffff8830bd45>] :gfs2:gfs2_glock_put+0x26/0x133 [<ffffffff8831fb06>] :gfs2:gfs2_jindex_hold+0x54/0x18d [<ffffffff8831740c>] :gfs2:init_inodes+0x5f/0x1e5 [<ffffffff88317dce>] :gfs2:fill_super+0x450/0x5a1 [<ffffffff8830d691>] :gfs2:gfs2_glock_nq_num+0x3b/0x68 [<ffffffff8008599b>] test_bdev_super+0x0/0xd [<ffffffff8831797e>] :gfs2:fill_super+0x0/0x5a1 [<ffffffff80086983>] get_sb_bdev+0x10a/0x164 [<ffffffff88316a5d>] :gfs2:gfs2_get_sb+0x13/0x2f [<ffffffff80086320>] vfs_kern_mount+0x93/0x11a [<ffffffff800863e9>] do_kern_mount+0x36/0x4d [<ffffffff8009b542>] do_mount+0x68c/0x6ff [<ffffffff8003e5a5>] autoremove_wake_function+0x0/0x2e [<ffffffff801d04fe>] do_sock_read+0xc0/0xcb [<ffffffff801d0b69>] sock_aio_read+0x4f/0x5e [<ffffffff8023d7ba>] thread_return+0x0/0xdf [<ffffffff80064ba5>] __alloc_pages+0x65/0x2ce [<ffffffff8009b63f>] sys_mount+0x8a/0xd0 [<ffffffff80009d2d>] tracesys+0xd5/0xe0 Code: 48 8b 47 08 48 89 fb 48 8b 10 48 39 fa 74 1b 48 89 fe 31 c0 RIP [<ffffffff80102e0d>] list_del+0x1/0x71 RSP <ffff8100684eb9c8> CR2: 0000000000000008 <0>Kernel panic - not syncing: Fatal exception Version-Release number of selected component (if applicable): gfs2-kmod-1.53-4.2 How reproducible: Always Steps to Reproduce: 1. service cman start (on all nodes) 2. service clvmd start (on all nodes) 3. mkfs -j1 4. mount -tgfs2 from first node 5. mount -tgfs2 from second node Actual results: Kernel panic Expected results: Error message (no panic) Additional info: I backtracked this a little while. The error is not related to recent changes for 253990 and such. Also, the glock that's getting a glock callback, resulting in the error, is the master directory.
The previous error might have been due to the fact that the extent_list is not initialized in the failure case. The statement should be moved from its current location to the beginning of function gfs2_jindex_hold: INIT_LIST_HEAD(&jd->extent_list); Even with that change, I still get this problem: GFS2: fsid=bobs_roth:roth_lv.1: can't mount journal #1 GFS2: fsid=bobs_roth:roth_lv.1: there are only 1 journals (0 - 0) Unable to handle kernel NULL pointer dereference at 0000000000000108 RIP: [<ffffffff8830d684>] :gfs2:gfs2_foreach_page+0x2c/0x149 PGD 6ebdc067 PUD 6f9a2067 PMD 0 Oops: 0000 [1] SMP last sysfs file: /kernel/dlm/roth_lv/event_done CPU 1 Modules linked in: lock_dlm(U) gfs2(U) dlm(U) configfs(U) autofs4(U) hidp(U) rfcomm(U) l2cap(U) bluetooth(U) sunrpc(U) ipv6(U) dm_multipath(U) video(U) sbs(U) backlight(U) i2c_ec(U) button(U) battery(U) asus_acpi(U) ac(U) parport_pc(U) lp(U) parport(U) joydev(U) ide_cd(U) i2c_i801(U) i2c_core(U) tg3(U) serio_raw(U) sg(U) cdrom(U) pcspkr(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) dm_mod(U) qla2xxx(U) scsi_transport_fc(U) ata_piix(U) libata(U) sd_mod(U) scsi_mod(U) ext3(U) jbd(U) ehci_hcd(U) ohci_hcd(U) uhci_hcd(U) Pid: 3090, comm: lock_dlm1 Not tainted 2.6.18-prep #1 RIP: 0010:[<ffffffff8830d684>] [<ffffffff8830d684>] :gfs2:gfs2_foreach_page+0x2c/0x149 RSP: 0018:ffff81006a5f5d30 EFLAGS: 00010286 RAX: ffff81006a975000 RBX: ffff81006a530358 RCX: 000000000000000c RDX: 0000000000000000 RSI: ffffffff8830dfd4 RDI: ffff81006a530358 RBP: 0000000000000000 R08: ffff81006a530358 R09: ffff810000000001 R10: ffff81006a5f5ec0 R11: 0000000000000001 R12: ffff81006a975000 R13: 0000000000000000 R14: ffffffff883225c0 R15: ffff81006a530358 FS: 0000000000000000(0000) GS:ffff810002f5af40(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000108 CR3: 000000006cd65000 CR4: 00000000000006e0 Process lock_dlm1 (pid: 3090, threadinfo ffff81006a5f4000, task ffff81006a973040) Stack: ffffffff883225c0 ffffffff8830dfd4 0000000000000003 0000000000000002 ffff810002c1a380 0000000100000000 0000000000000001 0000000000000086 0000000000000000 ffff81006fa11898 ffff81006a973040 000000000000006e Call Trace: [<ffffffff8830dfd4>] :gfs2:gfs2_zap_glock_buffers+0x0/0x2b [<ffffffff8003e205>] keventd_create_kthread+0x0/0x61 [<ffffffff8023d7ba>] thread_return+0x0/0xdf [<ffffffff8830db19>] :gfs2:gfs2_meta_inval+0x10/0x24 [<ffffffff8830dce5>] :gfs2:inode_go_inval+0x13/0x51 [<ffffffff8830c228>] :gfs2:drop_bh+0xbf/0x14c [<ffffffff8830c09d>] :gfs2:gfs2_glock_cb+0xca/0x153 [<ffffffff8837f8e1>] :lock_dlm:gdlm_thread+0x516/0x5cd [<ffffffff80025c3c>] default_wake_function+0x0/0xe [<ffffffff8003e205>] keventd_create_kthread+0x0/0x61 [<ffffffff8837f99f>] :lock_dlm:gdlm_thread1+0x0/0xa [<ffffffff8003e205>] keventd_create_kthread+0x0/0x61 [<ffffffff8003e47b>] kthread+0xd4/0x109 [<ffffffff8000aa51>] child_rip+0xa/0x11 [<ffffffff8003e205>] keventd_create_kthread+0x0/0x61 [<ffffffff8003e3a7>] kthread+0x0/0x109 [<ffffffff8000aa47>] child_rip+0x0/0x11 Code: 48 8b 92 08 01 00 00 48 89 54 24 10 2b 88 98 00 00 00 4c 8b RIP [<ffffffff8830d684>] :gfs2:gfs2_foreach_page+0x2c/0x149 RSP <ffff81006a5f5d30> CR2: 0000000000000108 <0>Kernel panic - not syncing: Fatal exception
This works properly in the gfs2-kmod-1.53-7 kernel module, so it probably is due to recent changes.
I verified this works properly with the -61 kernel plus all my patches for bug #253990. So it's definitely something to do with the latest round of oom patches. I'm closing this one out on the assumption that the oom fix will not cause this to break when the fix for bug #349271 eventually ships.