This is on the latest upstream gfs2 tree (should I be filing these bz's elsewhere?). I rm -rf'ed a kernel tree on my gfs2 fs and du -h'ed the fs on another node and I got this panic on the box that was rm'ing. I just wanted it to withdraw like in 231910 :(. Apr 5 17:34:34 rh5cluster1 kernel: BUG: unable to handle kernel paging request at virtual address 6b6b6bcb Apr 5 17:34:34 rh5cluster1 kernel: printing eip: Apr 5 17:34:34 rh5cluster1 kernel: c045d6eb Apr 5 17:34:34 rh5cluster1 kernel: *pde = 00000000 Apr 5 17:34:34 rh5cluster1 kernel: Oops: 0000 [#1] Apr 5 17:34:34 rh5cluster1 kernel: SMP Apr 5 17:34:34 rh5cluster1 kernel: Modules linked in: autofs4 hidp rfcomm l2cap bluetooth lock_dlm gfs2 dlm configfs sunrpc nf_conntrack_netbios_ns nf_conntrack nfnetlink iptable_filter ip_tables ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 sg dm_multipath video sbs i2c_ec button dock battery asus_acpi ac parport_pc lp parport floppy i2c_piix4 i2c_core cfi_probe gen_probe scb2_flash ata_generic pcspkr mtdcore chipreg map_funcs pata_serverworks libata tg3 serio_raw rtc_cmos rtc_core rtc_lib dm_snapshot dm_zero dm_mirror dm_mod qla2xxx scsi_transport_fc sd_mod scsi_mod ext3 jbd mbcache ehci_hcd ohci_hcd uhci_hcd Apr 5 17:34:34 rh5cluster1 kernel: CPU: 0 Apr 5 17:34:34 rh5cluster1 kernel: EIP: 0060:[<c045d6eb>] Not tainted VLI Apr 5 17:34:34 rh5cluster1 kernel: EFLAGS: 00010246 (2.6.21-rc5 #2) Apr 5 17:34:34 rh5cluster1 kernel: EIP is at __filemap_fdatawrite_range+0x29/0x67 Apr 5 17:34:34 rh5cluster1 kernel: eax: 00000001 ebx: 0000000a ecx: 00000000 edx: 00000000 Apr 5 17:34:34 rh5cluster1 kernel: esi: 6b6b6b6b edi: ed0edea4 ebp: ed0edeb0 esp: ed0ede70 Apr 5 17:34:34 rh5cluster1 kernel: ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0068 Apr 5 17:34:34 rh5cluster1 kernel: Process lock_dlm1 (pid: 3147, ti=ed0ed000 task=ed08e530 task.ti=ed0ed000) Apr 5 17:34:34 rh5cluster1 kernel: Stack: 00000000 00000000 00000000 00000000 00000001 00000000 00000000 00000000 Apr 5 17:34:34 rh5cluster1 kernel: 00000000 00000000 00000000 00000000 00000000 e8d0f96c e8cfe134 ed0d5000 Apr 5 17:34:34 rh5cluster1 kernel: ed0edec4 c045d92a ffffffff 7fffffff 00000001 ed0eded4 f8c49b00 e8cfe134 Apr 5 17:34:34 rh5cluster1 kernel: Call Trace: Apr 5 17:34:34 rh5cluster1 kernel: [<c0405ea6>] show_trace_log_lvl+0x1a/0x2f Apr 5 17:34:34 rh5cluster1 kernel: [<c0405f56>] show_stack_log_lvl+0x9b/0xa3 Apr 5 17:34:34 rh5cluster1 kernel: [<c0406116>] show_registers+0x1b8/0x289 Apr 5 17:34:34 rh5cluster1 kernel: [<c0406300>] die+0x119/0x22e Apr 5 17:34:34 rh5cluster1 kernel: [<c062090c>] do_page_fault+0x4e5/0x5bb Apr 5 17:34:34 rh5cluster1 kernel: [<c061f044>] error_code+0x7c/0x84 Apr 5 17:34:34 rh5cluster1 kernel: [<c045d92a>] filemap_fdatawrite+0x26/0x28 Apr 5 17:34:34 rh5cluster1 kernel: [<f8c49b00>] inode_go_sync+0x4a/0x91 [gfs2] Apr 5 17:34:34 rh5cluster1 kernel: [<f8c49b80>] inode_go_xmote_th+0x1e/0x21 [gfs2] Apr 5 17:34:34 rh5cluster1 kernel: [<f8c485db>] gfs2_glock_xmote_th+0x2f/0x167 [gfs2] Apr 5 17:34:34 rh5cluster1 kernel: [<f8c488e1>] run_queue+0x1ce/0x36d [gfs2] Apr 5 17:34:34 rh5cluster1 kernel: [<f8c48aaa>] blocking_cb+0x2a/0x3c [gfs2] Apr 5 17:34:34 rh5cluster1 kernel: [<f8c48af1>] gfs2_glock_cb+0x35/0x11a [gfs2] Apr 5 17:34:34 rh5cluster1 kernel: [<f8c1a9cd>] gdlm_thread+0x5f5/0x65d [lock_dlm] Apr 5 17:34:34 rh5cluster1 kernel: [<c0439093>] kthread+0xb3/0xdc Apr 5 17:34:34 rh5cluster1 kernel: [<c0405b4f>] kernel_thread_helper+0x7/0x10 Apr 5 17:34:34 rh5cluster1 kernel: ======================= Apr 5 17:34:34 rh5cluster1 kernel: Code: 5d c3 55 89 e5 57 56 89 c6 53 bb 0a 00 00 00 fc 83 ec 34 31 c0 89 4d c4 8d 7d cc 89 d9 89 55 c0 f3 ab 8b 45 10 8b 55 c4 89 45 d0 <8b> 46 60 89 55 e4 8b 55 0c 01 c0 89 45 d8 8b 45 c0 89 55 ec 31 Apr 5 17:34:34 rh5cluster1 kernel: EIP: [<c045d6eb>] __filemap_fdatawrite_range+0x29/0x67 SS:ESP 0068:ed0ede70
Ok so this is what I think is happening. We are unlinking a file, and then somebody is waiting for a lock on that inode. So in doing that unlink we destroy the inode, but because we had a holder waiting for this inode, when it the lock is granted it first goes to flush the data but it failes because the inode has been poisened, so whenever we go to reference the mapping, which is the poison value, we die. Now to figure out who the hell is trying to grab a lock on the inode after we've deleted it.
I belive that this is fixed in Ben's patch of 2nd May: [GFS2] flush the glock completely in inode_go_sync Ben/Josef, if you agree, then please close this one.
This does work with the fix fix for bz #231910 *** This bug has been marked as a duplicate of 231910 ***