235462 – GFS2: panic while doing an rm -rf and a du -h on seperate nodes

Bug 235462 - GFS2: panic while doing an rm -rf and a du -h on seperate nodes

Summary: GFS2: panic while doing an rm -rf and a du -h on seperate nodes

Keywords:
Status:	CLOSED DUPLICATE of bug 231910
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.1
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Steve Whitehouse
QA Contact:	Dean Jansa
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-04-05 21:48 UTC by Josef Bacik
Modified:	2009-05-28 03:34 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-07-31 22:33:55 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Josef Bacik 2007-04-05 21:48:49 UTC

This is on the latest upstream gfs2 tree (should I be filing these bz's 
elsewhere?).

I rm -rf'ed a kernel tree on my gfs2 fs and du -h'ed the fs on another node 
and I got this panic on the box that was rm'ing.  I just wanted it to withdraw 
like in 231910 :(.

Apr  5 17:34:34 rh5cluster1 kernel: BUG: unable to handle kernel paging 
request at virtual address 6b6b6bcb
Apr  5 17:34:34 rh5cluster1 kernel:  printing eip:
Apr  5 17:34:34 rh5cluster1 kernel: c045d6eb
Apr  5 17:34:34 rh5cluster1 kernel: *pde = 00000000
Apr  5 17:34:34 rh5cluster1 kernel: Oops: 0000 [#1]
Apr  5 17:34:34 rh5cluster1 kernel: SMP
Apr  5 17:34:34 rh5cluster1 kernel: Modules linked in: autofs4 hidp rfcomm 
l2cap bluetooth lock_dlm gfs2 dlm configfs sunrpc nf_conntrack_netbios_ns 
nf_conntrack nfnetlink iptable_filter ip_tables ip6t_REJECT xt_tcpudp 
ip6table_filter ip6_tables x_tables ipv6 sg dm_multipath video sbs i2c_ec 
button dock battery asus_acpi ac parport_pc lp parport floppy i2c_piix4 
i2c_core cfi_probe gen_probe scb2_flash ata_generic pcspkr mtdcore chipreg 
map_funcs pata_serverworks libata tg3 serio_raw rtc_cmos rtc_core rtc_lib 
dm_snapshot dm_zero dm_mirror dm_mod qla2xxx scsi_transport_fc sd_mod scsi_mod 
ext3 jbd mbcache ehci_hcd ohci_hcd uhci_hcd
Apr  5 17:34:34 rh5cluster1 kernel: CPU:    0
Apr  5 17:34:34 rh5cluster1 kernel: EIP:    0060:[<c045d6eb>]    Not tainted 
VLI
Apr  5 17:34:34 rh5cluster1 kernel: EFLAGS: 00010246   (2.6.21-rc5 #2)
Apr  5 17:34:34 rh5cluster1 kernel: EIP is at 
__filemap_fdatawrite_range+0x29/0x67
Apr  5 17:34:34 rh5cluster1 kernel: eax: 00000001   ebx: 0000000a   ecx: 
00000000   edx: 00000000
Apr  5 17:34:34 rh5cluster1 kernel: esi: 6b6b6b6b   edi: ed0edea4   ebp: 
ed0edeb0   esp: ed0ede70
Apr  5 17:34:34 rh5cluster1 kernel: ds: 007b   es: 007b   fs: 00d8  gs: 0000  
ss: 0068
Apr  5 17:34:34 rh5cluster1 kernel: Process lock_dlm1 (pid: 3147, ti=ed0ed000 
task=ed08e530 task.ti=ed0ed000)
Apr  5 17:34:34 rh5cluster1 kernel: Stack: 00000000 00000000 00000000 00000000 
00000001 00000000 00000000 00000000
Apr  5 17:34:34 rh5cluster1 kernel:        00000000 00000000 00000000 00000000 
00000000 e8d0f96c e8cfe134 ed0d5000
Apr  5 17:34:34 rh5cluster1 kernel:        ed0edec4 c045d92a ffffffff 7fffffff 
00000001 ed0eded4 f8c49b00 e8cfe134
Apr  5 17:34:34 rh5cluster1 kernel: Call Trace:
Apr  5 17:34:34 rh5cluster1 kernel:  [<c0405ea6>] show_trace_log_lvl+0x1a/0x2f
Apr  5 17:34:34 rh5cluster1 kernel:  [<c0405f56>] show_stack_log_lvl+0x9b/0xa3
Apr  5 17:34:34 rh5cluster1 kernel:  [<c0406116>] show_registers+0x1b8/0x289
Apr  5 17:34:34 rh5cluster1 kernel:  [<c0406300>] die+0x119/0x22e
Apr  5 17:34:34 rh5cluster1 kernel:  [<c062090c>] do_page_fault+0x4e5/0x5bb
Apr  5 17:34:34 rh5cluster1 kernel:  [<c061f044>] error_code+0x7c/0x84
Apr  5 17:34:34 rh5cluster1 kernel:  [<c045d92a>] filemap_fdatawrite+0x26/0x28
Apr  5 17:34:34 rh5cluster1 kernel:  [<f8c49b00>] inode_go_sync+0x4a/0x91 
[gfs2]
Apr  5 17:34:34 rh5cluster1 kernel:  [<f8c49b80>] inode_go_xmote_th+0x1e/0x21 
[gfs2]
Apr  5 17:34:34 rh5cluster1 kernel:  [<f8c485db>] 
gfs2_glock_xmote_th+0x2f/0x167 [gfs2]
Apr  5 17:34:34 rh5cluster1 kernel:  [<f8c488e1>] run_queue+0x1ce/0x36d [gfs2]
Apr  5 17:34:34 rh5cluster1 kernel:  [<f8c48aaa>] blocking_cb+0x2a/0x3c [gfs2]
Apr  5 17:34:34 rh5cluster1 kernel:  [<f8c48af1>] gfs2_glock_cb+0x35/0x11a 
[gfs2]
Apr  5 17:34:34 rh5cluster1 kernel:  [<f8c1a9cd>] gdlm_thread+0x5f5/0x65d 
[lock_dlm]
Apr  5 17:34:34 rh5cluster1 kernel:  [<c0439093>] kthread+0xb3/0xdc
Apr  5 17:34:34 rh5cluster1 kernel:  [<c0405b4f>] 
kernel_thread_helper+0x7/0x10
Apr  5 17:34:34 rh5cluster1 kernel:  =======================
Apr  5 17:34:34 rh5cluster1 kernel: Code: 5d c3 55 89 e5 57 56 89 c6 53 bb 0a 
00 00 00 fc 83 ec 34 31 c0 89 4d c4 8d 7d cc 89 d9 89 55 c0 f3 ab 8b 45 10 8b 
55 c4 89 45 d0 <8b> 46 60 89 55 e4 8b 55 0c 01 c0 89 45 d8 8b 45 c0 89 55 ec 
31
Apr  5 17:34:34 rh5cluster1 kernel: EIP: [<c045d6eb>] 
__filemap_fdatawrite_range+0x29/0x67 SS:ESP 0068:ed0ede70

Comment 1 Josef Bacik 2007-04-13 20:22:24 UTC

Ok so this is what I think is happening.  We are unlinking a file, and then
somebody is waiting for a lock on that inode.  So in doing that unlink we
destroy the inode, but because we had a holder waiting for this inode, when it
the lock is granted it first goes to flush the data but it failes because the
inode has been poisened, so whenever we go to reference the mapping, which is
the poison value, we die.  Now to figure out who the hell is trying to grab a
lock on the inode after we've deleted it.

Comment 2 Steve Whitehouse 2007-06-05 13:28:19 UTC

I belive that this is fixed in Ben's patch of 2nd May: [GFS2] flush the glock
completely in inode_go_sync

Ben/Josef, if you agree, then please close this one.

Comment 4 Ben Marzinski 2007-07-31 22:33:55 UTC

This does work with the fix fix for bz #231910

*** This bug has been marked as a duplicate of 231910 ***

Note You need to log in before you can comment on or make changes to this bug.