Bug 146805

Summary: Oops in gfs_trans_add_quota after shooting a gulm slave and client
Product: [Retired] Red Hat Cluster Suite Reporter: Corey Marthaler <cmarthal>
Component: gfsAssignee: Ben Marzinski <bmarzins>
Status: CLOSED WORKSFORME QA Contact: GFS Bugs <gfs-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 4CC: kanderso, kpreslan
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-01-04 20:07:10 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Corey Marthaler 2005-02-01 18:25:25 UTC
Description of problem:
I hit this running revolver, I had a healthy gulm cluster runing
genesis and accordion on every node and then shot a slave (morph-03)
and a client (morph-04). Immediately following this morph-05 Oops'ed:

Feb  1 12:13:02 morph-05 kernel: Unable to handle kernel NULL pointer
dereference at virtual address 0000000c
Feb  1 12:13:02 morph-05 kernel:  printing eip:
Feb  1 12:13:02 morph-05 kernel: f8b69169
Feb  1 12:13:02 morph-05 kernel: *pde = 33d1d001
Feb  1 12:13:02 morph-05 kernel: Oops: 0000 [#1]
Feb  1 12:13:02 morph-05 kernel: SMP
Feb  1 12:13:02 morph-05 kernel: Modules linked in: gnbd(U)
lock_nolock(U) gfs(U) lock_gulm(U) lock_harness(U)
GFS: fsid=morph-cluster:gfs0.4: warning: assertion "change" failed
Feb  1 12:13:02 morph-05 kernel: GFS: fsid=morph-cluster:gfs0.4:  
function = gfs_trans_add_quota
Feb  1 12:13:02 morph-05 kernel: GFS: fsid=morph-cluster:gfs0.4:  
file = /usr/src/build/512195-i686/BUILD/smp
/src/gfs/trans.c, line = 365
Feb  1 12:13:02 morph-05 kernel: GFS: fsid=morph-cluster:gfs0.4:  
time = 1107281581
Feb  1 12:13:02 morph-05 kernel:  lpfc md5 ipv6 parport_pc lp parport
autofs4 sunrpc e1000 microcode dm_mod uh
ci_hcd ehci_hcd button battery ac ext3 jbd qla2300 qla2xxx
scsi_transport_fc sd_mod scsi_mod
Feb  1 12:13:02 morph-05 kernel: CPU:    1
Feb  1 12:13:02 morph-05 kernel: EIP:    0060:[<f8b69169>]    Tainted:
GF     VLI
Feb  1 12:13:02 morph-05 kernel: EFLAGS: 00010213   (2.6.9-5.ELsmp)
Feb  1 12:13:02 morph-05 kernel: EIP is at gfs_quota_get+0x51/0x1d3 [gfs]
Feb  1 12:13:02 morph-05 kernel: eax: 00000000   ebx: f8b11684   ecx:
00000000   edx: 00000000
Feb  1 12:13:02 morph-05 kernel: esi: 00000001   edi: 00000000   ebp:
f3926600   esp: f3c98d4c
Feb  1 12:13:02 morph-05 kernel: ds: 007b   es: 007b   ss: 0068
Feb  1 12:13:02 morph-05 kernel: Process genesis (pid: 3923,
threadinfo=f3c98000 task=f3d94130)
Feb  1 12:13:02 morph-05 kernel: Stack: 00000000 00000000 00000001
f8aed000 f8aed248 00000001 f8aed000 f392660
0
Feb  1 12:13:02 morph-05 kernel:        f8b6a27c 00000001 f3926604
00000000 00000000 f3c77e18 00000000 f3c77e1
8
Feb  1 12:13:02 morph-05 kernel:        f3926600 f3c77e18 f8b6a396
00000000 f8aed000 00000000 f3c77e18 f8aed00
0
Feb  1 12:13:02 morph-05 kernel: Call Trace:
Feb  1 12:13:02 morph-05 kernel:  [<f8b6a27c>]
gfs_quota_hold_m+0x8f/0x131 [gfs]
Feb  1 12:13:02 morph-05 kernel:  [<f8b6a396>]
gfs_quota_lock_m+0x1f/0xc9 [gfs]
Feb  1 12:13:02 morph-05 kernel:  [<f8b53802>]
inode_init_and_link+0x110/0x388 [gfs]
Feb  1 12:13:02 morph-05 kernel:  [<f8b4ffc4>]
gfs_glock_nq_init+0x13/0x26 [gfs]
Feb  1 12:13:02 morph-05 kernel:  [<f8b50015>]
gfs_glock_nq_num+0x2e/0x71 [gfs]
Feb  1 12:13:02 morph-05 kernel:  [<f8b53c29>] gfs_createi+0x1af/0x1f1
[gfs]
Feb  1 12:13:02 morph-05 kernel:  [<f8b4eabd>] lock_on_glock+0x64/0x6a
[gfs]
Feb  1 12:13:02 morph-05 kernel:  [<f8b66345>] gfs_mkdir+0x60/0x2f8 [gfs]
Feb  1 12:13:02 morph-05 kernel:  [<c016071e>] permission+0x41/0x46
Feb  1 12:13:02 morph-05 kernel:  [<c0162972>] vfs_mkdir+0xab/0xe1
Feb  1 12:13:02 morph-05 kernel:  [<c0162a2d>] sys_mkdir+0x85/0xde
Feb  1 12:13:02 morph-05 kernel:  [<c01b76a4>]
atomic_dec_and_lock+0x20/0x40
Feb  1 12:13:02 morph-05 kernel:  [<c01691b5>] dput+0xa1/0x19b
Feb  1 12:13:02 morph-05 kernel:  [<c0156263>] __fput+0xda/0x100
Feb  1 12:13:02 morph-05 kernel:  [<c02c62a3>] syscall_call+0x7/0xb
Feb  1 12:13:02 morph-05 kernel: Code: 24 0c 8d 82 8c 46 02 00 e8 d5
bc 75 c7 8b 44 24 0c 8b 5c 24 0c 8b 88 84
 46 02 00 81 c3 84 46 02 00 39 d9 74 2c 8b 54 24 04 89 cf <39> 51 0c
75 16 8b 41 10 a8 01 0f 95 c2 83 7c 24 08
 00 0f 94 c0
Feb  1 12:13:02 morph-05 kernel:  <0>Fatal exception: panic in 5 seconds


Version-Release number of selected component (if applicable):
Gulm <CVS> (built Jan 28 2005 16:39:38) installed
GFS <CVS> (built Jan 28 2005 16:39:51) installed


How reproducible:
Didn't try

Comment 1 Kiersten (Kerri) Anderson 2005-02-09 16:16:31 UTC
Giving this one to Ben and putting on blocker list

Comment 2 Ben Marzinski 2005-02-11 18:36:00 UTC
O.k. So I can't reproduce this, but looking at the trace, here's what I know: In
gfs_quota_get(), the list of quota structs is corrupted. An element of the list
has a NULL next pointer. The only thing I can say for certain is that it's not
the list_head iself that gets messed up, but one of the elements on the list.

Comment 3 Ben Marzinski 2005-02-11 18:37:44 UTC
Just to clarify, the list that has the problem is sdp->sd_quota_list

Comment 4 Ben Marzinski 2005-02-11 20:57:34 UTC
I can't reproduce this. Corey can't reproduce this.  I looked through the code
related to the sd_quota_list, and I can't find any place where this could happen.
So unless this is a reproduceable problem, I don't have any way of figuring out
what happened.

Comment 5 Kiersten (Kerri) Anderson 2005-02-23 17:39:21 UTC
Removing from the blocker list - if we can recreate it, then will get added back
at that time.