Bug 169010 - kernel BUG at include/asm/spinlock.h:109!
kernel BUG at include/asm/spinlock.h:109!
Status: CLOSED INSUFFICIENT_DATA
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: dlm (Show other bugs)
4
All Linux
medium Severity medium
: ---
: ---
Assigned To: David Teigland
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-09-22 00:37 EDT by Wendy Cheng
Modified: 2009-04-16 16:30 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-09-22 15:09:44 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Wendy Cheng 2005-09-22 00:37:52 EDT
Description of problem:
2.6.9-11.ELsmp kernel panic with the following back trace:

Sep 17 11:06:03 fs1 php: PHP Notice:  Undefined index:  cases_id in
/opt/lib/PIS/DVI.php on line 326
Sep 17 11:06:05 fs1 php: PHP Notice:  Undefined index:  cases_id in
/opt/lib/PIS/DVI.php on line 326
Sep 17 11:06:05 fs1 crond(pam_unix)[23172]: session closed for user root
Sep 17 11:06:41 fs1 kernel: ------------[ cut here ]------------
Sep 17 11:06:41 fs1 kernel: kernel BUG at include/asm/spinlock.h:109!
Sep 17 11:06:41 fs1 kernel: invalid operand: 0000 [#1]
Sep 17 11:06:41 fs1 kernel: SMP
Sep 17 11:06:41 fs1 kernel: Modules linked in: iptable_nat ip_conntrack
iptable_filter ip_tables nfs nfsd exportfs lockd autofs4 i2c_dev i2c_core
lock_dlm(U) gfs(U) lock_harness(U) lm(U) cman(U) md5 ipv6 sunrpc dm_mod button
battery ac uhci_hcd ehci_hcd e1000 floppy sg ext3 jbd megaraid_mbox megaraid_mm
sd_mod scsi_mod
Sep 17 11:06:41 fs1 kernel: CPU:    0
Sep 17 11:06:41 fs1 kernel: EIP:    0060:[<c02c5f6e>]    Not tainted VLI
Sep 17 11:06:41 fs1 kernel: EFLAGS: 00010202   (2.6.9-11.ELsmp)
Sep 17 11:06:41 fs1 kernel: EIP is at _spin_unlock+0x1c/0x27
Sep 17 11:06:41 fs1 kernel: eax: 00000001   ebx: d5b27480   ecx: 00000001   edx:
e958de80
Sep 17 11:06:41 fs1 kernel: esi: e958de64   edi: f8ec6040   ebp: ced2df2c   esp:
f5a48efc
Sep 17 11:06:41 fs1 kernel: ds: 007b   es: 007b   ss: 0068
Sep 17 11:06:41 fs1 kernel: Process lock_dlm2 (pid: 2601, threadinfo=f5a48000
task=f5ed62b0)
Sep 17 11:06:41 fs1 kernel: Stack: f8e90d39 ced2df10 00000001 d5b27480 f8e90e02
00000001 ced2df10 f8e916f1
Sep 17 11:06:41 fs1 kernel:        f8ec6040 f8dc8000 00000000 f8dc8000 ced2df10
f5a48f4c 00000004 f8e92a93
Sep 17 11:06:41 fs1 kernel:        e21dd080 c382a200 f5a48f5c f8d7e635 2a2b898b
00000000 00000003 00000000
Sep 17 11:06:41 fs1 kernel: Call Trace:
Sep 17 11:06:41 fs1 kernel:  [<f8e90d39>] rq_demote+0x6d/0x98 [gfs]
Sep 17 11:06:41 fs1 kernel:  [<f8e90e02>] run_queue+0x5a/0xc1 [gfs]
Sep 17 11:06:41 fs1 kernel:  [<f8e916f1>] drop_bh+0x126/0x194 [gfs]
Sep 17 11:06:41 fs1 kernel:  [<f8e92a93>] gfs_glock_cb+0xa3/0x131 [gfs]
Sep 17 11:06:41 fs1 kernel:  [<f8d7e635>] process_complete+0x3b7/0x3bf [lock_dlm]
Sep 17 11:06:41 fs1 kernel:  [<f8d7e8b3>] dlm_async+0x276/0x2ff [lock_dlm]
Sep 17 11:06:41 fs1 kernel:  [<c011dc6f>] default_wake_function+0x0/0xc
Sep 17 11:06:41 fs1 kernel:  [<c011dc6f>] default_wake_function+0x0/0xc
Sep 17 11:06:41 fs1 kernel:  [<f8d7e63d>] dlm_async+0x0/0x2ff [lock_dlm]
Sep 17 11:06:41 fs1 kernel:  [<c0132e31>] kthread+0x73/0x9b
Sep 17 11:06:41 fs1 kernel:  [<c0132dbe>] kthread+0x0/0x9b
Sep 17 11:06:41 fs1 kernel:  [<c01041f1>] kernel_thread_helper+0x5/0xb
Sep 17 11:06:41 fs1 kernel: Code: 88 2d c0 f0 83 28 01 79 05 e8 5b ee ff ff c3
81 78 04 ad 4e ad de 89 c2 b1 01 74 08 0f 0b 6c 00 4f 88 2d c0 0f b6 02 84 c0 7e
08 <0f> 0b 6d 00 4f 88 2d c0 86 0a c3 f0 81 00 00 00 00 01 c3 f0 ff
Sep 17 11:06:41 fs1 kernel:  ------------[ cut here ]------------
Sep 17 11:06:41 fs1 kernel: Fatal exception: panic in 5 seconds
Sep 17 11:06:41 fs1 kernel: kernel BUG at include/asm/spinlock.h:109!
Sep 17 11:06:41 fs1 kernel: invalid operand: 0000 [#2]
Sep 17 11:06:41 fs1 kernel: SMP
Sep 17 11:06:41 fs1 kernel: Modules linked in: iptable_nat ip_conntrack
iptable_filter ip_tables nfs nfsd exportfs lockd autofs4 i2c_dev i2c_core
lock_dlm(U) gfs(U) lock_harness(U) dlm(U) cman(U) md5 ipv6 sunrpc dm_mod button
battery ac uhci_hcd ehci_hcd e1000 floppy sg ext3 jbd megaraid_mbox megaraid_mm
sd_mod scsi_mod
Sep 17 11:06:41 fs1 kernel: CPU:    3
Sep 17 11:06:41 fs1 kernel: EIP:    0060:[<c02c5f6e>]    Not tainted VLI
Sep 17 11:06:41 fs1 kernel: EFLAGS: 00010202   (2.6.9-11.ELsmp)
Sep 17 11:06:41 fs1 kernel: EIP is at _spin_unlock+0x1c/0x27
Sep 17 11:06:41 fs1 kernel: eax: 00000001   ebx: d5b27480   ecx: 00000001   edx:
e958de80
Sep 17 11:06:41 fs1 kernel: esi: e958de64   edi: f8ec6040   ebp: e28c9064   esp:
f5a47efc
Sep 17 11:06:41 fs1 kernel: ds: 007b   es: 007b   ss: 0068
Sep 17 11:06:41 fs1 kernel: Process lock_dlm1 (pid: 2600, threadinfo=f5a47000
task=f5ed78b0)
Sep 17 11:06:41 fs1 kernel: Stack: f8e90d39 e28c9048 00000001 d5b27480 f8e90e02
00000001 e28c9048 f8e916f1
Sep 17 11:06:41 fs1 kernel:        f8ec6040 f8dc8000 00000000 f8dc8000 e28c9048
f5a47f4c 00000004 f8e92a93
Sep 17 11:06:41 fs1 kernel:        d9c11900 c382a200 f5a47f5c f8d7e635 2faa790e
00000000 00000003 00000000
Sep 17 11:06:41 fs1 kernel: Call Trace:
Sep 17 11:06:41 fs1 kernel:  [<f8e90d39>] rq_demote+0x6d/0x98 [gfs]
Sep 17 11:06:41 fs1 kernel:  [<f8e90e02>] run_queue+0x5a/0xc1 [gfs]
Sep 17 11:06:41 fs1 kernel:  [<f8e916f1>] drop_bh+0x126/0x194 [gfs]
Sep 17 11:06:41 fs1 kernel:  [<f8e92a93>] gfs_glock_cb+0xa3/0x131 [gfs]
Sep 17 11:06:41 fs1 kernel:  [<f8d7e635>] process_complete+0x3b7/0x3bf [lock_dlm]
Sep 17 11:06:41 fs1 kernel:  [<f8d7e8b3>] dlm_async+x276/0x2ff [lock_dlm]
Sep 17 11:06:41 fs1 kernel:  [<c011dc6f>] default_wake_function+0x0/0xc
Sep 17 11:06:41 fs1 kernel:  [<c011dc6f>] default_wake_function+0x0/0xc
Sep 17 11:06:41 fs1 kernel:  [<f8d7e63d>] dlm_async+0x0/0x2ff [lock_dlm]
Sep 17 11:06:41 fs1 kernel:  [<c0132e31>] kthread+0x73/0x9b
Sep 17 11:06:41 fs1 kernel:  [<c0132dbe>] kthread+0x0/0x9b
Sep 17 11:06:41 fs1 kernel:  [<c01041f1>] kernel_thread_helper+0x5/0xb
Sep 17 11:06:41 fs1 kernel: Code: 88 2d c0 f0 83 28 01 79 05 e8 5b ee ff ff c3
81 78 04 ad 4e ad de 89 c2 b1 01 74 08 0f 0b 6c 00 4f 88 2d c0 0f b6 02 84 c0 7e
08 <0f> 0b 6d 00 4f 88 2d c0 86 0a c3 f0 81 00 00 00 00 01 c3 f0 ff
Sep 17 11:06:41 fs1 kernel:  <0>Fatal exception: panic in 5 seconds
Sep 17 11:07:04 fs1 kernel: CMAN: Being told to leave the cluster by node 1
Sep 17 11:07:04 fs1 kernel: CMAN: we are leaving the cluster.

Version-Release number of selected component (if applicable):
2.6.9-11.ELsmp
dlm-1.0.0-0-i686
dlm-kernel-smp-2.6.9-34.0-i686

How reproducible:
(unknown)

Additional info:
Two nodes cluster with manual fencing. Sysreport and cluster.conf are uploaded.
Comment 4 Wendy Cheng 2005-09-22 01:18:24 EDT
* No vmcore available - have asked front end support engineer to walk the
customer thru netdump/diskdump set up. Just in case there'll be more panics. 
* This is a manual fencing cluster so catching vmcore should be easy. 
* Don't know whether it is an one-time panic or repeatable at this moment.
* Roughly browse thru the code ... 

We crashed at sanity check (lock->magic != SPINLOCK_MAGIC) within
_spin_unlock(). This implies the code had passed the very same check in
spin_lock(). So I would say this panic was not from an un-initialized spin lock
but a trashed memory. One possibility is that the lock had been freed. Since
this happened on two different CPUs with two different threads (lock_dlm2 and
lock_dlm1), the locks could have been freed by gfs_glockd. 
Comment 5 Wendy Cheng 2005-09-22 11:38:34 EDT
I should have put a "?" at the end of above comment since I'm just doing a wild
guess. 
Comment 6 Kiersten (Kerri) Anderson 2006-09-22 15:06:40 EDT
Are we going to be able to solve this one and/or have we been able to recreate
it?  Would close it if we can't since it has been sitting untouched for a year now.
Comment 7 Wendy Cheng 2006-09-22 15:09:44 EDT
IT has closed - so let's close this bugzilla too. 

Note You need to log in before you can comment on or make changes to this bug.