Bug 143581 - Assertion failed in dlm/mount.c:339!
Assertion failed in dlm/mount.c:339!
Status: CLOSED NOTABUG
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: dlm (Show other bugs)
4
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: David Teigland
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2004-12-22 10:46 EST by Corey Marthaler
Modified: 2009-04-16 16:29 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-01-04 22:36:13 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Corey Marthaler 2004-12-22 10:46:26 EST
Description of problem:
I was running an I/O load on all the nodes in the morph cluster and
after about 3 hours morph-02 hit this bug after seeing dlm errors. All
other nodes in the cluster were fine and continued with I/O. But
morph-02 now reports an Input/output error for the fs.
stat64("/mnt/corey0", 0x805c8ac)        = -1 EIO (Input/output error)


Console:
Dec 21 21:27:20 morph-02 kernel: Buffer I/O error on device
diapered_dm-0, logical block 30674157
Dec 21 21:27:20 morph-02 kernel: lost page write due to I/O error on
diapered_dm-0
Dec 21 21:27:20 morph-02 kernel: dlm: corey1: remote_stage error -105
8101af
Dec 21 21:27:20 morph-02 kernel: dlm: corey1: remote_stage error -105
8a0024
Dec 21 21:27:20 morph-02 kernel: dlm: corey1: remote_stage error -105
900020
Dec 21 21:27:20 morph-02 kernel: kernel BUG at
/usr/src/cluster/gfs-kernel/src/dlm/mount.c:339!
Dec 21 21:27:20 morph-02 kernel: invalid operand: 0000 [#1]
Dec 21 21:27:20 morph-02 kernel: SMP
Dec 21 21:27:20 morph-02 kernel: Modules linked in: gnbd lock_gulm
lock_nolock lock_dlm dlm gfs lock_harness cman ipv6 parport_pc lp
parport autofs4 sunrpc e1000 microcode dm_mod uhci_hcd ehci_hcd button
battery ac ext3 jbd qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod
Dec 21 21:27:20 morph-02 kernel: CPU:    1
Dec 21 21:27:20 morph-02 kernel: EIP:    0060:[<f89adfce>]    Not
tainted VLI
Dec 21 21:27:20 morph-02 kernel: EFLAGS: 00010286   (2.6.9)
Dec 21 21:27:20 morph-02 kernel: EIP is at lm_dlm_withdraw+0x4e/0x70
[lock_dlm]
Dec 21 21:27:20 morph-02 kernel: eax: 00000001   ebx: f8ae2000   ecx:
f281bd80   edx: 00000246
Dec 21 21:27:20 morph-02 kernel: esi: f281a000   edi: f8afe910   ebp:
f8ae2000   esp: f281bd7c
Dec 21 21:27:20 morph-02 kernel: ds: 007b   es: 007b   ss: 0068
Dec 21 21:27:20 morph-02 kernel: Process growfiles (pid: 5318,
threadinfo=f281a000 task=f2ff81b0)
Dec 21 21:27:20 morph-02 kernel: Stack: f89b2425 00000153 f89b1d54
f89b241f 00a79110 f8a52006 f8a720bc f8afe910
Dec 21 21:27:20 morph-02 kernel:        f8a71218 f8afe910 dd46f000
f8a6e132 f8ae2000 f8a744c4 f8afe910 f8afe910
Dec 21 21:27:20 morph-02 kernel:        01cbfce9 00000000 f8afe910
f8a6e525 f8afe910 f8a71218 000004c0 f8afe910
Dec 21 21:27:20 morph-02 kernel: Call Trace:
Dec 21 21:27:20 morph-02 kernel:  [<f8a52006>]
gfs_lm_withdraw+0xd6/0x100 [gfs]
Dec 21 21:27:20 morph-02 kernel:  [<f8a6e132>]
gfs_meta_check_ii+0x62/0x80 [gfs]
Dec 21 21:27:20 morph-02 kernel:  [<f8a3eeb1>]
gfs_get_meta_buffer+0x251/0x2d0 [gfs]
Dec 21 21:27:20 morph-02 kernel:  [<f8a4d36d>]
gfs_copyin_dinode+0x2d/0x1b0 [gfs]
Dec 21 21:27:20 morph-02 kernel:  [<c011f2d0>]
default_wake_function+0x0/0x10
Dec 21 21:27:20 morph-02 kernel:  [<f8a4cb8d>] inode_go_lock+0x4d/0x60
[gfs]
Dec 21 21:27:20 morph-02 kernel:  [<f8a49b45>]
glock_wait_internal+0x125/0x250 [gfs]
Dec 21 21:27:20 morph-02 kernel:  [<f8a49fb2>] gfs_glock_nq+0x82/0x170
[gfs]
Dec 21 21:27:20 morph-02 kernel:  [<f8a4a76e>]
gfs_glock_nq_init+0x1e/0x40 [gfs]
Dec 21 21:27:20 morph-02 kernel:  [<f8a6250a>]
gfs_permission+0x4a/0x80 [gfs]
Dec 21 21:27:20 morph-02 kernel:  [<f8a624c0>] gfs_permission+0x0/0x80
[gfs]
Dec 21 21:27:20 morph-02 kernel:  [<c0169708>] permission+0x68/0x70
Dec 21 21:27:20 morph-02 kernel:  [<c016b1d7>] may_open+0x47/0x260
Dec 21 21:27:20 morph-02 kernel:  [<c016b4a1>] open_namei+0xb1/0x650
Dec 21 21:27:20 morph-02 kernel:  [<c015bb9d>] filp_open+0x2d/0x60
Dec 21 21:27:20 morph-02 kernel:  [<c015be08>] get_unused_fd+0x78/0xd0
Dec 21 21:27:20 morph-02 kernel:  [<c015bf4c>] sys_open+0x3c/0xa0
Dec 21 21:27:20 morph-02 kernel:  [<c0105f5d>] sysenter_past_esp+0x52/0x71
Dec 21 21:27:20 morph-02 kernel: Code: 1f 24 9b f8 89 44 24 0c b8 54
1d 9b f8 89 44 24 08 b8 53 01 00 00 89 44 24 04 e8 ce 4f 77 c7 c7 04
24 25 24 9b f8 e8 c2 4f 77 c7 <0f> 0b 53 01 54 1d 9b f8 c7 04 24 80 1d
9b f8 e8 ae 47 77 c7 8d
Dec 21 21:27:21 morph-02 kernel:  <4>printk: 8657 messages suppressed.
Dec 21 21:27:21 morph-02 kernel: dlm: corey1: remote_stage error -105
9a0075


Version-Release number of selected component (if applicable):
DLM <CVS> (built Dec 21 2004 12:20:35) installed


How reproducible:
Didn't try
Comment 1 David Teigland 2004-12-30 02:55:31 EST
GFS has hit a problem (looks like i/o errors) and is trying to
"withdraw".  This is why you get the EIO from gfs and why you reach
the assert on line 339:

static void lm_dlm_withdraw(lm_lockspace_t *lockspace)
{
        DLM_ASSERT(FALSE,);
}

The kernel also appears to be out of memory.  I'm not sure if this
precedes the withdrawl or not.  Anyway, the dlm prints -ENOBUFS when
it can't get any memory to send a message.  Have you set panic_on_oops?

I'm guessing this is all due to i/o errors and is not a bug.


Comment 2 Corey Marthaler 2005-01-04 15:29:54 EST
Whatever this issue is, I just hit it again while trying to reproduce
bz139958. I had I/O going on all 6 nodes of the cluster and shot 5 of
them. I then brought them all back up into the cluster. The node left
up replays the journals, spits out the thousands of lines of lock
information to the syslog, and then ends up hitting this bug. 
Comment 3 Corey Marthaler 2005-01-04 15:31:40 EST
This time the process was gfs_recoverd:


lock_dlm:  Assertion failed on line 339 of file
/usr/src/cluster/gfs-kernel/src/dlm/mount.c
lock_dlm:  assertion:  "FALSE"
lock_dlm:  time = 695792

------------[ cut here ]------------
kernel BUG at /usr/src/cluster/gfs-kernel/src/dlm/mount.c:339!
invalid operand: 0000 [#1]
SMP
Modules linked in: gnbd lock_gulm lock_nolock lock_dlm dlm cman gfs
lock_harness ipv6 parport_pc lp parport autofs4 sunrpc e1000 microcode
dm_mod uhci_hcd ehci_hcd button battery ac ext3 jbd qla2300 qla2xxx
scsi_transport_fc sd_mod scsi_mod
CPU:    0
EIP:    0060:[<f89adfce>]    Not tainted VLI
EFLAGS: 00010282   (2.6.9)
EIP is at lm_dlm_withdraw+0x4e/0x70 [lock_dlm]
eax: 00000001   ebx: f8af9658   ecx: f4845d00   edx: 00000292
esi: f89a9380   edi: f8afd910   ebp: 00000000   esp: f4845cfc
ds: 007b   es: 007b   ss: 0068
Process gfs_recoverd (pid: 4671, threadinfo=f4844000 task=f6cd8ef0)
Stack: f89b2425 00000153 f89b1d54 f89b241f 000a9df0 f89a845b f8ae1000
f8afd910
       f8a32fef f8a5309c f8afd910 00000004 f8afd910 f3104000 f8a4f1a2
f8ae1000
       f8a55550 f8afd910 f8afd910 080022c3 00000000 00000009 00000004
f8afd910
Call Trace:
 [<f89a845b>] lm_withdraw+0x3b/0x8a [lock_harness]
 [<f8a32fef>] gfs_lm_withdraw+0xcf/0xf0 [gfs]
 [<f8a4f1a2>] gfs_metatype_check_ii+0x72/0x90 [gfs]
 [<f8a48210>] foreach_descriptor+0x320/0x430 [gfs]
 [<f8a4894c>] gfs_recover_journal+0x1ec/0x390 [gfs]
 [<f8a48baf>] gfs_check_journals+0xbf/0xd0 [gfs]
 [<f8a1d2b7>] gfs_recoverd+0x47/0xb0 [gfs]
 [<f8a1d270>] gfs_recoverd+0x0/0xb0 [gfs]
 [<c01042b5>] kernel_thread_helper+0x5/0x10
Code: 1f 24 9b f8 89 44 24 0c b8 54 1d 9b f8 89 44 24 08 b8 53 01 00
00 89 44 24 04 e8 ce 4f 77 c7 c7 04 24 25 24 9b f8 e8 c2 4f 77 c7 <0f>
0b 53 01 54 1d 9b f8 c7 04 24 80 1d 9b f8 e8 ae 47 77 c7 8d
Comment 4 Corey Marthaler 2005-01-04 15:39:33 EST
reproduced w/o seeing the dlm errors
Comment 5 Corey Marthaler 2005-01-04 16:00:46 EST
It looks like you're right Dave, this does appear to be caused by I/O
errors. I noticed that before reproducing this, I hit 144110 which
cause d the following messages:

Jan  4 14:18:24 morph-01 kernel: Buffer I/O error on device
diapered_dm-0, logical block 178928990  Jan  4 14:18:24 morph-01
kernel: lost page write due to I/O error on diapered_dm-0
Comment 6 David Teigland 2005-01-04 22:36:13 EST
lock_dlm is supposed to assert when gfs withdraws (due to i/o errors)

Note You need to log in before you can comment on or make changes to this bug.