437893 – GFS2: soft lockup in glock_workqueue:3078

Bug 437893 - GFS2: soft lockup in glock_workqueue:3078

Summary: GFS2: soft lockup in glock_workqueue:3078

Keywords:
Status:	CLOSED DUPLICATE of bug 432057
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.2
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Steve Whitehouse
QA Contact:	GFS Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	428751
TreeView+	depends on / blocked

Reported:	2008-03-17 22:47 UTC by Nate Straz
Modified:	2009-05-28 03:39 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2008-07-09 15:53:07 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Nate Straz 2008-03-17 22:47:10 UTC

Description of problem:

When our distributed I/O load started (dd_io) I started seeing the following
message on one of four nodes:

BUG: soft lockup - CPU#1 stuck for 10s! [glock_workqueue:3078]

Pid: 3078, comm:      glock_workqueue
EIP: 0060:[<c0609a70>] CPU: 1
EIP is at _spin_lock+0x7/0xf
 EFLAGS: 00000286    Not tainted  (2.6.18-85.003 #1)
EAX: de43d2dc EBX: de43d2c0 ECX: 00000286 EDX: 00000200
ESI: de43d35c EDI: f6e3d740 EBP: 00000286 DS: 007b ES: 007b
CR0: 8005003b CR2: 08066204 CR3: 00726000 CR4: 000006d0
 [<f8d8f3f2>] glock_work_func+0xb/0x31 [gfs2]
 [<c0433524>] run_workqueue+0x78/0xb5
 [<f8d8f3e7>] glock_work_func+0x0/0x31 [gfs2]
 [<c0433dd8>] worker_thread+0xd9/0x10d
 [<c042028f>] default_wake_function+0x0/0xc
 [<c0433cff>] worker_thread+0x0/0x10d
 [<c04361f1>] kthread+0xc0/0xeb
 [<c0436131>] kthread+0x0/0xeb
 [<c0405c3b>] kernel_thread_helper+0x7/0x10
 =======================


Version-Release number of selected component (if applicable):
kernel-2.6.18-85.003 - This kernel has the patch to bug 428751


How reproducible:
unknown

Steps to Reproduce:
1. run dd_io with the above kernel


Additional info:

lock_dlm2     D 2C039F97  2916  3474     11          3484  3168 (L-TLB)
       f5cbcec8 00000046 000006c5 2c039f97 000006c5 f7f9daa0 00000009 f7a7c550 
       f7dfd000 2c03a8b9 000006c5 00000922 00000000 f7a7c65c c200c8e0 f67c3c40 
       00000000 00000020 00000000 0000000f c07a09b0 c07a09ac c07a09b0 c07a09ac 
Call Trace:
 [<c06084ee>] wait_for_completion+0x69/0x8d
 [<c042028f>] default_wake_function+0x0/0xc
 [<c0435fa8>] kthread_stop+0x4e/0x6c
 [<f8c99daa>] gdlm_withdraw+0x9c/0xb2 [lock_dlm]
 [<c04362bd>] autoremove_wake_function+0x0/0x2d
 [<f8d9490e>] gfs2_withdraw_lockproto+0x16/0x51 [gfs2]
 [<f8d91f90>] gfs2_lm_withdraw+0x63/0x7f [gfs2]
 [<f8da2cc5>] gfs2_assert_withdraw_i+0x1e/0x30 [gfs2]
 [<f8d8e74d>] xmote_bh+0x1c2/0x248 [gfs2]
 [<f8d8e850>] gfs2_glock_cb+0x7d/0xf6 [gfs2]
 [<f8c9a65e>] gdlm_thread+0x5b4/0x60a [lock_dlm]
 [<f8c9a6b4>] gdlm_thread2+0x0/0x7 [lock_dlm]
 [<c04361f1>] kthread+0xc0/0xeb
 [<c0436131>] kthread+0x0/0xeb
 [<c0405c3b>] kernel_thread_helper+0x7/0x10
 =======================
gfs2_glockd   D 8ACEFD91  2920  3484     11          3486  3474 (L-TLB)
       f6287efc 00000046 018ef4e4 8acefd91 000006d6 c43bd598 0000000a f7b46aa0 
       c06723c0 8acf1b6f 000006d6 00001dde 00000000 f7b46bac c200c8e0 f8d2f5c1 
       00000000 c43bd580 c43bd580 fffffffd 00000000 00000000 f7b9362c f7b9362c 
Call Trace:
 [<f8d2f5c1>] grant_pending_locks+0x62/0x137 [dlm]
 [<c0609721>] rwsem_down_write_failed+0x126/0x141
 [<f8d308d6>] __put_lkb+0x28/0xd5 [dlm]
 [<c0438c15>] .text.lock.rwsem+0x2b/0x3a
 [<f8d92cd3>] gfs2_log_flush+0x18/0x40c [gfs2]
 [<c0608391>] schedule+0x90d/0x9ba
 [<f8d8f8c8>] inode_go_sync+0x50/0xb8 [gfs2]
 [<f8d8e4a0>] gfs2_glock_drop_th+0x14/0xff [gfs2]
 [<f8d8eacd>] run_queue+0xa6/0x236 [gfs2]
 [<f8d8f000>] gfs2_glmutex_unlock+0x26/0x3c [gfs2]
 [<f8d8f0a3>] gfs2_reclaim_glock+0x8d/0x97 [gfs2]
 [<f8d87457>] gfs2_glockd+0x13/0xce [gfs2]
 [<c04362bd>] autoremove_wake_function+0x0/0x2d
 [<f8d87444>] gfs2_glockd+0x0/0xce [gfs2]
 [<c04361f1>] kthread+0xc0/0xeb
 [<c0436131>] kthread+0x0/0xeb
 [<c0405c3b>] kernel_thread_helper+0x7/0x10
 =======================
gfs2_recoverd S 1E0FB83E  3672  3486     11          3488  3484 (L-TLB)
       f62d8f98 00000046 00000000 1e0fb83e 00000751 f7de1c50 00000007 c2109550 
       c06723c0 1e0fc88b 00000751 0000104d 00000000 c210965c c200c8e0 c042dcc0 
       c079ee00 f62d8fa0 00000286 fffffffd 00000000 00000000 00775684 00775684 
Call Trace:
 [<c042dcc0>] lock_timer_base+0x15/0x2f
 [<f8d87512>] gfs2_recoverd+0x0/0x53 [gfs2]
 [<c0608ad4>] schedule_timeout+0x71/0x8c
 [<c042d3df>] process_timeout+0x0/0x5
 [<f8d87557>] gfs2_recoverd+0x45/0x53 [gfs2]
 [<c04361f1>] kthread+0xc0/0xeb
 [<c0436131>] kthread+0x0/0xeb
 [<c0405c3b>] kernel_thread_helper+0x7/0x10
 =======================
gfs2_logd     D EB8D26DA  2868  3488     11          3489  3486 (L-TLB)
       f6cf9eb0 00000046 f88b878d eb8d26da 000006c5 00000000 0000000a f7ba2000 
       c06723c0 eb8dc1b9 000006c5 00009adf 00000000 f7ba210c c200c8e0 c04d996c 
       f6056440 c042ce42 f7f7feac fffffffd 00000000 00000000 c200c8e0 00000000 
Call Trace:
 [<f88b878d>] dm_request+0xb5/0xd4 [dm_mod]
 [<c04d996c>] generic_unplug_device+0x15/0x22
 [<c042ce42>] getnstimeofday+0x30/0xb6
 [<c0608a31>] io_schedule+0x36/0x59
 [<c0473410>] sync_buffer+0x30/0x33
 [<c0608c08>] __wait_on_bit+0x33/0x58
 [<c04733e0>] sync_buffer+0x0/0x33
 [<c04733e0>] sync_buffer+0x0/0x33
 [<c0608c8f>] out_of_line_wait_on_bit+0x62/0x6a
 [<c04362ea>] wake_bit_function+0x0/0x3c
 [<c047338d>] __wait_on_buffer+0x1c/0x1f
 [<c0473dbd>] sync_dirty_buffer+0x86/0xb8
 [<f8d928d6>] log_write_header+0x132/0x304 [gfs2]
 [<f8d9301e>] gfs2_log_flush+0x363/0x40c [gfs2]
 [<f8d924e0>] gfs2_ail1_empty+0x13/0x7d [gfs2]
 [<f8d875f7>] gfs2_logd+0x92/0x13f [gfs2]
 [<f8d87565>] gfs2_logd+0x0/0x13f [gfs2]
 [<c04361f1>] kthread+0xc0/0xeb
 [<c0436131>] kthread+0x0/0xeb
 [<c0405c3b>] kernel_thread_helper+0x7/0x10
 =======================
gfs2_quotad   S F4749BE3  2852  3489     11          8311  3488 (L-TLB)
       f6324f98 00000046 f7fff680 f4749be3 00000757 f6324f84 0000000a f7bc5aa0 
       c06723c0 f474af1d 00000757 0000133a 00000000 f7bc5bac c200c8e0 c042dcc0 
       c079ee00 f6324fa0 00000286 fffffffd 00000000 00000000 0076f29e 0076f29e 
Call Trace:
 [<c042dcc0>] lock_timer_base+0x15/0x2f
 [<f8d876a4>] gfs2_quotad+0x0/0x12c [gfs2]
 [<c0608ad4>] schedule_timeout+0x71/0x8c
 [<c042d3df>] process_timeout+0x0/0x5
 [<f8d877be>] gfs2_quotad+0x11a/0x12c [gfs2]
 [<c04361f1>] kthread+0xc0/0xeb
 [<c0436131>] kthread+0x0/0xeb
 [<c0405c3b>] kernel_thread_helper+0x7/0x10
 =======================
pdflush       S A1E85F24  2532  8311     11                3489 (L-TLB)
       d062afa0 00000046 c04d9a79 a1e85f24 000006a4 c04362bd 0000000a ce3b7550 
       c20ef550 a1e87696 000006a4 00001772 00000001 ce3b765c c20136c4 fffffff4 
       0000040c 00000020 00000001 00000000 00000000 00000021 00000001 d062afb8 
Call Trace:
 [<c04d9a79>] blk_congestion_wait+0x5e/0x67
 [<c04362bd>] autoremove_wake_function+0x0/0x2d
 [<c045acdd>] pdflush+0x0/0x1a3
 [<c045ad94>] pdflush+0xb7/0x1a3
 [<c04361f1>] kthread+0xc0/0xeb
 [<c0436131>] kthread+0x0/0xeb
 [<c0405c3b>] kernel_thread_helper+0x7/0x10
 =======================
d_doio        D EBDF4409  2376  8545      1                3180 (NOTLB)
       c52a5cac 00000082 00027d9f ebdf4409 000006c4 c52a5ca0 00000007 c210e000 
       c20ef550 ed3dce39 000006c4 015e8a30 00000001 c210e10c c20136c4 c52a5cb4 
       c04733e0 c0608c8f 00000002 ffffffff 00000000 00000000 c52a5cd8 00000000 
Call Trace:
 [<c04733e0>] sync_buffer+0x0/0x33
 [<c0608c8f>] out_of_line_wait_on_bit+0x62/0x6a
 [<f8d8dc85>] just_schedule+0x5/0x8 [gfs2]
 [<c0608c08>] __wait_on_bit+0x33/0x58
 [<f8d8dc80>] just_schedule+0x0/0x8 [gfs2]
 [<f8d8dc80>] just_schedule+0x0/0x8 [gfs2]
 [<c0608c8f>] out_of_line_wait_on_bit+0x62/0x6a
 [<c04362ea>] wake_bit_function+0x0/0x3c
 [<f8d8dc7c>] wait_on_holder+0x27/0x2b [gfs2]
 [<f8d8ed29>] glock_wait_internal+0xcc/0x1d0 [gfs2]
 [<f8d8ef98>] gfs2_glock_nq+0x16b/0x18b [gfs2]
 [<f8d90013>] gfs2_glock_nq_atime+0xfa/0x2db [gfs2]
 [<f8d9668b>] gfs2_prepare_write+0xb5/0x32c [gfs2]
 [<c0456975>] generic_file_buffered_write+0x226/0x5a2
 [<c0420b5e>] rebalance_tick+0x11f/0x2e4
 [<c042a3e1>] current_fs_time+0x4a/0x55
 [<c0457197>] __generic_file_aio_write_nolock+0x4a6/0x52a
 [<c04e2704>] __next_cpu+0x12/0x21
 [<c041efa7>] find_busiest_group+0x177/0x462
 [<c04573f5>] generic_file_write+0x0/0x94
 [<c045734b>] __generic_file_write_nolock+0x86/0x9a
 [<c04362bd>] autoremove_wake_function+0x0/0x2d
 [<c0420b5e>] rebalance_tick+0x11f/0x2e4
 [<c0608ce3>] mutex_lock+0xb/0x19
 [<c045742f>] generic_file_write+0x3a/0x94
 [<c04573f5>] generic_file_write+0x0/0x94
 [<c04713ff>] vfs_write+0xa1/0x143
 [<c04719f1>] sys_write+0x3c/0x63
 [<c0404eff>] syscall_call+0x7/0xb
 =======================

Comment 1 Nate Straz 2008-03-18 02:41:20 UTC

How reproducible:
every time

Steps to Reproduce:
1. run dd_io with the above kernel, a test case with a large buffer size will
trigger it.

Comment 2 Steve Whitehouse 2008-06-02 10:16:50 UTC

Does this still happen with the latest build? I presume not since we've been
running dd_io against it extensively and I've not had any reports of this, so
perhaps we can close this one too?

Comment 4 Nate Straz 2008-06-02 15:50:23 UTC

I haven't seen this in a while, but let's give it the standard six month
NEEDINFO treatment.

Comment 5 Steve Whitehouse 2008-06-04 12:26:23 UTC

Pushing the severity down on the basis that this might well be already fixed.

Comment 6 Steve Whitehouse 2008-07-09 15:53:07 UTC


*** This bug has been marked as a duplicate of 432057 ***

Note You need to log in before you can comment on or make changes to this bug.