437498 – dlm_recv stuck in loop through lookup list

Bug 437498 - dlm_recv stuck in loop through lookup list

Summary: dlm_recv stuck in loop through lookup list

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.2
Hardware:	All
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	rc
Target Release:	---
Assignee:	David Teigland
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-03-14 16:12 UTC by David Teigland
Modified:	2009-09-03 16:51 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-05-01 19:17:48 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description David Teigland 2008-03-14 16:12:41 UTC

Description of problem:

Deans new dlm stress test caused this, probably when it tried to exit.

dlm_recv at 100% cpu

SysRq : Show CPUs
CPU2:
 ffff810102b4ff48 0000000000000000 ffff81007372b970 ffffffff8019cc11
 0000000000000000 ffff810080051400 0000000000000058 ffffffff8019cc40
 ffffffff8005e2fc ffffffff80022d96 ffff81013dc19860 ffff8100740f2000
Call Trace:
 <IRQ>  [<ffffffff8019cc11>] showacpu+0x0/0x3b
 [<ffffffff8019cc40>] showacpu+0x2f/0x3b
 [<ffffffff8005e2fc>] call_softirq+0x1c/0x28
 [<ffffffff80022d96>] smp_call_function_interrupt+0x57/0x75
 [<ffffffff8005dc22>] call_function_interrupt+0x66/0x6c
 <EOI>  [<ffffffff80062558>] __sched_text_start+0x148/0xaeb
 [<ffffffff8851a0b3>] :dlm:_request_lock+0x54/0x24c
 [<ffffffff8851a0b3>] :dlm:_request_lock+0x54/0x24c
 [<ffffffff8851a387>] :dlm:process_lookup_list+0x3b/0x58
 [<ffffffff8851b10e>] :dlm:_receive_message+0x384/0xb41
 [<ffffffff80063a5d>] mutex_lock+0xd/0x1d
 [<ffffffff8851b9c7>] :dlm:dlm_receive_buffer+0xf7/0x12b
 [<ffffffff8851efe4>] :dlm:dlm_process_incoming_buffer+0x100/0x138
 [<ffffffff8000eff8>] __alloc_pages+0x65/0x2ce
 [<ffffffff8852010a>] :dlm:process_recv_sockets+0x0/0x16
 [<ffffffff885211b7>] :dlm:receive_from_sock+0x68d/0x7f4
 [<ffffffff80033275>] lock_sock+0xa7/0xb2
 [<ffffffff80142cf2>] __next_cpu+0x19/0x28
 [<ffffffff8008984f>] find_busiest_group+0x20d/0x621
 [<ffffffff80049734>] worker_thread+0x0/0x122
 [<ffffffff8852011a>] :dlm:process_recv_sockets+0x10/0x16
 [<ffffffff8004ce1f>] run_workqueue+0x94/0xe4
 [<ffffffff80049734>] worker_thread+0x0/0x122
 [<ffffffff8009daed>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80049824>] worker_thread+0xf0/0x122
 [<ffffffff8008ab24>] default_wake_function+0x0/0xe
 [<ffffffff8009daed>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80032518>] kthread+0xfe/0x132
 [<ffffffff8005dfb1>] child_rip+0xa/0x11
 [<ffffffff8009daed>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8003241a>] kthread+0x0/0x132
 [<ffffffff8005dfa7>] child_rip+0x0/0x11

debugfs stress_waiters showed this (don't know if it's related):
f00020 1 2 resource13


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 David Teigland 2009-05-01 19:17:48 UTC

I'm keeping a note about this outside bz for whenever I happen to be working in this section of the code again.

Note You need to log in before you can comment on or make changes to this bug.