Bug 1276316

Summary: echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message
Product: Red Hat Enterprise Linux 6 Reporter: nishita <nishu01biswas>
Component: clusterAssignee: Christine Caulfield <ccaulfie>
Status: CLOSED INSUFFICIENT_DATA QA Contact: cluster-qe <cluster-qe>
Severity: medium Docs Contact: Steven J. Levine <slevine>
Priority: unspecified    
Version: 6.5CC: ccaulfie, cluster-maint, nishu01biswas, rpeterso, teigland
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
Cause: Consequence: Hardware possible ,due to memory high load Workaround (if any): kernel manipulation in grub.conf,memory high load alarm generating in service alarms. Result: kernel hung for 120 seconds
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-11-23 08:47:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description nishita 2015-10-29 12:43:13 UTC
Hi Team,

We have redhat cluster based 1 setup with 4 nodes and storage HP MSA2040 attached.
2 of the main servers (HP DL380 G8) gets hang every 1 hour.
Giving dmesg logs as:

Oct 28 14:03:23 TELD1 kernel: INFO: task expect:44222 blocked for more than 120 seconds.
Oct 28 14:03:23 TELD1 kernel:      Not tainted 2.6.32-431.29.2.el6.x86_64 #1
Oct 28 14:03:23 TELD1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 28 14:03:23 TELD1 kernel: expect        D 0000000000000001     0 44222  44217 0x00000080
Oct 28 14:03:23 TELD1 kernel: ffff880aa8e2fb88 0000000000000082 0000000000000000 0000000000000002
Oct 28 14:03:23 TELD1 kernel: ffff8813aed18740 00000000000000db ffff880a00000000 ffff880aa8e2fab8
Oct 28 14:03:23 TELD1 kernel: ffff881803e63058 ffff880aa8e2ffd8 000000000000fbc8 ffff881803e63058
Oct 28 14:03:23 TELD1 kernel: Call Trace:
Oct 28 14:03:23 TELD1 kernel: [<ffffffff811a0870>] ? pollwake+0x0/0x60
Oct 28 14:03:23 TELD1 kernel: [<ffffffff81529cf5>] schedule_timeout+0x215/0x2e0
Oct 28 14:03:23 TELD1 kernel: [<ffffffff81529973>] wait_for_common+0x123/0x180
Oct 28 14:03:23 TELD1 kernel: [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
Oct 28 14:03:23 TELD1 kernel: [<ffffffff81529a8d>] wait_for_completion+0x1d/0x20
Oct 28 14:03:23 TELD1 kernel: [<ffffffff81094df1>] flush_cpu_workqueue+0x61/0x90
Oct 28 14:03:23 TELD1 kernel: [<ffffffff81094ea0>] ? wq_barrier_func+0x0/0x20
Oct 28 14:03:23 TELD1 kernel: [<ffffffff810958c4>] flush_workqueue+0x54/0x80
Oct 28 14:03:23 TELD1 kernel: [<ffffffff81095905>] flush_scheduled_work+0x15/0x20
Oct 28 14:03:23 TELD1 kernel: [<ffffffff8133b0fc>] tty_ldisc_release+0x3c/0x90
Oct 28 14:03:23 TELD1 kernel: [<ffffffff8133544b>] tty_release_dev+0x40b/0x5e0
Oct 28 14:03:23 TELD1 kernel: [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
Oct 28 14:03:23 TELD1 kernel: [<ffffffff8133563e>] tty_release+0x1e/0x30
Oct 28 14:03:23 TELD1 kernel: [<ffffffff8118a6c5>] __fput+0xf5/0x210
Oct 28 14:03:23 TELD1 kernel: [<ffffffff8118a805>] fput+0x25/0x30
Oct 28 14:03:23 TELD1 kernel: [<ffffffff81185a4d>] filp_close+0x5d/0x90
Oct 28 14:03:23 TELD1 kernel: [<ffffffff81185b25>] sys_close+0xa5/0x100
Oct 28 14:03:23 TELD1 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

OS:-RHEL6.5-64BIT.

Kindly help.

Nishita

Comment 2 Christine Caulfield 2015-11-02 10:11:05 UTC
Hi

We'll need a lot more information than that to be able to debug the problem. If you have an account with Red Hat then please contact our support people and they'll help you to drill down and work out where the problem is happening. 

Looking at the kernel stack trace you posted, it seems very unlikely that it's cluster-related though.

Comment 3 nishita 2017-09-05 06:06:50 UTC
thanks for the info
kindly close the bug.
will raise the same ,in future if needed