1294765 – Login task on atomic host blocked for more than 120 seconds

Bug 1294765 - Login task on atomic host blocked for more than 120 seconds

Summary: Login task on atomic host blocked for more than 120 seconds

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	rhgs-server-container
Sub Component:
Version:	rhgs-3.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	Mohamed Ashiq
QA Contact:	Anoop
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-12-30 08:09 UTC by Shruti Sampat
Modified:	2016-01-19 06:28 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-01-19 06:28:59 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Information about the machine when the issue occurred (9.54 KB, application/x-bzip) 2015-12-30 08:09 UTC, Shruti Sampat	no flags	Details
Logs from /var/log/dmesg* (15.36 KB, application/x-bzip) 2015-12-30 08:10 UTC, Shruti Sampat	no flags	Details
View All

Description Shruti Sampat 2015-12-30 08:09:45 UTC

Created attachment 1110464 [details]
Information about the machine when the issue occurred

Description of problem:
-----------------------

Login task on atomic host running RHGS container is blocked for a long time, see call trace below -

[83287.004116] INFO: task login:26187 blocked for more than 120 seconds.
[83287.005441] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[83287.007711] login           D ffff88030c6d0270     0 26187  19423 0x00000004
[83287.009135]  ffff88030b9dfd80 0000000000000082 ffff88030781e780 ffff88030b9dffd8
[83287.011544]  ffff88030b9dffd8 ffff88030b9dffd8 ffff88030781e780 ffff88030781e780
[83287.013958]  7fffffffffffffff ffff88030c6d0278 0000000000000001 ffff88030c6d0270
[83287.016309] Call Trace:
[83287.017394]  [<ffffffff8163a889>] schedule+0x29/0x70
[83287.018618]  [<ffffffff81638579>] schedule_timeout+0x209/0x2d0
[83287.020361]  [<ffffffff810b8a56>] ? try_to_wake_up+0x1b6/0x300
[83287.021642]  [<ffffffff810b8bf0>] ? wake_up_state+0x10/0x20
[83287.022934]  [<ffffffff8163c62a>] ldsem_down_write+0xea/0x255
[83287.024232]  [<ffffffff8163cce8>] tty_ldisc_lock_pair_timeout+0x88/0x120
[83287.025561]  [<ffffffff813b63ac>] tty_ldisc_hangup+0xcc/0x230
[83287.026872]  [<ffffffff813adb54>] __tty_hangup+0x344/0x490
[83287.028167]  [<ffffffff813adfb1>] tty_vhangup_self+0x21/0x50
[83287.029472]  [<ffffffff811dd8d3>] sys_vhangup+0x23/0x30
[83287.030717]  [<ffffffff816458c9>] system_call_fastpath+0x16/0x1b

I found this solution in our Knowledgebase - https://access.redhat.com/solutions/31453

I have collected relevant data from the machine as described in that solution. Unfortunately, I don't have enough information to reproduce this issue. I will update this BZ when I have more information.

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
Red Hat Enterprise Linux Atomic Host release 7.2
rhgs-server-rhel7:3.1.2-3

How reproducible:
-----------------
Intermittently

Steps to Reproduce:
-------------------
Clear steps not available at the moment.

Actual results:
---------------
Login task is hung.

Comment 1 Shruti Sampat 2015-12-30 08:10:21 UTC

Created attachment 1110465 [details]
Logs from /var/log/dmesg*

Comment 3 Humble Chirammal 2015-12-31 06:47:36 UTC

(In reply to Shruti Sampat from comment #0)

> Login task on atomic host running RHGS container is blocked for a long time,
> see call trace below -
> 
> [83287.004116] INFO: task login:26187 blocked for more than 120 seconds.
> [83287.005441] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [83287.007711] login           D ffff88030c6d0270     0 26187  19423
> 0x00000004
> [83287.009135]  ffff88030b9dfd80 0000000000000082 ffff88030781e780
> ffff88030b9dffd8
> [83287.011544]  ffff88030b9dffd8 ffff88030b9dffd8 ffff88030781e780
> ffff88030781e780
> [83287.013958]  7fffffffffffffff ffff88030c6d0278 0000000000000001
> ffff88030c6d0270
> [83287.016309] Call Trace:
> [83287.017394]  [<ffffffff8163a889>] schedule+0x29/0x70
> [83287.018618]  [<ffffffff81638579>] schedule_timeout+0x209/0x2d0
> [83287.020361]  [<ffffffff810b8a56>] ? try_to_wake_up+0x1b6/0x300
> [83287.021642]  [<ffffffff810b8bf0>] ? wake_up_state+0x10/0x20
> [83287.022934]  [<ffffffff8163c62a>] ldsem_down_write+0xea/0x255
> [83287.024232]  [<ffffffff8163cce8>] tty_ldisc_lock_pair_timeout+0x88/0x120
> [83287.025561]  [<ffffffff813b63ac>] tty_ldisc_hangup+0xcc/0x230
> [83287.026872]  [<ffffffff813adb54>] __tty_hangup+0x344/0x490
> [83287.028167]  [<ffffffff813adfb1>] tty_vhangup_self+0x21/0x50
> [83287.029472]  [<ffffffff811dd8d3>] sys_vhangup+0x23/0x30
> [83287.030717]  [<ffffffff816458c9>] system_call_fastpath+0x16/0x1b
> 


At this stage, it looks like a resource congestion to me. 

Are you testing this in a VM ? If yes, how many CPUs are assigned to this VM ? Can you assign total number of vCPUs to the number of host CPUs and reproduce this issue ?

Comment 4 Shruti Sampat 2016-01-12 07:11:16 UTC

(In reply to Humble Chirammal from comment #3)
> 
> At this stage, it looks like a resource congestion to me. 
> 
> Are you testing this in a VM ? If yes, how many CPUs are assigned to this VM
> ? Can you assign total number of vCPUs to the number of host CPUs and
> reproduce this issue ?

I tested with a VM assigned with number of vCPUs equal to the number of host CPUs and kept it running for about a week. Haven't been able to reproduce this issue.

Comment 5 Mohamed Ashiq 2016-01-19 06:28:59 UTC

As the issue is due to resource congestion and when VM assigned with number of vCPUs equal to the number of host CPUs the issue is not reproducible. Hence closing the bug, if reproducible create a new RHBZ.

Note You need to log in before you can comment on or make changes to this bug.