Bug 1294765

Summary: Login task on atomic host blocked for more than 120 seconds
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Shruti Sampat <ssampat>
Component: rhgs-server-containerAssignee: Mohamed Ashiq <mliyazud>
Status: CLOSED NOTABUG QA Contact: Anoop <annair>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rhgs-3.1CC: hchiramm, mliyazud, pprakash, ssampat
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-01-19 06:28:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Information about the machine when the issue occurred
none
Logs from /var/log/dmesg* none

Description Shruti Sampat 2015-12-30 08:09:45 UTC
Created attachment 1110464 [details]
Information about the machine when the issue occurred

Description of problem:
-----------------------

Login task on atomic host running RHGS container is blocked for a long time, see call trace below -

[83287.004116] INFO: task login:26187 blocked for more than 120 seconds.
[83287.005441] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[83287.007711] login           D ffff88030c6d0270     0 26187  19423 0x00000004
[83287.009135]  ffff88030b9dfd80 0000000000000082 ffff88030781e780 ffff88030b9dffd8
[83287.011544]  ffff88030b9dffd8 ffff88030b9dffd8 ffff88030781e780 ffff88030781e780
[83287.013958]  7fffffffffffffff ffff88030c6d0278 0000000000000001 ffff88030c6d0270
[83287.016309] Call Trace:
[83287.017394]  [<ffffffff8163a889>] schedule+0x29/0x70
[83287.018618]  [<ffffffff81638579>] schedule_timeout+0x209/0x2d0
[83287.020361]  [<ffffffff810b8a56>] ? try_to_wake_up+0x1b6/0x300
[83287.021642]  [<ffffffff810b8bf0>] ? wake_up_state+0x10/0x20
[83287.022934]  [<ffffffff8163c62a>] ldsem_down_write+0xea/0x255
[83287.024232]  [<ffffffff8163cce8>] tty_ldisc_lock_pair_timeout+0x88/0x120
[83287.025561]  [<ffffffff813b63ac>] tty_ldisc_hangup+0xcc/0x230
[83287.026872]  [<ffffffff813adb54>] __tty_hangup+0x344/0x490
[83287.028167]  [<ffffffff813adfb1>] tty_vhangup_self+0x21/0x50
[83287.029472]  [<ffffffff811dd8d3>] sys_vhangup+0x23/0x30
[83287.030717]  [<ffffffff816458c9>] system_call_fastpath+0x16/0x1b

I found this solution in our Knowledgebase - https://access.redhat.com/solutions/31453

I have collected relevant data from the machine as described in that solution. Unfortunately, I don't have enough information to reproduce this issue. I will update this BZ when I have more information.

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
Red Hat Enterprise Linux Atomic Host release 7.2
rhgs-server-rhel7:3.1.2-3

How reproducible:
-----------------
Intermittently

Steps to Reproduce:
-------------------
Clear steps not available at the moment.

Actual results:
---------------
Login task is hung.

Comment 1 Shruti Sampat 2015-12-30 08:10:21 UTC
Created attachment 1110465 [details]
Logs from /var/log/dmesg*

Comment 3 Humble Chirammal 2015-12-31 06:47:36 UTC
(In reply to Shruti Sampat from comment #0)

> Login task on atomic host running RHGS container is blocked for a long time,
> see call trace below -
> 
> [83287.004116] INFO: task login:26187 blocked for more than 120 seconds.
> [83287.005441] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [83287.007711] login           D ffff88030c6d0270     0 26187  19423
> 0x00000004
> [83287.009135]  ffff88030b9dfd80 0000000000000082 ffff88030781e780
> ffff88030b9dffd8
> [83287.011544]  ffff88030b9dffd8 ffff88030b9dffd8 ffff88030781e780
> ffff88030781e780
> [83287.013958]  7fffffffffffffff ffff88030c6d0278 0000000000000001
> ffff88030c6d0270
> [83287.016309] Call Trace:
> [83287.017394]  [<ffffffff8163a889>] schedule+0x29/0x70
> [83287.018618]  [<ffffffff81638579>] schedule_timeout+0x209/0x2d0
> [83287.020361]  [<ffffffff810b8a56>] ? try_to_wake_up+0x1b6/0x300
> [83287.021642]  [<ffffffff810b8bf0>] ? wake_up_state+0x10/0x20
> [83287.022934]  [<ffffffff8163c62a>] ldsem_down_write+0xea/0x255
> [83287.024232]  [<ffffffff8163cce8>] tty_ldisc_lock_pair_timeout+0x88/0x120
> [83287.025561]  [<ffffffff813b63ac>] tty_ldisc_hangup+0xcc/0x230
> [83287.026872]  [<ffffffff813adb54>] __tty_hangup+0x344/0x490
> [83287.028167]  [<ffffffff813adfb1>] tty_vhangup_self+0x21/0x50
> [83287.029472]  [<ffffffff811dd8d3>] sys_vhangup+0x23/0x30
> [83287.030717]  [<ffffffff816458c9>] system_call_fastpath+0x16/0x1b
> 


At this stage, it looks like a resource congestion to me. 

Are you testing this in a VM ? If yes, how many CPUs are assigned to this VM ? Can you assign total number of vCPUs to the number of host CPUs and reproduce this issue ?

Comment 4 Shruti Sampat 2016-01-12 07:11:16 UTC
(In reply to Humble Chirammal from comment #3)
> 
> At this stage, it looks like a resource congestion to me. 
> 
> Are you testing this in a VM ? If yes, how many CPUs are assigned to this VM
> ? Can you assign total number of vCPUs to the number of host CPUs and
> reproduce this issue ?

I tested with a VM assigned with number of vCPUs equal to the number of host CPUs and kept it running for about a week. Haven't been able to reproduce this issue.

Comment 5 Mohamed Ashiq 2016-01-19 06:28:59 UTC
As the issue is due to resource congestion and when VM assigned with number of vCPUs equal to the number of host CPUs the issue is not reproducible. Hence closing the bug, if reproducible create a new RHBZ.