Bug 1294765 - Login task on atomic host blocked for more than 120 seconds
Login task on atomic host blocked for more than 120 seconds
Status: CLOSED NOTABUG
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: rhgs-server-container (Show other bugs)
3.1
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Mohamed Ashiq
Anoop
: ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-12-30 03:09 EST by Shruti Sampat
Modified: 2016-01-19 01:28 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-01-19 01:28:59 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Information about the machine when the issue occurred (9.54 KB, application/x-bzip)
2015-12-30 03:09 EST, Shruti Sampat
no flags Details
Logs from /var/log/dmesg* (15.36 KB, application/x-bzip)
2015-12-30 03:10 EST, Shruti Sampat
no flags Details

  None (edit)
Description Shruti Sampat 2015-12-30 03:09:45 EST
Created attachment 1110464 [details]
Information about the machine when the issue occurred

Description of problem:
-----------------------

Login task on atomic host running RHGS container is blocked for a long time, see call trace below -

[83287.004116] INFO: task login:26187 blocked for more than 120 seconds.
[83287.005441] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[83287.007711] login           D ffff88030c6d0270     0 26187  19423 0x00000004
[83287.009135]  ffff88030b9dfd80 0000000000000082 ffff88030781e780 ffff88030b9dffd8
[83287.011544]  ffff88030b9dffd8 ffff88030b9dffd8 ffff88030781e780 ffff88030781e780
[83287.013958]  7fffffffffffffff ffff88030c6d0278 0000000000000001 ffff88030c6d0270
[83287.016309] Call Trace:
[83287.017394]  [<ffffffff8163a889>] schedule+0x29/0x70
[83287.018618]  [<ffffffff81638579>] schedule_timeout+0x209/0x2d0
[83287.020361]  [<ffffffff810b8a56>] ? try_to_wake_up+0x1b6/0x300
[83287.021642]  [<ffffffff810b8bf0>] ? wake_up_state+0x10/0x20
[83287.022934]  [<ffffffff8163c62a>] ldsem_down_write+0xea/0x255
[83287.024232]  [<ffffffff8163cce8>] tty_ldisc_lock_pair_timeout+0x88/0x120
[83287.025561]  [<ffffffff813b63ac>] tty_ldisc_hangup+0xcc/0x230
[83287.026872]  [<ffffffff813adb54>] __tty_hangup+0x344/0x490
[83287.028167]  [<ffffffff813adfb1>] tty_vhangup_self+0x21/0x50
[83287.029472]  [<ffffffff811dd8d3>] sys_vhangup+0x23/0x30
[83287.030717]  [<ffffffff816458c9>] system_call_fastpath+0x16/0x1b

I found this solution in our Knowledgebase - https://access.redhat.com/solutions/31453

I have collected relevant data from the machine as described in that solution. Unfortunately, I don't have enough information to reproduce this issue. I will update this BZ when I have more information.

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
Red Hat Enterprise Linux Atomic Host release 7.2
rhgs-server-rhel7:3.1.2-3

How reproducible:
-----------------
Intermittently

Steps to Reproduce:
-------------------
Clear steps not available at the moment.

Actual results:
---------------
Login task is hung.
Comment 1 Shruti Sampat 2015-12-30 03:10 EST
Created attachment 1110465 [details]
Logs from /var/log/dmesg*
Comment 3 Humble Chirammal 2015-12-31 01:47:36 EST
(In reply to Shruti Sampat from comment #0)

> Login task on atomic host running RHGS container is blocked for a long time,
> see call trace below -
> 
> [83287.004116] INFO: task login:26187 blocked for more than 120 seconds.
> [83287.005441] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [83287.007711] login           D ffff88030c6d0270     0 26187  19423
> 0x00000004
> [83287.009135]  ffff88030b9dfd80 0000000000000082 ffff88030781e780
> ffff88030b9dffd8
> [83287.011544]  ffff88030b9dffd8 ffff88030b9dffd8 ffff88030781e780
> ffff88030781e780
> [83287.013958]  7fffffffffffffff ffff88030c6d0278 0000000000000001
> ffff88030c6d0270
> [83287.016309] Call Trace:
> [83287.017394]  [<ffffffff8163a889>] schedule+0x29/0x70
> [83287.018618]  [<ffffffff81638579>] schedule_timeout+0x209/0x2d0
> [83287.020361]  [<ffffffff810b8a56>] ? try_to_wake_up+0x1b6/0x300
> [83287.021642]  [<ffffffff810b8bf0>] ? wake_up_state+0x10/0x20
> [83287.022934]  [<ffffffff8163c62a>] ldsem_down_write+0xea/0x255
> [83287.024232]  [<ffffffff8163cce8>] tty_ldisc_lock_pair_timeout+0x88/0x120
> [83287.025561]  [<ffffffff813b63ac>] tty_ldisc_hangup+0xcc/0x230
> [83287.026872]  [<ffffffff813adb54>] __tty_hangup+0x344/0x490
> [83287.028167]  [<ffffffff813adfb1>] tty_vhangup_self+0x21/0x50
> [83287.029472]  [<ffffffff811dd8d3>] sys_vhangup+0x23/0x30
> [83287.030717]  [<ffffffff816458c9>] system_call_fastpath+0x16/0x1b
> 


At this stage, it looks like a resource congestion to me. 

Are you testing this in a VM ? If yes, how many CPUs are assigned to this VM ? Can you assign total number of vCPUs to the number of host CPUs and reproduce this issue ?
Comment 4 Shruti Sampat 2016-01-12 02:11:16 EST
(In reply to Humble Chirammal from comment #3)
> 
> At this stage, it looks like a resource congestion to me. 
> 
> Are you testing this in a VM ? If yes, how many CPUs are assigned to this VM
> ? Can you assign total number of vCPUs to the number of host CPUs and
> reproduce this issue ?

I tested with a VM assigned with number of vCPUs equal to the number of host CPUs and kept it running for about a week. Haven't been able to reproduce this issue.
Comment 5 Mohamed Ashiq 2016-01-19 01:28:59 EST
As the issue is due to resource congestion and when VM assigned with number of vCPUs equal to the number of host CPUs the issue is not reproducible. Hence closing the bug, if reproducible create a new RHBZ.

Note You need to log in before you can comment on or make changes to this bug.