Bug 1693752 - SSH pseudo terminals zombied after Ansible playbook execution exits in Mistral Executor container
Summary: SSH pseudo terminals zombied after Ansible playbook execution exits in Mistra...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-common
Version: 15.0 (Stein)
Hardware: All
OS: All
high
high
Target Milestone: beta
: 15.0 (Stein)
Assignee: Emilien Macchi
QA Contact: Marius Cornea
Andrew Burden
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-03-28 14:58 UTC by Emilien Macchi
Modified: 2019-09-26 10:49 UTC (History)
4 users (show)

Fixed In Version: openstack-tripleo-common-10.6.1-0.20190402170405.d19f18c.el8ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-09-21 11:21:05 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ansible ansible issues 49270 0 'None' closed SSH pseudo terminal zombied after ansible playbook exit 2020-09-22 18:53:51 UTC
Launchpad 1821854 0 None None None 2019-03-28 14:59:10 UTC
OpenStack gerrit 648674 0 'None' MERGED Revert "Stop dumb-init usage" 2020-09-22 18:53:47 UTC
Red Hat Product Errata RHEA-2019:2811 0 None None None 2019-09-21 11:21:23 UTC

Description Emilien Macchi 2019-03-28 14:58:35 UTC
Description of problem:

Since config-download is the default, we now run Ansible playbooks from the Mistral Executor containers which will remotely apply the configuration on the Overcloud.

It obviously creates a lot of SSH connections processes and if there is no init in place, nothing will clean the zombies.

Upstream Kolla deploys dumb init, which allows to run the applications after PID 1 and managed by an init in the container.

Downstream, we don't package dumb init therefore we don't have any mechanism in place.



Version-Release number of selected component (if applicable):
OSP14 and OSP15

How reproducible:


Steps to Reproduce:
Deploy an Undercloud and then an Overcloud


Actual results:

As a result, we can see a lot of dead processes on the Undercloud. Example in the Mistral Executor container:

()[root@undercloud /]$ ps faux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root       22047  0.0  0.0  12028  3344 pts/0    Ss   14:56   0:00 sh
root       22101  0.0  0.0  44092  3408 pts/0    R+   14:56   0:00  \_ ps faux
mistral        1  0.4  0.6 701904 160252 ?       Ss   04:24   3:05 /usr/bin/python3 /usr/bin/mistral-server --config-file=/etc/mistral/mistral.conf --log-file=/var/log/mistral/executor.log --server=executor
mistral     4146  0.0  0.0      0     0 ?        Zs   13:51   0:00 [ssh] <defunct>
mistral     4147  0.0  0.0      0     0 ?        Zs   13:51   0:00 [ssh] <defunct>
mistral     4149  0.0  0.0      0     0 ?        Zs   13:51   0:00 [ssh] <defunct>
mistral     4150  0.0  0.0      0     0 ?        Zs   13:51   0:00 [ssh] <defunct>
mistral     4152  0.0  0.0      0     0 ?        Zs   13:51   0:00 [ssh] <defunct>
mistral     4153  0.0  0.0      0     0 ?        Zs   13:51   0:00 [ssh] <defunct>
mistral     4155  0.0  0.0      0     0 ?        Zs   13:51   0:00 [ssh] <defunct>
mistral     4156  0.0  0.0      0     0 ?        Zs   13:51   0:00 [ssh] <defunct>
mistral     4158  0.0  0.0      0     0 ?        Zs   13:51   0:00 [ssh] <defunct>
mistral     4159  0.0  0.0      0     0 ?        Zs   13:51   0:00 [ssh] <defunct>
mistral     4161  0.0  0.0      0     0 ?        Zs   13:51   0:00 [ssh] <defunct>
mistral     4162  0.0  0.0      0     0 ?        Zs   13:51   0:00 [ssh] <defunct>
mistral     4164  0.0  0.0      0     0 ?        Zs   13:51   0:00 [ssh] <defunct>
mistral     4165  0.0  0.0      0     0 ?        Zs   13:51   0:00 [ssh] <defunct>
mistral     4167  0.0  0.0      0     0 ?        Zs   13:51   0:00 [ssh] <defunct>
mistral     4168  0.0  0.0      0     0 ?        Zs   13:51   0:00 [ssh] <defunct>
mistral     4404  0.0  0.0      0     0 ?        Zs   13:52   0:00 [ssh] <defunct>
mistral     4405  0.0  0.0      0     0 ?        Zs   13:52   0:00 [ssh] <defunct>
mistral     4426  0.0  0.0      0     0 ?        Zs   13:52   0:00 [ssh] <defunct>
mistral     4427  0.0  0.0  46488  3032 ?        Ss   13:52   0:01 ssh: /var/lib/mistral/overcloud/ansible-ssh/9d4a290937 [mux]
mistral     4429  0.0  0.0      0     0 ?        Zs   13:52   0:00 [ssh] <defunct>
mistral     4430  0.0  0.0  46456  3156 ?        Ss   13:52   0:01 ssh: /var/lib/mistral/overcloud/ansible-ssh/0eca69783a [mux]
mistral     4432  0.0  0.0      0     0 ?        Zs   13:52   0:00 [ssh] <defunct>
mistral     4433  0.0  0.0  46308  3352 ?        Ss   13:52   0:01 ssh: /var/lib/mistral/overcloud/ansible-ssh/89230ae28c [mux]
mistral     4435  0.0  0.0      0     0 ?        Zs   13:52   0:00 [ssh] <defunct>
mistral     4436  0.0  0.0  46356  3440 ?        Ss   13:52   0:00 ssh: /var/lib/mistral/overcloud/ansible-ssh/8af4971746 [mux]
mistral     4438  0.0  0.0      0     0 ?        Zs   13:52   0:00 [ssh] <defunct>
mistral     4439  0.0  0.0  46356  3264 ?        Ss   13:52   0:00 ssh: /var/lib/mistral/overcloud/ansible-ssh/fd0b1d8d7b [mux]
mistral     4441  0.0  0.0      0     0 ?        Zs   13:52   0:00 [ssh] <defunct>
mistral     4442  0.0  0.0  46336  3376 ?        Ss   13:52   0:00 ssh: /var/lib/mistral/overcloud/ansible-ssh/b35272565b [mux]
mistral     4445  0.0  0.0      0     0 ?        Zs   13:52   0:00 [ssh] <defunct>
mistral     4446  0.0  0.0      0     0 ?        Zs   13:52   0:00 [ssh] <defunct>
mistral     4447  0.0  0.0  46384  3164 ?        Ss   13:52   0:00 ssh: /var/lib/mistral/overcloud/ansible-ssh/2cb2323068 [mux]
mistral     4448  0.0  0.0  46472  3196 ?        Ss   13:52   0:00 ssh: /var/lib/mistral/overcloud/ansible-ssh/147c7420a3 [mux]

Expected results:

1) Mistral executor should be run as PID 2
2) other processes should be managed as a child of PID 1 managed by dumb init, and cleared out when the Ansible playbooks are done.

Comment 1 Lon Hohberger 2019-03-28 15:01:41 UTC
It's important to note is that this process list grows over time, effectively constituting a resource leak.

Comment 2 Emilien Macchi 2019-04-04 01:00:37 UTC
with latest compose from today:

$ sudo podman exec -it mistral_executor ps faux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
mistral        1  0.0  0.0   4208   640 ?        Ss   00:44   0:00 dumb-init --single-child ...
mistral        7  5.0  0.6 694040 154644 ?       R    00:44   0:42 /usr/bin/python3 /usr/bin/m...

Comment 8 errata-xmlrpc 2019-09-21 11:21:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:2811


Note You need to log in before you can comment on or make changes to this bug.