Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1693752

Summary: SSH pseudo terminals zombied after Ansible playbook execution exits in Mistral Executor container
Product: Red Hat OpenStack Reporter: Emilien Macchi <emacchi>
Component: openstack-tripleo-commonAssignee: Emilien Macchi <emacchi>
Status: CLOSED ERRATA QA Contact: Marius Cornea <mcornea>
Severity: high Docs Contact: Andrew Burden <aburden>
Priority: high    
Version: 15.0 (Stein)CC: jschluet, m.andre, mburns, slinaber
Target Milestone: betaKeywords: Triaged
Target Release: 15.0 (Stein)   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: openstack-tripleo-common-10.6.1-0.20190402170405.d19f18c.el8ost Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-09-21 11:21:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Emilien Macchi 2019-03-28 14:58:35 UTC
Description of problem:

Since config-download is the default, we now run Ansible playbooks from the Mistral Executor containers which will remotely apply the configuration on the Overcloud.

It obviously creates a lot of SSH connections processes and if there is no init in place, nothing will clean the zombies.

Upstream Kolla deploys dumb init, which allows to run the applications after PID 1 and managed by an init in the container.

Downstream, we don't package dumb init therefore we don't have any mechanism in place.



Version-Release number of selected component (if applicable):
OSP14 and OSP15

How reproducible:


Steps to Reproduce:
Deploy an Undercloud and then an Overcloud


Actual results:

As a result, we can see a lot of dead processes on the Undercloud. Example in the Mistral Executor container:

()[root@undercloud /]$ ps faux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root       22047  0.0  0.0  12028  3344 pts/0    Ss   14:56   0:00 sh
root       22101  0.0  0.0  44092  3408 pts/0    R+   14:56   0:00  \_ ps faux
mistral        1  0.4  0.6 701904 160252 ?       Ss   04:24   3:05 /usr/bin/python3 /usr/bin/mistral-server --config-file=/etc/mistral/mistral.conf --log-file=/var/log/mistral/executor.log --server=executor
mistral     4146  0.0  0.0      0     0 ?        Zs   13:51   0:00 [ssh] <defunct>
mistral     4147  0.0  0.0      0     0 ?        Zs   13:51   0:00 [ssh] <defunct>
mistral     4149  0.0  0.0      0     0 ?        Zs   13:51   0:00 [ssh] <defunct>
mistral     4150  0.0  0.0      0     0 ?        Zs   13:51   0:00 [ssh] <defunct>
mistral     4152  0.0  0.0      0     0 ?        Zs   13:51   0:00 [ssh] <defunct>
mistral     4153  0.0  0.0      0     0 ?        Zs   13:51   0:00 [ssh] <defunct>
mistral     4155  0.0  0.0      0     0 ?        Zs   13:51   0:00 [ssh] <defunct>
mistral     4156  0.0  0.0      0     0 ?        Zs   13:51   0:00 [ssh] <defunct>
mistral     4158  0.0  0.0      0     0 ?        Zs   13:51   0:00 [ssh] <defunct>
mistral     4159  0.0  0.0      0     0 ?        Zs   13:51   0:00 [ssh] <defunct>
mistral     4161  0.0  0.0      0     0 ?        Zs   13:51   0:00 [ssh] <defunct>
mistral     4162  0.0  0.0      0     0 ?        Zs   13:51   0:00 [ssh] <defunct>
mistral     4164  0.0  0.0      0     0 ?        Zs   13:51   0:00 [ssh] <defunct>
mistral     4165  0.0  0.0      0     0 ?        Zs   13:51   0:00 [ssh] <defunct>
mistral     4167  0.0  0.0      0     0 ?        Zs   13:51   0:00 [ssh] <defunct>
mistral     4168  0.0  0.0      0     0 ?        Zs   13:51   0:00 [ssh] <defunct>
mistral     4404  0.0  0.0      0     0 ?        Zs   13:52   0:00 [ssh] <defunct>
mistral     4405  0.0  0.0      0     0 ?        Zs   13:52   0:00 [ssh] <defunct>
mistral     4426  0.0  0.0      0     0 ?        Zs   13:52   0:00 [ssh] <defunct>
mistral     4427  0.0  0.0  46488  3032 ?        Ss   13:52   0:01 ssh: /var/lib/mistral/overcloud/ansible-ssh/9d4a290937 [mux]
mistral     4429  0.0  0.0      0     0 ?        Zs   13:52   0:00 [ssh] <defunct>
mistral     4430  0.0  0.0  46456  3156 ?        Ss   13:52   0:01 ssh: /var/lib/mistral/overcloud/ansible-ssh/0eca69783a [mux]
mistral     4432  0.0  0.0      0     0 ?        Zs   13:52   0:00 [ssh] <defunct>
mistral     4433  0.0  0.0  46308  3352 ?        Ss   13:52   0:01 ssh: /var/lib/mistral/overcloud/ansible-ssh/89230ae28c [mux]
mistral     4435  0.0  0.0      0     0 ?        Zs   13:52   0:00 [ssh] <defunct>
mistral     4436  0.0  0.0  46356  3440 ?        Ss   13:52   0:00 ssh: /var/lib/mistral/overcloud/ansible-ssh/8af4971746 [mux]
mistral     4438  0.0  0.0      0     0 ?        Zs   13:52   0:00 [ssh] <defunct>
mistral     4439  0.0  0.0  46356  3264 ?        Ss   13:52   0:00 ssh: /var/lib/mistral/overcloud/ansible-ssh/fd0b1d8d7b [mux]
mistral     4441  0.0  0.0      0     0 ?        Zs   13:52   0:00 [ssh] <defunct>
mistral     4442  0.0  0.0  46336  3376 ?        Ss   13:52   0:00 ssh: /var/lib/mistral/overcloud/ansible-ssh/b35272565b [mux]
mistral     4445  0.0  0.0      0     0 ?        Zs   13:52   0:00 [ssh] <defunct>
mistral     4446  0.0  0.0      0     0 ?        Zs   13:52   0:00 [ssh] <defunct>
mistral     4447  0.0  0.0  46384  3164 ?        Ss   13:52   0:00 ssh: /var/lib/mistral/overcloud/ansible-ssh/2cb2323068 [mux]
mistral     4448  0.0  0.0  46472  3196 ?        Ss   13:52   0:00 ssh: /var/lib/mistral/overcloud/ansible-ssh/147c7420a3 [mux]

Expected results:

1) Mistral executor should be run as PID 2
2) other processes should be managed as a child of PID 1 managed by dumb init, and cleared out when the Ansible playbooks are done.

Comment 1 Lon Hohberger 2019-03-28 15:01:41 UTC
It's important to note is that this process list grows over time, effectively constituting a resource leak.

Comment 2 Emilien Macchi 2019-04-04 01:00:37 UTC
with latest compose from today:

$ sudo podman exec -it mistral_executor ps faux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
mistral        1  0.0  0.0   4208   640 ?        Ss   00:44   0:00 dumb-init --single-child ...
mistral        7  5.0  0.6 694040 154644 ?       R    00:44   0:42 /usr/bin/python3 /usr/bin/m...

Comment 8 errata-xmlrpc 2019-09-21 11:21:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:2811