Description of problem: Since config-download is the default, we now run Ansible playbooks from the Mistral Executor containers which will remotely apply the configuration on the Overcloud. It obviously creates a lot of SSH connections processes and if there is no init in place, nothing will clean the zombies. Upstream Kolla deploys dumb init, which allows to run the applications after PID 1 and managed by an init in the container. Downstream, we don't package dumb init therefore we don't have any mechanism in place. Version-Release number of selected component (if applicable): OSP14 and OSP15 How reproducible: Steps to Reproduce: Deploy an Undercloud and then an Overcloud Actual results: As a result, we can see a lot of dead processes on the Undercloud. Example in the Mistral Executor container: ()[root@undercloud /]$ ps faux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 22047 0.0 0.0 12028 3344 pts/0 Ss 14:56 0:00 sh root 22101 0.0 0.0 44092 3408 pts/0 R+ 14:56 0:00 \_ ps faux mistral 1 0.4 0.6 701904 160252 ? Ss 04:24 3:05 /usr/bin/python3 /usr/bin/mistral-server --config-file=/etc/mistral/mistral.conf --log-file=/var/log/mistral/executor.log --server=executor mistral 4146 0.0 0.0 0 0 ? Zs 13:51 0:00 [ssh] <defunct> mistral 4147 0.0 0.0 0 0 ? Zs 13:51 0:00 [ssh] <defunct> mistral 4149 0.0 0.0 0 0 ? Zs 13:51 0:00 [ssh] <defunct> mistral 4150 0.0 0.0 0 0 ? Zs 13:51 0:00 [ssh] <defunct> mistral 4152 0.0 0.0 0 0 ? Zs 13:51 0:00 [ssh] <defunct> mistral 4153 0.0 0.0 0 0 ? Zs 13:51 0:00 [ssh] <defunct> mistral 4155 0.0 0.0 0 0 ? Zs 13:51 0:00 [ssh] <defunct> mistral 4156 0.0 0.0 0 0 ? Zs 13:51 0:00 [ssh] <defunct> mistral 4158 0.0 0.0 0 0 ? Zs 13:51 0:00 [ssh] <defunct> mistral 4159 0.0 0.0 0 0 ? Zs 13:51 0:00 [ssh] <defunct> mistral 4161 0.0 0.0 0 0 ? Zs 13:51 0:00 [ssh] <defunct> mistral 4162 0.0 0.0 0 0 ? Zs 13:51 0:00 [ssh] <defunct> mistral 4164 0.0 0.0 0 0 ? Zs 13:51 0:00 [ssh] <defunct> mistral 4165 0.0 0.0 0 0 ? Zs 13:51 0:00 [ssh] <defunct> mistral 4167 0.0 0.0 0 0 ? Zs 13:51 0:00 [ssh] <defunct> mistral 4168 0.0 0.0 0 0 ? Zs 13:51 0:00 [ssh] <defunct> mistral 4404 0.0 0.0 0 0 ? Zs 13:52 0:00 [ssh] <defunct> mistral 4405 0.0 0.0 0 0 ? Zs 13:52 0:00 [ssh] <defunct> mistral 4426 0.0 0.0 0 0 ? Zs 13:52 0:00 [ssh] <defunct> mistral 4427 0.0 0.0 46488 3032 ? Ss 13:52 0:01 ssh: /var/lib/mistral/overcloud/ansible-ssh/9d4a290937 [mux] mistral 4429 0.0 0.0 0 0 ? Zs 13:52 0:00 [ssh] <defunct> mistral 4430 0.0 0.0 46456 3156 ? Ss 13:52 0:01 ssh: /var/lib/mistral/overcloud/ansible-ssh/0eca69783a [mux] mistral 4432 0.0 0.0 0 0 ? Zs 13:52 0:00 [ssh] <defunct> mistral 4433 0.0 0.0 46308 3352 ? Ss 13:52 0:01 ssh: /var/lib/mistral/overcloud/ansible-ssh/89230ae28c [mux] mistral 4435 0.0 0.0 0 0 ? Zs 13:52 0:00 [ssh] <defunct> mistral 4436 0.0 0.0 46356 3440 ? Ss 13:52 0:00 ssh: /var/lib/mistral/overcloud/ansible-ssh/8af4971746 [mux] mistral 4438 0.0 0.0 0 0 ? Zs 13:52 0:00 [ssh] <defunct> mistral 4439 0.0 0.0 46356 3264 ? Ss 13:52 0:00 ssh: /var/lib/mistral/overcloud/ansible-ssh/fd0b1d8d7b [mux] mistral 4441 0.0 0.0 0 0 ? Zs 13:52 0:00 [ssh] <defunct> mistral 4442 0.0 0.0 46336 3376 ? Ss 13:52 0:00 ssh: /var/lib/mistral/overcloud/ansible-ssh/b35272565b [mux] mistral 4445 0.0 0.0 0 0 ? Zs 13:52 0:00 [ssh] <defunct> mistral 4446 0.0 0.0 0 0 ? Zs 13:52 0:00 [ssh] <defunct> mistral 4447 0.0 0.0 46384 3164 ? Ss 13:52 0:00 ssh: /var/lib/mistral/overcloud/ansible-ssh/2cb2323068 [mux] mistral 4448 0.0 0.0 46472 3196 ? Ss 13:52 0:00 ssh: /var/lib/mistral/overcloud/ansible-ssh/147c7420a3 [mux] Expected results: 1) Mistral executor should be run as PID 2 2) other processes should be managed as a child of PID 1 managed by dumb init, and cleared out when the Ansible playbooks are done.
It's important to note is that this process list grows over time, effectively constituting a resource leak.
with latest compose from today: $ sudo podman exec -it mistral_executor ps faux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND mistral 1 0.0 0.0 4208 640 ? Ss 00:44 0:00 dumb-init --single-child ... mistral 7 5.0 0.6 694040 154644 ? R 00:44 0:42 /usr/bin/python3 /usr/bin/m...
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:2811