Description of problem: After a successful stack update "Openstack orchestration service list" shows dead engine workers. Referring BZ 1730994, it's clear that there is no impact on the environment but still customer wants to know if any backend operations are making these processes as dead then it should automatically get clean-out from the list without any manual intervention. Version-Release number of selected component (if applicable): ~~~ heat-cfntools-1.3.0-2.el7ost.noarch Fri Nov 27 18:06:42 2020 openstack-heat-api-10.0.3-13.el7ost.noarch Fri Nov 27 18:22:27 2020 openstack-heat-api-cfn-10.0.3-13.el7ost.noarch Fri Nov 27 18:22:31 2020 openstack-heat-common-10.0.3-13.el7ost.noarch Fri Nov 27 18:22:23 2020 openstack-heat-engine-10.0.3-13.el7ost.noarch Fri Nov 27 18:22:36 2020 openstack-tripleo-heat-templates-8.4.1-58.el7ost.noarch Fri Nov 27 18:14:37 2020 puppet-heat-12.4.1-0.20200413050249.d61d033.el7ost.noarch Fri Nov 27 18:13:51 2020 python2-heatclient-1.14.1-1.el7ost.noarch Fri Nov 27 18:14:32 2020 python-heat-agent-1.5.4-1.el7ost.noarch Fri Nov 27 18:14:34 2020 ~~~ How reproducible: Yes Steps to Reproduce: Perform a minor update on the stack check the orchestration list output. Actual results: We can see dead heat-engine workers/processes in the "Openstack orchestration service list" after a successful stack update. Expected results: There should not be dead heat-engine workers/processes if the stack update is successfull. Additional info:
docker stop kills processes after 10s (default) timeout when heat-engines are restarted after an update. That's why you see some heat-engines with status 'down' as those were not stopped gracefully. We've increased that grace period to 60s in OSP14[1] and above that would reduce the occurrence of the above. In the mean time a cron job with 'heat-manage service clean' should clean those dead engine workers. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1641667#c8 *** This bug has been marked as a duplicate of bug 1641667 ***