Description of problem: When running overcloud deploy on an existing containerized HA cloud, one of the operations that are being run on the host is to kill any spurious epmd that might be running on the host. The logic relies on lsns to determine whether epmd is running on the host (spurious) on is containerized (from pacemaker-managed rabbitmq). for pid in $(pgrep epmd); do if [ "$(lsns -o NS -p $pid)" == "$(lsns -o NS -p 1)" ]; then kill $pid; break; fi; done Problem with that logic is that if lsns errors out for whatever reasons, the current test always returns true and in turn the containerized epmd is killed. This unexpectedly kills messaging service and also prevents pacemaker from restarting it properly for unrelated reasons. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. deploy a stack 2. force run an additional epmd on the host. As root on a controller: . /etc/rabbitmq/rabbitmq-env.conf epmd -daemon 3. use the same command as 1 to redeploy on top of the existing stack Actual results: all epmd processes are killed Expected results: Only the epmd from the host should be killed Additional info:
Fixed and in stable/pike upstream in https://review.openstack.org/#/c/524749/
Tested https://review.openstack.org/#/c/524749/ controller replacement was complete , instance was launch, overcloud is operable
Is this only in case of controller replacement or is this issue also reproducable in any config update changes in overcloud, scaling, and/or other operations post initial overcloud deployment?
(In reply to Jaromir Coufal from comment #3) > Is this only in case of controller replacement or is this issue also > reproducable in any config update changes in overcloud, scaling, and/or > other operations post initial overcloud deployment? I think this will not block minor updates, but this will block compute scaling.
*** Bug 1522785 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:0602