Bug 1519765 - containerized HA rabbitmq stops on re-deploy if lsns fails
Summary: containerized HA rabbitmq stops on re-deploy if lsns fails
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 12.0 (Pike)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: z2
: 12.0 (Pike)
Assignee: Damien Ciabrini
QA Contact: Artem Hrechanychenko
URL:
Whiteboard:
: 1522785 (view as bug list)
Depends On:
Blocks: 1505293
TreeView+ depends on / blocked
 
Reported: 2017-12-01 12:25 UTC by Damien Ciabrini
Modified: 2022-07-09 10:31 UTC (History)
15 users (show)

Fixed In Version: openstack-tripleo-heat-templates-7.0.9-1.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-03-28 17:14:53 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1735698 0 None None None 2017-12-01 12:25:10 UTC
OpenStack gerrit 524749 0 'None' MERGED Do not use lsns to kill non-containerized epmd on the host 2020-12-10 08:41:52 UTC
Red Hat Issue Tracker OSP-4787 0 None None None 2022-07-09 10:31:11 UTC
Red Hat Product Errata RHSA-2018:0602 0 None None None 2018-03-28 17:15:58 UTC

Description Damien Ciabrini 2017-12-01 12:25:11 UTC
Description of problem:

When running overcloud deploy on an existing containerized HA cloud, one of the operations that are being run on the host is to kill any spurious epmd that might be running on the host. The logic relies on lsns to determine whether epmd is running on the host (spurious) on is containerized (from pacemaker-managed rabbitmq).

            for pid in $(pgrep epmd); do if [ "$(lsns -o NS -p $pid)" == "$(lsns -o NS -p 1)" ]; then kill $pid; break; fi; done

Problem with that logic is that if lsns errors out for whatever reasons, the current test always returns true and in turn the containerized epmd is killed. This unexpectedly kills messaging service and also prevents pacemaker from restarting it properly for unrelated reasons.


Version-Release number of selected component (if applicable):


How reproducible:
Always


Steps to Reproduce:
1. deploy a stack

2. force run an additional epmd on the host. As root on a controller:
. /etc/rabbitmq/rabbitmq-env.conf 
epmd -daemon

3. use the same command as 1 to redeploy on top of the existing stack

Actual results:
all epmd processes are killed

Expected results:
Only the epmd from the host should be killed

Additional info:

Comment 1 Damien Ciabrini 2017-12-03 15:52:57 UTC
Fixed and in stable/pike upstream in https://review.openstack.org/#/c/524749/

Comment 2 Artem Hrechanychenko 2017-12-04 11:23:31 UTC
Tested https://review.openstack.org/#/c/524749/
controller replacement was complete , instance was launch, overcloud is operable

Comment 3 Jaromir Coufal 2017-12-04 20:31:37 UTC
Is this only in case of controller replacement or is this issue also reproducable in any config update changes in overcloud, scaling, and/or other operations post initial overcloud deployment?

Comment 4 Damien Ciabrini 2017-12-05 08:33:55 UTC
(In reply to Jaromir Coufal from comment #3)
> Is this only in case of controller replacement or is this issue also
> reproducable in any config update changes in overcloud, scaling, and/or
> other operations post initial overcloud deployment?

I think this will not block minor updates, but this will block compute scaling.

Comment 5 Eran Kuris 2017-12-06 14:19:52 UTC
*** Bug 1522785 has been marked as a duplicate of this bug. ***

Comment 20 errata-xmlrpc 2018-03-28 17:14:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:0602


Note You need to log in before you can comment on or make changes to this bug.