1519765 – containerized HA rabbitmq stops on re-deploy if lsns fails

Bug 1519765 - containerized HA rabbitmq stops on re-deploy if lsns fails

Summary: containerized HA rabbitmq stops on re-deploy if lsns fails

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-tripleo-heat-templates
Sub Component:
Version:	12.0 (Pike)
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	z2
Target Release:	12.0 (Pike)
Assignee:	Damien Ciabrini
QA Contact:	Artem Hrechanychenko
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1522785 (view as bug list)
Depends On:
Blocks:	1505293
TreeView+	depends on / blocked

Reported:	2017-12-01 12:25 UTC by Damien Ciabrini
Modified:	2022-07-09 10:31 UTC (History)
CC List:	15 users (show)
Fixed In Version:	openstack-tripleo-heat-templates-7.0.9-1.el7ost
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-03-28 17:14:53 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Launchpad	1735698	None	None	None	2017-12-01 12:25:10 UTC
OpenStack gerrit	524749	'None'	MERGED	Do not use lsns to kill non-containerized epmd on the host	2020-12-10 08:41:52 UTC
Red Hat Issue Tracker	OSP-4787	None	None	None	2022-07-09 10:31:11 UTC
Red Hat Product Errata	RHSA-2018:0602	None	None	None	2018-03-28 17:15:58 UTC

Description Damien Ciabrini 2017-12-01 12:25:11 UTC

Description of problem:

When running overcloud deploy on an existing containerized HA cloud, one of the operations that are being run on the host is to kill any spurious epmd that might be running on the host. The logic relies on lsns to determine whether epmd is running on the host (spurious) on is containerized (from pacemaker-managed rabbitmq).

            for pid in $(pgrep epmd); do if [ "$(lsns -o NS -p $pid)" == "$(lsns -o NS -p 1)" ]; then kill $pid; break; fi; done

Problem with that logic is that if lsns errors out for whatever reasons, the current test always returns true and in turn the containerized epmd is killed. This unexpectedly kills messaging service and also prevents pacemaker from restarting it properly for unrelated reasons.


Version-Release number of selected component (if applicable):


How reproducible:
Always


Steps to Reproduce:
1. deploy a stack

2. force run an additional epmd on the host. As root on a controller:
. /etc/rabbitmq/rabbitmq-env.conf 
epmd -daemon

3. use the same command as 1 to redeploy on top of the existing stack

Actual results:
all epmd processes are killed

Expected results:
Only the epmd from the host should be killed

Additional info:

Comment 1 Damien Ciabrini 2017-12-03 15:52:57 UTC

Fixed and in stable/pike upstream in https://review.openstack.org/#/c/524749/

Comment 2 Artem Hrechanychenko 2017-12-04 11:23:31 UTC

Tested https://review.openstack.org/#/c/524749/
controller replacement was complete , instance was launch, overcloud is operable

Comment 3 Jaromir Coufal 2017-12-04 20:31:37 UTC

Is this only in case of controller replacement or is this issue also reproducable in any config update changes in overcloud, scaling, and/or other operations post initial overcloud deployment?

Comment 4 Damien Ciabrini 2017-12-05 08:33:55 UTC

(In reply to Jaromir Coufal from comment #3)
> Is this only in case of controller replacement or is this issue also
> reproducable in any config update changes in overcloud, scaling, and/or
> other operations post initial overcloud deployment?

I think this will not block minor updates, but this will block compute scaling.

Comment 5 Eran Kuris 2017-12-06 14:19:52 UTC

*** Bug 1522785 has been marked as a duplicate of this bug. ***

Comment 20 errata-xmlrpc 2018-03-28 17:14:53 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:0602

Note You need to log in before you can comment on or make changes to this bug.