Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1519765

Summary: containerized HA rabbitmq stops on re-deploy if lsns fails
Product: Red Hat OpenStack Reporter: Damien Ciabrini <dciabrin>
Component: openstack-tripleo-heat-templatesAssignee: Damien Ciabrini <dciabrin>
Status: CLOSED ERRATA QA Contact: Artem Hrechanychenko <ahrechan>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 12.0 (Pike)CC: agurenko, aschultz, chjones, dciabrin, ekuris, jcoufal, jschluet, mburns, michele, mkrcmari, nalmond, ohochman, pkomarov, rbartal, rhel-osp-director-maint
Target Milestone: z2Keywords: Triaged, ZStream
Target Release: 12.0 (Pike)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-7.0.9-1.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-03-28 17:14:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1505293    

Description Damien Ciabrini 2017-12-01 12:25:11 UTC
Description of problem:

When running overcloud deploy on an existing containerized HA cloud, one of the operations that are being run on the host is to kill any spurious epmd that might be running on the host. The logic relies on lsns to determine whether epmd is running on the host (spurious) on is containerized (from pacemaker-managed rabbitmq).

            for pid in $(pgrep epmd); do if [ "$(lsns -o NS -p $pid)" == "$(lsns -o NS -p 1)" ]; then kill $pid; break; fi; done

Problem with that logic is that if lsns errors out for whatever reasons, the current test always returns true and in turn the containerized epmd is killed. This unexpectedly kills messaging service and also prevents pacemaker from restarting it properly for unrelated reasons.


Version-Release number of selected component (if applicable):


How reproducible:
Always


Steps to Reproduce:
1. deploy a stack

2. force run an additional epmd on the host. As root on a controller:
. /etc/rabbitmq/rabbitmq-env.conf 
epmd -daemon

3. use the same command as 1 to redeploy on top of the existing stack

Actual results:
all epmd processes are killed

Expected results:
Only the epmd from the host should be killed

Additional info:

Comment 1 Damien Ciabrini 2017-12-03 15:52:57 UTC
Fixed and in stable/pike upstream in https://review.openstack.org/#/c/524749/

Comment 2 Artem Hrechanychenko 2017-12-04 11:23:31 UTC
Tested https://review.openstack.org/#/c/524749/
controller replacement was complete , instance was launch, overcloud is operable

Comment 3 Jaromir Coufal 2017-12-04 20:31:37 UTC
Is this only in case of controller replacement or is this issue also reproducable in any config update changes in overcloud, scaling, and/or other operations post initial overcloud deployment?

Comment 4 Damien Ciabrini 2017-12-05 08:33:55 UTC
(In reply to Jaromir Coufal from comment #3)
> Is this only in case of controller replacement or is this issue also
> reproducable in any config update changes in overcloud, scaling, and/or
> other operations post initial overcloud deployment?

I think this will not block minor updates, but this will block compute scaling.

Comment 5 Eran Kuris 2017-12-06 14:19:52 UTC
*** Bug 1522785 has been marked as a duplicate of this bug. ***

Comment 20 errata-xmlrpc 2018-03-28 17:14:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:0602