Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1316215

Summary: Scaling out causes OverCloud outage
Product: Red Hat OpenStack Reporter: David Juran <djuran>
Component: openstack-tripleo-heat-templatesAssignee: Jiri Stransky <jstransk>
Status: CLOSED CURRENTRELEASE QA Contact: Shai Revivo <srevivo>
Severity: high Docs Contact:
Priority: high    
Version: 7.0 (Kilo)CC: aschultz, djuran, emacchi, jcoufal, jslagle, mburns, rhel-osp-director-maint, sclewis, vcojot
Target Milestone: ---Keywords: Reopened, Triaged, ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-10-18 20:30:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1484964    
Bug Blocks:    

Description David Juran 2016-03-09 17:09:46 UTC
Description of problem:
In the process of scaling out an OverCloud using OSP-d, e.g. adding a compute node, there is a period when the OverCloud become unavailable. This is far from ideal as it affects the uptime of the OpenStack. Handling the services in such a way that one node always would be available to server user requests would be much preferred. 

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-0.8.6-121.el7ost.noarch

How reproducible:
Every time

Steps to Reproduce:
1. Add an compute node
2. Listen to your users asking why OpenStack is down


Additional info:
In some cases I've seen nova-compute timing out the connection to RabbitMQ, causing the compute node to go off-line.

Comment 2 Mike Burns 2016-04-07 21:14:44 UTC
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.

Comment 4 David Juran 2016-06-03 14:14:40 UTC

*** This bug has been marked as a duplicate of bug 1339559 ***

Comment 5 David Juran 2016-07-27 10:37:33 UTC
Actually, on closer look, I'm un-duplicating this bug. Bz 1339559 is regarding not restarting services when scaling out. 
This bug is related, but covers a broader topic, what I would like to see is that when services are restarted, that the restart is orchestrated in a rolling way, such that in a rolling setup, there always is a running control node. With other words, even when a service restart is required, end-users will not experience a total outage

Comment 9 Emilien Macchi 2017-10-12 01:20:21 UTC
have you tried https://access.redhat.com/solutions/2345231 ?

Comment 10 David Juran 2017-10-12 13:25:51 UTC
I don't have any immediate need for a workaround right now but I am monitoring Bz1421883 as the kbase suggests

Comment 11 Alex Schultz 2017-10-18 20:30:49 UTC
This has been addressed in newer versions. Please upgrade to 10 where this should no longer be an issue. We won't be fixing this for OSP7