Bug 1316215 - Scaling out causes OverCloud outage
Summary: Scaling out causes OverCloud outage
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 7.0 (Kilo)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Jiri Stransky
QA Contact: Shai Revivo
URL:
Whiteboard:
Depends On: 1484964
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-03-09 17:09 UTC by David Juran
Modified: 2019-12-16 05:30 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-10-18 20:30:49 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description David Juran 2016-03-09 17:09:46 UTC
Description of problem:
In the process of scaling out an OverCloud using OSP-d, e.g. adding a compute node, there is a period when the OverCloud become unavailable. This is far from ideal as it affects the uptime of the OpenStack. Handling the services in such a way that one node always would be available to server user requests would be much preferred. 

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-0.8.6-121.el7ost.noarch

How reproducible:
Every time

Steps to Reproduce:
1. Add an compute node
2. Listen to your users asking why OpenStack is down


Additional info:
In some cases I've seen nova-compute timing out the connection to RabbitMQ, causing the compute node to go off-line.

Comment 2 Mike Burns 2016-04-07 21:14:44 UTC
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.

Comment 4 David Juran 2016-06-03 14:14:40 UTC

*** This bug has been marked as a duplicate of bug 1339559 ***

Comment 5 David Juran 2016-07-27 10:37:33 UTC
Actually, on closer look, I'm un-duplicating this bug. Bz 1339559 is regarding not restarting services when scaling out. 
This bug is related, but covers a broader topic, what I would like to see is that when services are restarted, that the restart is orchestrated in a rolling way, such that in a rolling setup, there always is a running control node. With other words, even when a service restart is required, end-users will not experience a total outage

Comment 9 Emilien Macchi 2017-10-12 01:20:21 UTC
have you tried https://access.redhat.com/solutions/2345231 ?

Comment 10 David Juran 2017-10-12 13:25:51 UTC
I don't have any immediate need for a workaround right now but I am monitoring Bz1421883 as the kbase suggests

Comment 11 Alex Schultz 2017-10-18 20:30:49 UTC
This has been addressed in newer versions. Please upgrade to 10 where this should no longer be an issue. We won't be fixing this for OSP7


Note You need to log in before you can comment on or make changes to this bug.