Bug 1423001 - Openstack Director updates to an even number of dedicate Ceph monitor nodes
Summary: Openstack Director updates to an even number of dedicate Ceph monitor nodes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-validations
Version: 11.0 (Ocata)
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: beta
: 11.0 (Ocata)
Assignee: Giulio Fidente
QA Contact: Yogev Rabl
Derek
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-02-16 19:59 UTC by Yogev Rabl
Modified: 2017-05-17 20:00 UTC (History)
12 users (show)

Fixed In Version: openstack-tripleo-validations-5.4.0-6.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-05-17 20:00:07 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:1245 normal SHIPPED_LIVE Red Hat OpenStack Platform 11.0 Bug Fix and Enhancement Advisory 2017-05-17 23:01:50 UTC
OpenStack gerrit 436194 None None None 2017-02-23 16:38:52 UTC

Description Yogev Rabl 2017-02-16 19:59:34 UTC
Description of problem:
Openstack director's templates were set to deploy the Ceph monitor as the single service on block-storage nodes. An update of the Overcloud from 3 nodes of block-storage nodes passed and the director created 4 monitors in quorum in the cluster.

Version-Release number of selected component (if applicable):
openstack-tripleo-validations-5.3.1-0.20170125194508.6b928f1.el7ost.noarch
openstack-tripleo-common-5.7.1-0.20170126235054.c75d3c6.el7ost.noarch
puppet-tripleo-6.1.0-0.20170127040716.d427c2a.el7ost.noarch
openstack-tripleo-puppet-elements-6.0.0-0.20170126053436.688584c.el7ost.noarch
openstack-tripleo-0.0.8-0.2.4de13b3git.el7ost.noarch
openstack-tripleo-heat-templates-6.0.0-0.20170127041112.ce54697.el7ost.1.noarch
openstack-tripleo-ui-2.0.1-0.20170126144317.f3bd97e.el7ost.noarch
python-tripleoclient-6.0.1-0.20170127055753.8ea289c.el7ost.noarch
openstack-tripleo-image-elements-6.0.0-0.20170126135810.00b9869.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. deploy an overcloud with dedicated Ceph monitor node (as described in Bug 1232958 )  
2. add another node to the deployment command and update the Overcloud
3.

Actual results:
The update creates another Ceph monitor, adds it to quorum. 

Expected results:
OSP director wont run the update, show a message: can't deploy an even number of Ceph monitor service.


Additional info:

Comment 2 Alex Schultz 2017-02-16 22:18:37 UTC
We don't support this kind of validation for roles or node counts so this would need to be an RFE

Comment 3 Giulio Fidente 2017-02-17 08:44:59 UTC
Yogev, from the 'ceph status' output you sent me, we appear to have completed successfully the update producing a 4 nodes cluster in healthy state ... so I agree it would be better to use an uneven number of monitors but ceph itself doesn't prevent you from using an even number so I am not sure director should.

Comment 4 Yogev Rabl 2017-02-20 15:28:05 UTC
In addition to the description:
A fresh deployment of two dedicated Ceph monitor nodes ended successfully with both of them in quorum. 
The templates were set to use the block-storage node as a dedicated Ceph monitor node. The Overcloud topology was:
 - 3 Control nodes
 - 2 Block storage nodes
 - 3 Ceph storage nodes (each with 10 OSDs) 
 - 2 Compute nodes
The deployment started without any warning or a sign that there will be an even number of Ceph monitors in the cluster.

Comment 5 Giulio Fidente 2017-02-20 21:26:19 UTC
I am adding a warning message in the post-deployment validations printed if the cluster is in HEALTH_WARN state.

If Ceph returns HEALTH_OK with two and/or any other even number of ceph-mon instances, I don't think we should stop deployment of an even number of nodes in tripleo.

Comment 7 Kyle Bader 2017-03-27 17:09:34 UTC
The problem isn't so much even, or odd, it's that three monitors are required for HA. A transitional state of 4 monitors is not problematic. The problem arises during leader election (paxos). There are situations where you would want to have an even number:

* Scaling from 3 monitors to an eventual 5
* Provisioning a 4th monitor with the intention of retiring an old monitor

When an operator goes to deploy a HA OSP control plane, is this something that is enforced programmatically?

For example, if HA OSP requires 3 controller nodes, and the templates contain 2, do we block installation?

If we do block installation, then we should have a way of enforcing similar requirements for other components (eg. Ceph).

Comment 8 Giulio Fidente 2017-03-27 19:19:37 UTC
hi Kyle, thanks for commenting.

Currently OSPd does not enforce (nor block) the deployment of a specific number of MONs, OSDs or even MDSs. Instead this BZ added a post-install validation task which prints a warning message if at the end of the deployment the Ceph cluster is not in HEALTH_OK.

My goal is to make OSPd more verbose if Ceph is in warning state; given that deploying an odd number of nodes isn't even worth a warning in Ceph, I don't think OSPd should prevent that.

Comment 12 Yogev Rabl 2017-04-19 02:26:45 UTC
verified on openstack-tripleo-validations-5.4.0-7.el7ost.noarch

Comment 13 errata-xmlrpc 2017-05-17 20:00:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1245


Note You need to log in before you can comment on or make changes to this bug.