Bug 1423001 - Openstack Director updates to an even number of dedicate Ceph monitor nodes
Summary: Openstack Director updates to an even number of dedicate Ceph monitor nodes
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-validations
Version: 11.0 (Ocata)
Hardware: x86_64
OS: Linux
Target Milestone: beta
: 11.0 (Ocata)
Assignee: Giulio Fidente
QA Contact: Yogev Rabl
Depends On:
TreeView+ depends on / blocked
Reported: 2017-02-16 19:59 UTC by Yogev Rabl
Modified: 2017-05-17 20:00 UTC (History)
12 users (show)

Fixed In Version: openstack-tripleo-validations-5.4.0-6.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2017-05-17 20:00:07 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
OpenStack gerrit 436194 0 None None None 2017-02-23 16:38:52 UTC
Red Hat Product Errata RHEA-2017:1245 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 11.0 Bug Fix and Enhancement Advisory 2017-05-17 23:01:50 UTC

Description Yogev Rabl 2017-02-16 19:59:34 UTC
Description of problem:
Openstack director's templates were set to deploy the Ceph monitor as the single service on block-storage nodes. An update of the Overcloud from 3 nodes of block-storage nodes passed and the director created 4 monitors in quorum in the cluster.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. deploy an overcloud with dedicated Ceph monitor node (as described in Bug 1232958 )  
2. add another node to the deployment command and update the Overcloud

Actual results:
The update creates another Ceph monitor, adds it to quorum. 

Expected results:
OSP director wont run the update, show a message: can't deploy an even number of Ceph monitor service.

Additional info:

Comment 2 Alex Schultz 2017-02-16 22:18:37 UTC
We don't support this kind of validation for roles or node counts so this would need to be an RFE

Comment 3 Giulio Fidente 2017-02-17 08:44:59 UTC
Yogev, from the 'ceph status' output you sent me, we appear to have completed successfully the update producing a 4 nodes cluster in healthy state ... so I agree it would be better to use an uneven number of monitors but ceph itself doesn't prevent you from using an even number so I am not sure director should.

Comment 4 Yogev Rabl 2017-02-20 15:28:05 UTC
In addition to the description:
A fresh deployment of two dedicated Ceph monitor nodes ended successfully with both of them in quorum. 
The templates were set to use the block-storage node as a dedicated Ceph monitor node. The Overcloud topology was:
 - 3 Control nodes
 - 2 Block storage nodes
 - 3 Ceph storage nodes (each with 10 OSDs) 
 - 2 Compute nodes
The deployment started without any warning or a sign that there will be an even number of Ceph monitors in the cluster.

Comment 5 Giulio Fidente 2017-02-20 21:26:19 UTC
I am adding a warning message in the post-deployment validations printed if the cluster is in HEALTH_WARN state.

If Ceph returns HEALTH_OK with two and/or any other even number of ceph-mon instances, I don't think we should stop deployment of an even number of nodes in tripleo.

Comment 7 Kyle Bader 2017-03-27 17:09:34 UTC
The problem isn't so much even, or odd, it's that three monitors are required for HA. A transitional state of 4 monitors is not problematic. The problem arises during leader election (paxos). There are situations where you would want to have an even number:

* Scaling from 3 monitors to an eventual 5
* Provisioning a 4th monitor with the intention of retiring an old monitor

When an operator goes to deploy a HA OSP control plane, is this something that is enforced programmatically?

For example, if HA OSP requires 3 controller nodes, and the templates contain 2, do we block installation?

If we do block installation, then we should have a way of enforcing similar requirements for other components (eg. Ceph).

Comment 8 Giulio Fidente 2017-03-27 19:19:37 UTC
hi Kyle, thanks for commenting.

Currently OSPd does not enforce (nor block) the deployment of a specific number of MONs, OSDs or even MDSs. Instead this BZ added a post-install validation task which prints a warning message if at the end of the deployment the Ceph cluster is not in HEALTH_OK.

My goal is to make OSPd more verbose if Ceph is in warning state; given that deploying an odd number of nodes isn't even worth a warning in Ceph, I don't think OSPd should prevent that.

Comment 12 Yogev Rabl 2017-04-19 02:26:45 UTC
verified on openstack-tripleo-validations-5.4.0-7.el7ost.noarch

Comment 13 errata-xmlrpc 2017-05-17 20:00:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.