1684087 – master/etcd replicas should be protected or limited to be modified.

Bug 1684087 - master/etcd replicas should be protected or limited to be modified.

Summary: master/etcd replicas should be protected or limited to be modified.

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	4.1.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.1.0
Assignee:	Abhinav Dahiya
QA Contact:	Johnny Liu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-02-28 11:37 UTC by Johnny Liu
Modified:	2019-04-10 10:10 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-04-03 17:48:54 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Johnny Liu 2019-02-28 11:37:35 UTC

Description of problem:
As far as I know, etcd is co-working with master on the same machine. According to my understanding, etcd cluster need 3 etd member as a minimal. If I am right, installer should protect user from modifying controlPlane.replicas to <3, and must be odd number.

Version-Release number of the following components:
v4.0.5-1-dirty

How reproducible:
Always

Steps to Reproduce:
1. Create install-config.yaml via openshift-install tool.
2. Modify controlPlane.replicas to 1 in install-config.yaml
3. Trigger install

Actual results:
Installer give no any warning and error to complain the incorrect controlPlane.replicas number.
After installation is completed, oc command failed due to apiserver is not ready which is caused by etcd cluster is not ready.
# oc get node
The connection to the server api.qe-jialiu1.qe.devcluster.openshift.com:6443 was refused - did you specify the right host or port?


Expected results:
Installer should warning and exit the installation when controlPlane.replicas is moidifed to <3, and not an odd number.

Additional info:
When modify controlPlane.replicas to 2, installation is completed, cluster is running well. But I do not think this is reasonable, because it is not comply with etcd cluster disaster recoverability.

Comment 1 Alex Crawford 2019-03-01 18:33:54 UTC


*** This bug has been marked as a duplicate of bug 1679772 ***

Comment 2 Johnny Liu 2019-03-07 10:52:03 UTC

I do not think this is a dup with bug 1679772.

This bug is talking about user mis-configure master number in install-config.yaml upon a fresh install, while bug 1679772 is talking about user mistakenly oc delete master or delete master instance via aws api as some day 2 operation.

In this bug, I am requesting installer should validate master number before trigger install.

Comment 4 Abhinav Dahiya 2019-03-21 21:35:17 UTC

> If I am right, installer should protect user from modifying controlPlane.replicas to <3, and must be odd number.
> When modify controlPlane.replicas to 2, installation is completed, cluster is running well. But I do not think this is reasonable, because it is not comply with etcd cluster disaster recoverability.

Actually we require atleast one master. Any configuration that is >=1 master is a *valid* configuration.

for example, having 4 masters is not wrong, it's just that the 4th etcd member is not adding to the high availability of the etcd cluster ie. it can still tolerate only one master being down.




> Installer give no any warning and error to complain the incorrect controlPlane.replicas number.
> After installation is completed, oc command failed due to apiserver is not ready which is caused by etcd cluster is not ready.
> # oc get node
> The connection to the server api.qe-jialiu1.qe.devcluster.openshift.com:6443 was refused - did you specify the right host or port?

The default install-config for AWS gives you 3 control plane machines, if you purposefully choose 1 control plane machine, you *the user* has decided that HA is not a requirement so the installer accepts the user's decision.

Comment 5 Johnny Liu 2019-03-22 03:10:51 UTC

(In reply to Abhinav Dahiya from comment #4)
> > If I am right, installer should protect user from modifying controlPlane.replicas to <3, and must be odd number.
> > When modify controlPlane.replicas to 2, installation is completed, cluster is running well. But I do not think this is reasonable, because it is not comply with etcd cluster disaster recoverability.
> 
> Actually we require atleast one master. Any configuration that is >=1 master
> is a *valid* configuration.
> 
> for example, having 4 masters is not wrong, it's just that the 4th etcd
> member is not adding to the high availability of the etcd cluster ie. it can
> still tolerate only one master being down.
> 
> 
> 
> 
> > Installer give no any warning and error to complain the incorrect controlPlane.replicas number.
> > After installation is completed, oc command failed due to apiserver is not ready which is caused by etcd cluster is not ready.
> > # oc get node
> > The connection to the server api.qe-jialiu1.qe.devcluster.openshift.com:6443 was refused - did you specify the right host or port?
> 
> The default install-config for AWS gives you 3 control plane machines, if
> you purposefully choose 1 control plane machine, you *the user* has decided
> that HA is not a requirement so the installer accepts the user's decision.

Just like your above statement - "Any configuration that is >=1 master is a *valid* configuration", according to my test result, the cluster totally does not work. In a word, user purposefully choose 1 control plane machine (which is a valid configuration), but the cluster does not work.

Comment 6 Abhinav Dahiya 2019-03-25 21:37:10 UTC

(In reply to Johnny Liu from comment #5)
> (In reply to Abhinav Dahiya from comment #4)
> > > If I am right, installer should protect user from modifying controlPlane.replicas to <3, and must be odd number.
> > > When modify controlPlane.replicas to 2, installation is completed, cluster is running well. But I do not think this is reasonable, because it is not comply with etcd cluster disaster recoverability.
> > 
> > Actually we require atleast one master. Any configuration that is >=1 master
> > is a *valid* configuration.
> > 
> > for example, having 4 masters is not wrong, it's just that the 4th etcd
> > member is not adding to the high availability of the etcd cluster ie. it can
> > still tolerate only one master being down.
> > 
> > 
> > 
> > 
> > > Installer give no any warning and error to complain the incorrect controlPlane.replicas number.
> > > After installation is completed, oc command failed due to apiserver is not ready which is caused by etcd cluster is not ready.
> > > # oc get node
> > > The connection to the server api.qe-jialiu1.qe.devcluster.openshift.com:6443 was refused - did you specify the right host or port?
> > 
> > The default install-config for AWS gives you 3 control plane machines, if
> > you purposefully choose 1 control plane machine, you *the user* has decided
> > that HA is not a requirement so the installer accepts the user's decision.
> 
> Just like your above statement - "Any configuration that is >=1 master is a
> *valid* configuration", according to my test result, the cluster totally
> does not work. 

Can you provide details around what is not working. For example, all libvirt clusters are created with single control plane host by default and we have not seen bugs claiming the cluster does not work at all.

> In a word, user purposefully choose 1 control plane machine
> (which is a valid configuration), but the cluster does not work.

Comment 7 Alex Crawford 2019-04-03 17:48:54 UTC

Closing due to inactivity. As far as we know, single node control planes work as intended.

Comment 8 Johnny Liu 2019-04-10 10:10:33 UTC

Just run the same testing using 4.0.0-0.nightly-2019-04-05-165550, 1 master + 1 worker installation is completed successfully.

Note You need to log in before you can comment on or make changes to this bug.