Description of problem: As far as I know, etcd is co-working with master on the same machine. According to my understanding, etcd cluster need 3 etd member as a minimal. If I am right, installer should protect user from modifying controlPlane.replicas to <3, and must be odd number. Version-Release number of the following components: v4.0.5-1-dirty How reproducible: Always Steps to Reproduce: 1. Create install-config.yaml via openshift-install tool. 2. Modify controlPlane.replicas to 1 in install-config.yaml 3. Trigger install Actual results: Installer give no any warning and error to complain the incorrect controlPlane.replicas number. After installation is completed, oc command failed due to apiserver is not ready which is caused by etcd cluster is not ready. # oc get node The connection to the server api.qe-jialiu1.qe.devcluster.openshift.com:6443 was refused - did you specify the right host or port? Expected results: Installer should warning and exit the installation when controlPlane.replicas is moidifed to <3, and not an odd number. Additional info: When modify controlPlane.replicas to 2, installation is completed, cluster is running well. But I do not think this is reasonable, because it is not comply with etcd cluster disaster recoverability.
*** This bug has been marked as a duplicate of bug 1679772 ***
I do not think this is a dup with bug 1679772. This bug is talking about user mis-configure master number in install-config.yaml upon a fresh install, while bug 1679772 is talking about user mistakenly oc delete master or delete master instance via aws api as some day 2 operation. In this bug, I am requesting installer should validate master number before trigger install.
> If I am right, installer should protect user from modifying controlPlane.replicas to <3, and must be odd number. > When modify controlPlane.replicas to 2, installation is completed, cluster is running well. But I do not think this is reasonable, because it is not comply with etcd cluster disaster recoverability. Actually we require atleast one master. Any configuration that is >=1 master is a *valid* configuration. for example, having 4 masters is not wrong, it's just that the 4th etcd member is not adding to the high availability of the etcd cluster ie. it can still tolerate only one master being down. > Installer give no any warning and error to complain the incorrect controlPlane.replicas number. > After installation is completed, oc command failed due to apiserver is not ready which is caused by etcd cluster is not ready. > # oc get node > The connection to the server api.qe-jialiu1.qe.devcluster.openshift.com:6443 was refused - did you specify the right host or port? The default install-config for AWS gives you 3 control plane machines, if you purposefully choose 1 control plane machine, you *the user* has decided that HA is not a requirement so the installer accepts the user's decision.
(In reply to Abhinav Dahiya from comment #4) > > If I am right, installer should protect user from modifying controlPlane.replicas to <3, and must be odd number. > > When modify controlPlane.replicas to 2, installation is completed, cluster is running well. But I do not think this is reasonable, because it is not comply with etcd cluster disaster recoverability. > > Actually we require atleast one master. Any configuration that is >=1 master > is a *valid* configuration. > > for example, having 4 masters is not wrong, it's just that the 4th etcd > member is not adding to the high availability of the etcd cluster ie. it can > still tolerate only one master being down. > > > > > > Installer give no any warning and error to complain the incorrect controlPlane.replicas number. > > After installation is completed, oc command failed due to apiserver is not ready which is caused by etcd cluster is not ready. > > # oc get node > > The connection to the server api.qe-jialiu1.qe.devcluster.openshift.com:6443 was refused - did you specify the right host or port? > > The default install-config for AWS gives you 3 control plane machines, if > you purposefully choose 1 control plane machine, you *the user* has decided > that HA is not a requirement so the installer accepts the user's decision. Just like your above statement - "Any configuration that is >=1 master is a *valid* configuration", according to my test result, the cluster totally does not work. In a word, user purposefully choose 1 control plane machine (which is a valid configuration), but the cluster does not work.
(In reply to Johnny Liu from comment #5) > (In reply to Abhinav Dahiya from comment #4) > > > If I am right, installer should protect user from modifying controlPlane.replicas to <3, and must be odd number. > > > When modify controlPlane.replicas to 2, installation is completed, cluster is running well. But I do not think this is reasonable, because it is not comply with etcd cluster disaster recoverability. > > > > Actually we require atleast one master. Any configuration that is >=1 master > > is a *valid* configuration. > > > > for example, having 4 masters is not wrong, it's just that the 4th etcd > > member is not adding to the high availability of the etcd cluster ie. it can > > still tolerate only one master being down. > > > > > > > > > > > Installer give no any warning and error to complain the incorrect controlPlane.replicas number. > > > After installation is completed, oc command failed due to apiserver is not ready which is caused by etcd cluster is not ready. > > > # oc get node > > > The connection to the server api.qe-jialiu1.qe.devcluster.openshift.com:6443 was refused - did you specify the right host or port? > > > > The default install-config for AWS gives you 3 control plane machines, if > > you purposefully choose 1 control plane machine, you *the user* has decided > > that HA is not a requirement so the installer accepts the user's decision. > > Just like your above statement - "Any configuration that is >=1 master is a > *valid* configuration", according to my test result, the cluster totally > does not work. Can you provide details around what is not working. For example, all libvirt clusters are created with single control plane host by default and we have not seen bugs claiming the cluster does not work at all. > In a word, user purposefully choose 1 control plane machine > (which is a valid configuration), but the cluster does not work.
Closing due to inactivity. As far as we know, single node control planes work as intended.
Just run the same testing using 4.0.0-0.nightly-2019-04-05-165550, 1 master + 1 worker installation is completed successfully.