Description of problem: https://bugzilla.redhat.com/show_bug.cgi?id=1468579 Above BZ backported changes to OCP that require ClusterID be on AWS resources. New clusters using the 3.7 installer should have their cluster resourced tagged, but existing clusters that are upgraded will need to add the following config stanza: kubernetesMasterConfig: controllerArguments: allow-untagged-cloud: - "true" if the cloud provider is AWS. Version-Release number of the following components: How reproducible: 100% Steps to Reproduce: 1. 2. 3. Actual results: Without the above configuration the atomic-openshift-master-controller process will exit with a missing ClusterID message in the logs. Expected results: The atomic-openshift-master-controller process of upgraded clusters should run without ClusterID on cluster resources.
Rather than trying to compensate for this in the master-config.yaml (where user set options and auto-set options will collide), if we know that we're going to need this information and we know that we'll never be able to auto-create this information, we should block the upgrade until the information is present. Doing this is just going to lead to confusion later about whether we can or can't automatically remove it after doing the same upgrade block in 3.8.
*** Bug 1498934 has been marked as a duplicate of this bug. ***
*** Bug 1498643 has been marked as a duplicate of this bug. ***
Ok, I think ansible can gather the tags from the metadata API. I think what we'll do is query that API for all node and master hosts. If a node or master host has the AWS cloud provider configured and they don't have a tag named "kubernetes.io/cluster/xxxx" we'll block the upgrade and install on 3.7 with a message that links to documentation. We need to get that documentation ready that explains both how to properly label new installations and how to retroactively label existing installations.
@sdodson - All instances, volumes, and security & load balancers. The tag ops was asked to set (and did for all but free-int & free-stg) was "KubernetesCluster" . So the check should allow for that.
Agree we should allow KubernetesCluster - not the preferred tag, but supported by kube/OpenShift
(In reply to Justin Pierce from comment #5) > @sdodson - All instances, volumes, and security & load balancers. The tag > ops was asked to set (and did for all but free-int & free-stg) was > "KubernetesCluster" . So the check should allow for that. I don't know that we'll be able to do that very well in traditional BYO inventory scenarios. Rob, Which is the preferred tag? This comment[1] from Hemant indicates it should be Openshift cluster is: kubernetes.io/cluster/xxxx=SomeUniqueClusterId 1 - https://trello.com/c/PWwHHUc0/154-retrofit-existing-clusters-with-the-tags-needed-for-the-the-provisioner#comment-595cee234251514e70b52a32
Both KubernetesCluster kubernetes.io/cluster/xxxx Are valid. The first one is the "old" method, the second is the "new" method. I haven't heard any indication that the old method will be unacceptable anytime soon. For consistency, I would go with whichever method the Ops team used when they tagged their clusters.
All we're going to do for this bug is that when upgrading from 3.6 to 3.7 and they've specified AWS cloud provider credentials we'll block the upgrade with a link to documentation explaining how to properly label instances and set the inventory variable to specify the cluster id. Once a cluster id variable is set the upgrade is unblocked under the assumption that the admin followed the documentation correctly.
According to https://bugzilla.redhat.com/show_bug.cgi?id=1372059#c11, we also need the cluster id check for 3.7 fresh install.
It appears that the current check implemented in v3.7.0-0.176.0 requires the "kubernetes.io/cluster" variation. We need this to be expanded to support KubernetesCluster which was applied to all operations cluster resources. Excerpt: fatal: [54.162.175.222]: FAILED! => { "changed": false, "failed": true, "msg": "Ensure that the openshift_clusterid is set and that all infrastructure has the required tags.\nFor dynamic provisioning when using multiple clusters in different zones, tag each node with Key=kubernetes.io/cluster/xxxx,Value=clusterid where xxxx and clusterid are unique per cluster. In versions prior to 3.6, this was Key=KubernetesCluster,Value=clusterid.\nhttps://github.com/openshift/openshift-docs/blob/master/install_config/persistent_storage/dynamically_provisioning_pvs.adoc#available-dynamically-provisioned-plug-ins\n" } https://buildvm.openshift.eng.bos.redhat.com:8443/job/operations/job/deployment/job/starter/job/starter%252Fupgrade/40/
*** Bug 1505464 has been marked as a duplicate of this bug. ***
Verified this bug with openshift-ansible-3.7.0-0.176.0.git.0.eec12b8.el7.noarch, and PASS. 1. rpm install on aws + no cloudprovider enabled + no openshift_clusterid, PASS. 2. rpm install on aws + cloudprovider enabled + no openshift_clusterid, FAIL. TASK [openshift_sanitize_inventory : Ensure clusterid is set along with the cloudprovider] *** Wednesday 25 October 2017 08:06:10 +0000 (0:00:00.030) 0:00:06.352 ***** fatal: [ec2-107-23-245-159.compute-1.amazonaws.com]: FAILED! => {"changed": false, "failed": true, "msg": "Ensure that the openshift_clusterid is set and that all infrastructure has the required tags.\nFor dynamic provisioning when using multiple clusters in different zones, tag each node with Key=kubernetes.io/cluster/xxxx,Value=clusterid where xxxx and clusterid are unique per cluster. In versions prior to 3.6, this was Key=KubernetesCluster,Value=clusterid.\nhttps://github.com/openshift/openshift-docs/blob/master/install_config/persistent_storage/dynamically_provisioning_pvs.adoc#available-dynamically-provisioned-plug-ins\n"} 3. rpm install on aws + cloudprovider enabled + openshift_clusterid, PASS.
*** Bug 1510878 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188