1491399 – Require AWS hosts be tagged "kubernetes.io/cluster/xxxx" in 3.7

Bug 1491399 - Require AWS hosts be tagged "kubernetes.io/cluster/xxxx" in 3.7

Summary: Require AWS hosts be tagged "kubernetes.io/cluster/xxxx" in 3.7

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	3.7.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	3.7.0
Assignee:	Kenny Woodson
QA Contact:	Johnny Liu
Docs Contact:
URL:
Whiteboard:
Duplicates (4):	1498643 1498934 1505464 1510878 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-09-13 16:35 UTC by Robert Rati
Modified:	2017-12-19 06:27 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:	Kubernetes requires AWS tags be set with kubernetes.io/cluster/xxxx or KubernetesCluster (deprecated) on objects in order for proper management of AWS resources.
Clone Of:
Environment:
Last Closed:	2017-11-28 22:10:58 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1498934	0	unspecified	CLOSED	Need installer support for ClusterID on AWS : master controller service not starting: ClusterID not configured	2021-02-22 00:41:40 UTC
Red Hat Product Errata	RHSA-2017:3188	0	normal	SHIPPED_LIVE	Moderate: Red Hat OpenShift Container Platform 3.7 security, bug, and enhancement update	2017-11-29 02:34:54 UTC

Internal Links: 1498934

Description Robert Rati 2017-09-13 16:35:09 UTC

Description of problem:
https://bugzilla.redhat.com/show_bug.cgi?id=1468579

Above BZ backported changes to OCP that require ClusterID be on AWS resources.  New clusters using the 3.7 installer should have their cluster resourced tagged, but existing clusters that are upgraded will need to add the following config stanza:

kubernetesMasterConfig:
  controllerArguments:
    allow-untagged-cloud:
      - "true"

if the cloud provider is AWS.

Version-Release number of the following components:


How reproducible:
100%

Steps to Reproduce:
1. 
2.
3.

Actual results:
Without the above configuration the atomic-openshift-master-controller process will exit with a missing ClusterID message in the logs.

Expected results:
The atomic-openshift-master-controller process of upgraded clusters should run without ClusterID on cluster resources.

Comment 1 David Eads 2017-10-05 18:47:58 UTC

Rather than trying to compensate for this in the master-config.yaml (where user set options and auto-set options will collide), if we know that we're going to need this information and we know that we'll never be able to auto-create this information, we should block the upgrade until the information is present.  Doing this is just going to lead to confusion later about whether we can or can't automatically remove it after doing the same upgrade block in 3.8.

Comment 2 Scott Dodson 2017-10-06 13:15:45 UTC

*** Bug 1498934 has been marked as a duplicate of this bug. ***

Comment 3 Scott Dodson 2017-10-06 13:16:18 UTC

*** Bug 1498643 has been marked as a duplicate of this bug. ***

Comment 4 Scott Dodson 2017-10-06 13:17:39 UTC

Ok, I think ansible can gather the tags from the metadata API. I think what we'll do is query that API for all node and master hosts. If a node or master host has the AWS cloud provider configured and they don't have a tag named "kubernetes.io/cluster/xxxx" we'll block the upgrade and install on 3.7 with a message that links to documentation. We need to get that documentation ready that explains both how to properly label new installations and how to retroactively label existing installations.

Comment 5 Justin Pierce 2017-10-06 13:45:35 UTC

@sdodson - All instances, volumes, and security & load balancers. The tag ops was asked to set (and did for all but free-int & free-stg) was "KubernetesCluster" . So the check should allow for that.

Comment 6 Mike Fiedler 2017-10-06 14:22:12 UTC

Agree we should allow KubernetesCluster - not the preferred tag, but supported by kube/OpenShift

Comment 7 Scott Dodson 2017-10-06 17:54:58 UTC

(In reply to Justin Pierce from comment #5)
> @sdodson - All instances, volumes, and security & load balancers. The tag
> ops was asked to set (and did for all but free-int & free-stg) was
> "KubernetesCluster" . So the check should allow for that.

I don't know that we'll be able to do that very well in traditional BYO inventory scenarios.


Rob,

Which is the preferred tag? This comment[1] from Hemant indicates it should be Openshift cluster is:

kubernetes.io/cluster/xxxx=SomeUniqueClusterId


1 - https://trello.com/c/PWwHHUc0/154-retrofit-existing-clusters-with-the-tags-needed-for-the-the-provisioner#comment-595cee234251514e70b52a32

Comment 8 Robert Rati 2017-10-06 18:04:33 UTC

Both

KubernetesCluster
kubernetes.io/cluster/xxxx

Are valid.  The first one is the "old" method, the second is the "new" method.  I haven't heard any indication that the old method will be unacceptable anytime soon.  For consistency, I would go with whichever method the Ops team used when they tagged their clusters.

Comment 9 Scott Dodson 2017-10-13 13:23:19 UTC

All we're going to do for this bug is that when upgrading from 3.6 to 3.7 and they've specified AWS cloud provider credentials we'll block the upgrade with a link to documentation explaining how to properly label instances and set the inventory variable to specify the cluster id. Once a cluster id variable is set the upgrade is unblocked under the assumption that the admin followed the documentation correctly.

Comment 10 Johnny Liu 2017-10-20 05:53:47 UTC

According to https://bugzilla.redhat.com/show_bug.cgi?id=1372059#c11, we also need the cluster id check for 3.7 fresh install.

Comment 11 Justin Pierce 2017-10-23 23:45:51 UTC

It appears that the current check implemented in v3.7.0-0.176.0 requires the  "kubernetes.io/cluster" variation. We need this to be expanded to support KubernetesCluster which was applied to all operations cluster resources. 

Excerpt:
fatal: [54.162.175.222]: FAILED! => {
    "changed": false, 
    "failed": true, 
    "msg": "Ensure that the openshift_clusterid is set and that all infrastructure has the required tags.\nFor dynamic provisioning when using multiple clusters in different zones, tag each node with Key=kubernetes.io/cluster/xxxx,Value=clusterid where xxxx and clusterid are unique per cluster. In versions prior to 3.6, this was Key=KubernetesCluster,Value=clusterid.\nhttps://github.com/openshift/openshift-docs/blob/master/install_config/persistent_storage/dynamically_provisioning_pvs.adoc#available-dynamically-provisioned-plug-ins\n"
}

https://buildvm.openshift.eng.bos.redhat.com:8443/job/operations/job/deployment/job/starter/job/starter%252Fupgrade/40/

Comment 12 Scott Dodson 2017-10-24 14:37:14 UTC

*** Bug 1505464 has been marked as a duplicate of this bug. ***

Comment 14 Johnny Liu 2017-10-25 10:14:47 UTC

Verified this bug with openshift-ansible-3.7.0-0.176.0.git.0.eec12b8.el7.noarch, and PASS.

1. rpm install on aws + no cloudprovider enabled + no openshift_clusterid, PASS.

2. rpm install on aws + cloudprovider enabled + no openshift_clusterid, FAIL.
TASK [openshift_sanitize_inventory : Ensure clusterid is set along with the cloudprovider] ***
Wednesday 25 October 2017  08:06:10 +0000 (0:00:00.030)       0:00:06.352 ***** 
fatal: [ec2-107-23-245-159.compute-1.amazonaws.com]: FAILED! => {"changed": false, "failed": true, "msg": "Ensure that the openshift_clusterid is set and that all infrastructure has the required tags.\nFor dynamic provisioning when using multiple clusters in different zones, tag each node with Key=kubernetes.io/cluster/xxxx,Value=clusterid where xxxx and clusterid are unique per cluster. In versions prior to 3.6, this was Key=KubernetesCluster,Value=clusterid.\nhttps://github.com/openshift/openshift-docs/blob/master/install_config/persistent_storage/dynamically_provisioning_pvs.adoc#available-dynamically-provisioned-plug-ins\n"}

3. rpm install on aws + cloudprovider enabled + openshift_clusterid, PASS.

Comment 15 Scott Dodson 2017-11-08 17:57:15 UTC

*** Bug 1510878 has been marked as a duplicate of this bug. ***

Comment 19 errata-xmlrpc 2017-11-28 22:10:58 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188

Note You need to log in before you can comment on or make changes to this bug.