Bug 1468579 - Missing Kubernetes Cluster ID tag from openshift cluster resources
Missing Kubernetes Cluster ID tag from openshift cluster resources
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Master (Show other bugs)
3.6.0
All Unspecified
unspecified Severity high
: ---
: 3.7.0
Assigned To: Robert Rati
DeShuai Ma
: OpsBlocker
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-07-07 08:42 EDT by Hemant Kumar
Modified: 2017-11-28 17:00 EST (History)
14 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Running multiple clusters in a single az in AWS requires resources be tagged. Consequence: Clusters will not work properly Fix: Master Controllers process will require a ClusterID on resources in order to run. Existing resources will need to be tagged manually. Result: Multiple clusters in one az will work properly once tagged
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-11-28 17:00:15 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Hemant Kumar 2017-07-07 08:42:51 EDT
Description of problem:

All Openshift cluster resources should have some sort of Kubernetes Cluster ID for AWS. Unfortunately, this information is not documented in detail either in upstream nor in openshift. 

The basic expectation of AWS cloudprovider is, each node will have a tag with key - "KubernetesCluster" and a value that represents the cluster-id. Other than this - same Cluster id will also configured in cloudprovider config file (aws.conf):

[Global]
KubernetesClusterTag=SomethingUnique

Once above expectation is met, all AWS resources (such as volumes, security groups, load balancers) created on-demand by Openshift automatically gets above KubernetesClusterTag tag.

Untagged clusters can result in broken storage, load balancer features if more than one cluster is present in same region, so tagging the cluster using aformentioned mechanism is strongly advised. There is a discussion upstream to make this mandatory and return with panic error if KubernetesClusterTag is not configured.


Currently clusters deployed using openshift-ansible-ops are untagged. In few cases, we even have more than one cluster running in same region and occasionally we get PV creation bugs too. 

This bug has two parts:

1. For any new cluster created using openshift-ansible-ops should have the tag. We need to update the ansible recipe to do that.

2. For existing clusters or clusters that are going through an upgrade, we need to retroactively tag the nodes and update aws.conf. The only problem with retroactively tagging nodes is, we are still left with aws resources created on-demand(such as volumes, load balancers, security groups) which are left untagged. We need to find all AWS resources created by Openshift and tag them as well:

- EBS volumes should be tagged. This is relatively easy to do, we can get all PVs and tag the EBS volumes using a ansible script.
- Load balancers should be tagged.
- Security Groups should be tagged.
- ???

I am not super familiar with networking area and I need some help trying to figure out a way to find out all networking AWS resources that needs to be tagged.
Comment 5 Dan Yocum 2017-07-11 15:59:05 EDT
See also: https://bugzilla.redhat.com/show_bug.cgi?id=1365398
Comment 11 DeShuai Ma 2017-09-28 04:58:55 EDT
This bug need verify on aws and 128 version. now the latest version in mirror repo is still 127, need wait mirror repo sync.
Comment 12 DeShuai Ma 2017-10-10 05:17:27 EDT
Verify on openshift v3.7.0-0.146.0

[root@ip-172-18-8-9 ~]# openshift version
openshift v3.7.0-0.146.0
kubernetes v1.7.6+a08f5eeb62
etcd 3.2.1
Steps to verify:
1. Setup cluster env in aws and enable cloud-provider

2. Remove "KubernetesCluster" tag on instance then restart atomic-openshift-master-controllers
# systemctl restart atomic-openshift-master-controllers.service 

3. Check atomic-openshift-master-controllers logs
Oct 10 05:07:30 ip-172-18-8-9.ec2.internal atomic-openshift-master-controllers[69790]: E1010 05:07:30.537355   69790 tags.go:94] Tag "KubernetesCluster" nor "kubernetes.io/cluster/..." not found; Kubernetes may behave unexpectedly.
Oct 10 05:07:30 ip-172-18-8-9.ec2.internal atomic-openshift-master-controllers[69790]: W1010 05:07:30.537373   69790 tags.go:78] AWS cloud - no clusterID filtering applied for shared resources; do not run multiple clusters in this AZ.
Oct 10 05:07:30 ip-172-18-8-9.ec2.internal atomic-openshift-master-controllers[69790]: F1010 05:07:30.537423   69790 controllermanager.go:179] error building controller context: no ClusterID Found.  A ClusterID is required for the cloud 
Oct 10 05:07:30 ip-172-18-8-9.ec2.internal systemd[1]: atomic-openshift-master-controllers.service: main process exited, code=exited, status=255/n/a
Oct 10 05:07:30 ip-172-18-8-9.ec2.internal systemd[1]: Unit atomic-openshift-master-controllers.service entered failed state.
Oct 10 05:07:30 ip-172-18-8-9.ec2.internal systemd[1]: atomic-openshift-master-controllers.service failed.

4. Modify allow-untagged-cloud=true in /etc/origin/master/master-config.yaml
kubernetesMasterConfig:
  controllerArguments:
    allow-untagged-cloud:
    - "true"

5. Check controller log again, there is some warning and controller can start success
Oct 10 05:12:30 ip-172-18-8-9 atomic-openshift-master-controllers: E1010 05:12:30.541450   70438 tags.go:94] Tag "KubernetesCluster" nor "kubernetes.io/cluster/..." not found; Kubernetes may behave unexpectedly.
Oct 10 05:12:30 ip-172-18-8-9 atomic-openshift-master-controllers: W1010 05:12:30.541471   70438 tags.go:78] AWS cloud - no clusterID filtering applied for shared resources; do not run multiple clusters in this AZ.
Oct 10 05:12:30 ip-172-18-8-9 atomic-openshift-master-controllers: W1010 05:12:30.541522   70438 controllermanager.go:422] detected a cluster without a ClusterID.  A ClusterID will be required in the future.  Please tag your cluster to avoid any future issues
Comment 15 errata-xmlrpc 2017-11-28 17:00:15 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188

Note You need to log in before you can comment on or make changes to this bug.