Bug 1862209
| Summary: | master machines are newly created even when 3 masters are already created | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Jatan Malde <jmalde> | ||||
| Component: | Installer | Assignee: | Abhinav Dahiya <adahiya> | ||||
| Installer sub component: | openshift-installer | QA Contact: | Yunfei Jiang <yunjiang> | ||||
| Status: | CLOSED ERRATA | Docs Contact: | |||||
| Severity: | high | ||||||
| Priority: | high | CC: | adahiya, mgugino, wking | ||||
| Version: | 4.5 | ||||||
| Target Milestone: | --- | ||||||
| Target Release: | 4.6.0 | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: |
Cause: Using platform.aws.userTags to add Name or kubernetes.io/cluster/ tags to resources created by the installer caused machine-api to fail to identify existing control plane machines.
Consequence: Failure to identify existing control plane machines cause machine-api to create another set of control plane hosts creating problems with etcd cluster membership.
Fix: The installer now does not allow users to set error prone tags in platform.aws.userTags
Result: Users will be prevented from adding tags that cause their clusters to have multiple control plane hosts and possibly broken etcd clusters.
|
Story Points: | --- | ||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2020-10-27 16:21:22 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Jatan Malde
2020-07-30 18:10:54 UTC
The issue is the user modified the machine objects prior to install to add custom instance tags. One of the tags was the name field. This resulted in the machine-controller not finding the instance.
items:
- apiVersion: machine.openshift.io/v1beta1
kind: Machine
metadata:
...
labels:
machine.openshift.io/cluster-api-cluster: cluster-id-xyz
...
name: cluster-id-xyz-master-0
namespace: openshift-machine-api
spec:
metadata: {}
...
tags:
- name: kubernetes.io/cluster/valid-cluster-id-here
value: owned
- name: CustomTagOk
value: CustomValueOk
- name: Name
value: custom-value-put-here
...
We should protect against this problem in the machine-api. It's unclear to me how these tags got placed on the objects as the tags do appear in AWS, so perhaps the installer is honoring them from elsewhere?
I'm going to assign this to the installer team for investigation on how this might happen. I see the tfstate file shows an instances with the name tag with values that shouldn't be there.
Moving this to installer. machine-api should also protect against this, but the root of this particular case is the installer IMO. Jira ticket for tracking machine-api work: https://issues.redhat.com/browse/OCPCLOUD-934 Based on discussions from Michael , he recommends that we add validations like 1. the platform.aws.userTags should not allow `Name` key as that can affect the master machines from getting adopted. 2. the same field should not also allow any keys `kubernete.io/clustername/*` keys. Hello Abhinav, From the PR and your above comment 7, seems like the key with prefix `kubernetes.io/clustername/` was blocked, I'm not sure if it should be `kubernetes.io/cluster/`, just double confirm. Thanks. verified. FAILED. >> error: time="2020-08-18T11:40:02Z" level=info msg="API v1.19.0-rc.2+99cb93a-dirty up" time="2020-08-18T11:40:02Z" level=info msg="Waiting up to 30m0s for bootstrapping to complete..." time="2020-08-18T12:10:02Z" level=info msg="Pulling debug logs from the bootstrap machine" time="2020-08-18T12:10:10Z" level=debug msg="error: error executing jsonpath \"{range .items[*]}{.metadata.name}{\\\"\\\\n\\\"}{end}\": Error executing template: not in range, nothing to end. Printing more information for debugging the template:" time="2020-08-18T12:10:12Z" level=debug msg="error: error executing jsonpath \"{range .items[*]}{.metadata.name}{\\\"\\\\n\\\"}{end}\": Error executing template: not in range, nothing to end. Printing more information for debugging the template:" time="2020-08-18T12:10:13Z" level=debug msg="Collecting info from 10.0.87.51" time="2020-08-18T12:10:13Z" level=debug msg="Collecting info from 10.0.63.41" time="2020-08-18T12:10:13Z" level=debug msg="Collecting info from 10.0.76.245" time="2020-08-18T12:10:14Z" level=info msg="Bootstrap gather logs captured here \"/home/ec2-user/46/yunjiang-bz209fix6/log-bundle-20200818121002.tar.gz\"" >> install-config <--snip--> platform: aws: region: us-east-2 userTags: kubernetes.io/cluster/yunjiang: yunjiang subnets: - subnet-0e96ec3d5f40e7afc - subnet-02ae90227c72b06fb - subnet-03e313b800882f9e7 <--snip--> attached install log and bootstrap logs. Created attachment 1711799 [details]
install log and log-bundle
verified. PASS. version: 4.6.0-0.nightly-2020-08-25-234625 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |