Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1862209

Summary: master machines are newly created even when 3 masters are already created
Product: OpenShift Container Platform Reporter: Jatan Malde <jmalde>
Component: InstallerAssignee: Abhinav Dahiya <adahiya>
Installer sub component: openshift-installer QA Contact: Yunfei Jiang <yunjiang>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: adahiya, mgugino, wking
Version: 4.5   
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Using platform.aws.userTags to add Name or kubernetes.io/cluster/ tags to resources created by the installer caused machine-api to fail to identify existing control plane machines. Consequence: Failure to identify existing control plane machines cause machine-api to create another set of control plane hosts creating problems with etcd cluster membership. Fix: The installer now does not allow users to set error prone tags in platform.aws.userTags Result: Users will be prevented from adding tags that cause their clusters to have multiple control plane hosts and possibly broken etcd clusters.
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:21:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
install log and log-bundle none

Description Jatan Malde 2020-07-30 18:10:54 UTC
Description of problem:

IHAC installing OCP 4.5.3 on AWS Private VPC in IPI installation. 

The installation process of bootstrapping temporary control plane works fine, Only three masters are seen there. 

When the installation moves ahead and the machine-api-operator comes up, we could see 3 masters additionally getting created along with 3 worker nodes. 


Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Setup OCP 4.5.3 on AWS private VPC and user tags
2. Installation shows 6 masters when machine-api-operator comes up.

Actual results:

Expected results:

It is expected that the machine-controller get the already created masters which is not visible here and hence new masters are created in that place.

Additional info:

Additional details with the tags, vpc details and instance are in the private note to avoid sensitive data loss.

Comment 3 Michael Gugino 2020-07-30 18:39:42 UTC
The issue is the user modified the machine objects prior to install to add custom instance tags.  One of the tags was the name field.  This resulted in the machine-controller not finding the instance.

items:
- apiVersion: machine.openshift.io/v1beta1
  kind: Machine
  metadata:
    ...
    labels:
      machine.openshift.io/cluster-api-cluster: cluster-id-xyz
      ...
    name: cluster-id-xyz-master-0
    namespace: openshift-machine-api
  spec:
    metadata: {}
        ...
        tags:
        - name: kubernetes.io/cluster/valid-cluster-id-here
          value: owned
        - name: CustomTagOk
          value: CustomValueOk
        - name: Name
          value: custom-value-put-here
...

We should protect against this problem in the machine-api.  It's unclear to me how these tags got placed on the objects as the tags do appear in AWS, so perhaps the installer is honoring them from elsewhere?

I'm going to assign this to the installer team for investigation on how this might happen.  I see the tfstate file shows an instances with the name tag with values that shouldn't be there.

Comment 4 Michael Gugino 2020-07-30 18:40:44 UTC
Moving this to installer.  machine-api should also protect against this, but the root of this particular case is the installer IMO.

Comment 5 Michael Gugino 2020-07-30 18:48:56 UTC
Jira ticket for tracking machine-api work: https://issues.redhat.com/browse/OCPCLOUD-934

Comment 7 Abhinav Dahiya 2020-07-31 16:57:39 UTC
Based on discussions from Michael , he recommends that we add validations like

1. the platform.aws.userTags should not allow `Name` key as that can affect the master machines from getting adopted.
2. the same field should not also allow any keys `kubernete.io/clustername/*` keys.

Comment 10 Yunfei Jiang 2020-08-18 11:46:55 UTC
Hello Abhinav,

From the PR and your above comment 7, seems like the key with prefix `kubernetes.io/clustername/` was blocked, I'm not sure if it should be `kubernetes.io/cluster/`, just double confirm.

Thanks.

Comment 11 Yunfei Jiang 2020-08-19 05:57:49 UTC
verified. FAILED.


>> error:

time="2020-08-18T11:40:02Z" level=info msg="API v1.19.0-rc.2+99cb93a-dirty up"
time="2020-08-18T11:40:02Z" level=info msg="Waiting up to 30m0s for bootstrapping to complete..."
time="2020-08-18T12:10:02Z" level=info msg="Pulling debug logs from the bootstrap machine"
time="2020-08-18T12:10:10Z" level=debug msg="error: error executing jsonpath \"{range .items[*]}{.metadata.name}{\\\"\\\\n\\\"}{end}\": Error executing template: not in range, nothing to end. Printing more information for debugging the template:"
time="2020-08-18T12:10:12Z" level=debug msg="error: error executing jsonpath \"{range .items[*]}{.metadata.name}{\\\"\\\\n\\\"}{end}\": Error executing template: not in range, nothing to end. Printing more information for debugging the template:"
time="2020-08-18T12:10:13Z" level=debug msg="Collecting info from 10.0.87.51"
time="2020-08-18T12:10:13Z" level=debug msg="Collecting info from 10.0.63.41"
time="2020-08-18T12:10:13Z" level=debug msg="Collecting info from 10.0.76.245"
time="2020-08-18T12:10:14Z" level=info msg="Bootstrap gather logs captured here \"/home/ec2-user/46/yunjiang-bz209fix6/log-bundle-20200818121002.tar.gz\""

>> install-config

<--snip-->
platform:
  aws:
    region: us-east-2
    userTags:
      kubernetes.io/cluster/yunjiang: yunjiang
    subnets:
    - subnet-0e96ec3d5f40e7afc
    - subnet-02ae90227c72b06fb
    - subnet-03e313b800882f9e7
<--snip-->

attached install log and bootstrap logs.

Comment 12 Yunfei Jiang 2020-08-19 05:58:55 UTC
Created attachment 1711799 [details]
install log and log-bundle

Comment 14 Yunfei Jiang 2020-08-26 06:41:07 UTC
verified. PASS.
version: 4.6.0-0.nightly-2020-08-25-234625

Comment 16 errata-xmlrpc 2020-10-27 16:21:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196

Comment 17 Red Hat Bugzilla 2023-09-15 00:34:58 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days