Bug 1877281 - vSphere UPI missing clusterID tag
Summary: vSphere UPI missing clusterID tag
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.6.0
Assignee: Alberto
QA Contact: Milind Yadav
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-09 09:44 UTC by Alberto
Modified: 2023-09-18 00:22 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-27 16:39:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-api-operator pull 667 0 None closed Bug 1877281: [vSphere] Don't fail when tag is not found 2021-02-03 14:09:46 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:39:23 UTC

Description Alberto 2020-09-09 09:44:55 UTC
Description of problem:

The installer creates clusterID tag which is then used on destroy to ensure no resources are leaked.

vSphere UPI clusters are missing the clusterID tag. This breaks the machine API expectations. This can be easily overcome by the vSphere machine controller relaxing the requirement of the clusterID tag to exist.
However orthogonally we should try to align UPI/IPI day 2 assumptions for consumers as much as possible.
This is to ensure UPI installer procedures create the clusterID tag.


Version-Release number of selected component (if applicable):
4.6

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:
UPI vSphere install has no clusterID tag created.

Expected results:
UPI vSphere install has the clusterID tag created.

Additional info:

Comment 1 Joseph Callen 2020-09-09 16:37:11 UTC
This bug I think has two parts:

1.) Documentation in the UPI section of vSphere installation - this isn't currently described as a requirement.

2.) In UPI terraform. This would be a fairly easy fix. Take a look at the existing IPI terraform:
https://github.com/openshift/installer/blob/a4decfbba65a79b8b995d666360d3a60f8de3454/data/data/vsphere/main.tf#L54-L70
and add on the upi template.

Comment 2 Abhinav Dahiya 2020-09-09 16:58:27 UTC
I think prescribing a tag resource be created for UPI customers is not appropiate. Therefore it doesn't make sense for installer team to make that a requirement for all customers.

As for machine-api objects,
- I think asking users to create a tag so that machine-api can correctly create machines seems like not too bad. Since users want to use a feature they can be asked to fulfil the requirements only then. And it looks like the clsuterID tag seems like an implicit requirement for machine-api.
- Also if machine-api can create objects without the the tag, I think that is also fine as users might want to use another tag they choose since this is UPI.

As for destroying resources with installer, we already do not support or provide any guarantees that the openshift-install destroy cluster will work with any UPI cluster. The user created the resources, the users needs to own the lifecycle.

Moving to cloud team to either document or allow.

Comment 3 Alberto 2020-09-10 08:03:03 UTC
>- Also if machine-api can create objects without the the tag, I think that is also fine as users might want to use another tag they choose since this is UPI.

The machine API can ignore the tag an operate successfully https://github.com/openshift/machine-api-operator/pull/667

Regardless I we should consolidate as much as possible the outcome from IPI/UPI installs unless there's a reason to not. This will make day 2 operations more predictable and will help to require as less manual/documented steps and burden for users as possible.

I think the installer should ensure this is run in the UPI scripts https://github.com/openshift/installer/blob/a4decfbba65a79b8b995d666360d3a60f8de3454/data/data/vsphere/main.tf#L54-L70

Comment 5 Milind Yadav 2020-09-16 10:54:39 UTC
VERIFIED At : 
[miyadav@miyadav vsphere]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2020-09-16-000734   True        False         153m    Cluster version is 4.6.0-0.nightly-2020-09-16-000734


Steps :
1. Create a machineset refer below :
https://gist.github.com/miyadav/ab17954c1075db1533eee13f0ffda58c

2. New machine in provisioned state 
[miyadav@miyadav vsphere]$ oc get machines -o wide
NAME                                 PHASE         TYPE   REGION   ZONE   AGE   NODE   PROVIDERID                                       STATE
vs-miyadav-0916-nzxpd-worker-sk6dt   Provisioned                          11m          vsphere://422bcb32-725b-d292-bd23-97d236a32ed2   poweredOn
logs :

.
.
I0916 10:51:01.497117       1 reconciler.go:967] Getting network status: object reference: vm-6123
I0916 10:51:01.497146       1 reconciler.go:976] Getting network status: device: VM Network, macAddress: 00:50:56:ab:4c:e1
I0916 10:51:01.497152       1 reconciler.go:981] Getting network status: getting guest info
I0916 10:51:01.502360       1 reconciler.go:374] vs-miyadav-0916-nzxpd-worker-sk6dt: reconciling network: IP addresses: [{InternalDNS vs-miyadav-0916-nzxpd-worker-sk6dt}]
I0916 10:51:01.502411       1 reconciler.go:269] vs-miyadav-0916-nzxpd-worker-sk6dt: reconciling powerstate annotation
I0916 10:51:01.509322       1 reconciler.go:715] vs-miyadav-0916-nzxpd-worker-sk6dt: Updating provider status
I0916 10:51:01.523318       1 machine_scope.go:102] vs-miyadav-0916-nzxpd-worker-sk6dt: patching machine
I0916 10:51:01.708641       1 controller.go:293] vs-miyadav-0916-nzxpd-worker-sk6dt: has no node yet, requeuing
.

.
Additional info :
Not sure the machine is stuck in provisioned state is same as - https://bugzilla.redhat.com/show_bug.cgi?id=1861974 or some other issue , please have a look .

Comment 7 Milind Yadav 2020-09-24 09:18:14 UTC
Moved to VERIFIED as per slack conversation , will handle stuck in provisioned state in 4.7

Comment 10 errata-xmlrpc 2020-10-27 16:39:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196

Comment 11 Red Hat Bugzilla 2023-09-18 00:22:21 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.