Bug 1877281

Summary: vSphere UPI missing clusterID tag
Product: OpenShift Container Platform Reporter: Alberto <agarcial>
Component: Cloud ComputeAssignee: Alberto <agarcial>
Cloud Compute sub component: Other Providers QA Contact: Milind Yadav <miyadav>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: unspecified CC: adahiya, jcallen
Version: 4.6   
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:39:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Alberto 2020-09-09 09:44:55 UTC
Description of problem:

The installer creates clusterID tag which is then used on destroy to ensure no resources are leaked.

vSphere UPI clusters are missing the clusterID tag. This breaks the machine API expectations. This can be easily overcome by the vSphere machine controller relaxing the requirement of the clusterID tag to exist.
However orthogonally we should try to align UPI/IPI day 2 assumptions for consumers as much as possible.
This is to ensure UPI installer procedures create the clusterID tag.


Version-Release number of selected component (if applicable):
4.6

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:
UPI vSphere install has no clusterID tag created.

Expected results:
UPI vSphere install has the clusterID tag created.

Additional info:

Comment 1 Joseph Callen 2020-09-09 16:37:11 UTC
This bug I think has two parts:

1.) Documentation in the UPI section of vSphere installation - this isn't currently described as a requirement.

2.) In UPI terraform. This would be a fairly easy fix. Take a look at the existing IPI terraform:
https://github.com/openshift/installer/blob/a4decfbba65a79b8b995d666360d3a60f8de3454/data/data/vsphere/main.tf#L54-L70
and add on the upi template.

Comment 2 Abhinav Dahiya 2020-09-09 16:58:27 UTC
I think prescribing a tag resource be created for UPI customers is not appropiate. Therefore it doesn't make sense for installer team to make that a requirement for all customers.

As for machine-api objects,
- I think asking users to create a tag so that machine-api can correctly create machines seems like not too bad. Since users want to use a feature they can be asked to fulfil the requirements only then. And it looks like the clsuterID tag seems like an implicit requirement for machine-api.
- Also if machine-api can create objects without the the tag, I think that is also fine as users might want to use another tag they choose since this is UPI.

As for destroying resources with installer, we already do not support or provide any guarantees that the openshift-install destroy cluster will work with any UPI cluster. The user created the resources, the users needs to own the lifecycle.

Moving to cloud team to either document or allow.

Comment 3 Alberto 2020-09-10 08:03:03 UTC
>- Also if machine-api can create objects without the the tag, I think that is also fine as users might want to use another tag they choose since this is UPI.

The machine API can ignore the tag an operate successfully https://github.com/openshift/machine-api-operator/pull/667

Regardless I we should consolidate as much as possible the outcome from IPI/UPI installs unless there's a reason to not. This will make day 2 operations more predictable and will help to require as less manual/documented steps and burden for users as possible.

I think the installer should ensure this is run in the UPI scripts https://github.com/openshift/installer/blob/a4decfbba65a79b8b995d666360d3a60f8de3454/data/data/vsphere/main.tf#L54-L70

Comment 5 Milind Yadav 2020-09-16 10:54:39 UTC
VERIFIED At : 
[miyadav@miyadav vsphere]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2020-09-16-000734   True        False         153m    Cluster version is 4.6.0-0.nightly-2020-09-16-000734


Steps :
1. Create a machineset refer below :
https://gist.github.com/miyadav/ab17954c1075db1533eee13f0ffda58c

2. New machine in provisioned state 
[miyadav@miyadav vsphere]$ oc get machines -o wide
NAME                                 PHASE         TYPE   REGION   ZONE   AGE   NODE   PROVIDERID                                       STATE
vs-miyadav-0916-nzxpd-worker-sk6dt   Provisioned                          11m          vsphere://422bcb32-725b-d292-bd23-97d236a32ed2   poweredOn
logs :

.
.
I0916 10:51:01.497117       1 reconciler.go:967] Getting network status: object reference: vm-6123
I0916 10:51:01.497146       1 reconciler.go:976] Getting network status: device: VM Network, macAddress: 00:50:56:ab:4c:e1
I0916 10:51:01.497152       1 reconciler.go:981] Getting network status: getting guest info
I0916 10:51:01.502360       1 reconciler.go:374] vs-miyadav-0916-nzxpd-worker-sk6dt: reconciling network: IP addresses: [{InternalDNS vs-miyadav-0916-nzxpd-worker-sk6dt}]
I0916 10:51:01.502411       1 reconciler.go:269] vs-miyadav-0916-nzxpd-worker-sk6dt: reconciling powerstate annotation
I0916 10:51:01.509322       1 reconciler.go:715] vs-miyadav-0916-nzxpd-worker-sk6dt: Updating provider status
I0916 10:51:01.523318       1 machine_scope.go:102] vs-miyadav-0916-nzxpd-worker-sk6dt: patching machine
I0916 10:51:01.708641       1 controller.go:293] vs-miyadav-0916-nzxpd-worker-sk6dt: has no node yet, requeuing
.

.
Additional info :
Not sure the machine is stuck in provisioned state is same as - https://bugzilla.redhat.com/show_bug.cgi?id=1861974 or some other issue , please have a look .

Comment 7 Milind Yadav 2020-09-24 09:18:14 UTC
Moved to VERIFIED as per slack conversation , will handle stuck in provisioned state in 4.7

Comment 10 errata-xmlrpc 2020-10-27 16:39:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196

Comment 11 Red Hat Bugzilla 2023-09-18 00:22:21 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days