Description of problem: The installer creates clusterID tag which is then used on destroy to ensure no resources are leaked. vSphere UPI clusters are missing the clusterID tag. This breaks the machine API expectations. This can be easily overcome by the vSphere machine controller relaxing the requirement of the clusterID tag to exist. However orthogonally we should try to align UPI/IPI day 2 assumptions for consumers as much as possible. This is to ensure UPI installer procedures create the clusterID tag. Version-Release number of selected component (if applicable): 4.6 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: UPI vSphere install has no clusterID tag created. Expected results: UPI vSphere install has the clusterID tag created. Additional info:
This bug I think has two parts: 1.) Documentation in the UPI section of vSphere installation - this isn't currently described as a requirement. 2.) In UPI terraform. This would be a fairly easy fix. Take a look at the existing IPI terraform: https://github.com/openshift/installer/blob/a4decfbba65a79b8b995d666360d3a60f8de3454/data/data/vsphere/main.tf#L54-L70 and add on the upi template.
I think prescribing a tag resource be created for UPI customers is not appropiate. Therefore it doesn't make sense for installer team to make that a requirement for all customers. As for machine-api objects, - I think asking users to create a tag so that machine-api can correctly create machines seems like not too bad. Since users want to use a feature they can be asked to fulfil the requirements only then. And it looks like the clsuterID tag seems like an implicit requirement for machine-api. - Also if machine-api can create objects without the the tag, I think that is also fine as users might want to use another tag they choose since this is UPI. As for destroying resources with installer, we already do not support or provide any guarantees that the openshift-install destroy cluster will work with any UPI cluster. The user created the resources, the users needs to own the lifecycle. Moving to cloud team to either document or allow.
>- Also if machine-api can create objects without the the tag, I think that is also fine as users might want to use another tag they choose since this is UPI. The machine API can ignore the tag an operate successfully https://github.com/openshift/machine-api-operator/pull/667 Regardless I we should consolidate as much as possible the outcome from IPI/UPI installs unless there's a reason to not. This will make day 2 operations more predictable and will help to require as less manual/documented steps and burden for users as possible. I think the installer should ensure this is run in the UPI scripts https://github.com/openshift/installer/blob/a4decfbba65a79b8b995d666360d3a60f8de3454/data/data/vsphere/main.tf#L54-L70
VERIFIED At : [miyadav@miyadav vsphere]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-09-16-000734 True False 153m Cluster version is 4.6.0-0.nightly-2020-09-16-000734 Steps : 1. Create a machineset refer below : https://gist.github.com/miyadav/ab17954c1075db1533eee13f0ffda58c 2. New machine in provisioned state [miyadav@miyadav vsphere]$ oc get machines -o wide NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE vs-miyadav-0916-nzxpd-worker-sk6dt Provisioned 11m vsphere://422bcb32-725b-d292-bd23-97d236a32ed2 poweredOn logs : . . I0916 10:51:01.497117 1 reconciler.go:967] Getting network status: object reference: vm-6123 I0916 10:51:01.497146 1 reconciler.go:976] Getting network status: device: VM Network, macAddress: 00:50:56:ab:4c:e1 I0916 10:51:01.497152 1 reconciler.go:981] Getting network status: getting guest info I0916 10:51:01.502360 1 reconciler.go:374] vs-miyadav-0916-nzxpd-worker-sk6dt: reconciling network: IP addresses: [{InternalDNS vs-miyadav-0916-nzxpd-worker-sk6dt}] I0916 10:51:01.502411 1 reconciler.go:269] vs-miyadav-0916-nzxpd-worker-sk6dt: reconciling powerstate annotation I0916 10:51:01.509322 1 reconciler.go:715] vs-miyadav-0916-nzxpd-worker-sk6dt: Updating provider status I0916 10:51:01.523318 1 machine_scope.go:102] vs-miyadav-0916-nzxpd-worker-sk6dt: patching machine I0916 10:51:01.708641 1 controller.go:293] vs-miyadav-0916-nzxpd-worker-sk6dt: has no node yet, requeuing . . Additional info : Not sure the machine is stuck in provisioned state is same as - https://bugzilla.redhat.com/show_bug.cgi?id=1861974 or some other issue , please have a look .
Moved to VERIFIED as per slack conversation , will handle stuck in provisioned state in 4.7
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days