Can you include the complete log from openshift-installer, .openshift_install.log file As for the tag no getting cleaned up, that was fixed in https://bugzilla.redhat.com/show_bug.cgi?id=1852720
Created attachment 1713210 [details] Hive OCP VMware IPI provision with bad VIP for API and Ingress
The attached logs https://bugzilla.redhat.com/attachment.cgi?id=1713210 have successful destroy logs, can you attach the logs from the failed destroy run.
The attached log is from the failure scenario. What piece of code is responsible for the uninstall pod that is having the problem? ``` cahl-vmware-uninstall-ltpnh 0/1 CrashLoopBackOff 44 3h25m ```
I retried this and wanted to add more info since I noticed more stuff in the namespace. ``` oc get all NAME READY STATUS RESTARTS AGE pod/cahl-vmware-bad-0-b2zrw-provision-hkf4b 0/3 Completed 0 35h pod/cahl-vmware-bad-8-kspph-provision-fq8x8 0/3 Completed 0 31h pod/cahl-vmware-bad-9-qq8d8-provision-kf4rk 0/3 Completed 0 26h pod/cahl-vmware-bad-uninstall-hz5nb 0/1 CrashLoopBackOff 191 15h NAME COMPLETIONS DURATION AGE job.batch/cahl-vmware-bad-0-b2zrw-provision 0/1 35h 35h job.batch/cahl-vmware-bad-8-kspph-provision 0/1 31h 31h job.batch/cahl-vmware-bad-9-qq8d8-provision 0/1 26h 26h job.batch/cahl-vmware-bad-uninstall 0/1 15h 15h oc logs pod/cahl-vmware-bad-uninstall-hz5nb '/vsphere/./..data' -> '/etc/pki/ca-trust/source/anchors/./..data' '/vsphere/./.cacert' -> '/etc/pki/ca-trust/source/anchors/./.cacert' '/vsphere/./..2020_09_02_20_44_31.130428863' -> '/etc/pki/ca-trust/source/anchors/./..2020_09_02_20_44_31.130428863' '/vsphere/./..2020_09_02_20_44_31.130428863/.cacert' -> '/etc/pki/ca-trust/source/anchors/./..2020_09_02_20_44_31.130428863/.cacert' time="2020-09-03T12:40:37Z" level=debug msg="find attached objects on tag" time="2020-09-03T12:40:37Z" level=fatal msg="Runtime error" error="list attached objects [cahl-vmware-bad-xk42s]: GET https://cicd-vcsa-01.cicd.red-chesterfield.com/rest/com/vmware/cis/tagging/tag/id:cahl-vmware-bad-xk42s: 404 Not Found" oc logs job.batch/cahl-vmware-bad-uninstall '/vsphere/./..data' -> '/etc/pki/ca-trust/source/anchors/./..data' '/vsphere/./.cacert' -> '/etc/pki/ca-trust/source/anchors/./.cacert' '/vsphere/./..2020_09_02_20_44_31.130428863' -> '/etc/pki/ca-trust/source/anchors/./..2020_09_02_20_44_31.130428863' '/vsphere/./..2020_09_02_20_44_31.130428863/.cacert' -> '/etc/pki/ca-trust/source/anchors/./..2020_09_02_20_44_31.130428863/.cacert' time="2020-09-03T12:40:37Z" level=debug msg="find attached objects on tag" time="2020-09-03T12:40:37Z" level=fatal msg="Runtime error" error="list attached objects [cahl-vmware-bad-xk42s]: GET https://cicd-vcsa-01.cicd.red-chesterfield.com/rest/com/vmware/cis/tagging/tag/id:cahl-vmware-bad-xk42s: 404 Not Found" ``` A note in slack by Andrew Butcher concerning the message in the logs: ``` Which would be a failure here https://github.com/openshift/installer/blob/master/pkg/destroy/vsphere/vsphere.go#L125 ? Maybe we need to update our installer vendor? ```
For a workaround you can remove the hive.openshift.io deprovosion finalizer on the clusterdeployment. In this case that should be safe as you know nothing could have been created that needs cleanup. I don't know how we're going to solve this one. We've seen a similar bug floating around with a bad certificate CA that also exhibits the same problem. Hive is not really presently in the business of talking directly to cloud providers very often, we typically leave that to the installer. Abhinav what do you think about an installer connectivity check for each cloud prior to generating the infraID? If this errored before infraID was written, we would be in the clear.
We have noted the same inability to cleanup for VMware when the VMware credentials provided are incorrect. ClusterDeprovision seems stuck. ``` oc logs job.batch/cahl-vmware-bad-uninstall -n cahl-vmware-bad '/vsphere/./..data' -> '/etc/pki/ca-trust/source/anchors/./..data' '/vsphere/./.cacert' -> '/etc/pki/ca-trust/source/anchors/./.cacert' '/vsphere/./..2020_10_14_12_32_46.770064044' -> '/etc/pki/ca-trust/source/anchors/./..2020_10_14_12_32_46.770064044' '/vsphere/./..2020_10_14_12_32_46.770064044/.cacert' -> '/etc/pki/ca-trust/source/anchors/./..2020_10_14_12_32_46.770064044/.cacert' time="2020-10-14T12:44:09Z" level=info msg="running file observer" files="[/vsphere-creds/..2020_10_14_12_32_46.887902698/password /vsphere-creds/..2020_10_14_12_32_46.887902698/username]" I1014 12:44:09.787772 15 observer_polling.go:159] Starting file observer time="2020-10-14T12:44:15Z" level=fatal msg="Runtime error" error="ServerFaultCode: Cannot complete login due to an incorrect user name or password." (base) cahl@MacBook-Pro deploy % oc logs job.batch/cahl-vmware-bad-uninstall -n cahl-vmware-bad '/vsphere/./..data' -> '/etc/pki/ca-trust/source/anchors/./..data' '/vsphere/./.cacert' -> '/etc/pki/ca-trust/source/anchors/./.cacert' '/vsphere/./..2020_10_14_12_32_46.770064044' -> '/etc/pki/ca-trust/source/anchors/./..2020_10_14_12_32_46.770064044' '/vsphere/./..2020_10_14_12_32_46.770064044/.cacert' -> '/etc/pki/ca-trust/source/anchors/./..2020_10_14_12_32_46.770064044/.cacert' time="2020-10-14T12:44:09Z" level=info msg="running file observer" files="[/vsphere-creds/..2020_10_14_12_32_46.887902698/password /vsphere-creds/..2020_10_14_12_32_46.887902698/username]" I1014 12:44:09.787772 15 observer_polling.go:159] Starting file observer time="2020-10-14T12:44:15Z" level=fatal msg="Runtime error" error="ServerFaultCode: Cannot complete login due to an incorrect user name or password." ```
No doc update per Greg
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days