Bug 1874240 - [vsphere] unable to deprovision - Runtime error list attached objects
Summary: [vsphere] unable to deprovision - Runtime error list attached objects
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Hive
Version: 4.5
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.7.0
Assignee: Akhil Rane
QA Contact: wang lin
URL:
Whiteboard:
Depends On: 1868755
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-31 18:56 UTC by cahl
Modified: 2023-09-15 00:47 UTC (History)
9 users (show)

Fixed In Version: 1.0.17
Doc Type: No Doc Update
Doc Text:
Clone Of: 1868755
Environment:
Last Closed: 2021-02-24 15:16:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Hive OCP VMware IPI provision with bad VIP for API and Ingress (99.33 KB, application/octet-stream)
2020-08-31 20:29 UTC, cahl
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:17:27 UTC

Comment 1 Abhinav Dahiya 2020-08-31 19:26:20 UTC
Can you include the complete log from openshift-installer, .openshift_install.log file

As for the tag no getting cleaned up, that was fixed in https://bugzilla.redhat.com/show_bug.cgi?id=1852720

Comment 2 cahl 2020-08-31 20:29:29 UTC
Created attachment 1713210 [details]
Hive OCP VMware IPI provision with bad VIP for API and Ingress

Comment 3 Abhinav Dahiya 2020-08-31 20:39:22 UTC
The attached logs https://bugzilla.redhat.com/attachment.cgi?id=1713210 have successful destroy logs, can you attach the logs from the failed destroy run.

Comment 4 cahl 2020-09-01 12:18:35 UTC
The attached log is from the failure scenario.  What piece of code is responsible for the uninstall pod that is having the problem? 
```
cahl-vmware-uninstall-ltpnh           0/1     CrashLoopBackOff   44         3h25m
```

Comment 5 cahl 2020-09-03 12:46:48 UTC
I retried this and wanted to add more info since I noticed more stuff in the namespace. 

```
oc get all                                                     
NAME                                          READY   STATUS             RESTARTS   AGE
pod/cahl-vmware-bad-0-b2zrw-provision-hkf4b   0/3     Completed          0          35h
pod/cahl-vmware-bad-8-kspph-provision-fq8x8   0/3     Completed          0          31h
pod/cahl-vmware-bad-9-qq8d8-provision-kf4rk   0/3     Completed          0          26h
pod/cahl-vmware-bad-uninstall-hz5nb           0/1     CrashLoopBackOff   191        15h

NAME                                          COMPLETIONS   DURATION   AGE
job.batch/cahl-vmware-bad-0-b2zrw-provision   0/1           35h        35h
job.batch/cahl-vmware-bad-8-kspph-provision   0/1           31h        31h
job.batch/cahl-vmware-bad-9-qq8d8-provision   0/1           26h        26h
job.batch/cahl-vmware-bad-uninstall           0/1           15h        15h


oc logs pod/cahl-vmware-bad-uninstall-hz5nb 
'/vsphere/./..data' -> '/etc/pki/ca-trust/source/anchors/./..data'
'/vsphere/./.cacert' -> '/etc/pki/ca-trust/source/anchors/./.cacert'
'/vsphere/./..2020_09_02_20_44_31.130428863' -> '/etc/pki/ca-trust/source/anchors/./..2020_09_02_20_44_31.130428863'
'/vsphere/./..2020_09_02_20_44_31.130428863/.cacert' -> '/etc/pki/ca-trust/source/anchors/./..2020_09_02_20_44_31.130428863/.cacert'
time="2020-09-03T12:40:37Z" level=debug msg="find attached objects on tag"
time="2020-09-03T12:40:37Z" level=fatal msg="Runtime error" error="list attached objects [cahl-vmware-bad-xk42s]: GET https://cicd-vcsa-01.cicd.red-chesterfield.com/rest/com/vmware/cis/tagging/tag/id:cahl-vmware-bad-xk42s: 404 Not Found"


oc logs job.batch/cahl-vmware-bad-uninstall 
'/vsphere/./..data' -> '/etc/pki/ca-trust/source/anchors/./..data'
'/vsphere/./.cacert' -> '/etc/pki/ca-trust/source/anchors/./.cacert'
'/vsphere/./..2020_09_02_20_44_31.130428863' -> '/etc/pki/ca-trust/source/anchors/./..2020_09_02_20_44_31.130428863'
'/vsphere/./..2020_09_02_20_44_31.130428863/.cacert' -> '/etc/pki/ca-trust/source/anchors/./..2020_09_02_20_44_31.130428863/.cacert'
time="2020-09-03T12:40:37Z" level=debug msg="find attached objects on tag"
time="2020-09-03T12:40:37Z" level=fatal msg="Runtime error" error="list attached objects [cahl-vmware-bad-xk42s]: GET https://cicd-vcsa-01.cicd.red-chesterfield.com/rest/com/vmware/cis/tagging/tag/id:cahl-vmware-bad-xk42s: 404 Not Found"
```  




A note in slack by Andrew Butcher concerning the message in the logs:
```
Which would be a failure here https://github.com/openshift/installer/blob/master/pkg/destroy/vsphere/vsphere.go#L125 ?

Maybe we need to update our installer vendor?
```

Comment 7 Devan Goodwin 2020-09-16 12:26:55 UTC
For a workaround you can remove the hive.openshift.io deprovosion finalizer on the clusterdeployment. In this case that should be safe as you know nothing could have been created that needs cleanup.

I don't know how we're going to solve this one. We've seen a similar bug floating around with a bad certificate CA that also exhibits the same problem. Hive is not really presently in the business of talking directly to cloud providers very often, we typically leave that to the installer.

Abhinav what do you think about an installer connectivity check for each cloud prior to generating the infraID? If this errored before infraID was written, we would be in the clear.

Comment 9 cahl 2020-10-14 12:46:55 UTC
We have noted the same inability to cleanup for VMware when the VMware credentials provided are incorrect.  ClusterDeprovision seems stuck.

```
oc logs job.batch/cahl-vmware-bad-uninstall -n cahl-vmware-bad        
'/vsphere/./..data' -> '/etc/pki/ca-trust/source/anchors/./..data'
'/vsphere/./.cacert' -> '/etc/pki/ca-trust/source/anchors/./.cacert'
'/vsphere/./..2020_10_14_12_32_46.770064044' -> '/etc/pki/ca-trust/source/anchors/./..2020_10_14_12_32_46.770064044'
'/vsphere/./..2020_10_14_12_32_46.770064044/.cacert' -> '/etc/pki/ca-trust/source/anchors/./..2020_10_14_12_32_46.770064044/.cacert'
time="2020-10-14T12:44:09Z" level=info msg="running file observer" files="[/vsphere-creds/..2020_10_14_12_32_46.887902698/password /vsphere-creds/..2020_10_14_12_32_46.887902698/username]"
I1014 12:44:09.787772      15 observer_polling.go:159] Starting file observer
time="2020-10-14T12:44:15Z" level=fatal msg="Runtime error" error="ServerFaultCode: Cannot complete login due to an incorrect user name or password."
(base) cahl@MacBook-Pro deploy % oc logs job.batch/cahl-vmware-bad-uninstall -n cahl-vmware-bad 
'/vsphere/./..data' -> '/etc/pki/ca-trust/source/anchors/./..data'
'/vsphere/./.cacert' -> '/etc/pki/ca-trust/source/anchors/./.cacert'
'/vsphere/./..2020_10_14_12_32_46.770064044' -> '/etc/pki/ca-trust/source/anchors/./..2020_10_14_12_32_46.770064044'
'/vsphere/./..2020_10_14_12_32_46.770064044/.cacert' -> '/etc/pki/ca-trust/source/anchors/./..2020_10_14_12_32_46.770064044/.cacert'
time="2020-10-14T12:44:09Z" level=info msg="running file observer" files="[/vsphere-creds/..2020_10_14_12_32_46.887902698/password /vsphere-creds/..2020_10_14_12_32_46.887902698/username]"
I1014 12:44:09.787772      15 observer_polling.go:159] Starting file observer
time="2020-10-14T12:44:15Z" level=fatal msg="Runtime error" error="ServerFaultCode: Cannot complete login due to an incorrect user name or password."
```

Comment 13 Jeana Routh 2021-02-11 15:04:07 UTC
No doc update per Greg

Comment 16 errata-xmlrpc 2021-02-24 15:16:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633

Comment 17 Red Hat Bugzilla 2023-09-15 00:47:17 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days


Note You need to log in before you can comment on or make changes to this bug.