Bug 1874240

Summary: [vsphere] unable to deprovision - Runtime error list attached objects
Product: OpenShift Container Platform Reporter: cahl
Component: HiveAssignee: Akhil Rane <arane>
Status: CLOSED ERRATA QA Contact: wang lin <lwan>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.5CC: adahiya, arane, cahl, dgoodwin, gshereme, jcallen, jima, jrouth, lwan
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 1.0.17 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: 1868755 Environment:
Last Closed: 2021-02-24 15:16:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1868755    
Bug Blocks:    
Attachments:
Description Flags
Hive OCP VMware IPI provision with bad VIP for API and Ingress none

Comment 1 Abhinav Dahiya 2020-08-31 19:26:20 UTC
Can you include the complete log from openshift-installer, .openshift_install.log file

As for the tag no getting cleaned up, that was fixed in https://bugzilla.redhat.com/show_bug.cgi?id=1852720

Comment 2 cahl 2020-08-31 20:29:29 UTC
Created attachment 1713210 [details]
Hive OCP VMware IPI provision with bad VIP for API and Ingress

Comment 3 Abhinav Dahiya 2020-08-31 20:39:22 UTC
The attached logs https://bugzilla.redhat.com/attachment.cgi?id=1713210 have successful destroy logs, can you attach the logs from the failed destroy run.

Comment 4 cahl 2020-09-01 12:18:35 UTC
The attached log is from the failure scenario.  What piece of code is responsible for the uninstall pod that is having the problem? 
```
cahl-vmware-uninstall-ltpnh           0/1     CrashLoopBackOff   44         3h25m
```

Comment 5 cahl 2020-09-03 12:46:48 UTC
I retried this and wanted to add more info since I noticed more stuff in the namespace. 

```
oc get all                                                     
NAME                                          READY   STATUS             RESTARTS   AGE
pod/cahl-vmware-bad-0-b2zrw-provision-hkf4b   0/3     Completed          0          35h
pod/cahl-vmware-bad-8-kspph-provision-fq8x8   0/3     Completed          0          31h
pod/cahl-vmware-bad-9-qq8d8-provision-kf4rk   0/3     Completed          0          26h
pod/cahl-vmware-bad-uninstall-hz5nb           0/1     CrashLoopBackOff   191        15h

NAME                                          COMPLETIONS   DURATION   AGE
job.batch/cahl-vmware-bad-0-b2zrw-provision   0/1           35h        35h
job.batch/cahl-vmware-bad-8-kspph-provision   0/1           31h        31h
job.batch/cahl-vmware-bad-9-qq8d8-provision   0/1           26h        26h
job.batch/cahl-vmware-bad-uninstall           0/1           15h        15h


oc logs pod/cahl-vmware-bad-uninstall-hz5nb 
'/vsphere/./..data' -> '/etc/pki/ca-trust/source/anchors/./..data'
'/vsphere/./.cacert' -> '/etc/pki/ca-trust/source/anchors/./.cacert'
'/vsphere/./..2020_09_02_20_44_31.130428863' -> '/etc/pki/ca-trust/source/anchors/./..2020_09_02_20_44_31.130428863'
'/vsphere/./..2020_09_02_20_44_31.130428863/.cacert' -> '/etc/pki/ca-trust/source/anchors/./..2020_09_02_20_44_31.130428863/.cacert'
time="2020-09-03T12:40:37Z" level=debug msg="find attached objects on tag"
time="2020-09-03T12:40:37Z" level=fatal msg="Runtime error" error="list attached objects [cahl-vmware-bad-xk42s]: GET https://cicd-vcsa-01.cicd.red-chesterfield.com/rest/com/vmware/cis/tagging/tag/id:cahl-vmware-bad-xk42s: 404 Not Found"


oc logs job.batch/cahl-vmware-bad-uninstall 
'/vsphere/./..data' -> '/etc/pki/ca-trust/source/anchors/./..data'
'/vsphere/./.cacert' -> '/etc/pki/ca-trust/source/anchors/./.cacert'
'/vsphere/./..2020_09_02_20_44_31.130428863' -> '/etc/pki/ca-trust/source/anchors/./..2020_09_02_20_44_31.130428863'
'/vsphere/./..2020_09_02_20_44_31.130428863/.cacert' -> '/etc/pki/ca-trust/source/anchors/./..2020_09_02_20_44_31.130428863/.cacert'
time="2020-09-03T12:40:37Z" level=debug msg="find attached objects on tag"
time="2020-09-03T12:40:37Z" level=fatal msg="Runtime error" error="list attached objects [cahl-vmware-bad-xk42s]: GET https://cicd-vcsa-01.cicd.red-chesterfield.com/rest/com/vmware/cis/tagging/tag/id:cahl-vmware-bad-xk42s: 404 Not Found"
```  




A note in slack by Andrew Butcher concerning the message in the logs:
```
Which would be a failure here https://github.com/openshift/installer/blob/master/pkg/destroy/vsphere/vsphere.go#L125 ?

Maybe we need to update our installer vendor?
```

Comment 7 Devan Goodwin 2020-09-16 12:26:55 UTC
For a workaround you can remove the hive.openshift.io deprovosion finalizer on the clusterdeployment. In this case that should be safe as you know nothing could have been created that needs cleanup.

I don't know how we're going to solve this one. We've seen a similar bug floating around with a bad certificate CA that also exhibits the same problem. Hive is not really presently in the business of talking directly to cloud providers very often, we typically leave that to the installer.

Abhinav what do you think about an installer connectivity check for each cloud prior to generating the infraID? If this errored before infraID was written, we would be in the clear.

Comment 9 cahl 2020-10-14 12:46:55 UTC
We have noted the same inability to cleanup for VMware when the VMware credentials provided are incorrect.  ClusterDeprovision seems stuck.

```
oc logs job.batch/cahl-vmware-bad-uninstall -n cahl-vmware-bad        
'/vsphere/./..data' -> '/etc/pki/ca-trust/source/anchors/./..data'
'/vsphere/./.cacert' -> '/etc/pki/ca-trust/source/anchors/./.cacert'
'/vsphere/./..2020_10_14_12_32_46.770064044' -> '/etc/pki/ca-trust/source/anchors/./..2020_10_14_12_32_46.770064044'
'/vsphere/./..2020_10_14_12_32_46.770064044/.cacert' -> '/etc/pki/ca-trust/source/anchors/./..2020_10_14_12_32_46.770064044/.cacert'
time="2020-10-14T12:44:09Z" level=info msg="running file observer" files="[/vsphere-creds/..2020_10_14_12_32_46.887902698/password /vsphere-creds/..2020_10_14_12_32_46.887902698/username]"
I1014 12:44:09.787772      15 observer_polling.go:159] Starting file observer
time="2020-10-14T12:44:15Z" level=fatal msg="Runtime error" error="ServerFaultCode: Cannot complete login due to an incorrect user name or password."
(base) cahl@MacBook-Pro deploy % oc logs job.batch/cahl-vmware-bad-uninstall -n cahl-vmware-bad 
'/vsphere/./..data' -> '/etc/pki/ca-trust/source/anchors/./..data'
'/vsphere/./.cacert' -> '/etc/pki/ca-trust/source/anchors/./.cacert'
'/vsphere/./..2020_10_14_12_32_46.770064044' -> '/etc/pki/ca-trust/source/anchors/./..2020_10_14_12_32_46.770064044'
'/vsphere/./..2020_10_14_12_32_46.770064044/.cacert' -> '/etc/pki/ca-trust/source/anchors/./..2020_10_14_12_32_46.770064044/.cacert'
time="2020-10-14T12:44:09Z" level=info msg="running file observer" files="[/vsphere-creds/..2020_10_14_12_32_46.887902698/password /vsphere-creds/..2020_10_14_12_32_46.887902698/username]"
I1014 12:44:09.787772      15 observer_polling.go:159] Starting file observer
time="2020-10-14T12:44:15Z" level=fatal msg="Runtime error" error="ServerFaultCode: Cannot complete login due to an incorrect user name or password."
```

Comment 13 Jeana Routh 2021-02-11 15:04:07 UTC
No doc update per Greg

Comment 16 errata-xmlrpc 2021-02-24 15:16:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633

Comment 17 Red Hat Bugzilla 2023-09-15 00:47:17 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days